免責聲明
本文中使用的股票數據完全是虛構的。僅供演示目的。請不要使用該數據做出任何金融決策。
在之前的文章中,我們看到了在本地使用Ollama進行RAG應用的好處。在本文中,我們將通過使用LangChain的SQLDatabaseToolkit
對自然語言(NL)查詢與數據庫系統進行測試,來擴展我們對Ollama的評估。SQL將作為基準系統,我們將探索OpenAI和Ollama提供的結果質量。
本文使用的筆記本文件可以在GitHub上找到。
介紹
LangChain的SQLDatabaseToolkit
是一個強大的工具,旨在將NL處理能力與關聯數據庫系統集成。它使用戶能夠使用NL輸入查詢數據庫,利用大型語言模型(LLMs)的功能動態生成SQL查詢。這使其尤其適用於非技術用戶或自動系統需要與結構化數據互動的應用。
LangChain支持多種LLM。LangChain還支持Ollama。在本文中,我們將評估LangChain與Ollama集成的情況以及在本地環境中使用SQLDatabaseToolkit
的可行性。
創建SingleStore Cloud帳戶
一篇先前的文章展示了創建免費的SingleStore Cloud帳戶的步驟。我們將使用免費的共享層。
選擇入門工作區 > 連線 > CLI客戶端將為我們提供後續所需的詳細信息,如用戶名
、密碼
、主機
、端口
和資料庫
。
創建數據庫表
對於我們的測試環境,我們將使用在雲中運行的SingleStore作為目標數據庫系統,並通過在本地系統運行的Jupyter筆記本安全連接到這個環境。
從SingleStore雲閃電面板的左側導航窗格中,我們將選擇開發 > 數據工作室 > 打開SQL編輯器。我們將創建三個表,如下:
CREATE TABLE IF NOT EXISTS tick (
symbol VARCHAR(10),
ts DATETIME SERIES TIMESTAMP,
open NUMERIC(18, 2),
high NUMERIC(18, 2),
low NUMERIC(18, 2),
price NUMERIC(18, 2),
volume INT,
KEY(ts)
);
CREATE TABLE IF NOT EXISTS portfolio (
symbol VARCHAR(10),
shares_held INT,
purchase_date DATE,
purchase_price NUMERIC(18, 2)
);
CREATE TABLE IF NOT EXISTS stock_sentiment (
headline VARCHAR(250),
positive FLOAT,
negative FLOAT,
neutral FLOAT,
url TEXT,
publisher VARCHAR(30),
ts DATETIME,
symbol VARCHAR(10)
);
我們將使用以下虛構數據將投資組合
表載入:
INSERT INTO portfolio (symbol, shares_held, purchase_date, purchase_price) VALUES
('AAPL', 100, '2022-01-15', 150.25),
('MSFT', 50, '2021-12-10', 305.50),
('GOOGL', 25, '2021-11-05', 2800.75),
('AMZN', 10, '2020-07-20', 3200.00),
('TSLA', 40, '2022-02-18', 900.60),
('NFLX', 15, '2021-09-01', 550.00);
對於股票情緒
表,我們將下載stock_sentiment.sql.zip文件並解壓縮。我們將使用MySQL客戶端將數據加載到表中,如下:
mysql -u "<username>" -p"<password>" -h "<host>" -P <port> -D <database> < stock_sentiment.sql
我們將使用之前保存的<用戶名>
、<密碼>
、<主機>
、<端口>
和<資料庫>
的值。
最後,對於tick
表,我們將創建一個流水線:
CREATE PIPELINE tick
AS LOAD DATA KAFKA 'public-kafka.memcompute.com:9092/stockticker'
BATCH_INTERVAL 45000
INTO TABLE tick
FIELDS TERMINATED BY ','
(symbol,ts,open,high,low,price,volume);
我們將調整以獲取最早的數據:
ALTER PIPELINE tick SET OFFSETS EARLIEST;
並測試管道:
TEST PIPELINE tick LIMIT 1;
示例輸出:
+--------+---------------------+--------+--------+--------+--------+--------+
| symbol | ts | open | high | low | price | volume |
+--------+---------------------+--------+--------+--------+--------+--------+
| MMM | 2025-01-23 21:40:32 | 178.34 | 178.43 | 178.17 | 178.24 | 38299 |
+--------+---------------------+--------+--------+--------+--------+--------+
然後我們將啟動管道:
START PIPELINE tick;
幾分鐘後,我們將檢查到目前為止加載的數據量:
SELECT COUNT(*)
FROM tick;
本地測試環境
從一篇之前的文章中,我們將按照相同的步驟來設置我們的本地測試環境,如這些部分所述:
- 介紹。使用虛擬機器或
venv
。 - 創建 SingleStore Cloud 帳戶。 此步驟已在上面完成。
- 創建數據庫。 免費共享層已提供數據庫,我們只需記下數據庫名稱。
- 安裝 Jupyter。
純文本
pip install notebook
- 安裝Ollama。
純文字
curl -fsSL https://ollama.com/install.sh | sh
- 環境變數.
純文字:
@ : / “” data-lang=”text/plain”> export SINGLESTOREDB_URL="<使用者名稱>:<密碼>@<主機>:<埠號>/<資料庫>"
將
<使用者名稱>
、<密碼>
、<主機>
、<埠號>
和<資料庫>
替換為您環境中的值。純文字“” data-lang=”text/plain”>export OPENAI_API_KEY="<OpenAI API 金鑰>"
將
<OpenAI API 金鑰>
替換為您的金鑰。 - 啟動 Jupyter。
普通文本
jupyter notebook
我們將使用來自GitHub的 Jupyter 筆記本。這些筆記本已配置為使用 OpenAI 和 Ollama。對於 Ollama,我們將使用工具支持中列出的 LLMs 之一。我們將測試以下四個查詢。
第一個查詢
SQL
SELECT symbol, (MAX(high) - MIN(low)) AS volatility
FROM tick
GROUP BY symbol
ORDER BY volatility ASC
LIMIT 1;
自然語言
"For each stock symbol, calculate the volatility as the difference\n"
"between the highest recorded price and the lowest recorded price over time.\n"
"Which stock symbol has the least volatility?"
結果
SQL
+--------+------------+
| symbol | volatility |
+--------+------------+
| FTR | 0.55 |
+--------+------------+
OpenAI
The stock symbol with the least volatility is FTR, with a volatility of 0.55.
Ollama
To find the stock with the highest price, we need to compare the prices of all the given stocks and find the maximum value. However, I don't have real-time access to financial data or the ability to execute code that interacts with external APIs. Therefore, I can't directly calculate the highest price from this list.
However, if you provide me with a specific date or time range, I can help you find the stock with the highest price during that period. For example, if you want to know the stock with the highest price on a particular day in the past, I can assist you with that.
If you're looking for the current highest-priced stock, I recommend checking a financial news website or an API that provides real-time stock data, such as Yahoo Finance, Google Finance, or a service like Alpha Vantage or Finnhub.io. These platforms can give you the most up-to-date information on stock prices.`
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE
第二個查詢
SQL
SELECT COUNT(*)
FROM tick;
自然語言
"How many rows are in the tick table?"
結果
SQL
+----------+
| COUNT(*) |
+----------+
| 22367162 |
+----------+
OpenAI
There are 22,367,162 rows in the tick table.
Ollama
The "tick" table has 3 rows.
第三個查詢
SQL
-- 步驟 1: 獲取每個符號的最新價格
WITH latest_prices AS (
SELECT symbol, price
FROM tick t1
WHERE ts = (
SELECT MAX(ts)
FROM tick t2
WHERE t2.symbol = t1.symbol
)
)
-- 步驟 2: 計算總投資組合價值
SELECT SUM(p.shares_held * lp.price) AS total_portfolio_value
FROM portfolio p, latest_prices lp
WHERE p.symbol = lp.symbol;
自然語言
"Taking all the stock symbols from the portfolio table,\n"
"and using the latest value for each stock symbol from the tick table,\n"
"calculate the grand total value of all the shares listed in the portfolio table."
結果
SQL
+-----------------------+
| total_portfolio_value |
+-----------------------+
| 44540.60 |
+-----------------------+
OpenAI
$44,540.60
Ollama
I don't know. The provided SQL query does not contain any of the common mistakes listed. It uses the correct join conditions, functions, and data types. The query also properly quotes identifiers and uses the correct number of arguments for functions. Therefore, no changes are needed. However, without access to the actual database schema and table data, I cannot provide a specific answer to the question.
第四個查詢
SQL
SELECT
(
SELECT ts
FROM stock_sentiment
WHERE symbol = 'AAPL'
ORDER BY positive DESC
LIMIT 1
) AS sentiment_date,
(
SELECT positive
FROM stock_sentiment
WHERE symbol = 'AAPL'
ORDER BY positive DESC
LIMIT 1
) AS most_positive_sentiment,
(
SELECT price
FROM tick
WHERE symbol = 'AAPL'
ORDER BY ts DESC
LIMIT 1
) AS current_best_price;
"Using the symbol AAPL, show me the date and the most\n"
"positive sentiment in the stock sentiment table and the\n"
"current best price for this symbol from the tick table."
結果
SQL
+---------------------+-------------------------+--------------------+
| sentiment_date | most_positive_sentiment | current_best_price |
+---------------------+-------------------------+--------------------+
| 2020-05-28 00:00:00 | 0.331509 | 116.27 |
+---------------------+-------------------------+--------------------+
OpenAI
On 2020-05-28, the most positive sentiment for AAPL was 0.331509, and the current best price for AAPL is 116.27.
Ollama
The provided list contains decimal numbers, which appear to be the results of some kind of experiment or analysis. Without additional context, it's difficult to determine the exact nature of these results. However, we can observe that the majority of the numbers are between 116.85 and 117.27, with a few outliers at 115.99 and 117.30. The smallest number in the list is 115.99, and the largest is 117.30.`
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE
摘要
分析結果,我們看到 SQL 和 OpenAI 在所有四個查詢中產生一致的輸出。然而,Ollama 表現出明顯的問題。一個在 GitHub 上的 討論串 指出,雖然 LLM 模型應該支持工具調用,但這一功能在 Ollama 中並不是原生可用的。
如果您能在其中一個支持的 LLM 中使這個 LangChain 功能與 Ollama 一起運作,請發送消息給我,我會更新文章並感謝您的幫助。
Source:
https://dzone.com/articles/openai-vs-ollama-langchain-sqldatabasetoolkit