OpenAI 대 Ollama: LangChain의 SQLDatabaseToolkit 사용

MySQL SQL

면책사항

이 기사에서 사용된 주식 데이터는 완전히 가상입니다. 데모 목적으로만 사용되었습니다. 금융 결정을 내리기 위해 이 데이터를 사용하지 마십시오.

이전 기사에서 Ollama를 로컬에서 RAG 애플리케이션에 사용하는 이점을 살펴보았습니다. 이 기사에서는 LangChain의 SQLDatabaseToolkit을 사용하여 데이터베이스 시스템에 대한 자연어 (NL) 쿼리를 테스트하여 Ollama의 평가를 확장할 것입니다. OpenAI와 Ollama가 제공하는 결과의 품질을 탐색하면서 비교의 기준 시스템으로 SQL을 사용할 것입니다.

이 기사에서 사용된 노트북 파일은 GitHub에서 이용할 수 있습니다.

소개

LangChain의 SQLDatabaseToolkit은 NL 처리 기능을 관계형 데이터베이스 시스템과 효과적으로 통합하기 위해 설계된 강력한 도구입니다. 이를 사용하면 사용자가 NL 입력을 사용하여 데이터베이스에 쿼리할 수 있으며, 대형 언어 모델 (LLM)의 기능을 사용하여 SQL 쿼리를 동적으로 생성할 수 있습니다. 이는 비기술적 사용자나 자동화된 시스템이 구조화된 데이터와 상호 작용해야 하는 응용 프로그램에 특히 유용합니다.

LangChain은 여러 LLM을 잘 지원합니다. 또한 Ollama를 지원합니다. 이 기사에서는 LangChain이 Ollama와 어떻게 통합되는지, 그리고 SQLDatabaseToolkit을 로컬 설정에서 사용할 수 있는지에 대해 평가할 것입니다.

SingleStore Cloud 계정 만들기

이전 기사에서 무료 SingleStore Cloud 계정을 만드는 단계를 보여주었습니다. 무료 공유 티어를 사용할 것입니다.

스타터 워크스페이스 > 연결 > CLI 클라이언트를 선택하면 username, password, host, port, database와 같은 나중에 필요한 세부 정보를 제공받을 수 있습니다.

데이터베이스 테이블 생성

테스트 환경으로는 클라우드에서 실행 중인 SingleStore를 대상 데이터베이스 시스템으로 사용하고, 이 환경에 로컬 시스템에서 실행 중인 Jupyter 노트북을 사용하여 안전하게 연결할 것입니다.

SingleStore 클라우드 포털의 왼쪽 탐색 창에서 개발 > 데이터 스튜디오 > SQL 편집기 열기를 선택합니다. 다음과 같이 세 개의 테이블을 생성할 것입니다:

SQL

CREATE TABLE IF NOT EXISTS tick (
    symbol VARCHAR(10),
    ts     DATETIME SERIES TIMESTAMP,
    open   NUMERIC(18, 2),
    high   NUMERIC(18, 2),
    low    NUMERIC(18, 2),
    price  NUMERIC(18, 2),
    volume INT,
    KEY(ts)
);

CREATE TABLE IF NOT EXISTS portfolio (
    symbol         VARCHAR(10),
    shares_held    INT,
    purchase_date  DATE,
    purchase_price NUMERIC(18, 2)
);

CREATE TABLE IF NOT EXISTS stock_sentiment (
    headline  VARCHAR(250),
    positive  FLOAT,
    negative  FLOAT,
    neutral   FLOAT,
    url       TEXT,
    publisher VARCHAR(30),
    ts        DATETIME,
    symbol    VARCHAR(10)
);

portfolio 테이블에 다음 가상 데이터를 로드할 것입니다:

SQL

INSERT INTO portfolio (symbol, shares_held, purchase_date, purchase_price) VALUES
('AAPL', 100, '2022-01-15',  150.25),
('MSFT',  50, '2021-12-10',  305.50),
('GOOGL', 25, '2021-11-05', 2800.75),
('AMZN',  10, '2020-07-20', 3200.00),
('TSLA',  40, '2022-02-18',  900.60),
('NFLX',  15, '2021-09-01',  550.00);

stock_sentiment 테이블의 경우 stock_sentiment.sql.zip 파일을 다운로드하고 압축을 푼 후 MySQL 클라이언트를 사용하여 데이터를 테이블에 로드할 것입니다:

Shell

mysql -u "<username>" -p"<password>" -h "<host>" -P <port> -D <database> < stock_sentiment.sql

이전에 저장한 <username>, <password>, <host>, <port>, <database>의 값을 사용할 것입니다.

마지막으로 tick 테이블의 경우 파이프라인을 생성할 것입니다:

SQL

CREATE PIPELINE tick
AS LOAD DATA KAFKA 'public-kafka.memcompute.com:9092/stockticker'
BATCH_INTERVAL 45000
INTO TABLE tick
FIELDS TERMINATED BY ','
(symbol,ts,open,high,low,price,volume);

우리는 가장 빠른 데이터를 얻기 위해 조정할 것입니다:

SQL

ALTER PIPELINE tick SET OFFSETS EARLIEST;

그리고 파이프라인을 테스트할 것입니다:

SQL

TEST PIPELINE tick LIMIT 1;

예시 출력:

Plain Text

+--------+---------------------+--------+--------+--------+--------+--------+
| symbol | ts                  | open   | high   | low    | price  | volume |
+--------+---------------------+--------+--------+--------+--------+--------+
| MMM    | 2025-01-23 21:40:32 | 178.34 | 178.43 | 178.17 | 178.24 |  38299 |
+--------+---------------------+--------+--------+--------+--------+--------+

그리고 나서 우리는 파이프라인을 시작할 것입니다:

SQL

START PIPELINE tick;

몇 분 후, 지금까지 로드된 데이터의 양을 확인할 것입니다:

SQL

SELECT COUNT(*)
FROM tick;

로컬 테스트 환경

이전 기사에서 우리는 이 섹션에 설명된 대로 로컬 테스트 환경을 설정하기 위해 동일한 단계를 따를 것입니다:

소개. 가상 머신 또는 venv을 사용하세요.
SingleStore 클라우드 계정 만들기. 이 단계는 위에서 완료되었습니다.
데이터베이스 만들기. 무료 공유 티어는 이미 데이터베이스를 제공하며 우리는 데이터베이스 이름을 적어두기만 하면 됩니다.
Jupyter 설치하기.
일반 텍스트
```
pip install notebook
```

Ollama 설치.

일반 텍스트

curl -fsSL https://ollama.com/install.sh | sh

환경 변수.
일반 텍스트
:
@:/“” data-lang=”text/plain”>

export SINGLESTOREDB_URL="<사용자 이름>:<비밀번호>@<호스트>:<포트>/<데이터베이스>"
<사용자 이름>, <비밀번호>, <호스트>, <포트> 및 <데이터베이스>를 귀하의 환경에 맞는 값으로 바꾸십시오.
일반 텍스트
“” data-lang=”text/plain”>
```
export OPENAI_API_KEY="<OpenAI API 키>"
```
<OpenAI API 키>를 귀하의 키로 바꾸십시오.
Jupyter를 시작합니다.
일반 텍스트
```
jupyter notebook
```

우리는 GitHub에서 Jupyter 노트북을 사용할 것입니다. 이 노트북은 OpenAI와 Ollama를 사용하도록 구성되어 있습니다. Ollama의 경우, 도구 지원에 나열된 LLM 중 하나를 사용할 것입니다. 다음 네 가지 쿼리를 테스트할 것입니다.

첫 번째 쿼리

SQL

SELECT symbol, (MAX(high) - MIN(low)) AS volatility
FROM tick
GROUP BY symbol
ORDER BY volatility ASC
LIMIT 1;

자연어

Plain Text

"For each stock symbol, calculate the volatility as the difference\n"
"between the highest recorded price and the lowest recorded price over time.\n"
"Which stock symbol has the least volatility?"

결과

SQL

Plain Text

+--------+------------+
| symbol | volatility |
+--------+------------+
| FTR    |       0.55 |
+--------+------------+

OpenAI

Plain Text

The stock symbol with the least volatility is FTR, with a volatility of 0.55.

Ollama

Plain Text

To find the stock with the highest price, we need to compare the prices of all the given stocks and find the maximum value. However, I don't have real-time access to financial data or the ability to execute code that interacts with external APIs. Therefore, I can't directly calculate the highest price from this list.

However, if you provide me with a specific date or time range, I can help you find the stock with the highest price during that period. For example, if you want to know the stock with the highest price on a particular day in the past, I can assist you with that.

If you're looking for the current highest-priced stock, I recommend checking a financial news website or an API that provides real-time stock data, such as Yahoo Finance, Google Finance, or a service like Alpha Vantage or Finnhub.io. These platforms can give you the most up-to-date information on stock prices.`
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE

두 번째 쿼리

SQL

SELECT COUNT(*)
FROM tick;

자연어

Plain Text

"How many rows are in the tick table?"

결과

SQL

Plain Text

+----------+
| COUNT(*) |
+----------+
| 22367162 |
+----------+

OpenAI

Plain Text

There are 22,367,162 rows in the tick table.

Ollama

Plain Text

The "tick" table has 3 rows.

세 번째 쿼리

SQL

-- 단계 1: 각 심볼의 최신 가격 가져오기
WITH latest_prices AS (
    SELECT symbol, price
    FROM tick t1
    WHERE ts = (
        SELECT MAX(ts)
        FROM tick t2
        WHERE t2.symbol = t1.symbol
    )
)

-- 단계 2: 총 포트폴리오 가치 계산
SELECT SUM(p.shares_held * lp.price) AS total_portfolio_value
FROM portfolio p, latest_prices lp
WHERE p.symbol = lp.symbol;

자연어

Plain Text

"Taking all the stock symbols from the portfolio table,\n"
"and using the latest value for each stock symbol from the tick table,\n"
"calculate the grand total value of all the shares listed in the portfolio table."

결과

SQL

Plain Text

+-----------------------+
| total_portfolio_value |
+-----------------------+
|              44540.60 |
+-----------------------+

OpenAI

Plain Text

$44,540.60

Ollama

Plain Text

I don't know. The provided SQL query does not contain any of the common mistakes listed. It uses the correct join conditions, functions, and data types. The query also properly quotes identifiers and uses the correct number of arguments for functions. Therefore, no changes are needed. However, without access to the actual database schema and table data, I cannot provide a specific answer to the question.

네 번째 쿼리

SQL

SELECT
    (
        SELECT ts
        FROM stock_sentiment
        WHERE symbol = 'AAPL'
        ORDER BY positive DESC
        LIMIT 1
    ) AS sentiment_date,
    (
        SELECT positive
        FROM stock_sentiment
        WHERE symbol = 'AAPL'
        ORDER BY positive DESC
        LIMIT 1
    ) AS most_positive_sentiment,
    (
        SELECT price
        FROM tick
        WHERE symbol = 'AAPL'
        ORDER BY ts DESC
        LIMIT 1
    ) AS current_best_price;

자연어

Plain Text

"Using the symbol AAPL, show me the date and the most\n"
"positive sentiment in the stock sentiment table and the\n"
"current best price for this symbol from the tick table."

결과

SQL

Plain Text

+---------------------+-------------------------+--------------------+
| sentiment_date      | most_positive_sentiment | current_best_price |
+---------------------+-------------------------+--------------------+
| 2020-05-28 00:00:00 |                0.331509 |             116.27 |
+---------------------+-------------------------+--------------------+

OpenAI

Plain Text

On 2020-05-28, the most positive sentiment for AAPL was 0.331509, and the current best price for AAPL is 116.27.

Ollama

Plain Text

The provided list contains decimal numbers, which appear to be the results of some kind of experiment or analysis. Without additional context, it's difficult to determine the exact nature of these results. However, we can observe that the majority of the numbers are between 116.85 and 117.27, with a few outliers at 115.99 and 117.30. The smallest number in the list is 115.99, and the largest is 117.30.`
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE

요약

결과를 분석해보면, SQL과 OpenAI는 네 가지 쿼리 전반에 걸쳐 일관된 출력을 생성하는 반면, Ollama는 명확한 문제를 나타냅니다. GitHub의 토론 스레드에서는 LLM 모델이 도구 호출을 지원해야 하지만, 이 기능이 Ollama에서는 기본적으로 제공되지 않는다고 강조합니다.

지원되는 LLM 중 하나에서 Ollama와 함께 이 LangChain 기능을 작동시킬 수 있다면, 저에게 메시지를 보내주시면 기사를 업데이트하고 도움을 인정하겠습니다.

Source:
https://dzone.com/articles/openai-vs-ollama-langchain-sqldatabasetoolkit