OpenAI vs Ollama Utilizzando lo SQLDatabaseToolkit di LangChain

Tutorial

MySQL SQL

Dichiarazione di non responsabilità

I dati azionari utilizzati in questo articolo sono completamente fittizi. Sono puramente a scopo dimostrativo. Si prega di non utilizzare questi dati per prendere decisioni finanziarie.

In un articolo precedente, abbiamo visto i vantaggi dell’utilizzo di Ollama localmente per un’applicazione RAG. In questo articolo, estenderemo la nostra valutazione di Ollama testando query in linguaggio naturale (NL) contro un sistema di database, utilizzando il SQLDatabaseToolkit di LangChain. SQL servirà come sistema di riferimento per il confronto mentre esploriamo la qualità dei risultati forniti da OpenAI e Ollama.

I file notebook utilizzati in questo articolo sono disponibili su GitHub.

Introduzione

Il SQLDatabaseToolkit di LangChain è uno strumento potente progettato per integrare le capacità di elaborazione del linguaggio naturale con i sistemi di database relazionali. Permette agli utenti di interrogare i database utilizzando input in NL, sfruttando le capacità dei modelli di linguaggio di grandi dimensioni (LLM) per generare query SQL in modo dinamico. Questo lo rende particolarmente utile per applicazioni in cui utenti non tecnici o sistemi automatizzati devono interagire con dati strutturati.

Un certo numero di LLM sono ben supportati da LangChain. LangChain fornisce anche supporto per Ollama. In questo articolo, valuteremo quanto bene LangChain si integra con Ollama e la fattibilità di utilizzare il SQLDatabaseToolkit in un’installazione locale.

Crea un account SingleStore Cloud

Un articolo precedente ha mostrato i passaggi per creare un account gratuito su SingleStore Cloud. Utilizzeremo il Free Shared Tier.

Selezionando il Starter Workspace > Connect > CLI Client otterremo i dettagli necessari in seguito, come username, password, host, port e database.

Creare Tabelle del Database

Per il nostro ambiente di test, utilizzeremo SingleStore in esecuzione nel Cloud come nostro sistema di database di destinazione e ci connetteremo in modo sicuro a questo ambiente utilizzando i notebook Jupyter in esecuzione su un sistema locale.

Dalla barra di navigazione a sinistra nel portale cloud di SingleStore, selezioneremo DEVELOP > Data Studio > Open SQL Editor. Creeremo tre tabelle, come segue:

SQL

CREATE TABLE IF NOT EXISTS tick (
    symbol VARCHAR(10),
    ts     DATETIME SERIES TIMESTAMP,
    open   NUMERIC(18, 2),
    high   NUMERIC(18, 2),
    low    NUMERIC(18, 2),
    price  NUMERIC(18, 2),
    volume INT,
    KEY(ts)
);

CREATE TABLE IF NOT EXISTS portfolio (
    symbol         VARCHAR(10),
    shares_held    INT,
    purchase_date  DATE,
    purchase_price NUMERIC(18, 2)
);

CREATE TABLE IF NOT EXISTS stock_sentiment (
    headline  VARCHAR(250),
    positive  FLOAT,
    negative  FLOAT,
    neutral   FLOAT,
    url       TEXT,
    publisher VARCHAR(30),
    ts        DATETIME,
    symbol    VARCHAR(10)
);

Caricheremo la tabella portfolio con i seguenti dati fittizi:

SQL

INSERT INTO portfolio (symbol, shares_held, purchase_date, purchase_price) VALUES
('AAPL', 100, '2022-01-15',  150.25),
('MSFT',  50, '2021-12-10',  305.50),
('GOOGL', 25, '2021-11-05', 2800.75),
('AMZN',  10, '2020-07-20', 3200.00),
('TSLA',  40, '2022-02-18',  900.60),
('NFLX',  15, '2021-09-01',  550.00);

Per la tabella stock_sentiment, scaricheremo il file stock_sentiment.sql.zip e lo estrarremo. Caricheremo i dati nella tabella utilizzando un client MySQL, come segue:

Shell

mysql -u "<username>" -p"<password>" -h "<host>" -P <port> -D <database> < stock_sentiment.sql

Utilizzeremo i valori per <username>, <password>, <host>, <port> e <database> che abbiamo salvato in precedenza.

Infine, per la tabella tick, creeremo una pipeline:

SQL

CREATE PIPELINE tick
AS LOAD DATA KAFKA 'public-kafka.memcompute.com:9092/stockticker'
BATCH_INTERVAL 45000
INTO TABLE tick
FIELDS TERMINATED BY ','
(symbol,ts,open,high,low,price,volume);

Ci adatteremo per ottenere i dati più recenti:

SQL

ALTER PIPELINE tick SET OFFSETS EARLIEST;

E testeremo il pipeline:

SQL

TEST PIPELINE tick LIMIT 1;

Esempio di output:

Plain Text

+--------+---------------------+--------+--------+--------+--------+--------+
| symbol | ts                  | open   | high   | low    | price  | volume |
+--------+---------------------+--------+--------+--------+--------+--------+
| MMM    | 2025-01-23 21:40:32 | 178.34 | 178.43 | 178.17 | 178.24 |  38299 |
+--------+---------------------+--------+--------+--------+--------+--------+

E poi inizieremo il pipeline:

SQL

START PIPELINE tick;

Dopo alcuni minuti, controlleremo la quantità di dati caricati finora:

SQL

SELECT COUNT(*)
FROM tick;

Ambiente di Test Locale

Da un articolo precedente, seguiremo gli stessi passaggi per impostare il nostro ambiente di test locale come descritto in queste sezioni:

Introduzione. Usa una Macchina Virtuale o venv.
Crea un account SingleStore Cloud. Questo passaggio è stato completato sopra.
Crea un database. Il Piano Gratuito Condiviso fornisce già un database e dobbiamo solo annotare il nome del database.
Installa Jupyter.
Testo Normale
```
pip install notebook
```

Installa Ollama.

Testo Normale

curl -fsSL https://ollama.com/install.sh | sh

Variabili di ambiente.
Testo normale
:
@:/“” data-lang=”text/plain”>

export SINGLESTOREDB_URL="<username>:<password>@<host>:<port>/<database>"
Sostituisci <username>, <password>, <host>, <port> e <database> con i valori per il tuo ambiente.
Testo normale
“” data-lang=”text/plain”>
```
export OPENAI_API_KEY="<OpenAI API Key>"
```
Sostituisci <OpenAI API Key> con la tua chiave.
Avvia Jupyter.
Testo Normale
```
jupyter notebook
```

Utilizzeremo i notebook Jupyter da GitHub. Questi notebook sono configurati per utilizzare OpenAI e Ollama. Per Ollama, utilizzeremo uno dei LLM elencati con Supporto per strumenti. Testeremo le seguenti quattro query.

Prima Query

SQL

SELECT symbol, (MAX(high) - MIN(low)) AS volatility
FROM tick
GROUP BY symbol
ORDER BY volatility ASC
LIMIT 1;

Lingua Naturale

Plain Text

"For each stock symbol, calculate the volatility as the difference\n"
"between the highest recorded price and the lowest recorded price over time.\n"
"Which stock symbol has the least volatility?"

Risultati

SQL

Plain Text

+--------+------------+
| symbol | volatility |
+--------+------------+
| FTR    |       0.55 |
+--------+------------+

OpenAI

Plain Text

The stock symbol with the least volatility is FTR, with a volatility of 0.55.

Ollama

Plain Text

To find the stock with the highest price, we need to compare the prices of all the given stocks and find the maximum value. However, I don't have real-time access to financial data or the ability to execute code that interacts with external APIs. Therefore, I can't directly calculate the highest price from this list.

However, if you provide me with a specific date or time range, I can help you find the stock with the highest price during that period. For example, if you want to know the stock with the highest price on a particular day in the past, I can assist you with that.

If you're looking for the current highest-priced stock, I recommend checking a financial news website or an API that provides real-time stock data, such as Yahoo Finance, Google Finance, or a service like Alpha Vantage or Finnhub.io. These platforms can give you the most up-to-date information on stock prices.`
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE

Seconda Query

SQL

SELECT COUNT(*)
FROM tick;

Lingua Naturale

Plain Text

"How many rows are in the tick table?"

Risultati

SQL

Plain Text

+----------+
| COUNT(*) |
+----------+
| 22367162 |
+----------+

OpenAI

Plain Text

There are 22,367,162 rows in the tick table.

Ollama

Plain Text

The "tick" table has 3 rows.

Terza Query

SQL

-- Passo 1: Ottieni il prezzo più recente per ogni simbolo
WITH latest_prices AS (
    SELECT symbol, price
    FROM tick t1
    WHERE ts = (
        SELECT MAX(ts)
        FROM tick t2
        WHERE t2.symbol = t1.symbol
    )
)

-- Passo 2: Calcola il valore totale del portafoglio
SELECT SUM(p.shares_held * lp.price) AS total_portfolio_value
FROM portfolio p, latest_prices lp
WHERE p.symbol = lp.symbol;

Lingua Naturale

Plain Text

"Taking all the stock symbols from the portfolio table,\n"
"and using the latest value for each stock symbol from the tick table,\n"
"calculate the grand total value of all the shares listed in the portfolio table."

Risultati

SQL

Plain Text

+-----------------------+
| total_portfolio_value |
+-----------------------+
|              44540.60 |
+-----------------------+

OpenAI

Plain Text

$44,540.60

Ollama

Plain Text

I don't know. The provided SQL query does not contain any of the common mistakes listed. It uses the correct join conditions, functions, and data types. The query also properly quotes identifiers and uses the correct number of arguments for functions. Therefore, no changes are needed. However, without access to the actual database schema and table data, I cannot provide a specific answer to the question.

Quarta Query

SQL

SELECT
    (
        SELECT ts
        FROM stock_sentiment
        WHERE symbol = 'AAPL'
        ORDER BY positive DESC
        LIMIT 1
    ) AS sentiment_date,
    (
        SELECT positive
        FROM stock_sentiment
        WHERE symbol = 'AAPL'
        ORDER BY positive DESC
        LIMIT 1
    ) AS most_positive_sentiment,
    (
        SELECT price
        FROM tick
        WHERE symbol = 'AAPL'
        ORDER BY ts DESC
        LIMIT 1
    ) AS current_best_price;

Lingua Naturale

Plain Text

"Using the symbol AAPL, show me the date and the most\n"
"positive sentiment in the stock sentiment table and the\n"
"current best price for this symbol from the tick table."

Risultati

SQL

Plain Text

+---------------------+-------------------------+--------------------+
| sentiment_date      | most_positive_sentiment | current_best_price |
+---------------------+-------------------------+--------------------+
| 2020-05-28 00:00:00 |                0.331509 |             116.27 |
+---------------------+-------------------------+--------------------+

OpenAI

Plain Text

On 2020-05-28, the most positive sentiment for AAPL was 0.331509, and the current best price for AAPL is 116.27.

Ollama

Plain Text

The provided list contains decimal numbers, which appear to be the results of some kind of experiment or analysis. Without additional context, it's difficult to determine the exact nature of these results. However, we can observe that the majority of the numbers are between 116.85 and 117.27, with a few outliers at 115.99 and 117.30. The smallest number in the list is 115.99, and the largest is 117.30.`
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE

Riassunto

Analizzando i risultati, vediamo che SQL e OpenAI producono output coerenti su tutte e quattro le query. Tuttavia, Ollama presenta chiari problemi. Un thread di discussione su GitHub evidenzia che mentre un modello LLM dovrebbe supportare la chiamata degli strumenti, questa funzionalità non è nativamente disponibile in Ollama.

Se sei in grado di far funzionare questa funzionalità LangChain con Ollama in uno dei LLM supportati, ti prego di inviarmi un messaggio e aggiornerò l’articolo e riconoscerò il tuo aiuto.

Source:
https://dzone.com/articles/openai-vs-ollama-langchain-sqldatabasetoolkit