OpenAI vs Ollama usando o SQLDatabaseToolkit da LangChain

Tutoriais

MySQL SQL

Aviso Legal

Os dados de ações usados neste artigo são totalmente fictícios. São apenas para fins de demonstração. Por favor, não utilize esses dados para tomar decisões financeiras.

No artigo anterior, vimos os benefícios de usar o Ollama localmente para uma aplicação RAG. Neste artigo, vamos ampliar nossa avaliação do Ollama testando consultas em linguagem natural (NL) contra um sistema de banco de dados, utilizando o SQLDatabaseToolkit do LangChain. O SQL servirá como o sistema de referência para comparação enquanto exploramos a qualidade dos resultados fornecidos pela OpenAI e pelo Ollama.

Os arquivos de notebook usados neste artigo estão disponíveis no GitHub.

Introdução

O SQLDatabaseToolkit do LangChain é uma ferramenta poderosa projetada para integrar capacidades de processamento de NL com sistemas de banco de dados relacionais. Ele permite que os usuários consultem bancos de dados usando entradas em NL, aproveitando as capacidades de grandes modelos de linguagem (LLMs) para gerar consultas SQL dinamicamente. Isso o torna especialmente útil para aplicações onde usuários não técnicos ou sistemas automatizados precisam interagir com dados estruturados.

Vários LLMs são bem suportados pelo LangChain. O LangChain também oferece suporte ao Ollama. Neste artigo, vamos avaliar quão bem o LangChain se integra ao Ollama e a viabilidade de usar o SQLDatabaseToolkit em uma configuração local.

Crie uma Conta na Nuvem SingleStore

Um artigo anterior mostrou os passos para criar uma conta gratuita na SingleStore Cloud. Usaremos o Nível Compartilhado Gratuito.

Selecionando o Espaço de Trabalho Inicial > Conectar > Cliente CLI, teremos os detalhes que precisamos mais tarde, como nome de usuário, senha, host, porta e banco de dados.

Criar Tabelas de Banco de Dados

Para nosso ambiente de teste, usaremos o SingleStore rodando na nuvem como nosso sistema de banco de dados alvo, e nos conectaremos de forma segura a esse ambiente usando notebooks Jupyter rodando em um sistema local.

Do painel de navegação à esquerda no portal da nuvem SingleStore, selecionaremos DESENVOLVER > Data Studio > Abrir Editor SQL. Criaremos três tabelas, conforme segue:

SQL

CREATE TABLE IF NOT EXISTS tick (
    symbol VARCHAR(10),
    ts     DATETIME SERIES TIMESTAMP,
    open   NUMERIC(18, 2),
    high   NUMERIC(18, 2),
    low    NUMERIC(18, 2),
    price  NUMERIC(18, 2),
    volume INT,
    KEY(ts)
);

CREATE TABLE IF NOT EXISTS portfolio (
    symbol         VARCHAR(10),
    shares_held    INT,
    purchase_date  DATE,
    purchase_price NUMERIC(18, 2)
);

CREATE TABLE IF NOT EXISTS stock_sentiment (
    headline  VARCHAR(250),
    positive  FLOAT,
    negative  FLOAT,
    neutral   FLOAT,
    url       TEXT,
    publisher VARCHAR(30),
    ts        DATETIME,
    symbol    VARCHAR(10)
);

Carregaremos a tabela portfolio com os seguintes dados fictícios:

SQL

INSERT INTO portfolio (symbol, shares_held, purchase_date, purchase_price) VALUES
('AAPL', 100, '2022-01-15',  150.25),
('MSFT',  50, '2021-12-10',  305.50),
('GOOGL', 25, '2021-11-05', 2800.75),
('AMZN',  10, '2020-07-20', 3200.00),
('TSLA',  40, '2022-02-18',  900.60),
('NFLX',  15, '2021-09-01',  550.00);

Para a tabela stock_sentiment, baixaremos o arquivo stock_sentiment.sql.zip e o descompactaremos. Carregaremos os dados na tabela usando um cliente MySQL, conforme segue:

Shell

mysql -u "<username>" -p"<password>" -h "<host>" -P <port> -D <database> < stock_sentiment.sql

Usaremos os valores de <nome de usuário>, <senha>, <host>, <porta> e <banco de dados> que salvamos anteriormente.

Finalmente, para a tabela tick, criaremos um pipeline:

SQL

CREATE PIPELINE tick
AS LOAD DATA KAFKA 'public-kafka.memcompute.com:9092/stockticker'
BATCH_INTERVAL 45000
INTO TABLE tick
FIELDS TERMINATED BY ','
(symbol,ts,open,high,low,price,volume);

Vamos ajustar para obter os dados mais recentes:

SQL

ALTER PIPELINE tick SET OFFSETS EARLIEST;

E testar o pipeline:

SQL

TEST PIPELINE tick LIMIT 1;

Exemplo de saída:

Plain Text

+--------+---------------------+--------+--------+--------+--------+--------+
| symbol | ts                  | open   | high   | low    | price  | volume |
+--------+---------------------+--------+--------+--------+--------+--------+
| MMM    | 2025-01-23 21:40:32 | 178.34 | 178.43 | 178.17 | 178.24 |  38299 |
+--------+---------------------+--------+--------+--------+--------+--------+

E então iniciaremos o pipeline:

SQL

START PIPELINE tick;

Após alguns minutos, verificaremos a quantidade de dados carregados até agora:

SQL

SELECT COUNT(*)
FROM tick;

Ambiente de Teste Local

A partir de um artigo anterior, seguiremos os mesmos passos para configurar nosso ambiente de teste local conforme descrito nestas seções:

Introdução. Use uma Máquina Virtual ou venv.
Criar uma conta na SingleStore Cloud. Esta etapa foi concluída acima.
Criar um banco de dados. A Camada Compartilhada Gratuita já fornece um banco de dados e só precisamos anotar o nome do banco de dados.
Instalar o Jupyter.
Texto Simples
```
pip install notebook
```

Instalar Ollama.

Texto Simples

curl -fsSL https://ollama.com/install.sh | sh

Variáveis de ambiente.
Texto simples
:
@:/“” data-lang=”text/plain”>

export SINGLESTOREDB_URL="<usuário>:<senha>@<host>:<porta>/<banco de dados>"
Substitua <usuário>, <senha>, <host>, <porta> e <banco de dados> pelos valores do seu ambiente.
Texto simples
“” data-lang=”text/plain”>
```
export OPENAI_API_KEY="<Chave da API OpenAI>"
```
Substitua <Chave da API OpenAI> pela sua chave.
Lançar Jupyter.
Texto Simples
```
jupyter notebook
```

Usaremos os notebooks Jupyter do GitHub. Esses notebooks estão configurados para usar OpenAI e Ollama. Para Ollama, usaremos um dos LLMs listados com suporte a ferramentas. Vamos testar as seguintes quatro consultas.

Primeira Consulta

SQL

SELECT symbol, (MAX(high) - MIN(low)) AS volatility
FROM tick
GROUP BY symbol
ORDER BY volatility ASC
LIMIT 1;

Idioma Natural

Plain Text

"For each stock symbol, calculate the volatility as the difference\n"
"between the highest recorded price and the lowest recorded price over time.\n"
"Which stock symbol has the least volatility?"

Resultados

SQL

Plain Text

+--------+------------+
| symbol | volatility |
+--------+------------+
| FTR    |       0.55 |
+--------+------------+

OpenAI

Plain Text

The stock symbol with the least volatility is FTR, with a volatility of 0.55.

Ollama

Plain Text

To find the stock with the highest price, we need to compare the prices of all the given stocks and find the maximum value. However, I don't have real-time access to financial data or the ability to execute code that interacts with external APIs. Therefore, I can't directly calculate the highest price from this list.

However, if you provide me with a specific date or time range, I can help you find the stock with the highest price during that period. For example, if you want to know the stock with the highest price on a particular day in the past, I can assist you with that.

If you're looking for the current highest-priced stock, I recommend checking a financial news website or an API that provides real-time stock data, such as Yahoo Finance, Google Finance, or a service like Alpha Vantage or Finnhub.io. These platforms can give you the most up-to-date information on stock prices.`
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE

Segunda Consulta

SQL

SELECT COUNT(*)
FROM tick;

Idioma Natural

Plain Text

"How many rows are in the tick table?"

Resultados

SQL

Plain Text

+----------+
| COUNT(*) |
+----------+
| 22367162 |
+----------+

OpenAI

Plain Text

There are 22,367,162 rows in the tick table.

Ollama

Plain Text

The "tick" table has 3 rows.

Terceira Consulta

SQL

-- Passo 1: Obter o preço mais recente para cada símbolo
WITH latest_prices AS (
    SELECT symbol, price
    FROM tick t1
    WHERE ts = (
        SELECT MAX(ts)
        FROM tick t2
        WHERE t2.symbol = t1.symbol
    )
)

-- Passo 2: Calcular o valor total do portfólio
SELECT SUM(p.shares_held * lp.price) AS total_portfolio_value
FROM portfolio p, latest_prices lp
WHERE p.symbol = lp.symbol;

Idioma Natural

Plain Text

"Taking all the stock symbols from the portfolio table,\n"
"and using the latest value for each stock symbol from the tick table,\n"
"calculate the grand total value of all the shares listed in the portfolio table."

Resultados

SQL

Plain Text

+-----------------------+
| total_portfolio_value |
+-----------------------+
|              44540.60 |
+-----------------------+

OpenAI

Plain Text

$44,540.60

Ollama

Plain Text

I don't know. The provided SQL query does not contain any of the common mistakes listed. It uses the correct join conditions, functions, and data types. The query also properly quotes identifiers and uses the correct number of arguments for functions. Therefore, no changes are needed. However, without access to the actual database schema and table data, I cannot provide a specific answer to the question.

Quarta Consulta

SQL

SELECT
    (
        SELECT ts
        FROM stock_sentiment
        WHERE symbol = 'AAPL'
        ORDER BY positive DESC
        LIMIT 1
    ) AS sentiment_date,
    (
        SELECT positive
        FROM stock_sentiment
        WHERE symbol = 'AAPL'
        ORDER BY positive DESC
        LIMIT 1
    ) AS most_positive_sentiment,
    (
        SELECT price
        FROM tick
        WHERE symbol = 'AAPL'
        ORDER BY ts DESC
        LIMIT 1
    ) AS current_best_price;

Idioma Natural

Plain Text

"Using the symbol AAPL, show me the date and the most\n"
"positive sentiment in the stock sentiment table and the\n"
"current best price for this symbol from the tick table."

Resultados

SQL

Plain Text

+---------------------+-------------------------+--------------------+
| sentiment_date      | most_positive_sentiment | current_best_price |
+---------------------+-------------------------+--------------------+
| 2020-05-28 00:00:00 |                0.331509 |             116.27 |
+---------------------+-------------------------+--------------------+

OpenAI

Plain Text

On 2020-05-28, the most positive sentiment for AAPL was 0.331509, and the current best price for AAPL is 116.27.

Ollama

Plain Text

The provided list contains decimal numbers, which appear to be the results of some kind of experiment or analysis. Without additional context, it's difficult to determine the exact nature of these results. However, we can observe that the majority of the numbers are between 116.85 and 117.27, with a few outliers at 115.99 and 117.30. The smallest number in the list is 115.99, and the largest is 117.30.`
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE

Resumo

Ao analisar os resultados, vemos que SQL e OpenAI produzem saídas consistentes em todas as quatro consultas. No entanto, Ollama apresenta problemas claros. Um tópico de discussão no GitHub destaca que, embora um modelo LLM deva suportar chamadas de ferramentas, essa funcionalidade não está disponível nativamente no Ollama.

Se você conseguir fazer essa funcionalidade do LangChain funcionar com Ollama em um dos LLMs suportados, por favor, me envie uma mensagem, e eu atualizarei o artigo e reconhecerei sua ajuda.

Source:
https://dzone.com/articles/openai-vs-ollama-langchain-sqldatabasetoolkit