OpenAI vs Ollama Usando o SQLDatabaseToolkit do LangChain

Tutoriais

MySQL SQL

Aviso Legal

Os dados de ações utilizados neste artigo são inteiramente fictícios. Eles são puramente para fins de demonstração. Por favor, não use esses dados para tomar decisões financeiras.

No artigo anterior, vimos os benefícios de usar o Ollama localmente para uma aplicação RAG. Neste artigo, vamos ampliar nossa avaliação do Ollama testando consultas em linguagem natural (LN) contra um sistema de banco de dados, usando o SQLDatabaseToolkit do LangChain. O SQL servirá como o sistema base para comparação enquanto exploramos a qualidade dos resultados fornecidos pela OpenAI e pelo Ollama.

Os arquivos de notebook usados neste artigo estão disponíveis no GitHub.

Introdução

O SQLDatabaseToolkit do LangChain é uma ferramenta poderosa projetada para integrar capacidades de processamento de LN com sistemas de banco de dados relacionais. Ele permite que os usuários consultem bancos de dados usando entradas em LN, utilizando as capacidades de grandes modelos de linguagem (LLMs) para gerar consultas SQL dinamicamente. Isso o torna especialmente útil para aplicações onde usuários não técnicos ou sistemas automatizados precisam interagir com dados estruturados.

Vários LLMs são bem suportados pelo LangChain. O LangChain também oferece suporte ao Ollama. Neste artigo, vamos avaliar quão bem o LangChain se integra ao Ollama e a viabilidade de usar o SQLDatabaseToolkit em uma configuração local.

Crie uma Conta na Nuvem SingleStore

Um artigo anterior mostrou os passos para criar uma conta gratuita no SingleStore Cloud. Vamos usar o Nível Gratuito Compartilhado.

Ao selecionar Espaço de Trabalho Inicial > Conectar > Cliente CLI, teremos os detalhes que precisamos posteriormente, como username, password, host, port e database.

Criar Tabelas de Banco de Dados

Para nosso ambiente de teste, usaremos o SingleStore em execução na Nuvem como nosso sistema de banco de dados de destino, e nos conectaremos de forma segura a este ambiente usando notebooks Jupyter em um sistema local.

No painel de navegação à esquerda no portal da nuvem do SingleStore, selecionaremos DESENVOLVER > Data Studio > Abrir Editor SQL. Vamos criar três tabelas, conforme a seguir:

SQL

CREATE TABLE IF NOT EXISTS tick (
    symbol VARCHAR(10),
    ts     DATETIME SERIES TIMESTAMP,
    open   NUMERIC(18, 2),
    high   NUMERIC(18, 2),
    low    NUMERIC(18, 2),
    price  NUMERIC(18, 2),
    volume INT,
    KEY(ts)
);

CREATE TABLE IF NOT EXISTS portfolio (
    symbol         VARCHAR(10),
    shares_held    INT,
    purchase_date  DATE,
    purchase_price NUMERIC(18, 2)
);

CREATE TABLE IF NOT EXISTS stock_sentiment (
    headline  VARCHAR(250),
    positive  FLOAT,
    negative  FLOAT,
    neutral   FLOAT,
    url       TEXT,
    publisher VARCHAR(30),
    ts        DATETIME,
    symbol    VARCHAR(10)
);

Vamos carregar a tabela portfolio com os seguintes dados fictícios:

SQL

INSERT INTO portfolio (symbol, shares_held, purchase_date, purchase_price) VALUES
('AAPL', 100, '2022-01-15',  150.25),
('MSFT',  50, '2021-12-10',  305.50),
('GOOGL', 25, '2021-11-05', 2800.75),
('AMZN',  10, '2020-07-20', 3200.00),
('TSLA',  40, '2022-02-18',  900.60),
('NFLX',  15, '2021-09-01',  550.00);

Para a tabela stock_sentiment, vamos baixar o arquivo stock_sentiment.sql.zip e descompactá-lo. Vamos carregar os dados na tabela usando um cliente MySQL, conforme a seguir:

Shell

mysql -u "<username>" -p"<password>" -h "<host>" -P <port> -D <database> < stock_sentiment.sql

Vamos usar os valores de <username>, <password>, <host>, <port> e <database> que salvamos anteriormente.

Por fim, para a tabela tick, vamos criar um pipeline:

SQL

CREATE PIPELINE tick
AS LOAD DATA KAFKA 'public-kafka.memcompute.com:9092/stockticker'
BATCH_INTERVAL 45000
INTO TABLE tick
FIELDS TERMINATED BY ','
(symbol,ts,open,high,low,price,volume);

Vamos ajustar para obter os dados mais recentes:

SQL

ALTER PIPELINE tick SET OFFSETS EARLIEST;

E testar o pipeline:

SQL

TEST PIPELINE tick LIMIT 1;

Saída de exemplo:

Plain Text

+--------+---------------------+--------+--------+--------+--------+--------+
| symbol | ts                  | open   | high   | low    | price  | volume |
+--------+---------------------+--------+--------+--------+--------+--------+
| MMM    | 2025-01-23 21:40:32 | 178.34 | 178.43 | 178.17 | 178.24 |  38299 |
+--------+---------------------+--------+--------+--------+--------+--------+

E então iniciaremos o pipeline:

SQL

START PIPELINE tick;

Após alguns minutos, verificaremos a quantidade de dados carregados até agora:

SQL

SELECT COUNT(*)
FROM tick;

Ambiente de Teste Local

De um artigo anterior, seguiremos os mesmos passos para configurar nosso ambiente de teste local conforme descrito nestas seções:

Introdução. Use uma Máquina Virtual ou venv.
Criar uma conta no SingleStore Cloud. Esta etapa foi concluída acima.
Criar um banco de dados. O nível Free Shared já fornece um banco de dados e só precisamos anotar o nome do banco de dados.
Instalar o Jupyter.
Texto Simples
```
pip install notebook
```

Instale Ollama.

Texto Simples

curl -fsSL https://ollama.com/install.sh | sh

Variáveis de ambiente.
Texto simples
:
@:/“” data-lang=”text/plain”>

export SINGLESTOREDB_URL="<usuário>:<senha>@<host>:<porta>/<banco de dados>"
Substitua <usuário>, <senha>, <host>, <porta> e <banco de dados> pelos valores do seu ambiente.
Texto simples
“” data-lang=”text/plain”>
```
export OPENAI_API_KEY="<Chave da API OpenAI>"
```
Substitua <Chave da API OpenAI> pela sua chave.
Inicie o Jupyter.
Texto simples
```
jupyter notebook
```

Vamos usar os notebooks Jupyter do GitHub. Esses notebooks estão configurados para usar OpenAI e Ollama. Para o Ollama, vamos utilizar um dos LLMs listados com Suporte de Ferramentas. Vamos testar as seguintes quatro consultas.

Primeira Consulta

SQL

SELECT symbol, (MAX(high) - MIN(low)) AS volatility
FROM tick
GROUP BY symbol
ORDER BY volatility ASC
LIMIT 1;

Linguagem Natural

Plain Text

"For each stock symbol, calculate the volatility as the difference\n"
"between the highest recorded price and the lowest recorded price over time.\n"
"Which stock symbol has the least volatility?"

Resultados

SQL

Plain Text

+--------+------------+
| symbol | volatility |
+--------+------------+
| FTR    |       0.55 |
+--------+------------+

OpenAI

Plain Text

The stock symbol with the least volatility is FTR, with a volatility of 0.55.

Ollama

Plain Text

To find the stock with the highest price, we need to compare the prices of all the given stocks and find the maximum value. However, I don't have real-time access to financial data or the ability to execute code that interacts with external APIs. Therefore, I can't directly calculate the highest price from this list.

However, if you provide me with a specific date or time range, I can help you find the stock with the highest price during that period. For example, if you want to know the stock with the highest price on a particular day in the past, I can assist you with that.

If you're looking for the current highest-priced stock, I recommend checking a financial news website or an API that provides real-time stock data, such as Yahoo Finance, Google Finance, or a service like Alpha Vantage or Finnhub.io. These platforms can give you the most up-to-date information on stock prices.`
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE

Segunda Consulta

SQL

SELECT COUNT(*)
FROM tick;

Linguagem Natural

Plain Text

"How many rows are in the tick table?"

Resultados

SQL

Plain Text

+----------+
| COUNT(*) |
+----------+
| 22367162 |
+----------+

OpenAI

Plain Text

There are 22,367,162 rows in the tick table.

Ollama

Plain Text

The "tick" table has 3 rows.

Terceira Consulta

SQL

-- Passo 1: Obter o último preço de cada símbolo
WITH latest_prices AS (
    SELECT symbol, price
    FROM tick t1
    WHERE ts = (
        SELECT MAX(ts)
        FROM tick t2
        WHERE t2.symbol = t1.symbol
    )
)

-- Passo 2: Calcular o valor total da carteira
SELECT SUM(p.shares_held * lp.price) AS total_portfolio_value
FROM portfolio p, latest_prices lp
WHERE p.symbol = lp.symbol;

Linguagem Natural

Plain Text

"Taking all the stock symbols from the portfolio table,\n"
"and using the latest value for each stock symbol from the tick table,\n"
"calculate the grand total value of all the shares listed in the portfolio table."

Resultados

SQL

Plain Text

+-----------------------+
| total_portfolio_value |
+-----------------------+
|              44540.60 |
+-----------------------+

OpenAI

Plain Text

$44,540.60

Ollama

Plain Text

I don't know. The provided SQL query does not contain any of the common mistakes listed. It uses the correct join conditions, functions, and data types. The query also properly quotes identifiers and uses the correct number of arguments for functions. Therefore, no changes are needed. However, without access to the actual database schema and table data, I cannot provide a specific answer to the question.

Quarta Consulta

SQL

SELECT
    (
        SELECT ts
        FROM stock_sentiment
        WHERE symbol = 'AAPL'
        ORDER BY positive DESC
        LIMIT 1
    ) AS sentiment_date,
    (
        SELECT positive
        FROM stock_sentiment
        WHERE symbol = 'AAPL'
        ORDER BY positive DESC
        LIMIT 1
    ) AS most_positive_sentiment,
    (
        SELECT price
        FROM tick
        WHERE symbol = 'AAPL'
        ORDER BY ts DESC
        LIMIT 1
    ) AS current_best_price;

Linguagem Natural

Plain Text

"Using the symbol AAPL, show me the date and the most\n"
"positive sentiment in the stock sentiment table and the\n"
"current best price for this symbol from the tick table."

Resultados

SQL

Plain Text

+---------------------+-------------------------+--------------------+
| sentiment_date      | most_positive_sentiment | current_best_price |
+---------------------+-------------------------+--------------------+
| 2020-05-28 00:00:00 |                0.331509 |             116.27 |
+---------------------+-------------------------+--------------------+

OpenAI

Plain Text

On 2020-05-28, the most positive sentiment for AAPL was 0.331509, and the current best price for AAPL is 116.27.

Ollama

Plain Text

The provided list contains decimal numbers, which appear to be the results of some kind of experiment or analysis. Without additional context, it's difficult to determine the exact nature of these results. However, we can observe that the majority of the numbers are between 116.85 and 117.27, with a few outliers at 115.99 and 117.30. The smallest number in the list is 115.99, and the largest is 117.30.`
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE

Resumo

Ao analisar os resultados, vemos que SQL e OpenAI produzem saídas consistentes em todas as quatro consultas. No entanto, Ollama apresenta problemas claros. Um tópico de discussão no GitHub destaca que, embora um modelo LLM deva oferecer suporte à chamada de ferramentas, essa funcionalidade não está disponível nativamente no Ollama.

Se você conseguir fazer com que essa funcionalidade LangChain funcione com o Ollama em um dos LLMs suportados, por favor me envie uma mensagem, e eu atualizarei o artigo e reconhecerei sua ajuda.

Source:
https://dzone.com/articles/openai-vs-ollama-langchain-sqldatabasetoolkit