Ollama + SingleStore – LangChain = :-(

教學

Pandas

在之前的文章中，我們使用了Ollama與LangChain和SingleStore。LangChain為Ollama與SingleStore的整合提供了一個高效且簡潔的解決方案。然而，如果我們移除LangChain會怎麼樣？在這篇文章中，我們將展示一個不依賴LangChain而使用Ollama與SingleStore的例子。我們將看到，雖然我們可以達到與上一篇文章中描述的相同結果，但代碼的數量會增加，需要我們管理更多LangChain通常處理的管道。

本文使用的筆記本文件可在GitHub上找到。

介紹

從上一篇文章開始，我們將按照這些部分中描述的相同步驟來設置我們的測試環境：

介紹
- 使用虛擬機器或venv。
創建SingleStoreDB雲賬戶
- 使用Ollama Demo群組作為工作區組名，並使用ollama-demo作為工作區名。記下密碼和主機名。通過在Ollama Demo群組 > 防火牆下配置防火牆，暫時允許從任何地方訪問。
創建數據庫

CREATE DATABASE IF NOT EXISTS ollama_demo;

安裝 Jupyter
- pip install notebook
安裝 Ollama
- curl -fsSL https://ollama.com/install.sh | sh
環境變量
- export SINGLESTOREDB_URL="admin:<password>@<host>:3306/ollama_demo"將 <password> 和 <host> 更換為您環境中的值。
啟動 Jupyter
- jupyter notebook

填写笔记本

首先，一些包裹：

Shell

!pip install ollama numpy pandas sqlalchemy-singlestoredb --quiet --no-warn-script-location

接下來，我們將導入一些庫：

Python

import ollama
import os
import numpy as np
import pandas as pd
from sqlalchemy import create_engine, text

我們將使用 all-minilm（截至撰寫時為 45 MB）來創建嵌入：

Python

ollama.pull("all-minilm")

Plain Text

{'status': 'success'}

對於我們的 LLM，我們將使用 llama2（在寫作之本時為 3.8 GB）：

Python

ollama.pull("llama2")

示例输出：

Plain Text

{'status': 'success'}

接下來，我們將使用來自 Ollama 網站的示例文本：

Python

documents = [
    "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
    "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
    "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
    "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
    "Llamas are vegetarians and have very efficient digestive systems",
    "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old"
]
​
df_data = []
​
for doc in documents:
    response = ollama.embeddings(
        model = "all-minilm",
        prompt = doc
    )
    embedding = response["embedding"]
    embedding_array = np.array(embedding).astype(np.float32)
    df_data.append({"content": doc, "vector": embedding_array})
​
df = pd.DataFrame(df_data)
​
dimensions = len(df.at[0, "vector"])

我們將嵌入設定為 all-minilm，並遍歷每個文件以建立 Pandas DataFrame 的內容。此外，我們還將嵌入轉換為 32 位格式，因為這是 SingleStore 對於 VECTOR 數據類型的默認設定。最後，我們將確定 Pandas DataFrame 中第一個文件的嵌入維度。

接下來，我們將建立與 SingleStore 實例的連接：

Python

connection_url = "singlestoredb://" + os.environ.get("SINGLESTOREDB_URL")
​
db_connection = create_engine(connection_url)

現在，我們將使用之前确定的維度創建一個具有 vector 列的表：

Python

query = text("""
CREATE TABLE IF NOT EXISTS pandas_docs (
    id BIGINT AUTO_INCREMENT NOT NULL,
    content LONGTEXT,
    vector VECTOR(:dimensions) NOT NULL,
    PRIMARY KEY(id)
);
""")
​
with db_connection.connect() as conn:
    conn.execute(query, {"dimensions": dimensions})

我們現在將 Pandas DataFrame 寫入表中：

Python

df.to_sql(
    "pandas_docs",
    con = db_connection,
    if_exists = "append",
    index = False,
    chunksize = 1000
)

示例输出：

Plain Text

6

我們現在將創建一個索引，以匹配我們在上一篇文章中創建的索引：

Python

query = text("""
ALTER TABLE pandas_docs ADD VECTOR INDEX (vector)
    INDEX_OPTIONS '{
          "metric_type": "EUCLIDEAN_DISTANCE"
     }';
""")
​
with db_connection.connect() as conn:
    conn.execute(query)

我們現在將提出一个问题，如下所示：

Python

prompt = "What animals are llamas related to?"
​
response = ollama.embeddings(
    prompt = prompt,
    model = "all-minilm"
)
​
embedding = response["embedding"]
embedding_array = np.array(embedding).astype(np.float32)
​
query = text("""
SELECT content
FROM pandas_docs
ORDER BY vector <-> :embedding_array ASC
LIMIT 1;
""")
​
with db_connection.connect() as conn:
    results = conn.execute(query, {"embedding_array": embedding_array})
    row = results.fetchone()
​
data = row[0]
print(data)

我們將將提示轉換為嵌入，確保嵌入已轉換為 32 位格式，然後執行使用欧几里得距離的中缀表示法 <-> 的 SQL 查詢。

示例输出：

Plain Text

Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels

接下來，我們將使用 LLM，如下所示：

Python

output = ollama.generate(
    model = "llama2",
    prompt = f"Using this data: {data}. Respond to this prompt: {prompt}"
)
​
print(output["response"])

示例输出：

Plain Text

​x
Llamas are members of the camelid family, which means they are closely related to other animals such as:
​
1. Vicuñas: Vicuñas are small, wild camelids that are native to South America. They are known for their soft, woolly coats and are considered an endangered species due to habitat loss and poaching.
2. Camels: Camels are large, even-toed ungulates that are native to Africa and the Middle East. They are known for their distinctive humps on their backs, which store water and food for long periods of time.
​
Both llamas and vicuñas are classified as members of the family Camelidae, while camels are classified as belonging to the family Dromedaryae. Despite their differences in size and habitat, all three species share many similarities in terms of their physical characteristics and behavior.

總結

在这篇文章中，我們複製了前一文中的步驟並獲得了類似的結果。然而，我們必須撰寫一系列SQL語句並處理LangChain會為我們處理的多個步驟。此外，與LangChain解決方案相比，長期維護代碼庫可能需要更多時間和成本。

使用LangChain而不是為數據庫訪問撰寫自訂代碼提供了許多優點，如效率、可擴展性和可靠性。

LangChain提供了一個預建的模塊庫以供數據庫互動，減少開發時間和努力。開發者可以使用這些模塊快速實現各種數據庫操作，而不必從零開始。

LangChain抽象化了數據庫管理中涉及的許多複雜性，讓開發者可以專注於高层次任務，而不是低級實現細節。這提高了開發效率和產品上市時間。

LangChain拥有一个庞大、活跃和不断增长的开发人员社区，在GitHub上提供广泛的技术文档和示例。

總結來說，LangChain為開發者提供了一個強大、高效且可靠的platform，以建立數據庫驅動的應用程序，讓他們可以專注於使用高层次抽象，而不是重新發明輪子。比較這篇文章中的示例和前一文使用的示例，我們可以看到這些優點。

Source:
https://dzone.com/articles/ollama-plus-singlestore-minus-langchain