Ollama + SingleStore – LangChain = :-(

Pandas

前回の記事では、OllamaをLangChainとSingleStoreと一緒に使用しました。LangChainは、OllamaをSingleStoreと統合するための効率的で簡潔なソリューションを提供しました。しかし、LangChainを取り除いた場合はどうでしょうか？この記事では、LangChainに依存せずにOllamaをSingleStoreと使用する例を示します。前回の記事で説明した同じ結果を達成することができますが、コードの量が増え、LangChainが通常に処理するパイプラインをより多く管理する必要があることがわかります。

この記事で使用したノートブックファイルは、GitHubで利用できます。

はじめに

前回の記事の内容を踏まえて、以下のセクションに記載された手順に従ってテスト環境を設定します：

はじめに
- 仮想マシンまたはvenvを使用します。
SingleStoreDB Cloudアカウントを作成
- Ollama Demo Groupをワークスペースグループ名として、ollama-demoをワークスペース名として使用します。パスワードとホスト名をメモしておきます。一時的にOllama Demo Group > Firewallでアクセスをどこからでも許可するようにファイアウォールを設定します。
Database を作成します。

CREATE DATABASE IF NOT EXISTS ollama_demo;

Jupyterをインストールします。
- pip install notebook
Ollamaをインストールします。
- curl -fsSL https://ollama.com/install.sh | sh
環境変数
- export SINGLESTOREDB_URL="admin:<password>@<host>:3306/ollama_demo"環境の値に応じて<password>と<host>を置き換えてください。
Jupyterを起動します。
- jupyter notebook

ノートブックを埋めてください。

まず、いくつかのパッケージが必要です。

Shell

!pip install ollama numpy pandas sqlalchemy-singlestoredb --quiet --no-warn-script-location

次に、いくつかのライブラリをインポートします。

Python

import ollama
import os
import numpy as np
import pandas as pd
from sqlalchemy import create_engine, text

all-minilmを使用して嵌入を作成します（執筆時点で45MBです）。

Python

ollama.pull("all-minilm")

Plain Text

{'status': 'success'}

私たちのLLMには、llama2（執筆時点で3.8GB）を使用します：

Python

ollama.pull("llama2")

出力例：

Plain Text

{'status': 'success'}

次に、Ollamaウェブサイトからの例文を使用します：

Python

documents = [
    "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
    "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
    "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
    "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
    "Llamas are vegetarians and have very efficient digestive systems",
    "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old"
]
​
df_data = []
​
for doc in documents:
    response = ollama.embeddings(
        model = "all-minilm",
        prompt = doc
    )
    embedding = response["embedding"]
    embedding_array = np.array(embedding).astype(np.float32)
    df_data.append({"content": doc, "vector": embedding_array})
​
df = pd.DataFrame(df_data)
​
dimensions = len(df.at[0, "vector"])

埋め込みをall-minilmに設定し、各ドキュメントを反復してPandas DataFrameのコンテンツを構築します。加えて、埋め込みを32ビット形式に変換し、これはSingleStoreのVECTORデータ型のデフォルトです。最後に、Pandas DataFrameの最初のドキュメントの埋め込み次元数を決定します。

次に、SingleStoreインスタンスに接続を作成します：

Python

connection_url = "singlestoredb://" + os.environ.get("SINGLESTOREDB_URL")
​
db_connection = create_engine(connection_url)

今から、前に決定した次元を使用してvectorカラムを持つテーブルを作成します：

Python

query = text("""
CREATE TABLE IF NOT EXISTS pandas_docs (
    id BIGINT AUTO_INCREMENT NOT NULL,
    content LONGTEXT,
    vector VECTOR(:dimensions) NOT NULL,
    PRIMARY KEY(id)
);
""")
​
with db_connection.connect() as conn:
    conn.execute(query, {"dimensions": dimensions})

今度は、Pandas DataFrameをテーブルに書き込みます：

Python

df.to_sql(
    "pandas_docs",
    con = db_connection,
    if_exists = "append",
    index = False,
    chunksize = 1000
)

出力例：

Plain Text

6

次に、前回の記事で作成したものと一致するインデックスを作成します：

Python

query = text("""
ALTER TABLE pandas_docs ADD VECTOR INDEX (vector)
    INDEX_OPTIONS '{
          "metric_type": "EUCLIDEAN_DISTANCE"
     }';
""")
​
with db_connection.connect() as conn:
    conn.execute(query)

次に、以下のように質問をします：

Python

prompt = "What animals are llamas related to?"
​
response = ollama.embeddings(
    prompt = prompt,
    model = "all-minilm"
)
​
embedding = response["embedding"]
embedding_array = np.array(embedding).astype(np.float32)
​
query = text("""
SELECT content
FROM pandas_docs
ORDER BY vector <-> :embedding_array ASC
LIMIT 1;
""")
​
with db_connection.connect() as conn:
    results = conn.execute(query, {"embedding_array": embedding_array})
    row = results.fetchone()
​
data = row[0]
print(data)

プロンプトを埋め込みに変換し、埋め込みが32ビット形式に変換されていることを確認し、SQLクエリを実行します。このクエリでは、ユークリッド距離に使用するために中置記法<->を使用します。

出力例：

Plain Text

Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels

次に、以下のようにLLMを使用します：

Python

output = ollama.generate(
    model = "llama2",
    prompt = f"Using this data: {data}. Respond to this prompt: {prompt}"
)
​
print(output["response"])

出力例：

Plain Text

​x
Llamas are members of the camelid family, which means they are closely related to other animals such as:
​
1. Vicuñas: Vicuñas are small, wild camelids that are native to South America. They are known for their soft, woolly coats and are considered an endangered species due to habitat loss and poaching.
2. Camels: Camels are large, even-toed ungulates that are native to Africa and the Middle East. They are known for their distinctive humps on their backs, which store water and food for long periods of time.
​
Both llamas and vicuñas are classified as members of the family Camelidae, while camels are classified as belonging to the family Dromedaryae. Despite their differences in size and habitat, all three species share many similarities in terms of their physical characteristics and behavior.

まとめ

この記事では、前回の記事で取り扱ったステップを再現し、類似な結果を達成しました。しかし、私たちはLangChainが取り扱うことになる複数のステップを管理し、一連のSQL文を書く必要がありました。また、長期的なコードベースの維持に関して、LangChainソリューションと比較してもっと時間とコストがかかるかもしれません。

データベースアクセスのためにカスタムコードを書く代わりにLangChainを使用することは、効率性、スケーラビリティ、信頼性など、いくつかの利点を提供します。

LangChainはデータベースインタラクションのための事前構築されたモジュールのライブラリを提供しており、開発時間と労力を削減します。開発者はこれらのモジュールを使用して、ゼロから始めることなく様々なデータベース操作を迅速に実装できます。

LangChainはデータベース管理に関与する多くの複雑さを抽象化しており、開発者は低レベルの実装詳細ではなく、高レベルのタスクに集中できます。これにより、データベース駆動型アプリケーションの生産性と市場投入までの時間が向上します。

LangChainには大規模で活発な、成長を続ける開発者コミュニティがあり、GitHubで利用可能であり、広範なドキュメントと例が提供されています。

要約すると、LangChainは開発者に対して強力で効率的で信頼性の高いプラットフォームを提供し、カスタムコードで車輪を再発明するのではなく、より高い抽象化を使用してビジネス問題に集中できるようにします。この記事の例と前回の記事で使用した例を比較すると、その利点が明確に見られます。

Source:
https://dzone.com/articles/ollama-plus-singlestore-minus-langchain