Ollama + SingleStore – 链路 = :-(

教程

Pandas

在上一篇文章中，我们使用了Ollama结合LangChain和SingleStore。LangChain为Ollama与SingleStore的集成提供了一个高效且紧凑的解决方案。但是，如果我们去掉LangChain会怎样呢？在本文中，我们将展示一个不依赖LangChain使用Ollama与SingleStore的例子。我们会看到，尽管可以实现与上一篇文章中描述的相同结果，但代码的数量增加了，我们需要管理更多LangChain通常会处理的管道工作。

本文中使用的笔记本文件可在GitHub上找到。

简介

从上一篇文章开始，我们将按照以下部分中描述的步骤设置测试环境：

简介
- 使用虚拟机或venv。
创建SingleStoreDB云账户
- 使用Ollama演示组作为工作区组名称，并使用ollama-demo作为工作区名称。记下密码和主机名称。通过在Ollama演示组 > 防火墙下配置防火墙，临时允许从任何地方访问。
创建数据库

CREATE DATABASE IF NOT EXISTS ollama_demo;

安装 Jupyter
- pip install notebook
安装 Ollama
- curl -fsSL https://ollama.com/install.sh | sh
环境变量
- export SINGLESTOREDB_URL="admin:<password>@<host>:3306/ollama_demo"将 <password> 和 <host> 替换为您环境中的值。
启动 Jupyter
- jupyter notebook

填写笔记本

首先，一些包：

Shell

!pip install ollama numpy pandas sqlalchemy-singlestoredb --quiet --no-warn-script-location

接下来，我们将导入一些库：

Python

import ollama
import os
import numpy as np
import pandas as pd
from sqlalchemy import create_engine, text

我们将使用 all-minilm 创建嵌入（在撰写本文时为45 MB）：<

Python

ollama.pull("all-minilm")

Plain Text

{'status': 'success'}

我们将使用llama2（在撰写本文时为3.8 GB）作为我们的语言模型：

Python

ollama.pull("llama2")

示例输出：

Plain Text

{'status': 'success'}

接下来，我们将使用来自Ollama网站的示例文本：

Python

documents = [
    "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
    "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
    "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
    "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
    "Llamas are vegetarians and have very efficient digestive systems",
    "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old"
]
​
df_data = []
​
for doc in documents:
    response = ollama.embeddings(
        model = "all-minilm",
        prompt = doc
    )
    embedding = response["embedding"]
    embedding_array = np.array(embedding).astype(np.float32)
    df_data.append({"content": doc, "vector": embedding_array})
​
df = pd.DataFrame(df_data)
​
dimensions = len(df.at[0, "vector"])

我们将嵌入设置为all-minilm，并遍历每个文档以构建Pandas DataFrame的内容。此外，我们将嵌入转换为32位格式，因为这是SingleStore对于VECTOR数据类型的默认设置。最后，我们将确定Pandas DataFrame中第一个文档的嵌入维度。

接下来，我们将创建与我们的SingleStore实例的连接：

Python

connection_url = "singlestoredb://" + os.environ.get("SINGLESTOREDB_URL")
​
db_connection = create_engine(connection_url)

现在，我们将使用我们之前确定的维度创建一个带有vector列的表：

Python

query = text("""
CREATE TABLE IF NOT EXISTS pandas_docs (
    id BIGINT AUTO_INCREMENT NOT NULL,
    content LONGTEXT,
    vector VECTOR(:dimensions) NOT NULL,
    PRIMARY KEY(id)
);
""")
​
with db_connection.connect() as conn:
    conn.execute(query, {"dimensions": dimensions})

现在，我们将Pandas DataFrame写入表中：

Python

df.to_sql(
    "pandas_docs",
    con = db_connection,
    if_exists = "append",
    index = False,
    chunksize = 1000
)

示例输出：

Plain Text

6

接下来，我们将创建一个索引，以匹配我们在上一篇文章中创建的索引：

Python

query = text("""
ALTER TABLE pandas_docs ADD VECTOR INDEX (vector)
    INDEX_OPTIONS '{
          "metric_type": "EUCLIDEAN_DISTANCE"
     }';
""")
​
with db_connection.connect() as conn:
    conn.execute(query)

现在，我们将提出一个问题，如下所示：

Python

prompt = "What animals are llamas related to?"
​
response = ollama.embeddings(
    prompt = prompt,
    model = "all-minilm"
)
​
embedding = response["embedding"]
embedding_array = np.array(embedding).astype(np.float32)
​
query = text("""
SELECT content
FROM pandas_docs
ORDER BY vector <-> :embedding_array ASC
LIMIT 1;
""")
​
with db_connection.connect() as conn:
    results = conn.execute(query, {"embedding_array": embedding_array})
    row = results.fetchone()
​
data = row[0]
print(data)

我们将转换提示为嵌入，确保嵌入转换为32位格式，然后执行使用欧几里得距离的中缀表示法<->的SQL查询。

示例输出：

Plain Text

Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels

接下来，我们将使用LLM，如下所示：

Python

output = ollama.generate(
    model = "llama2",
    prompt = f"Using this data: {data}. Respond to this prompt: {prompt}"
)
​
print(output["response"])

示例输出：

Plain Text

​x
Llamas are members of the camelid family, which means they are closely related to other animals such as:
​
1. Vicuñas: Vicuñas are small, wild camelids that are native to South America. They are known for their soft, woolly coats and are considered an endangered species due to habitat loss and poaching.
2. Camels: Camels are large, even-toed ungulates that are native to Africa and the Middle East. They are known for their distinctive humps on their backs, which store water and food for long periods of time.
​
Both llamas and vicuñas are classified as members of the family Camelidae, while camels are classified as belonging to the family Dromedaryae. Despite their differences in size and habitat, all three species share many similarities in terms of their physical characteristics and behavior.

总结

在这篇文章中，我们复制了上一篇文章中遵循的步骤，并取得了类似的结果。然而，我们必须编写一系列SQL语句并管理几个步骤，而这些步骤LangChain本可以为我们处理。此外，与LangChain解决方案相比，长期维护代码库可能涉及更多的时间和成本。

使用LangChain而不是为数据库访问编写自定义代码具有几个优点，如效率、可扩展性和可靠性。

LangChain为数据库交互提供了预构建模块的库，减少了开发时间和精力。开发者可以利用这些模块快速实现各种数据库操作，而无需从零开始。

LangChain抽象了许多涉及数据库管理的复杂性，使开发者可以专注于高级任务，而不是低级实现细节。这提高了数据库驱动应用程序的生产力和上市时间。

LangChain拥有一个庞大、活跃且不断增长的开发者社区，可在GitHub上找到，并提供了广泛的文档和示例。

总之，LangChain为开发者提供了一个强大、高效和可靠的平台，用于构建数据库驱动的应用程序，使他们能够使用更高级别的抽象专注于业务问题，而不是用自定义代码重新发明轮子。将本文中的示例与上一篇文章中的示例进行比较，我们可以看到其好处。

Source:
https://dzone.com/articles/ollama-plus-singlestore-minus-langchain