أولاما + سينغلستور = لانغ تشين = :-(

Pandas

في مقال سابق، استخدمنا Ollama مع LangChain و SingleStore. قدمت LangChain حلًا فعالًا ومكتملًا لدمج Ollama مع SingleStore. ومع ذلك، ماذا إذا قررنا إزالة LangChain؟ سنقوم في هذا المقال بإظهار مثال على استخدام Ollama مع SingleStore بدون الاعتماد على LangChain. سنرى أنه بينما يمكننا تحقيق نفس النتائج الموضحة في المقال السابق، يزداد عدد الأجزاء البرمجية مما يتطلب منا إدارة جزء أكبر من السباكة التي تتعامل معها LangChain بشكل عادي.

ملف المفكرة المستخدم في هذا المقال متاح على GitHub.

مقدمة

من المقال السابق، سنتبع نفس الخطوات لإعداد بيئتنا التجريبية كما موضح في هذه الأقسام:

مقدمة
- استخدام آلة افتراضية أو venv.
إنشاء حساب SingleStoreDB Cloud
- استخدام مجموعة العرض التجريبي لـ Ollama كاسم مجموعة المساحة العمل و ollama-demo كاسم المساحة العمل. قم بتوقيع ملاحظة بال كلمة المرور و اسم الناطق. امنح الوصول مؤقتًا من أي مكان عن طريق ضبط الجدار الناري تحت مجموعة العرض التجريبي لـ Ollama > الجدار الناري.
إنشاء قاعدة بيانات

CREATE DATABASE IF NOT EXISTS ollama_demo;

تثبيت Jupyter
- pip install notebook
تثبيت Ollama
- curl -fsSL https://ollama.com/install.sh | sh
متغير البيئة
- export SINGLESTOREDB_URL="admin:<password>@<host>:3306/ollama_demo"用您的环境中的值替换 <password> 和 <host>。
تشغيل Jupyter
- jupyter notebook

ملء المذكرة

أولاً، بعض المرادون:

Shell

!pip install ollama numpy pandas sqlalchemy-singlestoredb --quiet --no-warn-script-location

ثم، سنستير الي بعض المكتبات:

Python

import ollama
import os
import numpy as np
import pandas as pd
from sqlalchemy import create_engine, text

سنقوم بإنشاء التعبيرات بواسطة all-minilm (45 مبيعاً في الوقت الذي يمكن الكتابة فيه):

Python

ollama.pull("all-minilm")

Plain Text

{'status': 'success'}

حسنًا، سنستخدم llama2 (3.8 جيغا بتاتا في الوقت الذي يمكننا أن نكتب فيه) لـمحرك التعلم العام.

Python

ollama.pull("llama2")

المثال القادم:

Plain Text

{'status': 'success'}

ما يليه سنستخدم نصوص المثال من موقع Ollama:

Python

documents = [
    "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels",
    "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands",
    "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall",
    "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight",
    "Llamas are vegetarians and have very efficient digestive systems",
    "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old"
]
​
df_data = []
​
for doc in documents:
    response = ollama.embeddings(
        model = "all-minilm",
        prompt = doc
    )
    embedding = response["embedding"]
    embedding_array = np.array(embedding).astype(np.float32)
    df_data.append({"content": doc, "vector": embedding_array})
​
df = pd.DataFrame(df_data)
​
dimensions = len(df.at[0, "vector"])

سنضع التعبيرات في all-minilm وسنتطبق على كل مستند لبناء محتويات لجدول بانداس. أضافة إلى ذلك، سنحول التعبيرات إلى تنسيق 32 بتاتا لأن هذا ما يتم بواسطة سينجلستور بالفعل للتنسيق الافتراضي VECTOR. وأخيرًا، سنحدد عدد أبعاد التعبيرات لأول مستند في جدول بانداس.

ما يليه سنقوم بإنشاء اتصال بحزمة سينجلستور الخاصة بنا:

Python

connection_url = "singlestoredb://" + os.environ.get("SINGLESTOREDB_URL")
​
db_connection = create_engine(connection_url)

سنبني مجددًا جدولًا بعمود vector بواسطة الأبعاد التي تم تحديدها مسبقًا:

Python

query = text("""
CREATE TABLE IF NOT EXISTS pandas_docs (
    id BIGINT AUTO_INCREMENT NOT NULL,
    content LONGTEXT,
    vector VECTOR(:dimensions) NOT NULL,
    PRIMARY KEY(id)
);
""")
​
with db_connection.connect() as conn:
    conn.execute(query, {"dimensions": dimensions})

سنكتب الجدول البانداسي الى الجدول:

Python

df.to_sql(
    "pandas_docs",
    con = db_connection,
    if_exists = "append",
    index = False,
    chunksize = 1000
)

المثال القادم:

Plain Text

6

سنقوم بإنشاء فهرس لتطابق ما قمنا بإنشائه في المقالة السابقة:

Python

query = text("""
ALTER TABLE pandas_docs ADD VECTOR INDEX (vector)
    INDEX_OPTIONS '{
          "metric_type": "EUCLIDEAN_DISTANCE"
     }';
""")
​
with db_connection.connect() as conn:
    conn.execute(query)

سنطرح السؤال بالتالي:

Python

prompt = "What animals are llamas related to?"
​
response = ollama.embeddings(
    prompt = prompt,
    model = "all-minilm"
)
​
embedding = response["embedding"]
embedding_array = np.array(embedding).astype(np.float32)
​
query = text("""
SELECT content
FROM pandas_docs
ORDER BY vector <-> :embedding_array ASC
LIMIT 1;
""")
​
with db_connection.connect() as conn:
    results = conn.execute(query, {"embedding_array": embedding_array})
    row = results.fetchone()
​
data = row[0]
print(data)

سنحول العرض الى تعبيرات، ونتأكد من أن التعبيرات تم تحويلها إلى تنسيق 32 بتاتا، ومن ثم ننفذ ال consulta SQL التي تستخدم النمطة المتداخلة <-> لعدد المسافة الأوقاتية.

المثال القادم:

Plain Text

Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels

ما يليه سنستخدم المحرك التعلم العام بالتالي:

Python

output = ollama.generate(
    model = "llama2",
    prompt = f"Using this data: {data}. Respond to this prompt: {prompt}"
)
​
print(output["response"])

المثال القادم:

Plain Text

​x
Llamas are members of the camelid family, which means they are closely related to other animals such as:
​
1. Vicuñas: Vicuñas are small, wild camelids that are native to South America. They are known for their soft, woolly coats and are considered an endangered species due to habitat loss and poaching.
2. Camels: Camels are large, even-toed ungulates that are native to Africa and the Middle East. They are known for their distinctive humps on their backs, which store water and food for long periods of time.
​
Both llamas and vicuñas are classified as members of the family Camelidae, while camels are classified as belonging to the family Dromedaryae. Despite their differences in size and habitat, all three species share many similarities in terms of their physical characteristics and behavior.

الخلاصة

في هذه المقالة ، قمنا بتكرار الخطوات التي تتبعنا في المقالة السابقة وحققنا نتائج مماثلة. ومع ذلك ، كان علينا كتابة سلسلة من الأعمال الSQL وإدارة عدة خطوات التي ستقوم بها LangChain. وبالإضافة إلى ذلك قد يكون هناك وقت وتكاليف أكثر في المرافق البرمجية على المدى الطويل مقارنة بحلول LangChain.

الاستخدام من LangChain بدلاً من كتابة البرمجيات الخاصة للوصول إلى البيانات يوفر عدد من المزايا ، مثل الفعالية والقابلية للتنمية والموثرية.

يقدم LangChain مكتبة من المواد المبنية مسبقاً للتفاعل مع القواعد البيانية ، مما يخفيض وقت التطوير والجهد. يمكن للمطورين استخدام هذه المواد لتنفيذ عدة عمليات القواعد البيانية بسرعة وبدون حاجة إلى البدء من نقطة الصفر.

LangChain يخفي معظم التعقيدات المتعلقة بإدارة البيانات ، مما يسمح للمطورين بالتركيز على المهام العليا بدلاً من التفاصيل الدنيا للتطوير. هذا يحسن الإنتاجية ومعدل الوصول الى السوق للتطبيقات القواعد البيانية.

لLangChain مجموعة كبيرة ونشطة وتتوسع من قبل المطورين ، ويتاح لها GitHub وتوفر توثيق شامل وأمثلة عديدة.

في الختام ، يوفر LangChain للمطورين منصة قوية وفعالة وموثرة لبناء تطبيقات قواعد البيانات ، مما يسمح لهم بالتركيز على المشاكل الأعمالية باستخدام تعايشات عالية المستوى بدلاً من إع

Source:
https://dzone.com/articles/ollama-plus-singlestore-minus-langchain