Pandas
-
Ollama + SingleStore – LangChain = :-(
In a previous article, we used Ollama with LangChain and SingleStore. LangChain provided an efficient and compact solution for integrating Ollama with SingleStore. However, what if we were to remove LangChain? In this article, we’ll demonstrate an example of using Ollama with SingleStore without relying on LangChain. We’ll see that while we can achieve the same results described in the previous article, the number of code increases, requiring us to manage more of the plumbing that LangChain normally handles. The…
-
Data Warehouse for Data Science: Adopting Arrow Flight SQL for 10X Data Transfer
For years, JDBC and ODBC have been commonly adopted norms for database interaction. Now, as we gaze upon the vast expanse of the data realm, the rise of data science and data lake analytics brings bigger and bigger datasets. Correspondingly, we need faster and faster data reading and transmission, so we start to look for better answers than JDBC and ODBC. Thus, we include the Arrow Flight SQL protocol in Apache Doris 2.1, which provides tens-fold speedups for data transfer. …
-
Performing Advanced Facebook Event Data Analysis With a Vector Database
In today’s digital age, professionals across all industries must stay updated with upcoming events, conferences, and workshops. However, efficiently finding events that align with one’s interests amidst the vast ocean of online information presents a significant challenge. This blog introduces an innovative solution to this challenge: a comprehensive application designed to scrape event data from Facebook and analyze the scraped data using MyScale. While MyScale is commonly associated with the RAG tech stack or used as a vector database, its…
-
Harnessing Generative AI in Data Analysis With PandasAI
Ever wish your data would analyze itself? Well, we are one step closer to that day. PandasAI is a groundbreaking tool that significantly streamlines data analysis. This Python library expands on the capabilities of the popular Pandas library with the help of generative AI, making automated yet sophisticated data analysis a reality. By applying generative models like OpenAI’s GPT-3.5, PandasAI can understand and respond to human-like queries, execute complex data manipulations, and generate visual representations. Data analysis and AI combine…
-
ClickHouse: Windows Functions From Scratch
ClickHouse is a highly scalable, column-oriented, relational database management system optimized for analytical workloads. It is an open-source product developed by Yandex, a search engine company. One of the key features of ClickHouse is its support for advanced analytical functions, including windows functions. Windows functions were first introduced in the late 1990s by SQL Server, and since then, have become a standard feature in many relational databases, including ClickHouse. Today, windows functions are an indispensable tool for data analysts and…
-
How To Use Python pandas dropna() to Drop NA Values from DataFrame
Introduction In this tutorial, you’ll learn how to use panda’s DataFrame dropna() function. NA values are “Not Available”. This can apply to Null, None, pandas.NaT, or numpy.nan. Using dropna() will drop the rows and columns with these values. This can be beneficial to provide you with only valid data. By default, this function returns a new DataFrame and the source DataFrame remains unchanged. This tutorial was verified with Python 3.10.9, pandas 1.5.2, and NumPy 1.24.1. Syntax dropna() takes the following…
-
Parquet Data Filtering With Pandas
When it comes to filtering data from Parquet files using pandas, several strategies can be employed. While it’s widely recognized that partitioning data can significantly enhance the efficiency of filtering operations, there are additional methods to optimize the performance of querying data stored in Parquet files. Partitioning is just one of the options. Filtering by Partitioned Fields As previously mentioned, this approach is not only the most familiar but also typically the most impactful in terms of performance optimization. The…
-
Visualize Real-Time Data With Python, Dash, and RisingWave
Real-time data is important for businesses to make quick decisions. Seeing this data visually can help make decisions even faster. We can create visual representations of data using various data apps or dashboards. Dash is an open-source Python library that provides a wide range of built-in components for creating interactive charts, graphs, tables, and other UI elements. RisingWave is a SQL-based streaming database for real-time data processing. This article will explain how to use Python, Dash, and RisingWave to make…
-
How To Use Pandas and Matplotlib To Perform EDA In Python
Exploratory Data Analysis (EDA) is an essential step in any data science project, as it allows us to understand the data, detect patterns, and identify potential issues. In this article, we will explore how to use two popular Python libraries, Pandas and Matplotlib, to perform EDA. Pandas is a powerful library for data manipulation and analysis, while Matplotlib is a versatile library for data visualization. We will cover the basics of loading data into a pandas DataFrame, exploring the data…