-
In advanced to text to sql notebook, there was a code snippet to convert csv table to store in the vector index. Based on code, can I save 'pandas' dataframe to vector index? Code snippet for csv to vector store: I want to simulate the same to pandas dataframe called 'df' Obtained from the llamaindex notebook :https://www.youtube.com/watch?v=L1o1VPVfbb0
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
To index a Pandas DataFrame in a similar manner to the provided code snippet for a SQL table, you can follow these steps. The key is to iterate over the DataFrame rows, create Here is an example of how you can achieve this: import pandas as pd
from llama_index.core.schema import TextNode
from llama_index.core.storage import StorageContext
from llama_index.core import VectorStoreIndex
import os
from pathlib import Path
def index_dataframe(df: pd.DataFrame, index_dir: str = "dataframe_index_dir") -> VectorStoreIndex:
"""Index a Pandas DataFrame."""
if not Path(index_dir).exists():
os.makedirs(index_dir)
# Convert DataFrame rows to TextNode objects
nodes = [TextNode(text=str(row.to_dict())) for _, row in df.iterrows()]
# Create a VectorStoreIndex with the nodes
index = VectorStoreIndex(nodes, service_context=service_context)
# Save the index
index.set_index_id("vector_index")
index.storage_context.persist(index_dir)
return index
# Example usage
data = {
'city': ['Toronto', 'Tokyo', 'Berlin'],
'population': [2930000, 13960000, 3645000]
}
df = pd.DataFrame(data)
vector_index = index_dataframe(df) In this code:
This approach allows you to index a Pandas DataFrame similarly to how you would index rows from a SQL table [1]. |
Beta Was this translation helpful? Give feedback.
To fix the
AttributeError: 'str' object has no attribute 'text'
error, you need to ensure that the documents being passed toVectorStoreIndex
are in the correct format and not just plain strings. Here is an example of how you can load documents correctly and index them: