Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Feed Address Recognition & Querying Local Text Document Vectors in Pinecone #1

Open
norsizu opened this issue Mar 21, 2023 · 2 comments

Comments

@norsizu
Copy link

norsizu commented Mar 21, 2023

This Project has proven to be incredibly helpful and effective. However, I am encountering an issue with the feed address recognition feature. When I attempt to import my desired feed address, the system fails to recognize it, and I receive the following error message: "TypeError: expected string or bytes-like object, got 'NoneType'".

To resolve this problem and continue using the project effectively, I was wondering if there might be an alternative method for achieving my goal. Specifically, I am interested in uploading a batch of text document from my local machine as vectors to Pinecone and subsequently querying it. Could you please provide guidance on how to accomplish this task, or suggest any other solutions that may address the issue at hand?

Thank you for your time and support.

@gbaeke
Copy link
Owner

gbaeke commented Mar 21, 2023

Hi, glad you find it useful. Feedparser only works with RSS feeds. To use local text documents, you can iterate over those files in the folder, read every file and create the embedding. Something like:

import os
import openai

# Set up the OpenAI API
openai.api_key = "your_api_key"

# Define a function to get embeddings for given text
def get_embedding(text):
   # create embedding here with OpenAI
   return embedding

# Define the folder path containing the text files
folder_path = "path/to/your/text/files"

# Iterate over all text files in the folder
for file_name in os.listdir(folder_path):
    if file_name.endswith(".txt"):
        file_path = os.path.join(folder_path, file_name)
        
        # Read the content of the file
        with open(file_path, "r", encoding="utf-8") as file:
            content = file.read()
        
        # Get the embedding using the OpenAI API
        embedding = get_embedding(content)
        
        # upload to pinecone
        ...

@norsizu
Copy link
Author

norsizu commented Mar 21, 2023

Thank you so much, I will give this method a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants