Skip to content

Commit

Permalink
Fix GitHub ingest: separate files created properly (#76)
Browse files Browse the repository at this point in the history
* modified 'document' which is uploaded to supabase

* delete comments

---------

Co-authored-by: Kastan Day <[email protected]>
  • Loading branch information
star-nox and KastanDay authored Sep 6, 2023
1 parent 32abf39 commit 6a1a38f
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions ai_ta_backend/vector_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -810,14 +810,14 @@ def split_and_upload(self, texts: List[str], metadatas: List[Dict[str, Any]]):
"embedding": embeddings_dict[context.page_content]
} for context in contexts]

document = {
"course_name": contexts[0].metadata.get('course_name'),
"s3_path": contexts[0].metadata.get('s3_path'),
"readable_filename": contexts[0].metadata.get('readable_filename'),
"url": contexts[0].metadata.get('url'),
"base_url": contexts[0].metadata.get('base_url'),
"contexts": contexts_for_supa,
}
document = [{
"course_name": context.metadata.get('course_name'),
"s3_path": context.metadata.get('s3_path'),
"readable_filename": context.metadata.get('readable_filename'),
"url": context.metadata.get('url'),
"base_url": context.metadata.get('base_url'),
"contexts": contexts_for_supa, # should ideally be just one context but getting JSON serialization error when I do that
} for context in contexts]

count = self.supabase_client.table(os.getenv('NEW_NEW_NEWNEW_MATERIALS_SUPABASE_TABLE')).insert(document).execute() # type: ignore
print("successful END OF split_and_upload")
Expand Down

1 comment on commit 6a1a38f

@KastanDay
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@star-nox this commit introduced SEVERE bugs. It re-uploaded PDFs for every page in the PDF. so a 400-page PDF had 400 entries. Fixed here: 4f6a863

Please sign in to comment.