Skip to content

Conversation

@rickryan
Copy link

Description

This PR fixes a bug in the adding of documents in the grounding module. Dcouments implement the llama-index's Document interface in which the .text attribute is read only. The add_document method attempted to directly write the attribute with the results of the sanitize_raw_string function, causing a runtime error. For example when attempting to add mental faculties in the investment_firm.ipynb example:

grounding_faculty = FilesAndWebGroundingFaculty(folders_paths=["../data/grounding_examples/finance"])

Changes Made

  1. Created a new method clone_document_with_new_text to create a new document that is a clone of all the metadata and attributes of a source document but with different text.
  2. Created a new document, sanitized_document, from the result of sanitze_raw_string.
  3. Removed the line that added all the "new" documents to the list of documents at once.
  4. Added each document to the list of documents after it is sanitized.
  5. Added imports for Document and SimpleWebPageReader

Testing

Tested grounding for files in a directory and also for a web page within the context of the investment_firm example.

Fixed a bug in the grounding module that cause a crash whenever directories or web URL's were added.
@rickryan
Copy link
Author

@microsoft-github-policy-service agree

@paulosalem paulosalem changed the base branch from main to development August 5, 2025 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant