Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Incremental Indexing v1 #1318

Merged
merged 32 commits into from
Oct 30, 2024
Merged

Add Incremental Indexing v1 #1318

merged 32 commits into from
Oct 30, 2024

Conversation

AlonsoGuevara
Copy link
Contributor

@AlonsoGuevara AlonsoGuevara commented Oct 24, 2024

Description

Add Incremental Indexing support.

Usage

Drop new files into the already existing input folder and update settings.yaml

In settings.yaml

storage:
  type: file
  base_dir: "path/to/existing/artifacts"

update_index_storage:
  type: file
  base_dir: "path/to/create/new/index"
  

Related Issues

#741

Checklist

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have updated the documentation (if necessary).
  • I have added appropriate unit tests (if applicable).

Additional Notes

[Add any additional notes or context that may be helpful for the reviewer(s).]

AlonsoGuevara and others added 27 commits August 30, 2024 15:26
* Add cli and api entrypoints for update index

* Semver

* Update docs

* Run tests on feature branch main

* Better /main handling in tests
* Calculate new inputs and deleted inputs on update

* Semver

* Clear ruff checks

* Fix pyright

* Fix PyRight

* Ruff again
* Remove extraneous param

* Add community report mocking assertions

* Collapse primary report generation

* Collapse embeddings

* Format

* Semver

* Remove extraneous check

* Move option set
* Collapse create_base_entity_graph

* Format/typing

* Semver

* Fix smoke tests

* Simplify assignment
* Collapse entity summarize

* Semver
* Set up base assertions

* Replace entity_extract

* Finish collapsing workflow

* Semver

* Update snoke tests
* Update final text units

* Format

* Address comments
* Add naive community merge using time period

* formatting

* Query fixes

* Add descriptions from merged_entities

* Add summarization and embeddings

* Use iso format

* Ruff

* Pyright and smoke tests

* Pyright

* Pyright

* Update parquet for verb tests

* Fix smoke tests

* Remove sorting

* Update smoke tests

* Smoke tests

* Smoke tests

* Updated verb test to ack for latest changes on covariates
* Add config for incremental index + Bug fixes

* Ruff

* Fix smoke tests
@AlonsoGuevara AlonsoGuevara requested review from a team as code owners October 24, 2024 23:06
andresmor-ms
andresmor-ms previously approved these changes Oct 24, 2024
@AlonsoGuevara AlonsoGuevara merged commit 7235c6f into main Oct 30, 2024
25 checks passed
@AlonsoGuevara AlonsoGuevara deleted the incremental_indexing/main branch October 30, 2024 17:59
@shamalgithub
Copy link

Thank you for this feature update !

Based on the instructions , i belive all one has to do is to add a new file to the 'input' folder and update the settings.yaml file with the new output file name.
image

Looking at the issue , the proposed fix suggest that a graphrag.append command would be added to work with increamental indexing but that doesnot seem to be present.
image

I might be missing something here. Can someone kindly point out the correct way to do incremental indexing ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants