Skip to content

Conversation

@rbrugaro
Copy link
Collaborator

Several enhancements:

  1. trim input to TGI in cases where all community partial answers do not fit in the input context of the final answer generation -> this was causing error
  2. In previous implementation clustering and summary extraction was done at query time resulting in slow time to fist token. Moved clustering a full dataset summariization to the dataprep step. In addition to storing the graph in Neo4j now we also store the entity_info and the community_summaries for retrieval with cypher queries in retriever code
  3. fix gateway input

@codecov
Copy link

codecov bot commented Nov 12, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
comps/cores/mega/gateway.py 0.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
comps/cores/mega/gateway.py 29.82% <0.00%> (ø)

@rbrugaro rbrugaro added the WIP label Nov 12, 2024
@rbrugaro rbrugaro marked this pull request as ready for review November 12, 2024 22:00
@rbrugaro rbrugaro added this to the v1.1 milestone Nov 14, 2024
@ashahba ashahba removed the WIP label Nov 14, 2024
Copy link
Collaborator

@ashahba ashahba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rbrugaro for this PR.
My feedback should be pretty straight forward.
Most of questions about timeout's are just to bring them to your attention and it's your call to decide on the final default values.

Signed-off-by: Rita Brugarolas <[email protected]>
Signed-off-by: Rita Brugarolas <[email protected]>
Copy link
Collaborator

@ashahba ashahba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ashahba ashahba merged commit 0163ea6 into opea-project:main Nov 14, 2024
madison-evans pushed a commit to SAPD-Intel/GenAIComps that referenced this pull request May 12, 2025
… store in DB (opea-project#893)

* trim input to TGI, moved clustering and summarization to dataprep and DB store

Signed-off-by: Rita Brugarolas <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed inspect_db causing error in precommit

Signed-off-by: Rita Brugarolas <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add HF token to dataprep container because tokenizer is used now

Signed-off-by: Rita Brugarolas <[email protected]>

* updated READMEs to reflect latest changes

Signed-off-by: Rita Brugarolas <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* bug fix all files are ingested and graph extracted first followed by 1 cluster call for full graph in database

Signed-off-by: Rita Brugarolas <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update README based on fix for multifile

Signed-off-by: Rita Brugarolas <[email protected]>

* Changes to make graphrag ui work

Signed-off-by: theresa <[email protected]>

* fix bug build communities done once at end of ingestion

Signed-off-by: Rita Brugarolas <[email protected]>

* minor fixes

Signed-off-by: Rita Brugarolas <[email protected]>

* README fixes

Signed-off-by: Rita Brugarolas <[email protected]>

---------

Signed-off-by: Rita Brugarolas <[email protected]>
Signed-off-by: theresa <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: theresa <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants