Dev to staging (#1372)

prakriti-solankey · kartikpersistent · kaustubh-darekar · web-flow · commit 37b45b4b8171 · 2025-08-25T11:04:46.000+05:30
* Read only mode for unauthenticated users (#1046)

* llm name changes

* build fix

* default mode fix

* ragas model names update

* lint fixes

* Chunk Entities API condition

* added the tooltip for unsupported lllms for ragas metric loading

* removed unused imports

* multimode fix when we get error response

* mode changes for score display

* fix: Fixed the details state handling between multiple chats
feature: Added the warning banner If selected llm model is not supported for raga's evaluation

* Fix: Entity Mode Width Fix

* diffbot fix for async (#797)

* Minor changes (#798)

* added congig variable for default diffbot chat model

* fulltext index creation is skipped when the labels are empty

* entity vector change

* added optinal to communities for entity mode

* updated the entity query

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* New: Added the supported llm models for ragas evaluation

* Fix: Communitites Tab is displayed based communitites length

* added the conversation download button (#800)

* model name correction

* chatmode switch mode fix

* Add API payload GCP logging (#805)

* Adding Links to get neighboring nodes (#796)

* addition of link

* added neighbours query

* implemented with driver

* updated the query

* communitiesInfo name change

* communities.tsx removed

* api integration

* modified response

* entities change

* chunk and communities

* chunk space removal

* added element id to chunks

* loading on click

* format changes

* added file name for Dcoumrnt node

* chat token cut off model name update

* icon change

* duplicate sources removal

* Entity change

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;

* added error message for doc retriver (#807)

* copy row (#803)

* copy row

* column for copy

* column copy

* Raga's Evaluation For Multi Modes (#806)

* Updatedmodels for ragas eval

* context utilization metrics removed

* updated supported llms for ragas

* removed context utilization

* Implemented Parallel API

* multi api calls error resolved

* MultiMode Metrics

* Fix: Metric Evalution For Single Mode

* multi modes ragas evaluation

* api payload changes

* metric api output format changed

* multi mode ragas changes

* removed pre process dataset

* api response changes

* Multimode metrics api integration

* nan error for no answer resolved

* QA integration changes

---------

Co-authored-by: kaustubh-darekar &lt;kaustubh_darekar@persistent.com&gt;

* lint fixes

* fix: multimode metrics state handling
fix: lint fixes

* fix: Multimode metrics mode change state issue
fix: chunk list style issue

* fix: list style fix

* Correct TYPO mistake

* added new env for ragas embedding model

* Props name changes (#811)

* Props name changes

* removed the accesstoken from row on copy action

* props changes for dropzone component

* graph view changes

---------

Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;

* test

* view graph

* nodes count and relationshipcount updation fix

* sourceUrl Fix

* empty string "" fix to keep the default values we should keep the value blank instead ""

* prop changes

* props changes

* retry condition update for failed files (#820)

* Chat modes name changes (#815)

* Props name changes

* removed the accesstoken from row on copy action

* updated chat mode names

* Chat Modes Name Changes

* lint fixes

* using readble format In UI

* removal of size to avoid console warning

* key add

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;

* Youtube transcript fix with proxy (#822)

* update script for async func

* ragas changes for graph retrieval mode. context added in api output (#825)

* Remove extract latency from logging and add LIMIT in duplicate nodes

* Document updates (#828)

* document updated with ragas evaluation information

* formatting changes

* chatbot api documentation updated

* api details added in document

* function name changed for drop create vector index api

* Update README.md

* updated api structire in docs (#827)

* Update backend_docs.adoc

* 821 llm model listing (#823)

* added logic for document filters

* LLM models

* message change

* link added

* removed the text

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;

* Exclude session lable node from duplicate nodes list

* Added the tooltip for disabled llm option (#835)

* node size changes

* mode removal of rows check

* formatting

* Exclude __Entity__ node label from duplicate node list

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* fixed the youtube link

* Security header and GZIPMiddleware (#847)

* Added security header all API

* Add GZipMiddleware

* Chunk Text Details (#850)

* Community title added

* Added api for fetching chunk text details

* output format changed for chunk text

* integrated the service layer for chunkdata

* added the chunks

* formatting output of llm call for title generation

* formatting llm output for title generation

* added flex row

* Changes related to pagination of fetch chunk api

* Integrated the pagination

* page changes error resolved for fetch chunk api

* for get neighbours api , community title added in properties

* moving community title related changes to separate branch

* Removed Query module from fastapi import statement

* icon changes

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Communities Id to Title (#851)

* Staging to main (#735)

* Dev (#537)

* format fixes and graph schema indication fix

* Update README.md

* added chat modes variable in env updated the readme

* spell fix

* added the chat mode in env table

* added the logos

* fixed the overflow issues

* removed the extra fix

* Fixed specific scenario  "when the text from schema closes it should reopen the previous modal"

* readme changes

* removed dev console logs

* added new retrieval query (#533)

* format fixes and tab rendering fix

* fixed the setting modal reopen issue

---------

Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;

* disabled the sumbit buttom on loading

* Deduplication tab (#566)

* de-duplication API

* Update De-Duplicate query

* created the Deduplication tab

* added the API service

* added the removeable tags for similar nodes in deduplication tab

* Integrate Tag

* added GraphLabel

* added loader state

* added the merge service

* integrated the merge API

* Merge Query issue fixed

* Auto refresh the duplicate nodes after merging operation

* added the description for de duplication

* reset on merging

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* Update frontend_docs.adoc (#538)

* Update frontend_docs.adoc

* doc update

* Images

* Images folder change

* Images folder change

* test image

* Update frontend_docs.adoc

* image change

* Update frontend_docs.adoc

* Update frontend_docs.adoc

* added the Graph Mode SS

* added the Query SS

* Update frontend_docs.adoc

* conflics fix

* conflict fix

* Update frontend_docs.adoc

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* updated langchain versions (#565)

* Update the De-Duplication query

* Node relationship id type none issue (#547)

* de-duplication API

* Update De-Duplicate query

* Issue fixed Nodes,Relationship Id and Type None or Blank

* added the tooltips

* type fix

* Unneccory import

* added score threshold and added some error handling (#571)

* Update requirements.txt

* Tooltip and other UI fixes (#572)

* Staging To Main (#495)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* Dev (#433)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

---------

Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: Ajay Meena &lt;meenajy1996@gmail.com&gt;
Co-authored-by: Morgan Senechal &lt;morgan@neo4j.com&gt;
Co-authored-by: karanchellani &lt;142801957+karanchellani@users.noreply.github.com&gt;

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

* DEV to STAGING (#461)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;
Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: Ajay Meena &lt;meenajy1996@gmail.com&gt;
Co-authored-by: Morgan Senechal &lt;morgan@neo4j.com&gt;
Co-authored-by: karanchellani &lt;142801957+karanchellani@users.noreply.github.com&gt;

* DEV to STAGING (#462)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user with checkbox (#415)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* Extract schema using direct ChatOpenAI API and Chain

* integrated the checkbox for schema to text dialog

* Update SettingModal.tsx

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* gcs file content read via storage client (#417)

* gcs file content read via storage client

* added the access token the file state

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* pypdf2 to read files from gcs (#420)

* 407 remove driver from frontend (#416)

* removed driver

* removed API

* connecting to database on page refresh

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Css handling of info modal and Tooltips (#418)

* css change

* toolTips

* Sidebar Tooltips

* copy to clip

* css change

* added image types

* added gcs

* type fix

* docker changes

* speech

* added the toolip for dropzone sources

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed retrival bugs (#421)

* yarn format fixes

* changed the delete message

* added the cancel  button

* changed the message on tooltip

* added space

* UI fixes

* tooltip for setting

* updated req

* wikipedia URL input (#424)

* accept only wikipedia links

* added wikipedia link

* added wikilink regex

* wikipedia single url only

* changed the alert message

* wording change

* pushed validation state persist error

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* speech and copy (#422)

* speech and copy

* startTime

* added chunk properties

* tooltips

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Fixed issue for out of range in KNN API

* solved conflicts

* conflict solved

* Remove logging info from update KNN API

* tooltip changes

* format and lint fixes

* responsiveness changes

* Fixed issue for total pages GCS, S3

* UI polishing (#428)

* button and tooltip changes

* checking validation on change

* settings module populate fix

* format fixes

* opening the modal after auth success

* removed the limit

* added the scrobar for dropdowns

* speech state (#426)

* speech state

* Button Details changes

* delete wording change

* Total pages in buckets (#431)

* page number NA for buckets

* added N/A for gcs and s3 pages

* total pages for gcs

* remove unwanted logger

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* removed the max width

* Update FileTable.tsx

* Update the docker file

* Modified prompt (#438)

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* rendering Fix

* Local file upload gcs (#442)

* Uplaod file to GCS

* GCS local upload fixed issue and delete file from GCS after processing and failed or cancelled

* Add life cycle rule on uploaded bucket

* pdf upload local and gcs bucket check

* delete files when processed and extract changes

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* Modified chat length and entities used (#443)

* metadata for unstructured files (#446)

* Unstructured file metadata (#447)

* metadata for unstructured files

* sleep in gcs upload

* updated

* icons added to chunks (#435)

* icons added to chunks

* info modal icons

* fixed gcs status message issue

* added if check for failed count

* Null issue Fixed from backend for upload API and graph_document when model name mismatch

* added word break issue

* Added neo4j-rust-ext

* processing time estimation based on bytes

* File extension upper case fixed, File delete from GCS or local based on env variable.

* timer per byte

* Update Dockerfile

* Adding sort rows on the table (#451)

* Gcs upload folder hashed (#453)

* implement foldername hashed in GCS bucket uplaod

* Raise exception if invalid model selected

* folder name for gcs upload

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* upload all unstructuredfiles to gcs (#455)

* Mofified chunk query (#454)

* Added libre office for fixing error -- soffice command was not found. Please install libreoffice
on your system and try again.

- Install instructions: https://www.libreoffice.org/get-help/install-howto/
- Mac: https://formulae.brew.sh/cask/libreoffice
- Debian: https://wiki.debian.org/LibreOffice"

* Fix the PARTIAL CONTENT issue

* File-table no data found (#456)

* 'file-table''

* review comment

* Llm format change (#459)

* changed the llm models format to lowercase

* added the error message

* llm model changes

* format fixes

* removed unused import

* added the capitalize method

* delete files from merged_file_path only if source is local file

---------

Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* commented total page code (#460)

* format fixes

* removed the disabled check on dropdown

* Large file env

---------

Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;
Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: Ajay Meena &lt;meenajy1996@gmail.com&gt;
Co-authored-by: Morgan Senechal &lt;morgan@neo4j.com&gt;
Co-authored-by: karanchellani &lt;142801957+karanchellani@users.noreply.github.com&gt;

* added upload api

* changed the dropzone error message

* Dev to staging (#466)

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* recent merges

* pdf deletion due to out of diskspace

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* Convert is_cancelled value from string to bool

* added the default page size

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* offset in chunks (#389)

* page number in gcs loader (#393)

* added youtube timestamps (#392)

* chat pop up button (#387)

* expand

* minimize-icon

* css changes

* chat history

* chatbot wider Side Nav

* expand icon

* chatbot UI

* Delete

* merge fixes

* code suggestions

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* chunks create before extraction using is_pre_process variable (#383)

* chunks create before extraction using is_pre_process variable

* Return total pages for Model

* update requirement.txt

* total pages on uplaod API

* added the Confirmation Dialog

* added the selected files into the confirmation modal

* format and lint fixes

* added the stop watch image

* fileselection on alert dialog

* Add timeout in docker for gunicorn workers

* Add cancel icon to info popup (#384)

* Info Modal Changes

* css changes

* recent merges

* Integration_qa test (#375)

* Test IntegrationQA added

* update test cases

* update test

* update node count assertions

* test changes

* update changes

* modification test

* Code refatctor test cases

* Handle allowedlist issue in test

* test changes

* update test

* test case execution

* test chatbot updates

* test case update file

* added file

---------

Co-authored-by: Pravesh Kumar &lt;121786590+praveshkumar1988@users.noreply.github.com&gt;

* fixed status blank issue

* Rendering the file name instead of link for gcs and s3 sources in the info modal

* added the default page size

* Convert is_cancelled value from string to bool

* Issue fixed Processed chunked as 0 when file re-process again

* Youtube timestamps (#386)

* Wikipedia source to accept all valid urls

* wikipedia url to support multiple languages

* integrated wiki langauge param for extract api

* Youtube video timestamps

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* groq llm integration backend (#286)

* groq llm integration backend

* groq and description in node properties

* added groq in options

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Save Total Pages in DB

* Added total Pages

* file selection when we didn't select anything from Main table

* added the danger icon only for large files

* added the overflow for more files and file selection for all new files

* moved the interface to types

* added the icon accoroding to the source

* set total page for wiki and youtube

* h3 heading

* merge

* updated the alert on basis if total pages

* deleted chunks

* polling based on total pages

* isNan check

* large file based on file size for s3 and gcs

* file source in server side event

* time calculation based on chunks for gcs and s3

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;
Co-authored-by: Prakriti Solankey &lt;156313631+prakriti-solankey@users.noreply.github.com&gt;
Co-authored-by: abhishekkumar-27 &lt;164544129+abhishekkumar-27@users.noreply.github.com&gt;
Co-authored-by: aashipandya &lt;156318202+aashipandya@users.noreply.github.com&gt;

* fixed the layout issue

* Populate graph schema (#399)

* crreate new endpoint populate_graph_schema and update the query for getting lables from DB

* Added main.py changes

* conditionally-including-the-gcs-login-flow-in-gcs-as-source (#396)

* added the condtion

* removed llms

* Fixed issue : Remove extra unused param

* get emb only if used (#278)

* Chatbot chunks (#402)

* Added file name to the content  sent to LLM

* added chunk text in the response

* increased the docs parts sent to llm

* Modified graph query

* mardown rendering

* youtube starttime

* icons

* offset changes

* removed the files due to codespace space issue

---------

Co-authored-by: vasanthasaikalluri &lt;165021735+vasanthasaikalluri@users.noreply.github.com&gt;
Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user (#405)

* added the json

* added schema from text dialog

* integrated the schemaAPI

* added the alert

* resize fixes

* fixed css issue

* fixed status blank issue

* Modified response when no docs is retrived (#413)

* Fixed env/docker-compose for local deployments + README doc (#410)

* Fixed env/docker-compose for local deployments + README doc

* wrong place for ENV in README

* by default, removed langsmith + fixed knn score string to float

* by default, removed langsmith + fixed knn score string to float

* Fixed strings in docker-compose env

* Added requirements (neo4j 5.15 or later, APOC, and instructions for Neo4j Desktop)

* Missed the TIME_PER_PAGE env, was causing NaN issue in the approx time processing notification. fixed that

* Support for all unstructured files (#401)

* all unstructured files

* responsiveness

* added file type

* added the extensions

* spell mistake

* ppt file changes

---------

Co-authored-by: kartikpersistent &lt;101251502+kartikpersistent@users.noreply.github.com&gt;

* Settings modal to support generating the labels from the llm by using text given by user with …
diff --git a/backend/Dockerfile b/backend/Dockerfile
@@ -6,20 +6,33 @@ EXPOSE 8000
 RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        libmagic1 \
-       libgl1-mesa-glx \
+       libgl1 \
+       libglx-mesa0 \
        libreoffice \
        cmake \
        poppler-utils \
        tesseract-ocr && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*
+
 # Set LD_LIBRARY_PATH
 ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
 # Copy requirements file and install Python dependencies
 COPY requirements.txt constraints.txt /code/
 # --no-cache-dir --upgrade 
 RUN pip install --upgrade pip
 RUN pip install -r requirements.txt -c constraints.txt
+
+RUN python -c "from transformers import AutoTokenizer, AutoModel; \
+   name='sentence-transformers/all-MiniLM-L6-v2'; \
+   tok=AutoTokenizer.from_pretrained(name); \
+   mod=AutoModel.from_pretrained(name); \
+   tok.save_pretrained('./local_model'); \
+   mod.save_pretrained('./local_model')"
+
+RUN python -m nltk.downloader -d /usr/local/nltk_data punkt
+RUN python -m nltk.downloader -d /usr/local/nltk_data averaged_perceptron_tagger
+
 # Copy application code
 COPY . /code
 # Set command
diff --git a/backend/requirements.txt b/backend/requirements.txt
@@ -53,12 +53,12 @@ wrapt==1.17.2
 yarl==1.20.1
 youtube-transcript-api==1.1.0
 zipp==3.23.0
-sentence-transformers==4.1.0
+sentence-transformers==5.0.0
 google-cloud-logging==3.12.1
 pypandoc==1.15
 graphdatascience==1.15.1
 Secweb==1.18.1
-ragas==0.2.15
+ragas==0.3.1
 rouge_score==0.1.2
 langchain-neo4j==0.4.0
 pypandoc-binary==1.15
diff --git a/backend/src/QA_integration.py b/backend/src/QA_integration.py
@@ -38,7 +38,6 @@
 load_dotenv() 
 
 EMBEDDING_MODEL = os.getenv('EMBEDDING_MODEL')
-EMBEDDING_FUNCTION , _ = load_embedding_model(EMBEDDING_MODEL) 
 
 class SessionChatHistory:
     history_dict = {}
@@ -304,6 +303,7 @@ def create_document_retriever_chain(llm, retriever):
         output_parser = StrOutputParser()
 
         splitter = TokenTextSplitter(chunk_size=CHAT_DOC_SPLIT_SIZE, chunk_overlap=0)
+        EMBEDDING_FUNCTION , _ = load_embedding_model(EMBEDDING_MODEL) 
         embeddings_filter = EmbeddingsFilter(
             embeddings=EMBEDDING_FUNCTION,
             similarity_threshold=CHAT_EMBEDDING_FILTER_SCORE_THRESHOLD
@@ -344,7 +344,7 @@ def initialize_neo4j_vector(graph, chat_mode_settings):
 
         if not retrieval_query or not index_name:
             raise ValueError("Required settings 'retrieval_query' or 'index_name' are missing.")
-
+        EMBEDDING_FUNCTION , _ = load_embedding_model(EMBEDDING_MODEL) 
         if keyword_index:
             neo_db = Neo4jVector.from_existing_graph(
                 embedding=EMBEDDING_FUNCTION,
diff --git a/backend/src/document_sources/gcs_bucket.py b/backend/src/document_sources/gcs_bucket.py
@@ -46,46 +46,58 @@ def gcs_loader_func(file_path):
    return loader
 
 def get_documents_from_gcs(gcs_project_id, gcs_bucket_name, gcs_bucket_folder, gcs_blob_filename, access_token=None):
-  nltk.download('punkt')
-  nltk.download('averaged_perceptron_tagger')
-  if gcs_bucket_folder is not None and gcs_bucket_folder.strip()!="":
-    if gcs_bucket_folder.endswith('/'):
-      blob_name = gcs_bucket_folder+gcs_blob_filename
+
+  nltk.data.path.append("/usr/local/nltk_data")
+  nltk.data.path.append(os.path.expanduser("~/.nltk_data"))
+  try:
+      nltk.data.find("tokenizers/punkt")
+  except LookupError:
+    for resource in ["punkt", "averaged_perceptron_tagger"]:
+      try:
+          nltk.data.find(f"tokenizers/{resource}" if resource == "punkt" else f"taggers/{resource}")
+      except LookupError:
+          logging.info(f"Downloading NLTK resource: {resource}")
+          nltk.download(resource, download_dir=os.path.expanduser("~/.nltk_data"))
+          
+    logging.info("NLTK resources downloaded successfully.")
+    if gcs_bucket_folder is not None and gcs_bucket_folder.strip()!="":
+      if gcs_bucket_folder.endswith('/'):
+        blob_name = gcs_bucket_folder+gcs_blob_filename
+      else:
+        blob_name = gcs_bucket_folder+'/'+gcs_blob_filename 
     else:
-      blob_name = gcs_bucket_folder+'/'+gcs_blob_filename 
-  else:
-      blob_name = gcs_blob_filename  
-  
-  logging.info(f"GCS project_id : {gcs_project_id}")  
- 
-  if access_token is None:
-    storage_client = storage.Client(project=gcs_project_id)
-    bucket = storage_client.bucket(gcs_bucket_name)
-    blob = bucket.blob(blob_name) 
+        blob_name = gcs_blob_filename  
     
-    if blob.exists():
-        loader = GCSFileLoader(project_name=gcs_project_id, bucket=gcs_bucket_name, blob=blob_name, loader_func=gcs_loader_func)
-        pages = loader.load() 
-    else :
-      raise LLMGraphBuilderException('File does not exist, Please re-upload the file and try again.')
-  else:
-    creds= Credentials(access_token)
-    storage_client = storage.Client(project=gcs_project_id, credentials=creds)
+    logging.info(f"GCS project_id : {gcs_project_id}")  
   
-    bucket = storage_client.bucket(gcs_bucket_name)
-    blob = bucket.blob(blob_name) 
-    if blob.exists():
-      content = blob.download_as_bytes()
-      pdf_file = io.BytesIO(content)
-      pdf_reader = PdfReader(pdf_file)
-      # Extract text from all pages
-      text = ""
-      for page in pdf_reader.pages:
-            text += page.extract_text()
-      pages = [Document(page_content = text)]
+    if access_token is None:
+      storage_client = storage.Client(project=gcs_project_id)
+      bucket = storage_client.bucket(gcs_bucket_name)
+      blob = bucket.blob(blob_name) 
+      
+      if blob.exists():
+          loader = GCSFileLoader(project_name=gcs_project_id, bucket=gcs_bucket_name, blob=blob_name, loader_func=gcs_loader_func)
+          pages = loader.load() 
+      else :
+        raise LLMGraphBuilderException('File does not exist, Please re-upload the file and try again.')
     else:
-      raise LLMGraphBuilderException(f'File Not Found in GCS bucket - {gcs_bucket_name}')
-  return gcs_blob_filename, pages
+      creds= Credentials(access_token)
+      storage_client = storage.Client(project=gcs_project_id, credentials=creds)
+    
+      bucket = storage_client.bucket(gcs_bucket_name)
+      blob = bucket.blob(blob_name) 
+      if blob.exists():
+        content = blob.download_as_bytes()
+        pdf_file = io.BytesIO(content)
+        pdf_reader = PdfReader(pdf_file)
+        # Extract text from all pages
+        text = ""
+        for page in pdf_reader.pages:
+              text += page.extract_text()
+        pages = [Document(page_content = text)]
+      else:
+        raise LLMGraphBuilderException(f'File Not Found in GCS bucket - {gcs_bucket_name}')
+    return gcs_blob_filename, pages
 
 def upload_file_to_gcs(file_chunk, chunk_number, original_file_name, bucket_name, folder_name_sha1_hashed):
   try:
diff --git a/backend/src/make_relationships.py b/backend/src/make_relationships.py
@@ -12,7 +12,6 @@
 logging.basicConfig(format='%(asctime)s - %(message)s',level='INFO')
 
 EMBEDDING_MODEL = os.getenv('EMBEDDING_MODEL')
-EMBEDDING_FUNCTION , EMBEDDING_DIMENSION = load_embedding_model(EMBEDDING_MODEL)
 
 def merge_relationship_between_chunk_and_entites(graph: Neo4jGraph, graph_documents_chunk_chunk_Id : list):
     batch_data = []
@@ -41,7 +40,7 @@ def merge_relationship_between_chunk_and_entites(graph: Neo4jGraph, graph_docume
 def create_chunk_embeddings(graph, chunkId_chunkDoc_list, file_name):
     isEmbedding = os.getenv('IS_EMBEDDING')
     
-    embeddings, dimension = EMBEDDING_FUNCTION , EMBEDDING_DIMENSION
+    embeddings, dimension = load_embedding_model(EMBEDDING_MODEL)
     logging.info(f'embedding model:{embeddings} and dimesion:{dimension}')
     data_for_query = []
     logging.info(f"update embedding and vector index for chunks")
@@ -161,6 +160,7 @@ def create_chunk_vector_index(graph):
         vector_index_query = "SHOW INDEXES YIELD name, type, labelsOrTypes, properties WHERE name = 'vector' AND type = 'VECTOR' AND 'Chunk' IN labelsOrTypes AND 'embedding' IN properties RETURN name"
         vector_index = execute_graph_query(graph,vector_index_query)
         if not vector_index:
+            EMBEDDING_FUNCTION , EMBEDDING_DIMENSION = load_embedding_model(EMBEDDING_MODEL)
             vector_store = Neo4jVector(embedding=EMBEDDING_FUNCTION,
                                     graph=graph,
                                     node_label="Chunk", 
diff --git a/backend/src/ragas_eval.py b/backend/src/ragas_eval.py
@@ -13,7 +13,13 @@
 from ragas.embeddings import LangchainEmbeddingsWrapper
 import nltk
 
-nltk.download('punkt')
+nltk.data.path.append("/usr/local/nltk_data")
+nltk.data.path.append(os.path.expanduser("~/.nltk_data"))
+try:
+    nltk.data.find("tokenizers/punkt")
+except LookupError:
+    nltk.download("punkt", download_dir=os.path.expanduser("~/.nltk_data"))
+    
 load_dotenv()
 
 EMBEDDING_MODEL = os.getenv("RAGAS_EMBEDDING_MODEL")
diff --git a/backend/src/shared/common_fn.py b/backend/src/shared/common_fn.py
@@ -1,7 +1,10 @@
 import hashlib
+import os
+from transformers import AutoTokenizer, AutoModel
+from langchain_huggingface import HuggingFaceEmbeddings
+from threading import Lock
 import logging
 from src.document_sources.youtube import create_youtube_url
-from langchain_huggingface import HuggingFaceEmbeddings
 from langchain_google_vertexai import VertexAIEmbeddings
 from langchain_openai import OpenAIEmbeddings
 from langchain_neo4j import Neo4jGraph
@@ -16,6 +19,40 @@
 import boto3
 from langchain_community.embeddings import BedrockEmbeddings
 
+MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"
+MODEL_PATH = "./local_model"
+_lock = Lock()
+_embedding_instance = None
+
+def ensure_sentence_transformer_model_downloaded():
+   if os.path.isdir(MODEL_PATH):
+       print("Model already downloaded at:", MODEL_PATH)
+       return
+   else:
+       print("Downloading model to:", MODEL_PATH)
+       tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
+       model = AutoModel.from_pretrained(MODEL_NAME)
+       tokenizer.save_pretrained(MODEL_PATH)
+       model.save_pretrained(MODEL_PATH)
+   print("Model downloaded and saved.")
+
+def get_local_sentence_transformer_embedding():
+   """
+   Lazy, threadsafe singleton. Caller does not need to worry about
+   import-time initialization or download race.
+   """
+   global _embedding_instance
+   if _embedding_instance is not None:
+       return _embedding_instance
+   with _lock:
+       if _embedding_instance is not None:
+           return _embedding_instance
+       # Ensure model is present before instantiating
+       ensure_sentence_transformer_model_downloaded()
+       _embedding_instance = HuggingFaceEmbeddings(model_name=MODEL_PATH)
+       print("Embedding model initialized.")
+       return _embedding_instance
+
 def check_url_source(source_type, yt_url:str=None, wiki_query:str=None):
     language=''
     try:
@@ -85,9 +122,8 @@ def load_embedding_model(embedding_model_name: str):
         dimension = 1536
         logging.info(f"Embedding: Using bedrock titan Embeddings , Dimension:{dimension}")
     else:
-        embeddings = HuggingFaceEmbeddings(
-            model_name="all-MiniLM-L6-v2"#, cache_folder="/embedding_model"
-        )
+        # embeddings = HuggingFaceEmbeddings(model_name="./local_model")
+        embeddings = get_local_sentence_transformer_embedding()
         dimension = 384
         logging.info(f"Embedding: Using Langchain HuggingFaceEmbeddings , Dimension:{dimension}")
     return embeddings, dimension
diff --git a/docker-compose.yml b/docker-compose.yml
@@ -7,13 +7,9 @@ services:
       dockerfile: Dockerfile
     volumes:
       - ./backend:/code
+    env_file:
+      - ./backend/.env
     environment:
-      - NEO4J_URI=${NEO4J_URI-neo4j://database:7687}
-      - NEO4J_PASSWORD=${NEO4J_PASSWORD-password}
-      - NEO4J_USERNAME=${NEO4J_USERNAME-neo4j}
-      - OPENAI_API_KEY=${OPENAI_API_KEY-}
-      - DIFFBOT_API_KEY=${DIFFBOT_API_KEY-}
-      - EMBEDDING_MODEL=${EMBEDDING_MODEL-all-MiniLM-L6-v2}
       - LANGCHAIN_ENDPOINT=${LANGCHAIN_ENDPOINT-}
       - LANGCHAIN_TRACING_V2=${LANGCHAIN_TRACING_V2-}
       - LANGCHAIN_PROJECT=${LANGCHAIN_PROJECT-}
diff --git a/frontend/src/components/Layout/PageLayout.tsx b/frontend/src/components/Layout/PageLayout.tsx
@@ -24,8 +24,6 @@ import { SKIP_AUTH } from '../../utils/Constants';
 import { useNavigate } from 'react-router';
 import { deduplicateByFullPattern, deduplicateNodeByValue } from '../../utils/Utils';
 import DataImporterSchemaDialog from '../Popups/GraphEnhancementDialog/EnitityExtraction/DataImporter';
-
-
 const GCSModal = lazy(() => import('../DataSources/GCS/GCSModal'));
 const S3Modal = lazy(() => import('../DataSources/AWS/S3Modal'));
 const GenericModal = lazy(() => import('../WebSources/GenericSourceModal'));
diff --git a/frontend/src/components/Popups/GraphEnhancementDialog/EnitityExtraction/GraphPattern.tsx b/frontend/src/components/Popups/GraphEnhancementDialog/EnitityExtraction/GraphPattern.tsx
@@ -38,6 +38,16 @@ const GraphPattern: React.FC<TupleCreationProps> = ({
   });
   const sourceRef = useRef<HTMLDivElement | null>(null);
   const { userCredentials } = useCredentials();
+  const deduplicateOptions = (options: OptionType[]): OptionType[] => {
+    const seen = new Set<string>();
+    return options.filter((option) => {
+      if (seen.has(option.value)) {
+        return false;
+      }
+      seen.add(option.value);
+      return true;
+    });
+  };
 
   useEffect(() => {
     const isGlobalStateSet =
@@ -64,17 +74,53 @@ const GraphPattern: React.FC<TupleCreationProps> = ({
             target: { value: targetVal, label: targetVal },
           };
         });
-        const savedSources: OptionType[] = Array.from(sourceSet).map((val) => ({ value: val, label: val }));
         const savedTypes: OptionType[] = Array.from(typeSet).map((val) => ({ value: val, label: val }));
-        const savedTargets: OptionType[] = Array.from(targetSet).map((val) => ({ value: val, label: val }));
+        const combinedSourceTarget = new Set([...sourceSet, ...targetSet]);
+        const combinedSourceTargetOptions: OptionType[] = Array.from(combinedSourceTarget).map((val) => ({
+          value: val,
+          label: val,
+        }));
+
         setSelectedRels(mappedRels);
-        setSourceOptions(savedSources);
+        setSourceOptions(combinedSourceTargetOptions);
         setTypeOptions(savedTypes);
-        setTargetOptions(savedTargets);
+        setTargetOptions(combinedSourceTargetOptions);
       }
     }
   }, []);
 
+  useEffect(() => {
+    let timeoutId: NodeJS.Timeout;
+    timeoutId = setTimeout(() => {
+      if (sourceOptions.length > 0) {
+        const deduped = deduplicateOptions(sourceOptions);
+        if (deduped.length !== sourceOptions.length) {
+          setSourceOptions(deduped);
+        }
+      }
+
+      if (targetOptions.length > 0) {
+        const deduped = deduplicateOptions(targetOptions);
+        if (deduped.length !== targetOptions.length) {
+          setTargetOptions(deduped);
+        }
+      }
+
+      if (typeOptions.length > 0) {
+        const deduped = deduplicateOptions(typeOptions);
+        if (deduped.length !== typeOptions.length) {
+          setTypeOptions(deduped);
+        }
+      }
+    }, 1000);
+
+    return () => {
+      if (timeoutId) {
+        clearTimeout(timeoutId);
+      }
+    };
+  }, []);
+
   const handleNewValue = (newValue: string, type: 'source' | 'type' | 'target') => {
     const regex = /^[^,]*$/;
     if (!newValue.trim()) {
@@ -92,8 +138,12 @@ const GraphPattern: React.FC<TupleCreationProps> = ({
     } else {
       setShowWarning((old) => ({ ...old, [type]: { showError: false, errorMessage: '' } }));
       const newOption: OptionType = { value: newValue.trim(), label: newValue.trim() };
-      const checkUniqueValue = (list: OptionType[], value: OptionType) =>
-        (list.some((opt) => opt.value === value.value) ? list : [...list, value]);
+      const checkUniqueValue = (list: OptionType[], value: OptionType) => {
+        const exists = list.some((opt) => opt.value === value.value);
+        const updatedList = exists ? list : [...list, value];
+        return deduplicateOptions(updatedList);
+      };
+
       switch (type) {
         case 'source':
           setSourceOptions((prev) => checkUniqueValue(prev, newOption));
@@ -110,7 +160,7 @@ const GraphPattern: React.FC<TupleCreationProps> = ({
           onPatternChange(selectedSource as OptionType, selectedType as OptionType, newOption);
           break;
         default:
-          console.log('wrong type added');
+          // Invalid type provided
           break;
       }
       setInputValues((prev) => ({ ...prev, [type]: '' }));