You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For several days, I’ve noticed a peculiar “bug” on the platform. When I upload a relatively large file in terms of word count, the native embedding model throws an error like: SyntaxError: Unexpected token 's', "stream timeout" is not valid JSON.
This error suggests that the process failed and that the file wasn’t embedded. However, upon refreshing the page, the file magically appears as embedded.
While using the platform via the interface, this wasn’t a major issue—I could simply try again. However, when using the API, the same error occurred, and the connection was closed, preventing the process from completing successfully.
After much analysis, I suspected the issue might be related to the processor. I’m running the system on a 4-core VM in GCP. Previously, the same VM used to crash the container when I also ran the embedded vector database. I resolved that issue by hosting the files on Pinecone.
While monitoring the server's CPU usage, I noticed that even after the mentioned error, CPU usage remained high. This led me to discover that despite the interface or API returning an error, the file-to-vector conversion process continued running. That’s why, after refreshing the page, the file was already embedded.
My question:
What would be the best solution to address this issue?
Increase the VM’s resources?
Use an embedding service like Cohere, which provides dedicated support, similar to how I resolved the vector database issue with Pinecone?
Adopt a newer version of the embedding model available in the latest builds of AnythingLLM?
I’m working on a solution that will operate in automated routines. At certain intervals, the system will automatically send documents to the Workspace and pin them for user queries. However, this bug is creating challenges for implementing this functionality.
I appreciate any feedback or suggestions!
Are there known steps to reproduce?
No response
The text was updated successfully, but these errors were encountered:
How are you running AnythingLLM?
Docker (VM Google Cloud 4vCPU - 4GB RAM)
What happened?
For several days, I’ve noticed a peculiar “bug” on the platform. When I upload a relatively large file in terms of word count, the native embedding model throws an error like:
SyntaxError: Unexpected token 's', "stream timeout" is not valid JSON.
This error suggests that the process failed and that the file wasn’t embedded. However, upon refreshing the page, the file magically appears as embedded.
While using the platform via the interface, this wasn’t a major issue—I could simply try again. However, when using the API, the same error occurred, and the connection was closed, preventing the process from completing successfully.
After much analysis, I suspected the issue might be related to the processor. I’m running the system on a 4-core VM in GCP. Previously, the same VM used to crash the container when I also ran the embedded vector database. I resolved that issue by hosting the files on Pinecone.
While monitoring the server's CPU usage, I noticed that even after the mentioned error, CPU usage remained high. This led me to discover that despite the interface or API returning an error, the file-to-vector conversion process continued running. That’s why, after refreshing the page, the file was already embedded.
My question:
What would be the best solution to address this issue?
I’m working on a solution that will operate in automated routines. At certain intervals, the system will automatically send documents to the Workspace and pin them for user queries. However, this bug is creating challenges for implementing this functionality.
I appreciate any feedback or suggestions!
Are there known steps to reproduce?
No response
The text was updated successfully, but these errors were encountered: