Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Native Embedder Bug: Successful Indexing Despite Error Message #2994

Open
Peterson047 opened this issue Jan 20, 2025 · 0 comments
Open
Labels
possible bug Bug was reported but is not confirmed or is unable to be replicated.

Comments

@Peterson047
Copy link

Peterson047 commented Jan 20, 2025

How are you running AnythingLLM?

Docker (VM Google Cloud 4vCPU - 4GB RAM)

What happened?

For several days, I’ve noticed a peculiar “bug” on the platform. When I upload a relatively large file in terms of word count, the native embedding model throws an error like:
SyntaxError: Unexpected token 's', "stream timeout" is not valid JSON.

This error suggests that the process failed and that the file wasn’t embedded. However, upon refreshing the page, the file magically appears as embedded.

While using the platform via the interface, this wasn’t a major issue—I could simply try again. However, when using the API, the same error occurred, and the connection was closed, preventing the process from completing successfully.

After much analysis, I suspected the issue might be related to the processor. I’m running the system on a 4-core VM in GCP. Previously, the same VM used to crash the container when I also ran the embedded vector database. I resolved that issue by hosting the files on Pinecone.

While monitoring the server's CPU usage, I noticed that even after the mentioned error, CPU usage remained high. This led me to discover that despite the interface or API returning an error, the file-to-vector conversion process continued running. That’s why, after refreshing the page, the file was already embedded.

My question:

What would be the best solution to address this issue?

  • Increase the VM’s resources?
  • Use an embedding service like Cohere, which provides dedicated support, similar to how I resolved the vector database issue with Pinecone?
  • Adopt a newer version of the embedding model available in the latest builds of AnythingLLM?

I’m working on a solution that will operate in automated routines. At certain intervals, the system will automatically send documents to the Workspace and pin them for user queries. However, this bug is creating challenges for implementing this functionality.

I appreciate any feedback or suggestions!

Are there known steps to reproduce?

No response

@Peterson047 Peterson047 added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
possible bug Bug was reported but is not confirmed or is unable to be replicated.
Projects
None yet
Development

No branches or pull requests

1 participant