How are you running AnythingLLM?
Docker (local)
What happened?
When uploading bulk files into documents, I receive an error message "document processing api is not online" randomly on different files as they're being uploaded.
In experimentation, I had selected 8 PDF files that were all over 300+mb each. One of the 8 failed with the above error. If I wait for the other 7 to complete and then re-upload the one that failed, it uploads successfully.
In small batches, this is manageable, as I can pin-point the one that failed and re-upload it. However, in bulk testing, multiple files will fail and it's impossible to keep track of which ones sent and which ones fails, so the only solution I've found is to delete all the files, then re-upload them at 4 to 6 at a time (which takes HOURS when uploading hundreds of documents).
-
It appears as if the API that manages the upload is limited to the number of documents it can process at one time and/or if it tries to start an upload and the API is busy handling other files, it will fail as not online.
-
If a file fails, the system doesn't appear to try and upload the file again. It just errors and the user must try and track which file failed and then re-submit it for upload after the queue has finished. This is next to impossible on bulk files.
A) It would be nice, when uploading files in bulk or uploading large files, to control how many documents it tries to process at once. Example, if I am uploading 1,500 PDF files, a setting to limit the processor to no more than 4 documents at a time (to try and minimize the failures / track which files failed on upload).
B) It would be nice if there was a log file or report produced after a bulk upload that would list which files failed and which were successful. This would make it easier to identify which files need to be re-uploaded.
C) During the upload process, if a file fails to upload due to the API being unavailable, have the system automatically try the file again. Either move the file to the bottom of the list and retry or automatically try and then fail after X number of attempts.
Thank you.
Are there known steps to reproduce?
Windows running Docker version, upload 100+ large (30mb+) documents into the document manager.
How are you running AnythingLLM?
Docker (local)
What happened?
When uploading bulk files into documents, I receive an error message "document processing api is not online" randomly on different files as they're being uploaded.
In experimentation, I had selected 8 PDF files that were all over 300+mb each. One of the 8 failed with the above error. If I wait for the other 7 to complete and then re-upload the one that failed, it uploads successfully.
In small batches, this is manageable, as I can pin-point the one that failed and re-upload it. However, in bulk testing, multiple files will fail and it's impossible to keep track of which ones sent and which ones fails, so the only solution I've found is to delete all the files, then re-upload them at 4 to 6 at a time (which takes HOURS when uploading hundreds of documents).
It appears as if the API that manages the upload is limited to the number of documents it can process at one time and/or if it tries to start an upload and the API is busy handling other files, it will fail as not online.
If a file fails, the system doesn't appear to try and upload the file again. It just errors and the user must try and track which file failed and then re-submit it for upload after the queue has finished. This is next to impossible on bulk files.
A) It would be nice, when uploading files in bulk or uploading large files, to control how many documents it tries to process at once. Example, if I am uploading 1,500 PDF files, a setting to limit the processor to no more than 4 documents at a time (to try and minimize the failures / track which files failed on upload).
B) It would be nice if there was a log file or report produced after a bulk upload that would list which files failed and which were successful. This would make it easier to identify which files need to be re-uploaded.
C) During the upload process, if a file fails to upload due to the API being unavailable, have the system automatically try the file again. Either move the file to the bottom of the list and retry or automatically try and then fail after X number of attempts.
Thank you.
Are there known steps to reproduce?
Windows running Docker version, upload 100+ large (30mb+) documents into the document manager.