Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Certain PDF crashes RAG pipeline #98

Open
rjakomin opened this issue Feb 25, 2025 · 2 comments
Open

[Bug]: Certain PDF crashes RAG pipeline #98

rjakomin opened this issue Feb 25, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@rjakomin
Copy link

rjakomin commented Feb 25, 2025

Steps to reproduce

Hi,
whenever I try to copy the attached PDF file to my data folder monitored by the private RAG pipeline, it crashes the engine (the whole app docker container) without any error message. It happens every time for this pdf document: https://www.dancilla.com/PDF/Dancilla_alle_Volkstaenze.pdf

Relevant log output

2025-02-25 14:59:40 pathway_engine.connectors.monitoring INFO FileSystem(data): 1 entries (3530 minibatch(es)) have been sent to the engine
2025-02-25 15:00:32 root INFO {"_type": "request_payload", "session_id": "uuid-29d4de7b-6cd3-4f92-ab6e-5111029c3157", "payload": {}}
2025-02-25 15:00:37 root INFO {"_type": "request_payload", "session_id": "uuid-65828918-7085-4c99-8385-7a9c93320895", "payload": {}}
2025-02-25 15:00:44 pathway_engine.connectors.monitoring INFO FileSystem(data): 0 entries (1 minibatch(es)) have been sent to the engine
2025-02-25 15:00:44 pathway_engine.connectors.monitoring INFO PythonReader: 2 entries (87119 minibatch(es)) have been sent to the engine
2025-02-25 15:01:02 pathway_engine.connectors.monitoring INFO PythonReader: 0 entries (5 minibatch(es)) have been sent to the engine

What did you expect to happen?

Processing of the newly copied PDF file and including its content into the vector database used by the private RAG.

Version

current

Docker Versions (if used)

27.4.0, build bde2b89 (running on Windows 11)

OS

Windows 11

@rjakomin rjakomin added the bug Something isn't working label Feb 25, 2025
@rjakomin rjakomin changed the title [Bug]: PDF crashes RAG pipeline [Bug]: Certain PDF crashes RAG pipeline Feb 25, 2025
@dxtrous
Copy link
Member

dxtrous commented Feb 25, 2025

Thank you for the report @rjakomin. While we investigate this, could you share any relevant statistics (like docker stats memory, CPU usage profile directly before the crash) which could explain the cause of the crash?

@XGendre
Copy link
Contributor

XGendre commented Feb 26, 2025

Hi @rjakomin , thanks for your report. I did multiple tests in various environments including a Docker container in Windows 11. No problem occurs with the PDF file you gave. The file is read correctly and there is no crash.

Could you provide additional details about your environment and Docker statistics @dxtrous mentioned?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants