-
Notifications
You must be signed in to change notification settings - Fork 3
feat: [MLT-48] Object crops on PDFs #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Encord Agents test report90 tests 89 ✅ 3m 18s ⏱️ Results for commit 17d4bb9. ♻️ This comment has been updated with latest results. |
| import pymupdf | ||
|
|
||
|
|
||
| def extract_page( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't really be done by like other method as doc.close() apparently divorces pix so best to cast to file immediately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should fix this straight away. The pumupdf.Pixmap has a pil_image method so we can make the optional dependencies [mymupdf, PILLOW] and then use that. Then, no need to store on disk.
|
On a high level, this would be improved if instead we didn't write the intermediary crops to a file. As is, I'm just emulating the behaviour for videos but this could be improved for sure. Ticket already made around: Serverlessness |
c5f658e to
e472680
Compare
pyproject.toml
Outdated
| numpy = ">=1.26.4" | ||
| opencv-python = ">=4.1" | ||
|
|
||
| PyMuPdf = ">1.25.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big no no IMO. I agree with your comment that this should be made optional.
So you do pip install "encord-agents[pdf]" if you need it. encord-agents` has to be light weight in terms of dependencies. Cause what happens when we need to support DICOM or some exotic geospatial data type.
| import pymupdf | ||
|
|
||
|
|
||
| def extract_page( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should fix this straight away. The pumupdf.Pixmap has a pil_image method so we can make the optional dependencies [mymupdf, PILLOW] and then use that. Then, no need to store on disk.
Gives support for pdf package and provides method for interfacing with pdfs. Makes this package optional and slightly modify CI to ensure testing With pdf support
Feature, Tests, Docs not yet.
Currently adds dependency. Could technically become optional dependency