Bump pymupdf from 1.22.5 to 1.23.6 #136

dependabot · 2023-11-07T00:02:37Z

Bumps pymupdf from 1.22.5 to 1.23.6.

Release notes

PyMuPDF-1.23.6 released

PyMuPDF-1.23.6 has been released.

Wheels for Windows, Linux and MacOS, and the sdist, are available on pypi.org and can be installed in the usual way, for example:
python -m pip install --upgrade pymupdf
[Linux-aarch64 wheels are not available yet, they will be build and uploaded later.]

Changes in version 1.23.6 (2023-11-06)

Bug fixes:

Fixed #2553

Fixed #2608

Fixed #2710

Fixed #2774

Fixed #2775

Fixed #2777

Other:

Use MuPDF-1.23.5.

Added Document.resolve_names() (rebased implementation only).

PyMuPDF-1.23.5 released

PyMuPDF-1.23.5 has been released.

Wheels for Windows, Linux and MacOS, and the sdist, are available on pypi.org and can be installed in the usual way, for example:
python -m pip install --upgrade pymupdf
Changes in version 1.23.5 (2023-10-11)

Bug fixes:

Fixed #2341

Fixed #2522

Fixed #2548

Fixed #2596

Fixed #2635

Fixed #2637

Fixed #2699

Fixed #2703

Fixed #2710

Fixed #2723

Fixed #2730

... (truncated)

Changelog

Sourced from pymupdf's changelog.

Change Log

Changes in version 1.23.6 (2023-11-06)

Bug fixes:

Fixed 2553 <https://github.com/pymupdf/PyMuPDF/issues/2553>_: Invalid characters in versions >= 1.22

Fixed 2608 <https://github.com/pymupdf/PyMuPDF/issues/2608>_: Incorrect utf32 text extraction (high & low surrogates are split)

Fixed 2710 <https://github.com/pymupdf/PyMuPDF/issues/2710>_: page.rect and text location wrong / differing from older version

Fixed 2774 <https://github.com/pymupdf/PyMuPDF/issues/2774>_: wrong encoding for "?" character when sort=True

Fixed 2775 <https://github.com/pymupdf/PyMuPDF/issues/2775>_: fitz_new does not work with python3.10 or earlier

Fixed 2777 <https://github.com/pymupdf/PyMuPDF/issues/2777>_: With fitz_new, wrong type for Page.mediabox

Other:

Use MuPDF-1.23.5.

Added Document.resolve_names() (rebased implementation only).

Changes in version 1.23.5 (2023-10-11)

Bug fixes:

Fixed 2341 <https://github.com/pymupdf/PyMuPDF/issues/2341>_: Handling negative values in the zoom section for LINK_GOTO in linkDest

Fixed 2522 <https://github.com/pymupdf/PyMuPDF/issues/2522>_: Typo in set_layer() - NameError: name 'f' is not defined

Fixed 2548 <https://github.com/pymupdf/PyMuPDF/issues/2548>_: Fitz freezes on some PDFs when calling the fitz.Page.get_text_blocks method.

Fixed 2596 <https://github.com/pymupdf/PyMuPDF/issues/2596>_: save(garbage=3) breaks get_pixmap() with side effect

Fixed 2635 <https://github.com/pymupdf/PyMuPDF/issues/2635>_: "clean=True" makes objects invisible in the pdf

Fixed 2637 <https://github.com/pymupdf/PyMuPDF/issues/2637>_: Page.insert_textbox incorrectly handles the last word if it starts a new line

Fixed 2699 <https://github.com/pymupdf/PyMuPDF/issues/2699>_: extract paragraph with below table

Fixed 2703 <https://github.com/pymupdf/PyMuPDF/issues/2703>_: Wrong fontsize calculation in corner cases ("page.get_texttrace()")

Fixed 2710 <https://github.com/pymupdf/PyMuPDF/issues/2710>_: page.rect and text location wrong / differing from older version

Fixed 2723 <https://github.com/pymupdf/PyMuPDF/issues/2723>_: When will a Python 3.12 wheel be available?

Fixed 2730 <https://github.com/pymupdf/PyMuPDF/issues/2730>_: persistent get_text() formatting

Other:

Use MuPDF-1.23.4.

Fix optimisation flags with system installs.

Fixed the problem that the clip parameter does not take effect during table recognition

Support Pillow mode "RGBa"

Support extra word delimiters

Support checking valid PDF name objects

Changes in version 1.23.4 (2023-09-26)

Improved build instructions.

... (truncated)

Commits

fb67864 Update dates for release 1.23.6.
363bce0 tests/test_2548.py: fixed spurious test failure with mupdf release branch.
79b6248 tests/test_2548.py:test_2548(): fixed for mupdf-1.23.x and mupdf-master.
8922a21 tests/test_general.py:test_2710(): fix expectations for mupdf-1.23.x and mupd...
a969b1d setup.py: fixed handling of PYMUPDF_SETUP_REBUILD_GIT_DETAILS.
5e4056f pipcl.py: avoid unnecessarily updating .so's on macos.
2f4e2b7 tests/test_2548.py:test_2548(): update to match recent change to mupdf master.
7ffc37c Updates for release 1.23.6.
18ae8b9 docs/document.rst: added Document.resolve_names().
a4ccec5 tests/test_textextract.py: added test of docx device's 'space-guess'.
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

railway-app · 2023-11-07T00:02:40Z

This PR was not deployed automatically as @dependabot[bot] does not have access to the Railway project.

In order to get automatic PR deploys, please add @dependabot[bot] to the project inside the project settings page.

lintrule-review · 2023-11-07T00:02:40Z

You need to setup a payment method to use Lintrule

You can fix that by putting in a card here.

Bumps [pymupdf](https://github.com/pymupdf/pymupdf) from 1.22.5 to 1.23.6. - [Release notes](https://github.com/pymupdf/pymupdf/releases) - [Changelog](https://github.com/pymupdf/PyMuPDF/blob/main/changes.txt) - [Commits](pymupdf/PyMuPDF@1.22.5...1.23.6) --- updated-dependencies: - dependency-name: pymupdf dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]>

KastanDay

Works on local, pushing to test railway build.

* initial attempt * add parallel calls to local LLM for filtering. It's fully working, but it's too slow * add newrelic logging * add langhub prompt stuffing, works great. prep newrelic logging * optimize load time of hub.pull(prompt) * Working filtering with time limit, but the time limit is not fully respected, it will only return the next one after your time limit expires * Working stably, but it's too slow and under-utilizing the GPUs. Need VLLM or Ray Serve to increase GPU Util * Adding replicate model run to our utils... but the concurrency is not good enough * Initial commit for multi query retriever * Integrating Multi query retriever with in context padding. Replaced LCEL with custom implementation for retrieval and reciprocal rank fusion. Added llm to Ingest() * Bumping up langchain version for new imports * Adding langchainhub to requirements * Using gpt3.5 instead of llm server * Updating python version in railway * Updated Nomic in requirements.txt * fix openai version to pre 1.0 * anyscale LLM inference is faster than replicate or kastan.ai, 10 seconds for 80 inference * upgrade python from 3.8 to 3.10 * trying to fix tesseract // pdfminer requirements for image ingest * adding strict versions to all requirements * Bump pymupdf from 1.22.5 to 1.23.6 (#136) Bumps [pymupdf](https://github.com/pymupdf/pymupdf) from 1.22.5 to 1.23.6. - [Release notes](https://github.com/pymupdf/pymupdf/releases) - [Changelog](https://github.com/pymupdf/PyMuPDF/blob/main/changes.txt) - [Commits](pymupdf/PyMuPDF@1.22.5...1.23.6) --- updated-dependencies: - dependency-name: pymupdf dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * compatible wheel version * upgrade pip during image startup * properly upgrade pip * Fully lock ALL requirements. Hopefully speed up build times, too * Limit unstructured dependencies, image balloned from 700MB to 6GB. Hopefully resolved * Lock version of pip * Lock (correct) version of pip * add libgl1 for cv2 in Docker (for unstructured) * adding proper error logging to image ingest * Installing unstructured requirements individually to hopefully redoce bundle size by 5GB * Downgrading openai package version to pre-vision release * Update requirements.txt to latest on main * Add langchainhub to requirements * Reduce use of unstructured, hopefully the install is much smaller now * Guarantee Unique S3 Upload paths (#137) * should be fully working, in final testing * trying to fix double nested kwargs * fixing readable_filename in pdf ingest * apt install tesseract-ocr, LAME * remove stupid typo * minor bug * Finally fix **kwargs passing * minor fix * guarding against webscrape kwargs in pdf * guarding against webscrape kwargs in pdf * guarding against webscrape kwargs in pdf * adding better error messages * revert req changes * simplify prints * Bump typing-extensions from 4.7.1 to 4.8.0 (#90) Bumps [typing-extensions](https://github.com/python/typing_extensions) from 4.7.1 to 4.8.0. - [Release notes](https://github.com/python/typing_extensions/releases) - [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md) - [Commits](python/typing_extensions@4.7.1...4.8.0) --- updated-dependencies: - dependency-name: typing-extensions dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kastan Day <[email protected]> * Bump flask from 2.3.3 to 3.0.0 (#101) Bumps [flask](https://github.com/pallets/flask) from 2.3.3 to 3.0.0. - [Release notes](https://github.com/pallets/flask/releases) - [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst) - [Commits](pallets/flask@2.3.3...3.0.0) --- updated-dependencies: - dependency-name: flask dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kastan Day <[email protected]> * modified context_padding function to handle edge case * added print statements for testing * added print statements for multi-query test * added prints for valid docs * print statements for valid docs metadata * Guard against kwargs failures during webscrape * added fix for url parameter * HOTFIX: kwargs in html and pdf ingest for /webscrape * fix for pagenumber error * removed timestamp parameter * url fix * guard against missing URL metadata * minor refactor & cleanup * modified function to only pad first 5 docs * modified context_padding with multi-threading * modified context padding for only first 5 docs * modified function for removing duplicates in padded contexts * minor changes * Export conversation history on /analysis page (#141) * updated nomic version in requirements.txt * initial commit to PR * created API endpoint * completed export function * testing csv export on railway * code to remove file from repo after download * moved file storing out of docs folder * created a separate endpoint for multi-query-retrieval * added similar format_for_json for MQR * added option for extending one URL our when on baseurl or to opt out of it * merged context_filtering with MQR * added replicate to requirements.txt * added openai type to all openai functions * added filtering to the retrieval pipeline * moved filtering after context padding * changed model in run_anyscale() * minor string formatting in print statements * added ray.init() before calling filter function * added a wrapper function for run() * modified the code to use thread pool processor * fixed pool execution errors * replaced threadpool with processpool * testing multiprocessing with 10 contexts * restored to using all contexts * changed max_workers to 100 * changed max_workers to 100 * Guarentee unique s3 upload paths, support file updates (e.g. duplicate file guardfor Cron jobs) (#99) * added the add_users() for Canvas * added canvas course ingest * updated requirements * added .md ingest and fixed .py ingest * deleted test ipynb file * added nomic viz * added canvas file update function * completed update function * updated course export to include all contents * modified to handle diff file structures of downloaded content * modified canvas update * modified ingest function * modified update_files() for file replacement * removed the extra os.remove() * fix underscore to dash in for pip * removed json import and added abort to canvas functions * created separate PR for file update * added file-update logic in ingest, WIP * removed irrelevant text files * modified pdf ingest function * fixed PDF duplicate issue * removed unwanted files * updated nomic version in requirements.txt * modified s3_paths * testing unique filenames in aws upload * added missing library to requirements.txt * finished check_for_duplicates() * fixed filename errors * minor corrections * added a uuid check in check_for_duplicates() * regex depends on this being a dash * regex depends on this being a dash * Fix bug when no duplicate exists. * cleaning up prints, testing looks good. ready to merge * Further print and logging refinement * Remove s3 pased method for de-duplication, use Supabase only * remove duplicate imports * remove new requirement * Final print cleanups * remove pypdf import --------- Co-authored-by: root <root@ASMITA> Co-authored-by: Kastan Day <[email protected]> * changed workers to 30 in run.sh * Add Trunk Superlinter on-commit hooks (#164) * First attempt, should auto format on commit * maybe fix my yapf github action? Just bad formatting. * Finalized, excellent Trunk configs for my desired formatting * Further fix yapf GH Action * Full format of all files with Trunk * Fix more linting errors * Ignore .vscdoe folder * Reduce max line size to 120 (from 140) * Format code * Delete GH Action & Revert formatting in favor of Trunk. * Ignore the Readme * Remove trufflehog -- failing too much, confusing to new devs * Minor docstring update * trivial commit for testing * removing trivial commit for testing * Merge main into branch, vector_database.py probably needs work * Cleanup all Trunk lint errors that I can --------- Co-authored-by: KastanDay <[email protected]> Co-authored-by: Rohan Marwaha <[email protected]> * changed workers to 3 * logging time in API calling * removed wait parameter from executor.shutdown() * added timelog after openai completion * set openai api type as global variable * reduced max workers to 30 * moved filtering after MQR and modified th filtering code * minor function name change * minor changes * minor changes to print statements * Add example usage of our public API for chat calls * Add timeout to request, best practice * Add example usage notebook for our public API * Improve usage example to return model's response for easy storage. Fix linter inf loop * Final fix: Switch to https connections * Enhance logging in getTopContexts(), improve usage exmaple * Working implementation. Using ray, tested end to end locally * cleanup imports and dependencies * hard lock requirement versions * fix requirements hard locks * slim down reqs * Merge main.. touching up lint errors * Add pydantic req * fix ray start syntax * Improve prints logging * Add posthot logging for filter_top_contexts * Add course name to posthog logs * Remove langsmith hub for prompts because too unstable, hardcoded instead * remove osv-scanner from trunk linting runs --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Kastan Day <[email protected]> Co-authored-by: Asmita Dabholkar <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jkmin3 <[email protected]> Co-authored-by: root <root@ASMITA> Co-authored-by: KastanDay <[email protected]>

* updated nomic version in requirements.txt * Updated Nomic in requirements.txt * fix openai version to pre 1.0 * upgrade python from 3.8 to 3.10 * trying to fix tesseract // pdfminer requirements for image ingest * adding strict versions to all requirements * Bump pymupdf from 1.22.5 to 1.23.6 (#136) Bumps [pymupdf](https://github.com/pymupdf/pymupdf) from 1.22.5 to 1.23.6. - [Release notes](https://github.com/pymupdf/pymupdf/releases) - [Changelog](https://github.com/pymupdf/PyMuPDF/blob/main/changes.txt) - [Commits](pymupdf/PyMuPDF@1.22.5...1.23.6) --- updated-dependencies: - dependency-name: pymupdf dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * compatible wheel version * upgrade pip during image startup * properly upgrade pip * Fully lock ALL requirements. Hopefully speed up build times, too * Limit unstructured dependencies, image balloned from 700MB to 6GB. Hopefully resolved * Lock version of pip * Lock (correct) version of pip * add libgl1 for cv2 in Docker (for unstructured) * adding proper error logging to image ingest * Installing unstructured requirements individually to hopefully redoce bundle size by 5GB * Reduce use of unstructured, hopefully the install is much smaller now * Guarantee Unique S3 Upload paths (#137) * should be fully working, in final testing * trying to fix double nested kwargs * fixing readable_filename in pdf ingest * apt install tesseract-ocr, LAME * remove stupid typo * minor bug * Finally fix **kwargs passing * minor fix * guarding against webscrape kwargs in pdf * guarding against webscrape kwargs in pdf * guarding against webscrape kwargs in pdf * adding better error messages * revert req changes * simplify prints * Bump typing-extensions from 4.7.1 to 4.8.0 (#90) Bumps [typing-extensions](https://github.com/python/typing_extensions) from 4.7.1 to 4.8.0. - [Release notes](https://github.com/python/typing_extensions/releases) - [Changelog](https://github.com/python/typing_extensions/blob/main/CHANGELOG.md) - [Commits](python/typing_extensions@4.7.1...4.8.0) --- updated-dependencies: - dependency-name: typing-extensions dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kastan Day <[email protected]> * Bump flask from 2.3.3 to 3.0.0 (#101) Bumps [flask](https://github.com/pallets/flask) from 2.3.3 to 3.0.0. - [Release notes](https://github.com/pallets/flask/releases) - [Changelog](https://github.com/pallets/flask/blob/main/CHANGES.rst) - [Commits](pallets/flask@2.3.3...3.0.0) --- updated-dependencies: - dependency-name: flask dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kastan Day <[email protected]> * Guard against kwargs failures during webscrape * HOTFIX: kwargs in html and pdf ingest for /webscrape * Export conversation history on /analysis page (#141) * updated nomic version in requirements.txt * initial commit to PR * created API endpoint * completed export function * testing csv export on railway * code to remove file from repo after download * moved file storing out of docs folder * added option for extending one URL our when on baseurl or to opt out of it * Guarentee unique s3 upload paths, support file updates (e.g. duplicate file guardfor Cron jobs) (#99) * added the add_users() for Canvas * added canvas course ingest * updated requirements * added .md ingest and fixed .py ingest * deleted test ipynb file * added nomic viz * added canvas file update function * completed update function * updated course export to include all contents * modified to handle diff file structures of downloaded content * modified canvas update * modified ingest function * modified update_files() for file replacement * removed the extra os.remove() * fix underscore to dash in for pip * removed json import and added abort to canvas functions * created separate PR for file update * added file-update logic in ingest, WIP * removed irrelevant text files * modified pdf ingest function * fixed PDF duplicate issue * removed unwanted files * updated nomic version in requirements.txt * modified s3_paths * testing unique filenames in aws upload * added missing library to requirements.txt * finished check_for_duplicates() * fixed filename errors * minor corrections * added a uuid check in check_for_duplicates() * regex depends on this being a dash * regex depends on this being a dash * Fix bug when no duplicate exists. * cleaning up prints, testing looks good. ready to merge * Further print and logging refinement * Remove s3 pased method for de-duplication, use Supabase only * remove duplicate imports * remove new requirement * Final print cleanups * remove pypdf import --------- Co-authored-by: root <root@ASMITA> Co-authored-by: Kastan Day <[email protected]> * Add Trunk Superlinter on-commit hooks (#164) * First attempt, should auto format on commit * maybe fix my yapf github action? Just bad formatting. * Finalized, excellent Trunk configs for my desired formatting * Further fix yapf GH Action * Full format of all files with Trunk * Fix more linting errors * Ignore .vscdoe folder * Reduce max line size to 120 (from 140) * Format code * Delete GH Action & Revert formatting in favor of Trunk. * Ignore the Readme * Remove trufflehog -- failing too much, confusing to new devs * Minor docstring update * trivial commit for testing * removing trivial commit for testing * Merge main into branch, vector_database.py probably needs work * Cleanup all Trunk lint errors that I can --------- Co-authored-by: KastanDay <[email protected]> Co-authored-by: Rohan Marwaha <[email protected]> * Add example usage of our public API for chat calls * Add timeout to request, best practice * Add example usage notebook for our public API * Improve usage example to return model's response for easy storage. Fix linter inf loop * Final fix: Switch to https connections * Enhance logging in getTopContexts(), improve usage exmaple * minor changes for postman testing * minor changes for testing * added print statements * re-creating error * added condition to check if content is a list * added json handling needed to test with Postman * exception handling for get-nomic-map * json decoding for testing * added prints for testing * added prints for testing * added prints for testing * added prints for testing * fix for string error in nomic log * removed json debugging code * Cleanup comments * Enhance type checking, cleanup formatting * formatting * Fix type checks to isinstance() * Revert vector_database.py to status on main --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Kastan Day <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: jkmin3 <[email protected]> Co-authored-by: root <root@ASMITA> Co-authored-by: KastanDay <[email protected]> Co-authored-by: Rohan Marwaha <[email protected]>

dependabot bot added the dependencies Pull requests that update a dependency file label Nov 7, 2023

dependabot bot mentioned this pull request Nov 7, 2023

Bump pymupdf from 1.22.5 to 1.23.5 #116

Closed

dependabot bot force-pushed the dependabot/pip/pymupdf-1.23.6 branch from 26b8332 to d3460be Compare November 7, 2023 15:34

dependabot bot force-pushed the dependabot/pip/pymupdf-1.23.6 branch from d3460be to a79114f Compare November 7, 2023 15:44

KastanDay approved these changes Nov 7, 2023

View reviewed changes

KastanDay merged commit d4b4e8f into main Nov 7, 2023
1 check passed

KastanDay deleted the dependabot/pip/pymupdf-1.23.6 branch November 7, 2023 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump pymupdf from 1.22.5 to 1.23.6 #136

Bump pymupdf from 1.22.5 to 1.23.6 #136

dependabot bot commented on behalf of github Nov 7, 2023 •

edited

Loading

railway-app bot commented Nov 7, 2023

lintrule-review bot commented Nov 7, 2023

KastanDay left a comment

Bump pymupdf from 1.22.5 to 1.23.6 #136

Bump pymupdf from 1.22.5 to 1.23.6 #136

Conversation

dependabot bot commented on behalf of github Nov 7, 2023 • edited Loading

PyMuPDF-1.23.6 released

PyMuPDF-1.23.5 released

Change Log

railway-app bot commented Nov 7, 2023

lintrule-review bot commented Nov 7, 2023

You need to setup a payment method to use Lintrule

KastanDay left a comment

Choose a reason for hiding this comment

dependabot bot commented on behalf of github Nov 7, 2023 •

edited

Loading