|
| 1 | +.. include:: header.rst |
| 2 | + |
| 3 | + |
| 4 | +Change Log |
| 5 | +=========================================================================== |
| 6 | + |
| 7 | +Changes in version 0.0.11 |
| 8 | +-------------------------- |
| 9 | + |
| 10 | +Fixes: |
| 11 | +~~~~~~~ |
| 12 | + |
| 13 | +* `90 <https://github.com/pymupdf/RAG/issues/90>`_ "'Quad' object has no attribute 'tl'" |
| 14 | +* `88 <https://github.com/pymupdf/RAG/issues/88>`_ "Bug in is_significant function" |
| 15 | + |
| 16 | + |
| 17 | +Improvements: |
| 18 | +~~~~~~~~~~~~~~ |
| 19 | +* Extended the list of known bullet point characters. |
| 20 | + |
| 21 | + |
| 22 | +Changes in version 0.0.10 |
| 23 | +-------------------------- |
| 24 | + |
| 25 | +Fixes: |
| 26 | +~~~~~~~ |
| 27 | + |
| 28 | +* `73 <https://github.com/pymupdf/RAG/issues/73>`_ "bug in to_markdown internal function" |
| 29 | +* `74 <https://github.com/pymupdf/RAG/issues/74>`_ "minimum area for images & vector graphics" |
| 30 | +* `75 <https://github.com/pymupdf/RAG/issues/75>`_ "Poor Markdown Generation for Particular PDF" |
| 31 | +* `76 <https://github.com/pymupdf/RAG/issues/76>`_ "suggestion on useful api parameters" |
| 32 | + |
| 33 | + |
| 34 | +Improvements: |
| 35 | +~~~~~~~~~~~~~~ |
| 36 | +* Improved recognition of "insignificant" vector graphics. Graphics like text highlights or borders will be ignored. |
| 37 | +* The format of saved images can now be controlled via new parameter `image_format`. |
| 38 | +* Images can be stored in a specific folder via the new parameter `image_path`. |
| 39 | +* Images are **not stored if contained** in another image on same page. |
| 40 | +* Images are **not stored if too small:** if width or height are less than 5% of corresponding page dimension. |
| 41 | +* All text is always written. If `write_images=True`, text on images / graphics can be suppressed by setting `force_text=False`. |
| 42 | + |
| 43 | + |
| 44 | +Changes in version 0.0.9 |
| 45 | +-------------------------- |
| 46 | + |
| 47 | +Fixes: |
| 48 | +~~~~~~~ |
| 49 | + |
| 50 | +* `71 <https://github.com/pymupdf/RAG/issues/71>`_ "Unexpected results in pymupdf4llm but pymupdf works" |
| 51 | +* `68 <https://github.com/pymupdf/RAG/issues/68>`_ "Issue with text extraction near footer of page" |
| 52 | + |
| 53 | + |
| 54 | +Improvements: |
| 55 | +~~~~~~~~~~~~~~ |
| 56 | +* Improved identification of scattered text span particles. This should address most issues with out-of-sequence situations. |
| 57 | +* We now correctly process rotated pages (see issue #68). |
| 58 | + |
| 59 | + |
| 60 | +Changes in version 0.0.8 |
| 61 | +-------------------------- |
| 62 | + |
| 63 | +Fixes: |
| 64 | +~~~~~~~ |
| 65 | + |
| 66 | +* `65 <https://github.com/pymupdf/RAG/issues/65>`_ Fix typo in `pymupdf_rag.py`. |
| 67 | + |
| 68 | + |
| 69 | +Changes in version 0.0.7 |
| 70 | +-------------------------- |
| 71 | + |
| 72 | +Fixes: |
| 73 | +~~~~~~~ |
| 74 | + |
| 75 | +* `54 <https://github.com/pymupdf/RAG/issues/54>`_ "Mistakes in orchestrating sentences". Additional fix: text extraction no longer uses the TEXT_DEHYPHNATE flag bit. |
| 76 | + |
| 77 | +Improvements: |
| 78 | +~~~~~~~~~~~~~~~~ |
| 79 | + |
| 80 | +* Improved the algorithm dealing with vector graphics. Vector graphics are now more reliably classified as irrelevant: We now detect when "strokes" only exist in the neighborhood of the graphics boundary box border itself. This is quite often the case for code snippets. |
| 81 | + |
| 82 | + |
| 83 | +Changes in version 0.0.6 |
| 84 | +-------------------------- |
| 85 | + |
| 86 | +Fixes: |
| 87 | +~~~~~~~ |
| 88 | + |
| 89 | +* `55 <https://github.com/pymupdf/RAG/issues/55>`_ "Bug in helpers/multi_column.py - IndexError: list index out of range" |
| 90 | +* `54 <https://github.com/pymupdf/RAG/issues/54>`_ "Mistakes in orchestrating sentences" |
| 91 | +* `52 <https://github.com/pymupdf/RAG/issues/52>`_ "Chunking of text files" |
| 92 | +* Partial fix for `41 <https://github.com/pymupdf/RAG/issues/41>`_ / `40 <https://github.com/pymupdf/RAG/issues/40>`_. Improved page column detection, but still no silver bullet for overly complex page layouts. |
| 93 | + |
| 94 | +Improvements: |
| 95 | +~~~~~~~~~~~~~~~~ |
| 96 | + |
| 97 | +* New parameter `dpi` to specify the resolution of images. |
| 98 | +* New parameters `page_width` / `page_height` for easily processing reflowable documents (Text, Office, e-books). |
| 99 | +* New parameter `graphics_limit` to avoid spending runtimes for value-less content. |
| 100 | +* New parameter `table_strategy` to directly control the table detection strategy. |
| 101 | + |
| 102 | +.. include:: footer.rst |
| 103 | + |
0 commit comments