You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGES.md
+18Lines changed: 18 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,23 @@
1
1
# Change Log
2
2
3
+
## Changes in version 0.0.20
4
+
5
+
### Fixes:
6
+
7
+
*[171](https://github.com/pymupdf/RAG/issues/171) - Text rects overlap with tables and images that should be excluded.
8
+
*[189](https://github.com/pymupdf/RAG/issues/189) - The position of the extracted image is incorrect
9
+
*[238](https://github.com/pymupdf/RAG/issues/238) - When text is laid out around the picture, text extraction is missing.
10
+
11
+
### Other Changes:
12
+
13
+
* Added **_new parameter_**`ignore_images`: (bool) optional. `True` will not consider images in any way. May be useful for pages where a plethora of images prevents meaningful layout analysis. Typical examples are PowerPoint slides and derived / similar pages.
14
+
15
+
* Added **_new parameter_**`ignore_graphics`: (bool), optional. `True` will not consider graphics except for table detection. May be useful for pages where a plethora of vector graphics prevents meaningful layout analysis. Typical examples are PowerPoint slides and derived / similar pages.
16
+
17
+
* Added **_new parameter_** to class `IdentifyHeaders`: Use `max_levels` (integer <= 6) to limit the generation of header tag levels. e.g. `headers = pymupdf4llm.IdentifyHeaders(doc, max_level=3)` ensures that only up to 3 header levels will ever be generated. Any text with a font size less than the value of `###` will be body text. In this case, the markdown generation itself would be coded as `md = pymupdf4llm.to_markdown(doc, hdr_info=headers, ...)`.
18
+
19
+
* Changed parameter `table_strategy`: When specifying `None`, no effort to detecting tables will be made. This can be useful when tables are of no interest or known to not exist in a given file. This will speed up processing significantly. Be prepared to see more changes and extensions here.
0 commit comments