You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* New parameter `embed_images` (bool) **embeds** images and vector graphics in the markdown text as base64-encoded strings. Ignores `write_images` and `image_path` parameters.
23
+
* New parameter `image_size_limit` which is a float between 0 and 1, default is 0.05 (5%). Causes images to be ignored if their width or height values are smaller than the corresponding fraction of the page's width or height.
24
+
* The algorithm has been improved which determins the sequence of the text rectangles on multi-column pages.
25
+
* Change of the header identification algorithm: If more than six header levels are required for a document, then all text with a font size larger than body text is assumed to be a header of level 6 (i.e. HTML "h6" = "###### ").
26
+
27
+
7
28
Changes in version 0.0.13
8
29
--------------------------
9
30
@@ -19,7 +40,6 @@ Improvements:
19
40
* New parameter `extract_words` enforces `page_chunks=True` and adds a "words" list to each page dictionary.
0 commit comments