You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- This option is used to enable GOST (Russian government standard "ГОСТ Р 21.1101") frame recognition for PDF documents or images.
24
24
25
25
26
26
The content of each page of some technical documents is placed in special GOST frames. An example of GOST frames is shown in the example below (:ref:`example_gost_frame`).
27
-
Such frames contain meta-information and are not part of the text content of the document.Based on this, we have implemented the functionality for ignoring GOST frames in documents, which works for:
27
+
Such frames contain meta-information and are not part of the text content of the document.Based on this, we have implemented the functionality for ignoring GOST frames in documents, which works for:
28
28
29
-
* Copyable and non-copyable PDF documents (:class:`dedoc.readers.PdfTxtlayerReader` and :class:`dedoc.readers.PdfTabbyReader`);
30
-
* Images (:class:`dedoc.readers.PdfImageReader`).
29
+
* Copyable PDF documents (:class:`dedoc.readers.PdfTxtlayerReader` and :class:`dedoc.readers.PdfTabbyReader`);
30
+
* Non-copyable PDF documents and Images (:class:`dedoc.readers.PdfImageReader`).
31
31
32
32
If parameter ``need_gost_frame_analysis=True``, the GOST frame itself is ignored and only the contents inside the frame are extracted.
33
33
34
34
.. _example_gost_frame:
35
35
36
36
Examples of GOST frame
37
37
----------------------
38
-
For example your send PDF-document with two pages:
38
+
For example, your send PDF-document with two pages:download:`PDF-document with two pages <../_static/gost_frame_data/document_with_gost_frame.pdf>`:
- This option is used to enable GOST (Russian government standard) frame recognition for PDF documents or images.
161
-
The GOST frame recognizer is used in :meth:`dedoc.readers.PdfBaseReader.read`. Its main function is to recognize and
162
-
ignore the GOST frame on the document. It allows :class:`dedoc.readers.PdfImageReader`, :class:`dedoc.readers.PdfTxtlayerReader`
163
-
and :class:`dedoc.readers.PdfTabbyReader` to properly process the content of the document containing GOST frame, see :ref:`gost_frame_handling` for more details
161
+
It allows :class:`dedoc.readers.PdfImageReader`, :class:`dedoc.readers.PdfTxtlayerReader` and :class:`dedoc.readers.PdfTabbyReader`
162
+
to properly process the content of the document containing GOST frame, see :ref:`gost_frame_handling` for more details.
0 commit comments