Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

ispras / dedoc Public

Notifications You must be signed in to change notification settings
Fork 21
Star 186

Code
Issues 3
Pull requests 1
Discussions
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights

Releases: ispras/dedoc

Releases · ispras/dedoc

v2.1.1

22 Mar 08:17

oksidgy

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.1.1

Update README.md.
Update table and time benchmarks.
Re-label line-classifier datasets (law, diploma, paragraphs datasets).
Update tasker creators (for the labeling system).
Fix HTML table parsing.

Assets 2

Loading

All reactions

v2.1

05 Mar 10:44

NastyBoget

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.1

Custom loggers deleted (the common logger is used for all dedoc classes).
Do not change the document image if it has a correct orientation (orientation correction function changed).
Use only PdfTabbyReader during detection of a textual layer in PDF files.
Code related to the labeling mode refactored and removed from the library package (it is located in the separate directory).
Added BoldAnnotation for words in PdfImageReader.
More benchmarks are added: images of tables parsing, postprocessing of Tesseract OCR.
Some fixes are made in a web-form of Dedoc.
Tutorial how to add a new structure type to Dedoc added.
Parsing of EML and HTML files fixed.

Assets 2

Loading

All reactions

v2.0

25 Dec 13:39

NastyBoget

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v2.0

Fix table extraction from PDF using empty config (see issue)
Add more benchmarks for Tesseract
Fix extension extraction for file names with several dots
Change names of some methods and their parameters for all main classes (attachments extractors, converters, readers, metadata extractors, structure extractors, structure constructors).
Please look to the Package reference of documentation for more details
Add AttachAnnotation and TableAnnotation to PPTX (see discussion)
Fix bugs in DOCX handling (see issues 378, 379

Assets 2

Loading

All reactions

0 Join discussion

v1.1.1

24 Nov 13:06

NastyBoget

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.1.1

Use older pydantic version for improving compatibility with other libraries.
Add support for RTF format.
Fix bug in handling files' names with dots and spaces.
Fix bug in non-integer values of text formatting in DocxReader.
Add support of on_gpu parameter in config.
Add attached images extraction for PdfTabbyReader.
Fix partial file reading for PdfTabbyReader.
Add tutorial how to create dedoc's basic data structures.
Fix attachments_dir parameter for readers and attachments extractors.

Assets 2

Loading

All reactions

v1.1.0

24 Oct 10:01

NastyBoget

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.1.0

Add BBoxAnnotation to table cells for PdfTabbyReader.
Fix swagger, add api schema classes, remove to_dict method from ParsedDocument.
Improve parsing PDF by PdfTxtlayerReader, add benchmarks.
Fix BBoxAnnotation extraction for tables in PdfImageReader using table_type=split_last_column parameter.
Change base method of metadata extractors, rename it to extract_metadata.
Unify BBoxAnnotation extraction for all PDF readers - return only words bboxes.
Increase timeout value for all converters.

Assets 2

Loading

All reactions

v1.0

10 Oct 15:19

NastyBoget

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.0

Remove is_one_column_document_list parameter.
Add tutorial about support for a new document type to the documentation.
Improve textual layer correctness classifier.
Improve orientation and columns classifier.
Change table's output structure - added CellWithMeta instead of a textual string.
Add BBoxAnnotation to table cells for PdfTxtlayerReader and PdfImageReader.
Add ConfidenceAnnotation to table cells for PdfImageReader.
Remove insert_table parameter.
Added information about table and page rotation to the table and document metadata respectively.
Use dedoc-utils library for document images preprocessing.
Change web interface, fix online-examples of document processing.
Add comparison operator to LineWithMeta.

Assets 2

Loading

All reactions

v0.11.2

06 Sep 15:25

dronperminov

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.11.2

Remove plexus-utils-1.1.jar.
Update installation documentation.
Add documentation for Tesseract OCR installation.
Add documentation for annotations.
Add documentation for secure torch.
Fix examples.

Assets 2

Loading

All reactions

v0.11.1

30 Aug 10:23

NastyBoget

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.11.1

Add bbox annotations in PdfTabbyReader.
Add bbox annotations for words in PdfTxtlayerReader.
Add an option plain_text to the return_format parameter.
Reduce size of the dedoc base image, move dockerfiles to the separate repository.
Refactor script for tesseract benchmarking.
Make fixed dedoc dependencies as ranges.
Add table cell properties in PdfTabbyReader.

Assets 2

Loading

All reactions

v0.11.0

22 Aug 12:33

oksidgy

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.11.0

v0.11.0 (2023-08-22)

Release note: v0.11.0 <https://github.com/ispras/dedoc/releases/tag/v0.11.0>_

Rename exceptions classes.
Update style tests.
Change ConfidenceAnnotation value range to [0, 1].
Add bbox annotations for words in PdfImageReader.

Assets 2

Loading

All reactions

v0.10.0

01 Aug 14:58

dronperminov

Compare

Choose a tag to compare

Loading

v0.10.0

Add ConfidenceAnnotation annotation for PdfImageReader.
Remove version parameter from metadata extractors, structure constructors and parsed document methods.
Add version file and version resolving for the library.
Add recursive handling of attachments.
Add parameter for saving attachments in a custom directory.
Remove dedoc threaded manager.
Improve PdfAutoReader.
Add temporary file name to DocumentMetadata.

Assets 2

Loading

All reactions

Previous 1 2 3 Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.