Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tabby bbox annotations #309

Closed
wants to merge 22 commits into from
Closed

Tabby bbox annotations #309

wants to merge 22 commits into from

Conversation

sunveil
Copy link
Collaborator

@sunveil sunveil commented Aug 22, 2023

No description provided.

NastyBoget and others added 22 commits May 29, 2023 16:20
Added deleting '\n\t' and '\n ' while loading docx
Minor code reformat
Changed to regular expression
Minor code reformat
Minor code reformat: doc_str -> content

Added request test for not stripped xml example
* Move some readers, converters, metadata_extractors

* Fix txt law handling and add html2pdf reader

* Move auto_pdf_reader

* Add base64 metadata extraction

* Move tests

* fixed tests
* TLDR-340 renamed pdf folder; some refactoring

* TLDR-340 after review
* fixed files

* moved train scripts from docreader

* moved train scripts from docreader

* Move api_collect_train_dataset to fastapi

* fixed training scripts again

* fixed table tasker

* moved unit tests

* fixed unit tests

* final fix maybe hopefully ....

* final fix maybe hopefully

* fix small bug

* rename pdf reader in scripts

---------

Co-authored-by: Alexander Golodkov <[email protected]>
Co-authored-by: Nasty <[email protected]>
# Conflicts:
#	dedoc/api/api_args.py
#	dedoc/api/dedoc_api.py
#	dedoc/api/train_dataset/api_collect_train_dataset.py
#	dedoc/api/train_dataset/async_archive_handler.py
#	dedoc/attachments_extractors/concrete_attachments_extractors/json_attachment_extractor.py
#	dedoc/config.py
#	dedoc/converters/concrete_converters/binary_converter.py
#	dedoc/download_models.py
#	dedoc/main.py
#	dedoc/manager/dedoc_manager.py
#	dedoc/manager/dedoc_thread_manager.py
#	dedoc/metadata_extractors/concrete_metadata_extractors/base_metadata_extractor.py
#	dedoc/metadata_extractors/concrete_metadata_extractors/docx_metadata_extractor.py
#	dedoc/metadata_extractors/concrete_metadata_extractors/note_metadata_extarctor.py
#	dedoc/readers/__init__.py
#	dedoc/readers/email_reader/email_reader.py
#	dedoc/readers/html2pdf_reader/html2pdf_reader.py
#	dedoc/readers/note_reader/note_reader.py
#	dedoc/readers/pdf_reader/pdf_auto_reader/pdf_auto_reader.py
#	dedoc/readers/pdf_reader/pdf_base_reader.py
#	dedoc/readers/pdf_reader/pdf_image_reader/line_metadata_extractor/font_type_classifier.py
#	dedoc/readers/pdf_reader/pdf_image_reader/ocr/ocr_page/ocr_line.py
#	dedoc/readers/pdf_reader/pdf_image_reader/pdf_image_reader.py
#	dedoc/readers/pdf_reader/pdf_txtlayer_reader/extractor_pdf_textlayer.py
#	dedoc/readers/pdf_reader/pdf_txtlayer_reader/pdf_tabby_reader.py
#	dedoc/readers/pdf_reader/pdf_txtlayer_reader/pdf_txtlayer_reader.py
#	dedoc/structure_extractors/hierarchy_level_builders/diploma_builder/body_builder.py
#	dedoc/train_dataset/taskers/images_creators/concrete_creators/docx_images_creator.py
#	dedoc/utils/utils.py
#	docker/Dockerfile
#	resources/benchmarks/time_benchmark.json
#	tests/api_tests/test_api_archives.py
#	tests/api_tests/test_api_doctype_law.py
#	tests/api_tests/test_api_excel.py
#	tests/api_tests/test_api_format_docx.py
#	tests/api_tests/test_api_json.py
#	tests/api_tests/test_api_with_attachments.py
#	tests/data/with_attachments/minio.zip
#	tests/data/with_attachments/name_slash.zip
#	tests/unit_tests/test_module_attachment_extractor.py
* TLDR-429 flake8 style testing added

* Fix tests

* Docs fix

* Fix tests

* Review fixes
@sunveil sunveil closed this Aug 23, 2023
@sunveil sunveil deleted the tabby_bbox_annotations branch August 23, 2023 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants