Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed image dpi from 96 to 124 for layout detection #278

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

serg33v
Copy link

@serg33v serg33v commented Jan 9, 2025

I have a few cases where tabled can't recognize table on a page where whole page is a table.

Here is how I tested the DPI number. With default 96 it was not detecting this tables. With 100 it's spotted the table in one case, but still missed in another 2.

After testing other 2 cases, i found sweet spot 124 DPI.
With 124 DPI it's working for all cases.

Can't upload this cases to this PR for testing, bcs of data protection.

def load_pdfs_images(input_path, max_pages=None, start_page=None):
    if os.path.isdir(input_path):
        images, _, _ = load_from_folder(input_path, max_pages, start_page=start_page, dpi=124)
        highres_images, names, text_lines = load_from_folder(input_path, max_pages, dpi=surya_settings.IMAGE_DPI_HIGHRES,
                                                             load_text_lines=True, start_page=start_page)
    else:
        images, _, _ = load_from_file(input_path, max_pages, start_page=start_page, dpi=124)
        highres_images, names, text_lines = load_from_file(input_path, max_pages, dpi=surya_settings.IMAGE_DPI_HIGHRES,
                                                           load_text_lines=True, start_page=start_page)

    return images, highres_images, names, text_lines```

Copy link
Contributor

github-actions bot commented Jan 9, 2025

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request

@serg33v
Copy link
Author

serg33v commented Jan 9, 2025

I have read the CLA Document and I hereby sign the CLA

@serg33v
Copy link
Author

serg33v commented Jan 10, 2025

recheck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants