Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Singular Visual Line Should Be Identified as a Single TextElement #78

Open
deenaawny-github-account opened this issue Jan 11, 2024 · 0 comments
Labels
for-internal-team Intended for completion by the internal team status:deferred Deferred for future consideration.

Comments

@deenaawny-github-account
Copy link
Contributor

Problem

For MSFT 0000950170-23-014423, the top section title "PART I. FINANCIAL INFORMATION " is identified as two semantic elements:
{
"cls_name": "TopSectionTitle",
"level": 0,
"section_type": "part1",
"text_content": "PART I. FINANCI"
},
{
"cls_name": "TitleElement",
"level": 0,
"text_content": "AL INFORMATION"
}

This should be:
{ "cls_name": "TopSectionTitle",
"level": 0,
"section_type": "part1",
"text_content": "PART I. FINANCIAL INFORMATION"
}

Ideas about a possible solution

Adjust text element merger to keep merging elements until a new visual line.

@deenaawny-github-account deenaawny-github-account changed the title Singular visual line should be identified as a single TextElement Singular Visual Line Should Be Identified as a Single TextElement Jan 11, 2024
@Elijas Elijas added status:deferred Deferred for future consideration. for-internal-team Intended for completion by the internal team labels Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for-internal-team Intended for completion by the internal team status:deferred Deferred for future consideration.
Projects
None yet
Development

No branches or pull requests

2 participants