Singular Visual Line Should Be Identified as a Single TextElement #78
Labels
for-internal-team
Intended for completion by the internal team
status:deferred
Deferred for future consideration.
Problem
For MSFT 0000950170-23-014423, the top section title "PART I. FINANCIAL INFORMATION " is identified as two semantic elements:
{
"cls_name": "TopSectionTitle",
"level": 0,
"section_type": "part1",
"text_content": "PART I. FINANCI"
},
{
"cls_name": "TitleElement",
"level": 0,
"text_content": "AL INFORMATION"
}
This should be:
{ "cls_name": "TopSectionTitle",
"level": 0,
"section_type": "part1",
"text_content": "PART I. FINANCIAL INFORMATION"
}
Ideas about a possible solution
Adjust text element merger to keep merging elements until a new visual line.
The text was updated successfully, but these errors were encountered: