Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the TopSectionTitle being split in MSFT filing #63

Open
Elijas opened this issue Dec 27, 2023 · 0 comments
Open

Fix the TopSectionTitle being split in MSFT filing #63

Elijas opened this issue Dec 27, 2023 · 0 comments
Labels
contributions-welcome Intended for completion by you, the contributor feature:elements Parsing all the other elements correctly

Comments

@Elijas
Copy link
Member

Elijas commented Dec 27, 2023

Context

MSFT accuracy-test (permalink at the time of posting)

Problem

Titles come out as two separate title elements

        {
            "text_content": "PART I. FINANCI"
        },
        {
            "text_content": "AL INFORMATION"
        },

This is because MSFT puts the section titles into two pieces for some reason

Ideas about a possible solution

Maybe include the line information into the solution: If two elements of the same type (and level) are on the same line, they should probably be identified as a single element

@Elijas Elijas added contributions-welcome Intended for completion by you, the contributor feature:elements Parsing all the other elements correctly labels Dec 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributions-welcome Intended for completion by you, the contributor feature:elements Parsing all the other elements correctly
Projects
Development

No branches or pull requests

1 participant