Skip to content

[IMP] account_statement_import_sheet_file: XLS[X] file can actually be HTML#763

Closed
Daemo00 wants to merge 1 commit intoOCA:17.0from
Daemo00:17.0-imp-account_statement_import_sheet_file-import_HTML
Closed

[IMP] account_statement_import_sheet_file: XLS[X] file can actually be HTML#763
Daemo00 wants to merge 1 commit intoOCA:17.0from
Daemo00:17.0-imp-account_statement_import_sheet_file-import_HTML

Conversation

@Daemo00
Copy link
Copy Markdown

@Daemo00 Daemo00 commented Jan 30, 2025

I recently downloaded an XLS file that was actually an HTML file (they exist, check the file in tests/fixtures!) and the module account_statement_import_sheet_file wasn't able to import it.
With this PR, it is possible to import such files.

Most of the README edits happened automatically running pre-commit.

@OCA-git-bot
Copy link
Copy Markdown
Contributor

Hi @alexey-pelykh,
some modules you are maintaining are being modified, check this out!

@Daemo00 Daemo00 marked this pull request as ready for review January 30, 2025 18:44
@Daemo00 Daemo00 force-pushed the 17.0-imp-account_statement_import_sheet_file-import_HTML branch from 316b3cd to eca64c3 Compare February 24, 2025 15:11
@Daemo00 Daemo00 force-pushed the 17.0-imp-account_statement_import_sheet_file-import_HTML branch from eca64c3 to 7a4d63f Compare March 4, 2025 09:30
@Daemo00 Daemo00 force-pushed the 17.0-imp-account_statement_import_sheet_file-import_HTML branch from 7a4d63f to c5f7aa5 Compare April 6, 2025 16:03
@Daemo00 Daemo00 force-pushed the 17.0-imp-account_statement_import_sheet_file-import_HTML branch from c5f7aa5 to 6f148f3 Compare July 27, 2025 14:49
@github-actions
Copy link
Copy Markdown

There hasn't been any activity on this pull request in the past 4 months, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 30 days.
If you want this PR to never become stale, please ask a PSC member to apply the "no stale" label.

@github-actions github-actions Bot added the stale PR/Issue without recent activity, it'll be soon closed automatically. label Nov 30, 2025
@Daemo00 Daemo00 force-pushed the 17.0-imp-account_statement_import_sheet_file-import_HTML branch 2 times, most recently from 276af41 to 6f148f3 Compare November 30, 2025 21:38
@Daemo00 Daemo00 force-pushed the 17.0-imp-account_statement_import_sheet_file-import_HTML branch from 6f148f3 to a914254 Compare November 30, 2025 21:56
@github-actions github-actions Bot removed the stale PR/Issue without recent activity, it'll be soon closed automatically. label Dec 7, 2025
Copy link
Copy Markdown
Contributor

@alexey-pelykh alexey-pelykh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this — banks exporting XLS files that are actually HTML is a common pain point. The detection via lstrip().startswith("<html>") and conversion through lxml is a pragmatic solution.

One minor thing: is_HTML doesn't follow Python naming — should be is_html. Not blocking.

Code review LGTM.

_("No valid encoding was found for the attached file")
) from None
decoded_file = data_file.decode(detected_encoding)
is_HTML = decoded_file.lower().lstrip().startswith("<html>")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is_HTMLis_html per PEP 8 (snake_case for local variables).

@Daemo00
Copy link
Copy Markdown
Author

Daemo00 commented Mar 6, 2026

I'm no more working on this one because I have moved to 18.0 (#871), who wants can supersede.

@Daemo00 Daemo00 closed this Mar 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants