Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] RTF injector for encapsulated HTML. #353

Open
TheElementalOfDestruction opened this issue Mar 18, 2023 · 2 comments
Open

Comments

@TheElementalOfDestruction
Copy link
Collaborator

Currently, injecting into the RTF body when it has encapsulated HTML uses the following process:

  1. Use a simple detector that only works on very clean encapsulated HTML and use that to inject (very reliable).
  2. If the encapsulated HTML is "dirty" (generally because huge numbers of tags are all shoved together into a destination and making it so it's impossible to just use generic detection to insert successfully right after the body tag), ignore the fact that is has encapsulated HTML and just treat it as standard RTF, using the code added in 0.40.0.

This is not to be considered urgent, just noting that it is (hopefully) planned for a future version to have code that can clean up the data a bit more to make it so that even some of the worst encapsulated HTML can be fixed up for RTF injection so it can be successfully deencapsulated at some other point.

Some things it will need to be able to do:

  • Recognize encapsulated HTML and it's relevant RTF tags.
  • Understand how to module \*\htmltag destinations to clean them up.
  • Be able to correctly adjust the order of tags in the event that HTML tags appear before the end of the RTF header, where putting text could cause corruption.
  • Add HTML tags for extemly poor quality RTF documents (some encapsulated HTML is completely missing tags like <html>, <body>, etc.).
@AIDBEmon
Copy link

AIDBEmon commented Nov 15, 2023

Hey @TheElementalOfDestruction Hope you are doing well. Is this the feature that will resolve this error here?

Bug Metadata
Version of extract_msg: 0.46.2
Your python version: Python 3.8
How did you launch extract_msg?
I used the extract_msg package

image

@TheElementalOfDestruction
Copy link
Collaborator Author

@AIDBEmon please refrain from posting on an unrelated feature request.

It looks like you are having an issue with RTFDE which should be reported at https://github.com/seamustuohy/RTFDE

If you wish to temporarily silence this issue until a patch can be made for RTFDE, you can pass errorBehavior = extract_msg.enums.ErrorBehavior.RTFDE_UNKNOWN_ERROR as a keyword argument to openMsg.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants