Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interface to stdlib interfaces #56

Open
srkunze opened this issue Nov 30, 2018 · 9 comments
Open

interface to stdlib interfaces #56

srkunze opened this issue Nov 30, 2018 · 9 comments

Comments

@srkunze
Copy link

srkunze commented Nov 30, 2018

Referring to #55 it would be great if msg-extractor could use stdlib functionality

from email.mime.text import MIMEText  # for body or bodyHtml
from rfc822 import AddressList  # for recipients and sender
from email.message import Message  # for attachments

I currently wrap this to provide the stdlib interface

@TheElementalOfDestruction
Copy link
Collaborator

Can you give a detailed explanation on the benefit of adding these things?

@TheElementalOfDestruction
Copy link
Collaborator

TheElementalOfDestruction commented Dec 3, 2018

Also, email.message.Message is already being used for email headers.

Well, sort of being used.

@srkunze
Copy link
Author

srkunze commented Dec 5, 2018

Reading an RFC 2822 email message, exposes certain interfaces described here: https://docs.python.org/3/library/email.html (Python 2.7 respectively)

  1. I think a well-known interface is always a convincing selling point for any library.

  2. Using the well-tested stdlib covers already a lot of corner-cases and tries to be standard-compliant.

  3. Existing programs and my-real-world-is-messy-library-which-fixes-known-issues-in-stdlib just work.

@punkrokk
Copy link
Contributor

punkrokk commented Dec 5, 2018

@srkunze are you asking for our messages.py to pull headers/mime parts with stdlib? I'm having a hard time boiling down precisely what you are looking for? Can you clarify? Are you suggesting the extract_msg library uses stdlib to parse components once we "decode" the email?

@punkrokk
Copy link
Contributor

@srkunze thoughts?

@srkunze
Copy link
Author

srkunze commented Dec 19, 2018

@srkunze are you asking for our messages.py to pull headers/mime parts with stdlib?

I think that would be good idea. Right now, I'm wrapping msg-extractor with the usual MIMEParts library of the stdlib:

if len(message.attachments) == 1 and not (
        message.attachments[0].longFilename or message.attachments[0].shortFilename):
    mime_message = email.message_from_string(message.attachments[0].data)
    body_text = emailutils.body_as_text(mime_message)
    body_html = None
else:
    mime_message = None
    body_text = try_unicode(message.body)
    body_html = try_unicode(message.htmlBody) if message.htmlBody else None

if mime_message:
    for part in emailutils.get_parts_list_from_msg_walk(mime_message):
        yield part
    return
if body_text:
    message_part = MIMEText(body_text, _charset=None)
elif body_html:
    message_part = MIMEText(body_text, _subtype='html', _charset=None)
yield message_part

for index, att in enumerate(message.attachments, 1):
    name = att.longFilename or att.shortFilename
    part = MIMENonMultipart(*mimetypes.guess_type(name)[0].split('/', 1))
    part.set_payload(att.data)
    part.set_param('filename', name, 'content-disposition')

    yield part

and the same for rfc2822 headers

froms = AddressList(message.header['from'])
tos = AddressList(message.header['to'])
ccs = AddressList(message.header['cc'])

I'm having a hard time boiling down precisely what you are looking for? Can you clarify? Are you suggesting the extract_msg library uses stdlib to parse components once we "decode" the email?

Basically that's the essence I think.

PS1: emailutils is some private lib which can work with regular stdlib email and rfc822 objects such as email.Message.
PS2: in case you wonder why there's a mime_message in the above source code: I actually have some S/MIME messages which don't have any body and a sole attachment without a name.

@srkunze
Copy link
Author

srkunze commented Dec 19, 2018

I don't know how you feel about relying on rfc822 but in case of #55 it solved my problem right away.

@TheElementalOfDestruction
Copy link
Collaborator

Been a long time since I've interacted with this, and #248 was the issue related to specifically returning an EmailMessage instance for a given MSG file. That has now be completed. Let me know if you think more should reasonably be done to fulfill interfacing with the standard library.

@srkunze
Copy link
Author

srkunze commented Sep 6, 2023

To be fair with you, I am not working anymore for the team that's using this library. I'm still remembering this case vaguely. Not sure about it anymore; and additionally, I would not be able to test it anymore. :-/

@mpeuss Maybe you can have a look and answer @TheElementalOfDestruction 's question on this. Not sure whether you moved on or not with this. But I guess it's best when you decide on how to proceed.

On the other hand, @TheElementalOfDestruction , you still can decide to go into that direction without any feedback from the folks as TBZ and try to expose a stdlib-compatible interface anyway.

It's up to you. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants