Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyzmail crashes when parsing a mail with badly encoded UTF-8 header #12

Open
amikoren opened this issue May 19, 2016 · 3 comments
Open

Comments

@amikoren
Copy link

Hello aspineux,

I have a crash happenning at pyzmail, at some rare malformed mail file. It seems like a pyzmail mistreating such file.

Details:
Python version:pyt 3.4.2
pyzmail - 1.0.3
Linux - Debian 8

$ grep version /usr/lib/python3.4/email/init.py
version = '5.1.0'

Crash reason:
If header can not be encoded (UTF-8 is badly encoded), Compat32._sanitize_header() at _policy_base.py doesn't return a string, but an instance of class email.header.Header
That causes pyzmail to crash when trying to activate Header.startswith()

Reproduce:
Take the attached file, and run from python3:
import pyzmail
pyzmail.message_from_bytes(open('/tmp/mail_utf8_error', 'rb').read())

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.4/dist-packages/pyzmail/parse.py", line 774, in message_from_bytes
return PyzMessage(email.message_from_bytes(s, _args, *_kws))
File "/usr/local/lib/python3.4/dist-packages/pyzmail/parse.py", line 634, in init
self.mailparts=get_mail_parts(self)
File "/usr/local/lib/python3.4/dist-packages/pyzmail/parse.py", line 482, in get_mail_parts
mailparts.append(MailPart(part, filename=filename, type=type, charset=charset, content_id=part.get('Content-Id'), description=part.get('Content-Description'), disposition=disposition, is_body=parts.get(part, False)))
File "/usr/local/lib/python3.4/dist-packages/pyzmail/parse.py", line 98, in init
if self.content_id.startswith('<') and self.content_id.endswith('>'):
AttributeError: 'Header' object has no attribute 'startswith'

mail_utf8_error.zip

@srault95
Copy link

No problem for me if use:

# pyzmail 1.0.3 - Python 3.4.3 - windows 7

>>> msg = pyzmail.message_from_file(open('mail_utf8_error'))

>>> msg.as_string()
'Return-Path: <[email protected]>\nReceived: from mp0-f70.google.com (mp0-f70.google.com [209.85.220.70])\n\tby i-sgcore01-poc-server.c.trusty-catbird-121621.internal (Postfix) with ESMTPS i
d 020B9408FA\n\tfor <[email protected]>; Fri, 13 May 2016 17:13:48 +0300 (IDT)\nReceived: by mp0-f70.google.com with SMTP id gw7so150349649pac.0\n        for <[email protected]>; Fri, 13 May 2016
07:13:47 -0700 (PDT)\nX-Original-Authentication-Results: mx.google.com;       spf=pass (google.com: domain of [email protected] designates 69.88.22.222 as permitted sender) smtp.mailfrom=a@s
.com\nX-Received: by 10.98.55.133 with SMTP id e127mr22852924pfa.81.1463148827242;\n        Fri, 13 May 2016 07:13:47 -0700 (PDT)\nX-Received: by 10.98.55.133 with SMTP id e127mr22
852810pfa.81.1463148826374;\n        Fri, 13 May 2016 07:13:46 -0700 (PDT)\nReceived: from gproxy10-pub.mail.unifiedlayer.com (gproxy10-pub.mail.unifiedlayer.com. [69.88.22.222])\n
        by mx.google.com with SMTP id i10si24956178paz.90.2016.05.13.07.13.45\n        for <[email protected]>;\n        Fri, 13 May 2016 07:13:46 -0700 (PDT)\nReceived-SPF: pass (google.c
om: domain of [email protected] designates 69.88.22.222 as permitted sender) client-ip=69.88.22.222;\nAuthentication-Results: mx.google.com;\n       spf=pass (google.com: domain of [email protected] d
esignates 69.88.22.222 as permitted sender) [email protected]\nReceived: (qmail 2115 invoked by uid 0); 13 May 2016 14:12:47 -0000\nReceived: from unknown (HELO cmgw4) (10.0.90
.85)\n  by g.m.u.com with SMTP; 13 May 2016 14:12:47 -0000\nReceived: from box1210.bluehost.com ([55.88.222.222])\n\tby cmgw4 with\n\tid tqCi1s00d4Z6XqA01qCly4; Fri, 13 May 2016 08
:12:47 -0600\nReceived: from [111.11.22.111] (port=61165 helo=LocalHost)\n\tby box1210.bluehost.com with esmtpsa (TLSv1:AES128-SHA:128)\n\t(Exim 4.86_2)\n\t(envelope-from <[email protected]>
)\n\tid 1b1DpX-0008LG-HL\n\tfor [email protected]; Fri, 13 May 2016 08:12:42 -0600\nMessage-ID: <[email protected]>\nFrom: "Ms.A" <[email protected]>\nReply-To:
<[email protected]>\nTo: "E" <[email protected]>\nSubject: Re:Hi E,Greetings from S A\nDate: Fri, 13 May 2016 22:22:24 +0800\nMIME-Version: 1.0\nX-Priority: 3\nX-Mailer: Joinf MailSystem 8.0\nConten
t-Type: multipart/related;\n\ttype="multipart/alternative";\n\tboundary="Mark=_217952388210897619413514"\nX-Identified-User: {1094:bb.com:s1:s.com} {sentby:smtp auth 111.11.22.111
authed with [email protected]}\n\n\n--Mark=_2179523882108976194183049--\n\n--Mark=_217952388210897619413514\nContent-Type: image/jpg;\n\tname="=?utf-8?Q?=E5=95=86=E5=AF=8Clog.jpg?="\nContent
-Transfer-Encoding: base64\nContent-ID: =?utf-8?b?PMOJw4zCuMK7bG9nLmpwZ0A0MjUwMy42NDA2NjA2NDgxLjY1?=\n\n\n--Mark=_217952388210897619413514--\n'

@amikoren
Copy link
Author

Thanks for checking, srault95. Maybe it's a windows-linux difference? At my Debian it happens with Python 3.4.3, pyzmail 1.0.3. Using pyzmail.message_from_file() on that file also raises an exception.

@aspineux
Copy link
Owner

Thank you amikoren, I can reproduce the problem and I will provide a fix soon.

srault95, you are using the "old python2" interface (aka the text interface)

msg = pyzmail.message_from_file(open('mail_utf8_error'))
vs

msg = pyzmail.message_from_bytes(open('/tmp/mail_utf8_error', 'rb').read())

In "open('mail_utf8_error')"
the content of the file is decoded using your local encoding, and then pyzmail and the mail library
is working on a different set of data.

What amikoren is doing is more like this

msg = pyzmail.message_from_file(open('m:\tmp\mail_utf8_error','rb').read().decode('utf-8'))

And what you are doing is more like this

msg = pyzmail.message_from_file(open('m:\tmp\mail_utf8_error','rb').read().decode('cp1252'))

replace 'cp1252' with you local Windows encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants