python - parsing email_message['Subject'] results 3 strings instead of 1 -
i'm parsing email subject , getting multiple strings (depending on subject length) starting =?utf-8?b?
. normal behavior? how can join strings 1 string 1 encoding?
email_message = email.message_from_string(raw_email) print email_message['subject']
...
=?utf-8?b?15bxkneqiner15pxmden15qg15hxodez16hxmdeqiner15vxk9ezinec15txkdez158g?= =?utf-8?b?157xk9ev16ig15txp9ez15pxldetineu15bxlcdxnneqinei15xxkdetineq150g15dxonezineo15u=?= =?utf-8?b?16nxnsdxlneo15hxla==?=
edit:
subjectdecoded, encoding = decode_header(email.utils.parseaddr(email_message['subject'])[1])[0] if encoding==none: subjectdecodedparsed = email_message['subject'] print 'i not decoding subject' print subjectdecodedparsed else: subjectdecodedparsed = subjectdecoded.decode(encoding) print 'i decoding subject' print subjectdecodedparsed.encode('utf8') #<--- first line presented here
your string encoded using quoted-printable format mime headers. email.header
module handles you:
>>> subject = '''\ ... =?utf-8?b?15bxkneqiner15pxmden15qg15hxodez16hxmdeqiner15vxk9ezinec15txkdez158g?= ... =?utf-8?b?157xk9ev16ig15txp9ez15pxldetineu15bxlcdxnneqinei15xxkdetineq150g15dxonezineo15u=?= ... =?utf-8?b?16nxnsdxlneo15hxla==?=''' >>> email.header import decode_header >>> line in subject.splitlines(): ... bytes, encoding = decode_header(line)[0] ... print bytes.decode(encoding) ... זאת בדיקה בסיסית בכדי להבין מדוע הקידוד הזה לא עובד אם אני רו שם הרבה
the subject (which one string newlines , leading whitespace) spans multiple lines fit strict line length limitations set mime standard.
Comments
Post a Comment