Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

perl - Decode an UTF8 email header

I have an email subject of the form:

=?utf-8?B?T3.....?=

The body of the email is utf-8 base64 encoded - and has decoded fine. I am current using Perl's Email::MIME module to decode the email.

What is the meaning of the =?utf-8 delimiter and how do I extract information from this string?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The encoded-word tokens (as per RFC 2047) can occur in values of some headers. They are parsed as follows:

=?<charset>?<encoding>?<data>?=

Charset is UTF-8 in this case, the encoding is B which means base64 (the other option is Q which means Quoted Printable).

To read it, first decode the base64, then treat it as UTF-8 characters.

Also read the various Internet Mail RFCs for more detail, mainly RFC 2047.

Since you are using Perl, Encode::MIME::Header could be of use:

SYNOPSIS

use Encode qw/encode decode/;
$utf8   = decode('MIME-Header', $header);
$header = encode('MIME-Header', $utf8);

ABSTRACT

This module implements RFC 2047 Mime Header Encoding. There are 3 variant encoding names; MIME-Header, MIME-B and MIME-Q. The difference is described below

              decode()          encode()  
MIME-Header   Both B and Q      =?UTF-8?B?....?=  
MIME-B        B only; Q croaks  =?UTF-8?B?....?=  
MIME-Q        Q only; B croaks  =?UTF-8?Q?....?=

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...