Encoded-word (RFC 2047)

RFC 2047

An encoding scheme defined in RFC 2047 ("Encoded-Word") that allows non-ASCII characters in email header fields such as Subject and From, by encoding them as =?charset?encoding?text?= tokens.

Email headers are required by RFC 5322 to contain only 7-bit ASCII characters. RFC 2047 provides a workaround: non-ASCII text in headers is represented as an "encoded word" in the form =?charset?B?...?= (Base64) or =?charset?Q?...?= (quoted-printable). For example, a Japanese subject line might appear in the raw message as =?ISO-2022-JP?B?...?= and must be decoded before it can be displayed.

Without RFC 2047 decoding, subject lines and sender names containing accented characters, CJK characters, Arabic, or any other non-ASCII script appear as raw encoded strings — completely unreadable to the end user. Correct implementation requires detecting encoded-word tokens anywhere they can legally appear in a header value and decoding each one using the specified charset and encoding.

Mbox Viewer decodes RFC 2047 encoded words in all header fields when building its message list and search index. This means that searching for a name written in its original script — for example, a Japanese sender name — will match correctly even though the underlying MBOX file stores the name in encoded form.

Related terms

Charset

The character encoding that specifies how bytes in a text part are mapped to readable characters. Common charsets include UTF-8, ISO-8859-1, and Shift_JIS; a mismatch causes garbled text known as mojibake.

Header

The structured metadata block at the beginning of an email message, containing fields like From, To, Subject, Date, and numerous technical fields that describe how the message was composed, routed, and encoded.

Back to the full glossary

Read your MBOX files on Mac and Windows

Download on the Mac App Store Microsoft Store