Mailing List Archive

Re: What do these '=?utf-8?' sequences mean in python?
Chris Green <cl@isbd.net> wrote:
> I'm having a real hard time trying to do anything to a string (?)
> returned by mailbox.MaildirMessage.get().
>
What a twit I am :-)

Strings are immutable, I have to do:-

newstring = oldstring.replace("_", " ")

Job done!

--
Chris Green
·
--
https://mail.python.org/mailman/listinfo/python-list
Re: What do these '=?utf-8?' sequences mean in python? [ In reply to ]
Chris Green <cl@isbd.net> writes:
> Chris Green <cl@isbd.net> wrote:
>> I'm having a real hard time trying to do anything to a string (?)
>> returned by mailbox.MaildirMessage.get().
>>
> What a twit I am :-)
>
> Strings are immutable, I have to do:-
>
> newstring = oldstring.replace("_", " ")
>
> Job done!

Not necessarily.

The subject in the original article was:
=?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)?=

That's some kind of MIME encoding. Just replacing underscores by spaces
won't necessarily give you anything meaningful. (What if there are
actual underscores in the original subject line?)

You should probably apply some kind of MIME-specific decoding. (I don't
have a specific suggestion for how to do that.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */
--
https://mail.python.org/mailman/listinfo/python-list
Re: What do these '=?utf-8?' sequences mean in python? [ In reply to ]
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
> Chris Green <cl@isbd.net> writes:
> > Chris Green <cl@isbd.net> wrote:
> >> I'm having a real hard time trying to do anything to a string (?)
> >> returned by mailbox.MaildirMessage.get().
> >>
> > What a twit I am :-)
> >
> > Strings are immutable, I have to do:-
> >
> > newstring = oldstring.replace("_", " ")
> >
> > Job done!
>
> Not necessarily.
>
> The subject in the original article was:
> =?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)?=
>
> That's some kind of MIME encoding. Just replacing underscores by spaces
> won't necessarily give you anything meaningful. (What if there are
> actual underscores in the original subject line?)
>
> You should probably apply some kind of MIME-specific decoding. (I don't
> have a specific suggestion for how to do that.)
>
Yes, OK, but my problem was that my filter looks for the string
"Waterways Continental Europe" in the message Subject: to route the
message to the appropriate mailbox. When the Subject: has accents the
string becomes "Waterways_Continental_Europe" and thus the match
fails. Simply changing all underscores back to spaces makes my test
for "Waterways Continental Europe" work. The changed Subject: line
gets thrown away after the test so I don't care about anything else
getting changed.

(When there are no accented characters in the Subject: the string is
"Waterways Continental Europe" so I can't easily change the search
text. I guess I could use an RE.)

--
Chris Green
·
--
https://mail.python.org/mailman/listinfo/python-list
Re: What do these '=?utf-8?' sequences mean in python? [ In reply to ]
Chris Green ha scritto:
> Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
>> Chris Green <cl@isbd.net> writes:
>>> Chris Green <cl@isbd.net> wrote:
>>>> I'm having a real hard time trying to do anything to a string (?)
>>>> returned by mailbox.MaildirMessage.get().
>>>>
>>> What a twit I am :-)
>>>
>>> Strings are immutable, I have to do:-
>>>
>>> newstring = oldstring.replace("_", " ")
>>>
>>> Job done!
>>
>> Not necessarily.
>>
>> The subject in the original article was:
>> =?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)?=
>>
>> That's some kind of MIME encoding. Just replacing underscores by spaces
>> won't necessarily give you anything meaningful. (What if there are
>> actual underscores in the original subject line?)
>>
>> You should probably apply some kind of MIME-specific decoding. (I don't
>> have a specific suggestion for how to do that.)
>>
> Yes, OK, but my problem was that my filter looks for the string
> "Waterways Continental Europe" in the message Subject: to route the
> message to the appropriate mailbox. When the Subject: has accents the
> string becomes "Waterways_Continental_Europe" and thus the match
> fails. Simply changing all underscores back to spaces makes my test
> for "Waterways Continental Europe" work. The changed Subject: line
> gets thrown away after the test so I don't care about anything else
> getting changed.
>
> (When there are no accented characters in the Subject: the string is
> "Waterways Continental Europe" so I can't easily change the search
> text. I guess I could use an RE.)
>

In reality you should also take into account the fact that if the header
contains a 'b' instead of a 'q' as a penultimate character, then the
rest of the package is converted on the basis64

"=?utf-8?Q?" --> "=?utf-8?B?"


--
https://mail.python.org/mailman/listinfo/python-list
Re: What do these '=?utf-8?' sequences mean in python? [ In reply to ]
Chris Green wrote at 2023-5-6 15:58 +0100:
>Chris Green <cl@isbd.net> wrote:
>> I'm having a real hard time trying to do anything to a string (?)
>> returned by mailbox.MaildirMessage.get().
>>
>What a twit I am :-)
>
>Strings are immutable, I have to do:-
>
> newstring = oldstring.replace("_", " ")

The solution based on `email.Header` proposed by `jak` is better.
--
https://mail.python.org/mailman/listinfo/python-list
Re: What do these '=?utf-8?' sequences mean in python? [ In reply to ]
On 08May2023 12:19, jak <nospam@please.ty> wrote:
>In reality you should also take into account the fact that if the
>header
>contains a 'b' instead of a 'q' as a penultimate character, then the
>rest of the package is converted on the basis64
>
>"=?utf-8?Q?" --> "=?utf-8?B?"

Aye. Specification:

https://datatracker.ietf.org/doc/html/rfc2047

You should reach for jak's suggested email.header suggestion _before_
parsing the subject line. Details:

https://docs.python.org/3/library/email.header.html#module-email.header

Cheers,
Cameron Simpson <cs@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list