Mailing List Archive: accent (special char) with python

"nom <d@d.dd>" wrote:
> i'm from france

hmm. last time I checked, france was ".fr",
not ".dd"...

> then i use accent : i try to convert a string :
> 'Soci\351t\351' to 'Société'
> how can do it?

print it?

>>> print 'Soci\351t\351'
Société

...

Python is mostly agnostic when it comes to
character sets -- strings just contain 8-bit
characters, and it's up to the programmer
to make sure they're interpreted in the right
way on input or output. in your case,
'Soci\351t\351' is an ISO Latin 1 string (also
known as ISO 8859-1). so if your environment
uses ISO Latin 1, it just works. on the other
hand, if your environment were to use, say,
IBM's old PC encoding (like in the MS-DOS
window under Windows), it would come out
as:

>>> print 'Soci\351t\351'
SociÚtÚ

and in an UTF-8 environment, it's an
illegal string:

>>> print unicode('Soci\351t\351')
Traceback (innermost last):
File "<stdin>", line 1, in ?
ValueError: invalid UTF-8 code

...

so I guess the answer to your question is
"depends on what you're doing..."

but before you try to explain that, please
take a look at Jukka Korpela's character
code tutorial, available from:

http://www.hut.fi/~jkorpela/chars.html

"This document in itself does not
contain solutions to practical problems
with character codes; rather, it gives
background information needed for
understanding what solutions there
might be, what the different solutions
do - and what's really the problem in
the first place."

</F>