Mailing List Archive

python3 question
Hello.  In python3, how do you do this?

tgt = 'gebuchte Umsätze;'

In python2, you could do this:

tgt = unicode ('gebuchte Umsätze;'.decode ('latin1'))

but that gives:

SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xe4 in
position 12: invalid continuation byte

In fact, any constant with ä in it will give you that.
Re: python3 question [ In reply to ]
On 1/13/21 7:31 PM, n952162 wrote:
> Hello.  In python3, how do you do this?
>
> tgt = 'gebuchte Umsätze;'
>
> In python2, you could do this:
>
> tgt = unicode ('gebuchte Umsätze;'.decode ('latin1'))
>
> but that gives:
>
> SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xe4 in
> position 12: invalid continuation byte
>
> In fact, any constant with ä in it will give you that.
>
>

Okay, I see that if your locale is not C, you can do:

tgt = 'gebuchte Umsätze;'
Re: python3 question [ In reply to ]
On 2021-01-13, n952162 <n952162@web.de> wrote:

> Hello. In python3, how do you do this?

Please explain what "this" is trying to accomplish, and we can tell
you how to do it in Python3. Are you trying to convert from Unicode to
Latin1 and back to Unicode?

Python 3.8.6 (default, Jan 2 2021, 20:25:58)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 'gebuchte Umsätze;'.encode('latin1').decode('latin1')
'gebuchte Umsätze;'


> tgt = 'gebuchte Umsätze;'
>
> In python2, you could do this:
>
> tgt = unicode ('gebuchte Umsätze;'.decode ('latin1'))
>
> but that gives:
>
> SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xe4 in
> position 12: invalid continuation byte
>
> In fact, any constant with ä in it will give you that.
Re: Re: python3 question [ In reply to ]
On 1/13/21 7:59 PM, Grant Edwards wrote:
> On 2021-01-13, n952162 <n952162@web.de> wrote:
>
>> Hello. In python3, how do you do this?
> Please explain what "this" is trying to accomplish, and we can tell
> you how to do it in Python3. Are you trying to convert from Unicode to
> Latin1 and back to Unicode?
>
> Python 3.8.6 (default, Jan 2 2021, 20:25:58)
> [GCC 9.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> 'gebuchte Umsätze;'.encode('latin1').decode('latin1')
> 'gebuchte Umsätze;'
>
>
I'm trying to search for a string in a file.  I don't know why there
needs to be any conversion going on.

Just running python3 in interactive mode, I can input the literal when
the locale is right:

12/lcl/data/f/b>LC_ALL=de_DE python3
Python 3.7.9 (default, Nov 16 2020, 00:32:07)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Could not open PYTHONSTARTUP
FileNotFoundError: [Errno 2] No such file or directory:
'/home/mellman/lib/python/rpnrc'
>>> s = "gebuchte Umsätze"
>>> print (s)
gebuchte Umsätze
>>>

but it doesn't work from within my pgm...

With python2, I presume there was conversion going on because ... a
string can't have unicode chars, so it must be a unicode string that has
to be decoded.

tgt = unicode ('gebuchte Umsätze;'.decode ('latin1'))

But python3 is supposed to make all that superfluous ... I thought that
was a major driving factor for python3 ... that everything was unicode,
conversion wouldn't be necessary.
Re: python3 question [ In reply to ]
On 1/13/21 7:57 PM, n952162 wrote:
> On 1/13/21 7:31 PM, n952162 wrote:
>> Hello.  In python3, how do you do this?
>>
>> tgt = 'gebuchte Umsätze;'
>>
>> In python2, you could do this:
>>
>> tgt = unicode ('gebuchte Umsätze;'.decode ('latin1'))
>>
>> but that gives:
>>
>> SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xe4 in
>> position 12: invalid continuation byte
>>
>> In fact, any constant with ä in it will give you that.
>>
>>
>
> Okay, I see that if your locale is not C, you can do:
>
> tgt = 'gebuchte Umsätze;'
>
>
>

Okay, I see I had this bit of magic in line 2:

# -*- coding: utf-8 -*- [ this has to be in line1 or line2!!! ]

I've removed that and the error msg is somewhat different:

SyntaxError: Non-UTF-8 code starting with '\xe4' in file test.py on line
89, but no encoding declared; see http://python.org/dev/peps/pep-0263/
for details

Note that line 89 is a *comment* (with a ä)

So, I'm looking into that...

Oh, I think that gave me a solution!

# -*- coding: latin1 -*- [ this has to be in line1 or line2!!! ]

seems to work.  At least, I got some other errors now ;-)
Re: python3 question [RESOLVED] [ In reply to ]
On 1/13/21 8:41 PM, n952162 wrote:
> On 1/13/21 7:57 PM, n952162 wrote:
>> On 1/13/21 7:31 PM, n952162 wrote:
>>> Hello.  In python3, how do you do this?
>>>
>>> tgt = 'gebuchte Umsätze;'
>>>
>>> In python2, you could do this:
>>>
>>> tgt = unicode ('gebuchte Umsätze;'.decode ('latin1'))
>>>
>>> but that gives:
>>>
>>> SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xe4 in
>>> position 12: invalid continuation byte
>>>
>>> In fact, any constant with ä in it will give you that.
>>>
>>>
>>
>> Okay, I see that if your locale is not C, you can do:
>>
>> tgt = 'gebuchte Umsätze;'
>>
>>
>>
>
> Okay, I see I had this bit of magic in line 2:
>
> # -*- coding: utf-8 -*- [ this has to be in line1 or line2!!! ]
>
> I've removed that and the error msg is somewhat different:
>
> SyntaxError: Non-UTF-8 code starting with '\xe4' in file test.py on line
> 89, but no encoding declared; see http://python.org/dev/peps/pep-0263/
> for details
>
> Note that line 89 is a *comment* (with a ä)
>
> So, I'm looking into that...
>
> Oh, I think that gave me a solution!
>
> # -*- coding: latin1 -*- [ this has to be in line1 or line2!!! ]
>
> seems to work.  At least, I got some other errors now ;-)
>
>

Yes, indeed, this works now, even without setting my locale:

    tgt = 'gebuchte Umsätze;'
Re: python3 question [ In reply to ]
On 2021-01-13, n952162 <n952162@web.de> wrote:

> # -*- coding: utf-8 -*- [ this has to be in line1 or line2!!! ]

If you have that line in your source code, make sure your editor is
saving the file in UTF-8 encoding.

> Oh, I think that gave me a solution!
>
> # -*- coding: latin1 -*- [ this has to be in line1 or line2!!! ]
>
> seems to work. At least, I got some other errors now ;-)

What encoding is your editor using?

--
Grant
Re: Re: python3 question [ In reply to ]
On 1/13/21 8:57 PM, Grant Edwards wrote:
> On 2021-01-13, n952162 <n952162@web.de> wrote:
>
>> # -*- coding: utf-8 -*- [ this has to be in line1 or line2!!! ]
> If you have that line in your source code, make sure your editor is
> saving the file in UTF-8 encoding.
>
>> Oh, I think that gave me a solution!
>>
>> # -*- coding: latin1 -*- [ this has to be in line1 or line2!!! ]
>>
>> seems to work. At least, I got some other errors now ;-)
> What encoding is your editor using?
>
> --
> Grant
>
>
>
>

vi?  How would I determine that?  My locale is C
Re: Re: python3 question [ In reply to ]
On 13/01/2021 20:06, n952162 wrote:
>> What encoding is your editor using?
>
> vi?  How would I determine that?  My locale is C
>

You could use:

:set fenc

to display the current encoding used for the file, or

:set fenc=utf8

to force UTF-8 or any other encoding of your chosing. You can also add a
magic line with fenc to the file to always ensure that the specified
encoding is used, assuming you also have magic lines enabled in vimrc.

- Victor
Re: Re: python3 question [ In reply to ]
On 1/13/21 9:22 PM, Victor Ivanov wrote:
> On 13/01/2021 20:06, n952162 wrote:
>>> What encoding is your editor using?
>>
>> vi?  How would I determine that?  My locale is C
>>
>
> You could use:
>
>   :set fenc
>
> to display the current encoding used for the file, or
>
>   :set fenc=utf8
>
> to force UTF-8 or any other encoding of your chosing. You can also add
> a magic line with fenc to the file to always ensure that the specified
> encoding is used, assuming you also have magic lines enabled in vimrc.
>
> - Victor
>

  fileencoding= 195,3         55%


I suspect that's really useful for languages other than latin1.