Mailing List Archive

[ python-Bugs-1767933 ] Badly formed XML using etree and utf-16
Bugs item #1767933, was opened at 2007-08-05 18:01
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1767933&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Windows
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: BugoK (bugok)
Assigned to: Nobody/Anonymous (nobody)
Summary: Badly formed XML using etree and utf-16

Initial Comment:
Hello,

The bug occurs when writing an XML file using the UTF-16 encoding.
The problem is that the etree encodes every string to utf-16 by itself - meaning, inserting the 0xfffe BOM before every string (tag, text, attribute name, etc.), causing a badly formed utf=16 strings.

A possible solution, which was offered by a co-worker of mine, was to use a utf-16 writer (from codecs.getwriter('utf-16') to write the file.

Best,

BugoK.


----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1767933&group_id=5470
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[ python-Bugs-1767933 ] Badly formed XML using etree and utf-16 [ In reply to ]
Bugs item #1767933, was opened at 2007-08-05 18:01
Message generated for change (Settings changed) made by bugok
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1767933&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: XML
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: BugoK (bugok)
Assigned to: Nobody/Anonymous (nobody)
Summary: Badly formed XML using etree and utf-16

Initial Comment:
Hello,

The bug occurs when writing an XML file using the UTF-16 encoding.
The problem is that the etree encodes every string to utf-16 by itself - meaning, inserting the 0xfffe BOM before every string (tag, text, attribute name, etc.), causing a badly formed utf=16 strings.

A possible solution, which was offered by a co-worker of mine, was to use a utf-16 writer (from codecs.getwriter('utf-16') to write the file.

Best,

BugoK.


----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1767933&group_id=5470
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[ python-Bugs-1767933 ] Badly formed XML using etree and utf-16 [ In reply to ]
Bugs item #1767933, was opened at 2007-08-05 08:01
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1767933&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: XML
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: BugoK (bugok)
>Assigned to: Fredrik Lundh (effbot)
Summary: Badly formed XML using etree and utf-16

Initial Comment:
Hello,

The bug occurs when writing an XML file using the UTF-16 encoding.
The problem is that the etree encodes every string to utf-16 by itself - meaning, inserting the 0xfffe BOM before every string (tag, text, attribute name, etc.), causing a badly formed utf=16 strings.

A possible solution, which was offered by a co-worker of mine, was to use a utf-16 writer (from codecs.getwriter('utf-16') to write the file.

Best,

BugoK.


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-08-06 22:54

Message:
Logged In: YES
user_id=33168
Originator: NO

Fredrik, could you take a look at this?

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1767933&group_id=5470
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[ python-Bugs-1767933 ] Badly formed XML using etree and utf-16 [ In reply to ]
Bugs item #1767933, was opened at 2007-08-05 17:01
Message generated for change (Comment added) made by effbot
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1767933&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: XML
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: BugoK (bugok)
Assigned to: Fredrik Lundh (effbot)
Summary: Badly formed XML using etree and utf-16

Initial Comment:
Hello,

The bug occurs when writing an XML file using the UTF-16 encoding.
The problem is that the etree encodes every string to utf-16 by itself - meaning, inserting the 0xfffe BOM before every string (tag, text, attribute name, etc.), causing a badly formed utf=16 strings.

A possible solution, which was offered by a co-worker of mine, was to use a utf-16 writer (from codecs.getwriter('utf-16') to write the file.

Best,

BugoK.


----------------------------------------------------------------------

>Comment By: Fredrik Lundh (effbot)
Date: 2007-08-07 08:20

Message:
Logged In: YES
user_id=38376
Originator: NO

ET's standard serializer currently only supports ASCII-compatible
encodings. See e.g.

http://effbot.python-hosting.com/ticket/47

The best workaround for ET 1.2 (Python 2.5) is probably to serialize as
"utf-8" and transcode:

out = unicode(ET.tostring(elem), "utf-8").encode(...)

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-08-07 07:54

Message:
Logged In: YES
user_id=33168
Originator: NO

Fredrik, could you take a look at this?

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1767933&group_id=5470
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com