Mailing List Archive

[issue5100] ElementTree.iterparse and Element.tail confusion
New submission from Jeroen Dirks <jeroen.dirks@oracle.com>:

I am using cElementTree.iterparse in order to parse through a huge XML
document and filter out sections of interest.

The usage pattern is that I wait for an "end" event for a element of
interest and then if it matches a some criterium I write it out using
cElementTree.tostring().

My code had bug in it because the cElementTree.tostring methods prints
the element including its tail. The element retreived from the iterparse
iterator sometimes contains the tail by the time it emits the end event
but sometimes it does not.

In my document the tail just consisted of the newline '\n' character and
about 98% of the time it was attached to the element during its end event.

This is rather confusing behavior.

Could ElementTree/cElementTree.iterparse be changed so that if you
respond to the end event for an element its tail is never set?

----------
components: XML
messages: 80783
nosy: jeroen.dirks
severity: normal
status: open
title: ElementTree.iterparse and Element.tail confusion
type: behavior
versions: Python 2.6

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5100>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com