Mailing List Archive

[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False.
New submission from Takeshi Matsuyama <tksmashiw@gmail.com>:

When I make a dictionary by parsing "legacy-icon-mapping.xml"(which is a
part of
icon-naming-utils[http://tango.freedesktop.org/Tango_Icon_Library]) with
the following script, the three keys of the dictionary are collapsed if
the "buffer_text" attribute is False.

=====================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import with_statement
import sys
from xml.parsers.expat import ParserCreate
import codecs

class Database:
"""Make a dictionary which is accessible by Databese.dict"""
def __init__(self, buffer_text):
self.cnt = None
self.name = None
self.data = None
self.dict = {}
p = ParserCreate()
p.buffer_text = buffer_text

p.StartElementHandler = self.start_element
p.EndElementHandler = self.end_element
p.CharacterDataHandler = self.char_data

with open("/usr/share/icon-naming-utils/legacy-icon-mapping.xml",
'r') as f:
p.ParseFile(f)

def start_element(self, name, attrs):
if name == 'context':
self.cnt = attrs["dir"]
if name == 'icon':
self.name = attrs["name"]

def end_element(self, name):
if name == 'link':
self.dict[self.data] = (self.cnt, self.name)

def char_data(self, data):
self.data = data.strip()

def print_set(aset):
for e in aset:
print '\t' + e

if __name__ == '__main__':
sys.stdout = codecs.getwriter('utf_8')(sys.stdout)
map_false_dict = Database(False).dict
map_true_dict = Database(True).dict
print "The keys which exist if buffer_text=False but don't exist if
buffer_text=True are"
print_set(set(map_false_dict.keys()) - set(map_true_dict.keys()))
print "The keys which exist if buffer_text=True but don't exist if
buffer_text=False are"
print_set(set(map_true_dict.keys()) - set(map_false_dict.keys()))
=====================

The result of running this script is
======================
The keys which exist if buffer_text=False but don't exist if
buffer_text=True are
rt-descending
ock_text_right
lc
The keys which exist if buffer_text=True but don't exist if
buffer_text=False are
stock_text_right
gnome-mime-application-vnd.stardivision.calc
gtk-sort-descending
======================
I confirmed it in Python-2.5.2 on Fedora 10.

----------
components: XML
messages: 80398
nosy: tksmashiw
severity: normal
status: open
title: xml.parsers.expat make a dictionary which keys are broken if buffer_text is False.
type: behavior
versions: Python 2.5

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5036>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False. [ In reply to ]
Gabriel Genellina <gagsl-py2@yahoo.com.ar> added the comment:

If the xml file is small enough, could you attach it to the issue? Or
provide a download location? I could not find it myself (without
downloading the whole package)

(Note that Python 2.5 only gets security fixes now, so unless this
still fails with 2.6 or later, this issue is likely to be closed)

----------
nosy: +gagenellina

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5036>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False. [ In reply to ]
Takeshi Matsuyama <tksmashiw@gmail.com> added the comment:

Thanks for reply!

>If the xml file is small enough, could you attach it to the issue? Or
>provide a download location?
Sorry, I found here.
http://webcvs.freedesktop.org/icon-theme/icon-naming-utils/legacy-icon-mapping.xml?revision=1.75&content-type=text%2Fplain&pathrev=1.75

>(Note that Python 2.5 only gets security fixes now, so unless this
>still fails with 2.6 or later, this issue is likely to be closed)
I roughly confirmed the same problem on python-3.0 on MS Windows 2 weeks
ago, but need to verify more strictly...

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5036>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False. [ In reply to ]
HiroakiKawai <kawai@apache.org> added the comment:

The sample code has bug. expat is OK.

Method char_data must append the incoming characters because the
character sequence is an buffered input.
def char_data(self, data):
self.data += data

You should reset it by self.data = '' at end_element().

----------
nosy: +kawai

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5036>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False. [ In reply to ]
Takeshi Matsuyama <tksmashiw@gmail.com> added the comment:

Hi kawai.
I got correct output by modifying the code like you say, but I still
cannot understand why this happens.
Could you tell me more briefly, or point any documents about it?
I can't find any notes which say don't pass strings but append it for
CharacterDataHandler in official documents.
Does everyone know/understand it already? Only I am so stupid? (;;)

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5036>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False. [ In reply to ]
HiroakiKawai <kawai@apache.org> added the comment:

That's the spec of XML SAX interface.

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5036>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False. [ In reply to ]
HiroakiKawai <kawai@apache.org> added the comment:

Please read "The ContentHandler.characters() callback is missing data!"
http://www.saxproject.org/faq.html

and close this issue :)

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5036>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False. [ In reply to ]
Takeshi Matsuyama <tksmashiw@gmail.com> added the comment:

a mistake of my former message, briefly -> in detail

>Please read "The ContentHandler.characters() callback is missing data!"
>http://www.saxproject.org/faq.html
I was just reading above site. it is now very clear for me.
Thanks kawai and I'm sorry to take up your time, gagenellina.

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5036>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False. [ In reply to ]
Takeshi Matsuyama <tksmashiw@gmail.com> added the comment:

>From msg80438
>You should reset it by self.data = '' at end_element().

It seems that we should reset it at start_element() like this,
============================
def start_element(self, name, attrs):
...abbr...
if name == 'link':
self.data = ''
=============================
or unwanted \s, \t, and \n mix in "self.data".
That's all, thanks.

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5036>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False. [ In reply to ]
Takeshi Matsuyama <tksmashiw@gmail.com> added the comment:

Could someone close this?

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5036>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False. [ In reply to ]
Changes by Benjamin Peterson <benjamin@python.org>:


----------
resolution: -> invalid
status: open -> closed

_______________________________________
Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue5036>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com