Mailing List Archive

xmltv and German umlauts
Hi,

has anyone found a solution to get correct German umlauts in the show
titles and descriptions instead of HTLM special characters? The grabber
by Ben Bucksch doesn't seem to work anymore.

Ralf
Re: xmltv and German umlauts [ In reply to ]
Ralf Haller wrote:

> has anyone found a solution to get correct German umlauts in the show
> titles and descriptions instead of HTLM special characters?

I read the new version of xmltv has a hack to work around that in the
most common cases.

> The grabber by Ben Bucksch doesn't seem to work anymore.

Ah, you saw that, good :-). It works fine for me. What's your problem?

Ben
Re: xmltv and German umlauts [ In reply to ]
Hi Ben,

here's what I get:

[wei@Baden tvmoviefetch]$ python fetchdata.py
Faulty config
[wei@Baden tvmoviefetch]$ more config.py
download_days = 8;
# Get program for that many days, starting from (incl.) today.
# Don't use more days than the provider has online, or you'll produce
errors.
# Can be overridden using --days param.
baseurl = "http://tvmovie.kunde.serverflex.info/onlinedata/xml-gz/";
# baseurl: where to get the data from. With trailing /, if necessary
datadir = "/var/lib/xmltv/";
# Where to put the downloaded and temporary files.
# Must only be writable by the account running this app and trusted users.
outfile = "all.xmltv";
# the resulting xmltv file with all days, relative to data dir.
# Can be overridden using --output param.
instdir = "/opt/tvmoviefetch/"; # where this file is; with trailing /
imagebaseurl = "http://localhost/"; # for channel icons. invalid for now
logfile = "fetchdata.log";
check_wellformed = 1;
# Set to 0, if you don't want to install qtxmlcheck or it is too slow
for you. Note that later
# apps may then bail, if there are any errors in any file, and not
process any of the output.

What's wrong?`

Ralf

Ben Bucksch wrote:

> Ralf Haller wrote:
>
>> has anyone found a solution to get correct German umlauts in the show
>> titles and descriptions instead of HTLM special characters?
>
>
> I read the new version of xmltv has a hack to work around that in the
> most common cases.
>
>> The grabber by Ben Bucksch doesn't seem to work anymore.
>
>
> Ah, you saw that, good :-). It works fine for me. What's your problem?
>
> Ben
>
> _______________________________________________
> mythtv-users mailing list
> mythtv-users@snowman.net
> http://lists.snowman.net/cgi-bin/mailman/listinfo/mythtv-users
>
Re: xmltv and German umlauts [ In reply to ]
Ralf Haller wrote:

> here's what I get:
>
> [wei@Baden tvmoviefetch]$ python fetchdata.py
> Faulty config
> [wei@Baden tvmoviefetch]$ more config.py
> [...]

> What's wrong?

I don't know, with the info you gave me. Did you manually go through all
options and adjust them to your system? Do all the dirs exist? Are they
writable? Is the program installed where you claim it is? Is
fetchdata.log writable? Do you have qtxmlcheck installed?
Re: xmltv and German umlauts [ In reply to ]
Hi Ben,

I forgot to create the directories thet are needed. Now it seems to
work, however I get many warnings:

[wei@Baden tvmoviefetch]$ python fetchdata.py --days 1
downloading max. 1 days, writing to all.xmltv
20030526
Warning: Source file 20030526_516.xml.xml had errors, I have to delete it
Warning: Source file 20030526_029.xml.xml had errors, I have to delete it
Warning: Source file 20030526_530.xml.xml had errors, I have to delete it
Warning: Source file 20030526_010.xml.xml had errors, I have to delete it
Warning: Source file 20030526_019.xml.xml had errors, I have to delete it
Warning: Source file 20030526_090.xml.xml had errors, I have to delete it
Warning: Source file 20030526_179.xml.xml had errors, I have to delete it
Warning: Source file 20030526_015.xml.xml had errors, I have to delete it
Warning: Source file 20030526_107.xml.xml had errors, I have to delete it
Warning: Source file 20030526_011.xml.xml had errors, I have to delete it
Warning: Source file 20030526_189.xml.xml had errors, I have to delete it
Warning: Source file 20030526_005.xml.xml had errors, I have to delete it
20030526_532.xml:261: error: Input is not proper UTF-8, indicate encoding !
ner, Ehefrau des Inhabers und Juden Lucas Steiner, hat die Leitung des
Theaters

^
20030526_532.xml:261: error: Bytes: 0xFC 0x62 0x65 0x72
ner, Ehefrau des Inhabers und Juden Lucas Steiner, hat die Leitung des
Theaters

^
unable to parse 20030526_532.xml
Warning: Source file 20030526_546.xml.xml had errors, I have to delete it
Warning: Source file 20030526_515.xml.xml had errors, I have to delete it
Warning: Source file 20030526_549.xml.xml had errors, I have to delete it
Warning: Source file 20030526_542.xml.xml had errors, I have to delete it
Warning: Source file 20030526_181.xml.xml had errors, I have to delete it
Warning: Source file 20030526_006.xml.xml had errors, I have to delete it
Warning: Source file 20030526_503.xml.xml had errors, I have to delete it
Warning: Source file 20030526_540.xml.xml had errors, I have to delete it
Warning: Source file 20030526_521.xml.xml had errors, I have to delete it
Warning: Source file 20030526_121.xml.xml had errors, I have to delete it
Warning: Source file 20030526_088.xml.xml had errors, I have to delete it
Warning: Source file 20030526_512.xml.xml had errors, I have to delete it
Warning: Source file 20030526_009.xml.xml had errors, I have to delete it
Warning: Source file 20030526_548.xml.xml had errors, I have to delete it
Warning: Source file 20030526_505.xml.xml had errors, I have to delete it
Warning: Source file 20030526_510.xml.xml had errors, I have to delete it
Warning: Source file 20030526_519.xml.xml had errors, I have to delete it
Warning: Source file 20030526_543.xml.xml had errors, I have to delete it
Warning: Source file 20030526_501.xml.xml had errors, I have to delete it
Warning: Source file 20030526_118.xml.xml had errors, I have to delete it
Warning: Source file 20030526_539.xml.xml had errors, I have to delete it
Warning: Source file 20030526_162.xml.xml had errors, I have to delete it
Warning: Source file 20030526_547.xml.xml had errors, I have to delete it
Warning: Source file 20030526_039.xml.xml had errors, I have to delete it
Warning: Source file 20030526_541.xml.xml had errors, I have to delete it
Warning: Source file 20030526_065.xml.xml had errors, I have to delete it
Warning: Source file 20030526_520.xml.xml had errors, I have to delete it
Warning: Source file 20030526_004.xml.xml had errors, I have to delete it
Warning: Source file 20030526_044.xml.xml had errors, I have to delete it
Warning: Source file 20030526_002.xml.xml had errors, I have to delete it
Warning: Source file 20030526_109.xml.xml had errors, I have to delete it
Warning: Source file 20030526_089.xml.xml had errors, I have to delete it
Warning: Source file 20030526_206.xml.xml had errors, I have to delete it
Warning: Source file 20030526_054.xml.xml had errors, I have to delete it
Warning: Source file 20030526_001.xml.xml had errors, I have to delete it
Warning: Source file 20030526_522.xml.xml had errors, I have to delete it
Warning: Source file 20030526_544.xml.xml had errors, I have to delete it
Warning: Source file 20030526_518.xml.xml had errors, I have to delete it
Warning: Source file 20030526_024.xml.xml had errors, I have to delete it
Warning: Source file 20030526_012.xml.xml had errors, I have to delete it
Warning: Source file 20030526_550.xml.xml had errors, I have to delete it
Warning: Source file 20030526_513.xml.xml had errors, I have to delete it
Warning: Source file 20030526_527.xml.xml had errors, I have to delete it
Warning: Source file 20030526_523.xml.xml had errors, I have to delete it
Warning: Source file 20030526_008.xml.xml had errors, I have to delete it
Warning: Source file 20030526_511.xml.xml had errors, I have to delete it
Warning: Source file 20030526_545.xml.xml had errors, I have to delete it
Warning: Source file 20030526_027.xml.xml had errors, I have to delete it
Warning: Source file 20030526_032.xml.xml had errors, I have to delete it
Warning: Source file 20030526_504.xml.xml had errors, I have to delete it
Warning: Source file 20030526_018.xml.xml had errors, I have to delete it
Warning: Source file 20030526_026.xml.xml had errors, I have to delete it
Warning: Source file 20030526_063.xml.xml had errors, I have to delete it
Warning: Source file 20030526_517.xml.xml had errors, I have to delete it
Warning: Source file 20030526_534.xml.xml had errors, I have to delete it
Warning: Source file 20030526_508.xml.xml had errors, I have to delete it
Warning: Source file 20030526_205.xml.xml had errors, I have to delete it
Warning: Source file 20030526_533.xml.xml had errors, I have to delete it
Warning: Source file 20030526_524.xml.xml had errors, I have to delete it

gunzip: 20030526_035.xml.gz: unexpected end of file
sed: kann 20030526_035.xml nicht lesen: Datei oder Verzeichnis nicht
gefunden
Traceback (most recent call last):
File "fetchdata.py", line 66, in ?
File "functions.py", line 13, in replace_entities
IOError: [Errno 2] No such file or directory: '20030526_035.xml'

Maybe TVMovie changed their file format?

Is MythTV capable of displaying ä as ä? Using tv_grab_de shows me
these ugly HTML special characters (ä) in the schedule.

Ralf

Ben Bucksch wrote:

> Ralf Haller wrote:
>
>> here's what I get:
>>
>> [wei@Baden tvmoviefetch]$ python fetchdata.py
>> Faulty config
>> [wei@Baden tvmoviefetch]$ more config.py
>> [...]
>
>
>> What's wrong?
>
>
> I don't know, with the info you gave me. Did you manually go through
> all options and adjust them to your system? Do all the dirs exist? Are
> they writable? Is the program installed where you claim it is? Is
> fetchdata.log writable? Do you have qtxmlcheck installed?
>
> _______________________________________________
> mythtv-users mailing list
> mythtv-users@snowman.net
> http://lists.snowman.net/cgi-bin/mailman/listinfo/mythtv-users
>
Re: xmltv and German umlauts [ In reply to ]
Ralf Haller wrote:

> Warning: Source file 20030526_516.xml.xml had errors, I have to delete it

I don't know why that is. (The additional ".xml" is a bug in the text
only, I think, ignore it.) Obviously, there's something wrong with the
file (it's empty or trncated or not converted properly or the like), but
I don't know why and what. Are you sure that you followed all the
instructions precisely, incl. requirements? In particular, does your sed
support the -i option?

> gunzip: 20030526_035.xml.gz: unexpected end of file
> sed: kann 20030526_035.xml nicht lesen: Datei oder Verzeichnis nicht
> gefunden

This happens to me sometimes as well. Usually, fetching again or a day
later fixes it. I should add a check for that (later).
I also get a lot of "duplicate entry ... for key ...", but that
shouldn't be harmful and that might be because some commandline options
are not implemented. I should add that as well.

> Maybe TVMovie changed their file format?

No. Again: I use it here and it works for me, so it can't be the format.

> Is MythTV capable of displaying ä as ä?

Yes.
Re: xmltv and German umlauts [ In reply to ]
Ben Bucksch wrote:

> Ralf Haller wrote:
>
>> Warning: Source file 20030526_516.xml.xml had errors, I have to
>> delete it
>
>
> I don't know why that is. (The additional ".xml" is a bug in the text
> only, I think, ignore it.) Obviously, there's something wrong with the
> file (it's empty or trncated or not converted properly or the like),
> but I don't know why and what. Are you sure that you followed all the
> instructions precisely, incl. requirements? In particular, does your
> sed support the -i option?

Yes, sed support the -i option.

When viewing the file
http://tvmovie.kunde.serverflex.info/onlinedata/xml-gz/20030526_001.xml.gz
with IE (sorry, I'm at work right now) I get:

- <Sendung>
<SendungID>2798516</SendungID>
<Titel>Fliege - Die Talkshow</Titel>
<Datum>2003-05-26</Datum>
<Zeit>16:00:00</Zeit>
<Dauer>60</Dauer>
+ <Flags>
<stereo />
<telefon />
</Flags>
<Showview>79-317</Showview>
<DiffAktualisierung>2</DiffAktualisierung>
<DiffAenderung>-1</DiffAenderung>
The XML page cannot be displayed

Cannot view XML input using XSL style sheet. Please correct the error
and then click the Refresh button, or try again later.


--------------------------------------------------------------------------------

An invalid character was found in text content. Error processing
resource
'http://tvmovie.kunde.serverflex.info/onlinedata/xml-gz/20030526_001.xml.gz'.
Line 575, Position 26

<Text>Zum 102. Mal

ss="b" onclick="return false" onfocus="h()" STYLE="visibility:hidden">-
<!-- relations to Info
-->
- <!-- reference to Film
-->
- <TVShow>
<OriginalTitel>Albtraum Eigenheim</OriginalTitel>
<Jahr>0</Jahr>
<FSK>0</FSK>

There seems to be an error in the file. Can you reproduce that effect?

Ralf

>
>> gunzip: 20030526_035.xml.gz: unexpected end of file
>> sed: kann 20030526_035.xml nicht lesen: Datei oder Verzeichnis nicht
>> gefunden
>
>
> This happens to me sometimes as well. Usually, fetching again or a day
> later fixes it. I should add a check for that (later).
> I also get a lot of "duplicate entry ... for key ...", but that
> shouldn't be harmful and that might be because some commandline
> options are not implemented. I should add that as well.
>
>> Maybe TVMovie changed their file format?
>
>
> No. Again: I use it here and it works for me, so it can't be the format.
>
>> Is MythTV capable of displaying &auml; as ä?
>
>
> Yes.
>
> _______________________________________________
> mythtv-users mailing list
> mythtv-users@snowman.net
> http://lists.snowman.net/cgi-bin/mailman/listinfo/mythtv-users
>
Re: xmltv and German umlauts [ In reply to ]
Ralf Haller wrote:

> with IE (sorry, I'm at work right now) I get:
> An invalid character was found in text content. Error processing
> resource
> 'http://tvmovie.kunde.serverflex.info/onlinedata/xml-gz/20030526_001.xml.gz'.
> Line 575, Position 26
> <Text>Zum 102. Mal

The XML source file is not well-formed ("Live&amp;uuml;bertragung"). My
converter fixes that using sed before further processing the file. So,
checking the file on the server won't give useful results.
Re: xmltv and German umlauts [ In reply to ]
Hi Ben,

it seems that sed doesn't make that change on my system? I am using sed
version 4.05!

[root@Baden etc]# sed --version
GNU sed Version 4.0.5
Copyright (C) 2002 Free Software Foundation, Inc.

Ralf

Ben Bucksch wrote:

> Ralf Haller wrote:
>
>> with IE (sorry, I'm at work right now) I get:
>> An invalid character was found in text content. Error processing
>> resource
>> 'http://tvmovie.kunde.serverflex.info/onlinedata/xml-gz/20030526_001.xml.gz'.
>> Line 575, Position 26
>> <Text>Zum 102. Mal
>
>
> The XML source file is not well-formed ("Live&amp;uuml;bertragung").
> My converter fixes that using sed before further processing the file.
> So, checking the file on the server won't give useful results.
>
> _______________________________________________
> mythtv-users mailing list
> mythtv-users@snowman.net
> http://lists.snowman.net/cgi-bin/mailman/listinfo/mythtv-users
>
Re: xmltv and German umlauts [ In reply to ]
Hi Ben,

it works now. The problem was that I didn't install qtxmlcheck. I
disabled the xml check in the config file and the warning disappeared.

But how can I prevent that all channels are imported in the myth
database? I want only those channels I can receive.

BTW, thanks for your patience!

Ralf

Ralf Haller wrote:

> Hi Ben,
>
> it seems that sed doesn't make that change on my system? I am using
> sed version 4.05!
>
> [root@Baden etc]# sed --version
> GNU sed Version 4.0.5
> Copyright (C) 2002 Free Software Foundation, Inc.
>
> Ralf
>
> Ben Bucksch wrote:
>
>> Ralf Haller wrote:
>>
>>> with IE (sorry, I'm at work right now) I get:
>>> An invalid character was found in text content. Error processing
>>> resource
>>> 'http://tvmovie.kunde.serverflex.info/onlinedata/xml-gz/20030526_001.xml.gz'.
>>> Line 575, Position 26
>>> <Text>Zum 102. Mal
>>
>>
>>
>> The XML source file is not well-formed ("Live&amp;uuml;bertragung").
>> My converter fixes that using sed before further processing the file.
>> So, checking the file on the server won't give useful results.
>>
>> _______________________________________________
>> mythtv-users mailing list
>> mythtv-users@snowman.net
>> http://lists.snowman.net/cgi-bin/mailman/listinfo/mythtv-users
>>
>
>
> _______________________________________________
> mythtv-users mailing list
> mythtv-users@snowman.net
> http://lists.snowman.net/cgi-bin/mailman/listinfo/mythtv-users
>
Re: xmltv and German umlauts [ In reply to ]
Ralf Haller wrote:

> The problem was that I didn't install qtxmlcheck.

ahhhhhhhhhhhrg.

> I disabled the xml check in the config file and the warning disappeared.

I'd suggest to use it. Otherwise, if there is a single problem in any of
the files, mythfilldatabase won't import anything at all, IIRC.

> But how can I prevent that all channels are imported in the myth
> database? I want only those channels I can receive.

I don't know, I have the same problem. It's a problem with
mythfilldatabase. The --update option is supposed to do that, I think,
but doesn't. At least for me, maybe I am just using |su| wrongly:
su mythtv -c "/usr/local/bin/mythfilldatabase --quiet --update"
Re: xmltv and German umlauts [ In reply to ]
Ben Bucksch wrote:

> Ralf Haller wrote:
>
>> The problem was that I didn't install qtxmlcheck.
>
>
> ahhhhhhhhhhhrg.

There's only one thing to say: RTFM! ;-) Now it works like it's supposed to.

>
>> I disabled the xml check in the config file and the warning disappeared.
>
>
> I'd suggest to use it. Otherwise, if there is a single problem in any
> of the files, mythfilldatabase won't import anything at all, IIRC.

I need the QT3 in order to translate qtxmlcheck. Is there a rpm for RedHat?

>
>> But how can I prevent that all channels are imported in the myth
>> database? I want only those channels I can receive.
>
>
> I don't know, I have the same problem. It's a problem with
> mythfilldatabase. The --update option is supposed to do that, I think,
> but doesn't. At least for me, maybe I am just using |su| wrongly:
> su mythtv -c "/usr/local/bin/mythfilldatabase --quiet --update"

I solved the problem by commenting out the channels in channel.py I
don't want. It's not the best solution but it works.

Another thing I noticed is that ' is displayed as '; The files frm
tvmovie already contains these faults. Maybe you should add this to
fix-sourcexml.sed?

Ralf

>
> _______________________________________________
> mythtv-users mailing list
> mythtv-users@snowman.net
> http://lists.snowman.net/cgi-bin/mailman/listinfo/mythtv-users
>