Mailing List Archive

utf-8 fixes for metadata.xml
I've just gone through and fixed all of the broken utf-8 in metadata.xml
files. I think... There was rather a lot of it due to various editor
bugs which I'm hoping are no longer an issue. Requests:

- Could anyone who can read whatever language 'vi' is please check a few
of the category metadata.xml files?

- Could anyone editing metadata.xml files please do a 'cvs diff' and
check that whatever you did didn't screw up any encodings? Usually you
can tell by checking that cvs diff isn't giving you any expected lines
changed on things that contain strange characters. If things do seem to
be going screwy, please let me know what you were doing that did it so
that I can figure out whether we still have broken editors / tools out
there... If you're entering utf-8 I'd hope that you'd know how to check
already :)

- If anyone is still going to throw a hissy fit and refuse to play nice
with this utf-8 thing, please let me know now so that I don't waste more
time fixing things.

There are still broken ChangeLogs and one or two h0rked ebuilds in the
tree. I'll get to those once I'm sure that I'm not wasting my time.

--
Ciaran McCreesh
--
gentoo-dev@gentoo.org mailing list
Re: utf-8 fixes for metadata.xml [ In reply to ]
On Fri, Aug 05, 2005 at 01:13:31AM +0100, Ciaran McCreesh wrote:
> I've just gone through and fixed all of the broken utf-8 in metadata.xml
> files. I think... There was rather a lot of it due to various editor
> bugs which I'm hoping are no longer an issue. Requests:
>
> - Could anyone who can read whatever language 'vi' is please check a few
> of the category metadata.xml files?
vim-6.3.084 still breaks them, unless you set encoding=utf-8 in your vim
settings. It's the same bug I spoke to you about earlier today.

Open the metadata.xml (see no '[converted]' text).
Save metadata.xml (see the '[converted]' text).
File is now broken.

As a vim workaround, maybe force encoding=utf-8 in the gentoo filetype
stuff?

--
Robin Hugh Johnson
E-Mail : robbat2@orbis-terrarum.net
Home Page : http://www.orbis-terrarum.net/?l=people.robbat2
ICQ# : 30269588 or 41961639
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
Re: utf-8 fixes for metadata.xml [ In reply to ]
maillog: 04/08/2005-19:15:15(-0700): Robin H. Johnson types
> On Fri, Aug 05, 2005 at 01:13:31AM +0100, Ciaran McCreesh wrote:
> > I've just gone through and fixed all of the broken utf-8 in metadata.xml
> > files. I think... There was rather a lot of it due to various editor
> > bugs which I'm hoping are no longer an issue. Requests:
> >
> > - Could anyone who can read whatever language 'vi' is please check a few
> > of the category metadata.xml files?
> vim-6.3.084 still breaks them, unless you set encoding=utf-8 in your vim
> settings. It's the same bug I spoke to you about earlier today.
>
> Open the metadata.xml (see no '[converted]' text).
> Save metadata.xml (see the '[converted]' text).
> File is now broken.
>
> As a vim workaround, maybe force encoding=utf-8 in the gentoo filetype
> stuff?

But utf-8 is supposed to be autodetected, since the default
fileencodings always contains utf-8, doesn't it? It is not
autodetected only if the file is not *strictly* utf-8.

--
() Georgi Georgiev () C-3PO: We're doomed! ()
() chutz@gg3.net () ()
() +81(90)2877-8845 () ()
Re: utf-8 fixes for metadata.xml [ In reply to ]
On Fri, Aug 05, 2005 at 11:28:46AM +0900, Georgi Georgiev wrote:
> > As a vim workaround, maybe force encoding=utf-8 in the gentoo filetype
> > stuff?
> But utf-8 is supposed to be autodetected, since the default
> fileencodings always contains utf-8, doesn't it? It is not
> autodetected only if the file is not *strictly* utf-8.
It sets 'fileencoding' correctly, but 'encoding' is not set at all.

Here's the vim settings that I see after opening the file and running :set.

autoindent hlsearch tabstop=4 ttymouse=xterm
backspace=2 ruler textwidth=80 viminfo='20,"500
history=50 shiftwidth=4 ttyfast
commentstring=<!--%s-->
fileencoding=utf-8
fileencodings=ucs-bom,utf-8,default
filetype=gentoo-metadata
suffixes=.bak,~,.o,.h,.info,.swp,.obj,.info,.aux,.log,.dvi,.bbl,.out
syntax=gentoo-metadata

--
Robin Hugh Johnson
E-Mail : robbat2@orbis-terrarum.net
Home Page : http://www.orbis-terrarum.net/?l=people.robbat2
ICQ# : 30269588 or 41961639
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
Re: utf-8 fixes for metadata.xml [ In reply to ]
maillog: 04/08/2005-19:35:01(-0700): Robin H. Johnson types
> On Fri, Aug 05, 2005 at 11:28:46AM +0900, Georgi Georgiev wrote:
> > > As a vim workaround, maybe force encoding=utf-8 in the gentoo filetype
> > > stuff?
> > But utf-8 is supposed to be autodetected, since the default
> > fileencodings always contains utf-8, doesn't it? It is not
> > autodetected only if the file is not *strictly* utf-8.
> It sets 'fileencoding' correctly, but 'encoding' is not set at all.
>
> Here's the vim settings that I see after opening the file and running :set.
>
> autoindent hlsearch tabstop=4 ttymouse=xterm
> backspace=2 ruler textwidth=80 viminfo='20,"500
> history=50 shiftwidth=4 ttyfast
> commentstring=<!--%s-->
> fileencoding=utf-8
> fileencodings=ucs-bom,utf-8,default
> filetype=gentoo-metadata
> suffixes=.bak,~,.o,.h,.info,.swp,.obj,.info,.aux,.log,.dvi,.bbl,.out
> syntax=gentoo-metadata

What does "set enc?" say?

Anyway, setting enc=utf-8 when your terminal is using something else
makes the output look like shit. Furthermore, you wouldn't be able to
input any non-ascii characters anyway (or maybe you will, depending on
your locale, but they would appear as garbage on the screen). If you're
not going to go anywhere near non-ascii text it might be OK.

I guess you're better off using gvim if "locale -k charmap" doesn't say
UTF-8 in your term.

--
-* Georgi Georgiev -* Each honest calling, each walk of life, -*
*- chutz@gg3.net *- has its own elite, its own aristocracy *-
-* +81(90)2877-8845 -* based on excellence of performance. -- -*
*- ------------------- *- James Bryant Conant *-
Re: utf-8 fixes for metadata.xml [ In reply to ]
On Fri, Aug 05, 2005 at 12:46:13PM +0900, Georgi Georgiev wrote:
> What does "set enc?" say?
:set enc?
encoding=latin1

> Anyway, setting enc=utf-8 when your terminal is using something else
> makes the output look like shit. Furthermore, you wouldn't be able to
> input any non-ascii characters anyway (or maybe you will, depending on
> your locale, but they would appear as garbage on the screen). If you're
> not going to go anywhere near non-ascii text it might be OK.
I don't plan to edit any non-ascii text in my $EDITOR of choice. The
only place I ever write non-ASCII stuff is a word processor.

> I guess you're better off using gvim if "locale -k charmap" doesn't say
> UTF-8 in your term.
I'm using aterm, and I refuse to use a graphical vim.

$ locale -k charmap
charmap="ANSI_X3.4-1968"

--
Robin Hugh Johnson
E-Mail : robbat2@orbis-terrarum.net
Home Page : http://www.orbis-terrarum.net/?l=people.robbat2
ICQ# : 30269588 or 41961639
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85
Re: utf-8 fixes for metadata.xml [ In reply to ]
On Thu, 4 Aug 2005 19:15:15 -0700 "Robin H. Johnson"
<robbat2@gentoo.org> wrote:
| vim-6.3.084 still breaks them, unless you set encoding=utf-8 in your
| vim settings. It's the same bug I spoke to you about earlier today.

Bleh, I just figured that one out. The best way around it is to export
LC_ALL=en_US.utf8 before editing UTF-8 files.

| As a vim workaround, maybe force encoding=utf-8 in the gentoo filetype
| stuff?

Hrm, we can't mess with encoding without breaking things for people
using silly terminals (like aterm, gnome-terminal and konsole) that lie
about their $TERM.

--
Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron)
Mail : ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm
Re: utf-8 fixes for metadata.xml [ In reply to ]
Ciaran McCreesh wrote:
> Bleh, I just figured that one out. The best way around it is to export
> LC_ALL=en_US.utf8 before editing UTF-8 files.

I've had some troubles (mainly `less` and some X apps working
incorrectly) with LC_CTYPE=cs_CZ.utf8 which were fixed by
LC_CTYPE=cs_CZ.UTF-8. It seems that the case doesn't matter, but the "-"
is important.

> Hrm, we can't mess with encoding without breaking things for people
> using silly terminals (like aterm, gnome-terminal and konsole) that lie
> about their $TERM.

My konsole reports "xterm"; what's wrong with that?

-jkt


--
cd /local/pub && more beer > /dev/mouth
Re: utf-8 fixes for metadata.xml [ In reply to ]
On Sun, 07 Aug 2005 12:24:35 +0200 Jan Kundrát <jkt@gentoo.org> wrote:
| > Hrm, we can't mess with encoding without breaking things for people
| > using silly terminals (like aterm, gnome-terminal and konsole) that
| > lie about their $TERM.
|
| My konsole reports "xterm"; what's wrong with that?

konsole is not an xterm. xterm is capable of handling all kinds of
termcap sequences that konsole is not. By lying about its term name,
konsole is making things very hard for people who write applications
which can make use of these sequences -- should the app assume that
xterm means "I am a real xterm" or "I am some lame terminal pretending
to be xterm"? The former screws over non-xterm users, the latter screws
over xterm users.

--
Ciaran McCreesh : Gentoo Developer (Vim, Shell tools, Fluxbox, Cron)
Mail : ciaranm at gentoo.org
Web : http://dev.gentoo.org/~ciaranm