Mailing List Archive

A very simple category proposal
We already have almost everything we need for categorization of articles,
except for

1) A category namespace.

2) Some code to render links to this namespace differently than normal
links, and some code to render pages within the category namespace.

How it would work:

You add a link to

[[Category:stub]]
[[Category:movie]]
[[Category:painter]]

or whatever; on the page this is rendered a bit like the interlanguage
links, separate from the textual content:

This article belongs into the following categories:
[[Category:stub]] - [[Category:movie]]

etc.

When you visit a Category page, you then automatically get a two-part page
that consists of

1) Editable category description
2) Result of the "What links here" query

This would allow us to make many, many pages more effective:

- Find or fix a stub could be replaced with [[Category:Stub]] in stub
articles.

- "Links to disambiguation pages" could be replaced with
[[Category:disambiguation]].

- "Pages in need of attention" could be replaced with [[Category:Work
needed]].

All these pages currently work very ineffectively because links are often
not removed when they need to be, and the pages get long and unwieldy. I'm
hesitant regarding "Votes for deletion", but with some additional work
that could also be changed to use the new scheme.

Aside from that, we could start categorizing the Wikipedia content itself
more effectively. Many of our "lists of" could be changed into auto-
generated category pages. List gets too long? Then nest categories by
adding a category tag to a category page.

Thoughts? Objections? I consider implementing this myself if nobody cares
enough to do it.

Regards,

Erik
Re: A very simple category proposal [ In reply to ]
On sab, 2003-02-15 at 17:53, Erik Moeller wrote:
> When you visit a Category page, you then automatically get a two-part page
> that consists of
>
> 1) Editable category description
> 2) Result of the "What links here" query

...which itself is *desperately* in need of being made to better handle
large numbers of links. Backlinks for articles linked to by 30,000 US
cities, for instance, are rather hard to navigate.

> Aside from that, we could start categorizing the Wikipedia content itself
> more effectively. Many of our "lists of" could be changed into auto-
> generated category pages. List gets too long? Then nest categories by
> adding a category tag to a category page.

Sorting is usually a significant issue in 'lists of X'. To replace
manually created lists, it would be necessary to be able to specify a
'sort by' string with the categorization, which could be a 'Last, First'
style name, a year, quantity or other numeric sort, etc.

<ugly idea>
[[Category:Novelist|Hemingway, Ernest]]
[[Category:Events in World War II|1941-12-07]]
[[Category:Largest cities by population|-3694820]]
(negative so the numeric sort begins with the largest pops)
</ugly>

-- brion vibber (brion @ pobox.com)
Re: A very simple category proposal [ In reply to ]
> Sorting is usually a significant issue in 'lists of X'. To replace
> manually created lists, it would be necessary to be able to specify a
> 'sort by' string with the categorization, which could be a 'Last, First'
> style name, a year, quantity or other numeric sort, etc.

> <ugly idea>
> [[Category:Novelist|Hemingway, Ernest]]
> [[Category:Events in World War II|1941-12-07]]
> [[Category:Largest cities by population|-3694820]]
> (negative so the numeric sort begins with the largest pops)
> </ugly>

This would work; hard to implement, though. The "What links here" query
would need to grep every page for the [[Category:]] string. Of course, the
magic we could do with that might be worth it. This came up before
regarding the paper version, and we ultimately have to think about a
solution (preferably one which allows several sorting criteria). What I'm
worried about is mostly the UI, anything that's not part of the article
text adds UI complexity. (The other side of the coin: Any logic we add to
page contents adds code complexity.)

But this is advanced stuff, I think we should first implement a basic
version for the stubs, disambiguation pages etc. and then think about
issues like advanced sorting. Having names in person lists under the first
rather than the last letter would not be so terrible for the moment
either.

Regards,

Erik
Re: A very simple category proposal [ In reply to ]
On sab, 2003-02-15 at 18:53, Erik Moeller wrote:
> > <ugly idea>
> > [[Category:Novelist|Hemingway, Ernest]]
> > [[Category:Events in World War II|1941-12-07]]
> > [[Category:Largest cities by population|-3694820]]
> > (negative so the numeric sort begins with the largest pops)
> > </ugly>
>
> This would work; hard to implement, though. The "What links here" query
> would need to grep every page for the [[Category:]] string.

Rather, on page save we'd add it to either yet another table or, more
likely, another column in 'links'.

-- brion vibber (brion @ pobox.com)
Re: A very simple category proposal [ In reply to ]
Does *anyone* remember what happened when I introduced that category
link thingy (in Phase 2, IIRC)? I vaguely remember an outcry on
wikipedia-l...

I think it would be more reasonable to
1. Expand the Phase 3 software to be used at the sifter project (meaning
improved user rights management, new skin, and a special import page,
basically). I'll start on that soon at the test site.
2. Set up sifter project (my preference would be on the wikipedia server ;-)
3. Implement a category scheme there. It would use drop-down boxes or
the like rather than text edits since, well, noone can edit a sifter page...
(4. If a wikipedia majority likes it, we can use it on wikipedia as well...)

Magnus
Re: A very simple category proposal [ In reply to ]
On dim, 2003-02-16 at 02:33, Magnus Manske wrote:
> Does *anyone* remember what happened when I introduced that category
> link thingy (in Phase 2, IIRC)? I vaguely remember an outcry on
> wikipedia-l...

Before our time, old man. :)

> I think it would be more reasonable to
> 1. Expand the Phase 3 software to be used at the sifter project (meaning
> improved user rights management, new skin, and a special import page,
> basically). I'll start on that soon at the test site.
> 2. Set up sifter project (my preference would be on the wikipedia server ;-)
> 3. Implement a category scheme there. It would use drop-down boxes or
> the like rather than text edits since, well, noone can edit a sifter page...

Bleah! No offense. :) However, a category system must IMHO be:

1) open-ended -- I need to be able to add [[Category:California cities]]
or [[Category:Esperanto literature]]. Maybe only two people are
interested in the category, but that should be enough.

2) **Integrated** -- filtering of Recentchanges by a selection of
categories would be more powerful than the current watchlist system (in
a sort of meta-sense, the watchlist establishes a category of 'pages
user X is interested in'). If it's sitting only on sifter, it's 100%
useless to me. With thousands of edits per day on the wiki, I as an
editor *need* to have a useful way to track things that will be of
interest to me, and that's where I need the categories.

Whatlinkshere, recentchangeslinked, and watchlist are crude movements in
the direction of what is needed. But they are limited:

* Watchlist requires that you personally add things to it. New topics
that will be of interest to you won't show up until you find them in
some other way.

* What links here doesn't distinguish between essential links that
indicate a categorical relation and incidental links, and performs no
useful ordering or organization. And it does not handle large numbers of
links at all well.

* Recent changes linked plus a topic list page isn't half bad, but
centralized topic lists get neglected and fall out of sync. Pages are
renamed, deleted, or added and the list isn't updated. (On the up side,
potential pages can be listed in the topic list and will show up once
they're created.) For tracking edits in a category it's ok, but you
can't presently combine multiple categories in one display short of
physically combining the lists into one page (and again the sync
problems!)

Category tags with a better list/backlink system aren't perfect either.
They would handle some problems:
* A renamed page will be carried over automatically (vs rclinked)
* If another editor adds a page to a category you're watching, you see
it too (vs watchlist)

but have others:
* Nonexistent pages can't be categorized; a newly created page will
appear in no category until someone edits it and adds a link.

There are perhaps ways around this. Like the proposed new system for
language links, a secondary table could store category information even
for 'potential' pages. But that brings us back to extensibility and
editability problems.

A "shared watchlist" might be a direction to go in; category pages could
be created at will, there could be a table listing category connections
and some programmatic way of adding pages to them like adding to the
per-user watchlist (with an extra step of selecting a category). If
you're watching a category page, you get the pages in that category
highlighted in rc as well. That still leaves the issue of sorting and
presentation of category lists, but that may be easier in an add form
than in a funky link.

There's also the possibility of automatic categorization based on
checking the categorization of linking pages, but this way leads
artificial intelligence and the end of life as we know it. :)

Anyway, just thoughts.

-- brion vibber (brion @ pobox.com)
Re: A very simple category proposal [ In reply to ]
> Does *anyone* remember what happened when I introduced that category
> link thingy (in Phase 2, IIRC)? I vaguely remember an outcry on
> wikipedia-l...

Nope. Got any links?
I don't care too much about sifter (I think it's a good idea, but the
outcome depends on the people involved), and I think a basic category
scheme is useful for Wikipedia proper. Note that first and foremost I want
to use it to fix the typically outdated "pages needing attention" pages.
Advanced stuff like filtering RC by category is nice, but not immediately
needed.

As for Brion's remark that non-existent pages would not be listed on a
category page, every category page would have an editable and an
autogenerated part; in the editable part it would be possible to add a
"list of pages that need to be written" for every category. Such a clear
visual separation might actually make sense and encourage people more to
write missing articles.

Regards,

Erik
Re: A very simple category proposal [ In reply to ]
On dim, 2003-02-16 at 04:38, Erik Moeller wrote:
> > Does *anyone* remember what happened when I introduced that category
> > link thingy (in Phase 2, IIRC)? I vaguely remember an outcry on
> > wikipedia-l...
>
> Nope. Got any links?

(google google)
http://www.wikipedia.org/pipermail/wikipedia-l/2002-January/thread.html#1046

> As for Brion's remark that non-existent pages would not be listed on a
> category page, every category page would have an editable and an
> autogenerated part; in the editable part it would be possible to add a
> "list of pages that need to be written" for every category. Such a clear
> visual separation might actually make sense and encourage people more to
> write missing articles.

Visual separation is one thing, but functional separation is what I'm
hoping to avoid. :) Automatically categorizing new articles under
category(ies) whose pages link them might do it.

-- brion vibber (brion @ pobox.com)
Re: A very simple category proposal [ In reply to ]
Erik Moeller wrote:

>Thoughts? Objections? I consider implementing this myself if nobody cares
>enough to do it.

I think there are some problems with the idea of creating a separate
category namespace.

(1) It creates an ambiguation problem. For example, should an article
about "events in U.S. history" appear in the main Wikipedia namespace
or in the the "category" namespace?

(2) It could fragment the Wikipedia. All of the other namespaces --
Talk, User, Wikipedia, etc. -- serve some sort of "meta" function
with respect to the actual "encyclopedia" part of Wikipedia, so it
makes sense to compartmentalize them in a separate namespace. This
isn't the case with "category" articles, which are themselves part of
the encyclopedia.

(3)

In addition to these problems, creating a category namespace only
provides limited utility. Thinking ahead a bit, I think something
more flexible and powerful would better serve the project as it grows
and matures. Rather than simply creating a category namespace, I
think it would be better to develop a system of object typing like
the one I described awhile ago. The Wiki syntax could be expanded to
include ways of marking object types and properties, which could then
be used in turn to support data mining. For example, if we had a way
of marking certain articles as "people" objects with properties such
as first name, last name, date of birth, etc., the software could
autogenerate lists of people sorted either alphabetically or by date
of birth. Also, the autogenerated lists could include selected
properties extracted from each of the articles in the list. Here's
some pseudocode to give an idea of how someone might generate a list
of U.S. presidents:

<LIST ORDER=year of inauguration>*'''[U.S. President->name]''', [U.S.
President->year of inauguration], [U.S. President->party affiliation]
</LIST>

The idea is that this would expand to the equivalent of:

*'''James Earl Carter''', 1976, Democrat
*'''Ronald Wilson Reagan''', 1980, Republican
*'''George Herbert Walker Bush''', 1988, Republican
*'''William Jefferson Clinton''', 1992, Democrat
*'''George W. Bush''', 2000, Republican

This sort of thing could be coded into Wikipedia with an object
model, but I don't see how creating a category namespace would make
it possible.
--
--------------------------------
| Sheldon Rampton
| Editor, PR Watch (www.prwatch.org)
| Author of books including:
| Friends In Deed: The Story of US-Nicaragua Sister Cities
| Toxic Sludge Is Good For You
| Mad Cow USA
| Trust Us, We're Experts
--------------------------------