Mailing List Archive

Auto-harvesting of dates
It is very easy to write a regexp that will recognize and parse dates in
running text using formats like "July 14, 1789". That kind of automatic
harvesting can be applied when a Wiki article is saved, and the dates
found can be indexed and easily searchable.

Was this alternative ever considered before the introduction of the
current Wikipedia custom of writing [[July 14]], [[1789]]? Many people
have spent their time adding [[]] markup to any years and dates in
Wikipedia articles, and maintain the pages for dates and years. This time
could have been saved if automatic harvesting and indexing had been used
instead. (I am one of those persons.)

The current Wikipedia custom is an entrenched position, that would take
more energy to get out of than I dare think about. However, it is just as
easy to write that regexp to recognize dates in formats like "[[July 14]],
[[1789]]" instead.

If you would like to test a function like this, visit
http://susning.nu/Carl_Wilhelm_Scheele
and click on this Swedish chemist's birth date "9 december 1742".

My regexps recognize "99 monthname 9999", "monthname 9999", "year 9999",
"born 9999", "died 9999", "9999-talet" (centuries and decades), which are
the most common ways to specify dates in Swedish text. Clicking on a date
leads to a search of adjacent dates found in other pages, chronologically
sorted. Monthnames without day number are sorted before the 1st of that
month. Years without months sort before January 1 of that year. Decades
and centuries sort before the first year of the specified interval.


--
Lars Aronsson (lars@aronsson.se)
tel +46-70-7891609
http://aronsson.se/ http://elektrosmog.nu/ http://susning.nu/
RE: Auto-harvesting of dates [ In reply to ]
*** Sorry, that went to the wrong group. ***

Nice idea! Anyone remember the "auto-wikification" feature? This could be
one of its functions.

But, Lee is very cautious when it comes to software changing articles, and
he *does* have a point.

But, things like this could be on some kind of button, IMHO.

Magnus

> -----Original Message-----
> From: wikitech-l-admin@nupedia.com
> [mailto:wikitech-l-admin@nupedia.com]On Behalf Of Lars Aronsson
> Sent: Tuesday, July 16, 2002 10:59 PM
> To: wikitech-l@nupedia.com
> Subject: [Wikitech-l] Auto-harvesting of dates
>
>
> It is very easy to write a regexp that will recognize and parse dates in
> running text using formats like "July 14, 1789". That kind of automatic
> harvesting can be applied when a Wiki article is saved, and the dates
> found can be indexed and easily searchable.
>
> Was this alternative ever considered before the introduction of the
> current Wikipedia custom of writing [[July 14]], [[1789]]? Many people
> have spent their time adding [[]] markup to any years and dates in
> Wikipedia articles, and maintain the pages for dates and years. This time
> could have been saved if automatic harvesting and indexing had been used
> instead. (I am one of those persons.)
>
> The current Wikipedia custom is an entrenched position, that would take
> more energy to get out of than I dare think about. However, it is just as
> easy to write that regexp to recognize dates in formats like "[[July 14]],
> [[1789]]" instead.
>
> If you would like to test a function like this, visit
> http://susning.nu/Carl_Wilhelm_Scheele
> and click on this Swedish chemist's birth date "9 december 1742".
>
> My regexps recognize "99 monthname 9999", "monthname 9999", "year 9999",
> "born 9999", "died 9999", "9999-talet" (centuries and decades), which are
> the most common ways to specify dates in Swedish text. Clicking on a date
> leads to a search of adjacent dates found in other pages, chronologically
> sorted. Monthnames without day number are sorted before the 1st of that
> month. Years without months sort before January 1 of that year. Decades
> and centuries sort before the first year of the specified interval.
>
>
> --
> Lars Aronsson (lars@aronsson.se)
> tel +46-70-7891609
> http://aronsson.se/ http://elektrosmog.nu/ http://susning.nu/
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@ross.bomis.com
> http://ross.bomis.com/mailman/listinfo/wikitech-l