Mailing List Archive

Citation of versions by timestamp?
A wikipedian has recently been trying to find a good way to cite
particular revisions of articles in the bibliography for a paper.

Current we can give URLs for the _current_ version of an article
(current as of whenever it is visited), or of _previous_ versions (as of
when the citation was made):

current:
http://www.wikipedia.org/wiki/Foobar
old:
http://www.wikipedia.org/w/wiki.phtml?title=Foobar&oldid=12345

There are two main problems with this (aside from the ugliness of the
old-reference URLs):

* There is no way to reference the current version _as of the time of
citation_. Since that revision isn't in the old table, it has no oldid
assigned yet.

* oldid values sometimes can change, as when an article is deleted and
subsequently restored (done also when recombining histories of articles
that have been broken by crude renaming). Possible rearrangements of the
database (such as combining all languages into a single table) could
require reassigning oldids en masse. They are *not* reliable long-term
citations.

One possible solution would be to provide a way of citing articles as of
a particular timestamp, for instance:

http://www.wikipedia.org/wiki/Foobar?version=20030224161134

which would pull up either a cur or old version with that timestamp.
(It could also be prettified: version=2003-02-24-16:11:34 etc)

Advantages:
* consistent, no fuss, no worries about rearrangement of db structure
* citation URL can be provided in a nice handy link at the bottom of
every page

Disadvantages:
* timestamp has 1-second resolution. Generally this is going to be
unique (at least per article), but it may occasionally not be,
particularly in cases of recombined histories. Some articles had
multiple revisions' timestamps set to the same time due to bugs in the
rename code and other db tweaks in early '02.
* for this reason it's not suitable as the mainline url for drawing up
old history revisions via the history list; so people have to remember
to find and use the citation url separately

Alternatively, we could supply _both_ timestamp and oldid in the URL,
and let timestamp have priority if an exact match on both is not found.

Thoughts?

-- brion vibber (brion @ pobox.com)
Re: Citation of versions by timestamp? [ In reply to ]
I need this too, indeed anyone who uses a wikipedia article for a source
needs it. All you can do now is to cite the url, page title and the date you
took it off wikipedia like this:

http://www.internet-encyclopedia.info/statistics.html -- Original source: <a
href="http://www.wikipedia.org/wiki/Statistics">Statistics - Wikipedia</a>,
February, 2003, Revised: February 23, 2003<BR>
Copyright &copy; 2003 under the terms of the <a
href="http://www.gnu.org/copyleft/fdl.html">GNU Free Documentation
License</a>
<BR>

By the way, do you think this copyright and link to the GNU license is
sufficient?

Fred Bauder

http://wwww.internet-encyclopedia.info

> From: Brion Vibber <brion@pobox.com>
> Reply-To: wikitech-l@wikipedia.org
> Date: 23 Feb 2003 21:41:13 -0800
> To: wikitech-l@wikipedia.org
> Subject: [Wikitech-l] Citation of versions by timestamp?
>
> A wikipedian has recently been trying to find a good way to cite
> particular revisions of articles in the bibliography for a paper.
>
> Current we can give URLs for the _current_ version of an article
> (current as of whenever it is visited), or of _previous_ versions (as of
> when the citation was made):
>
> current:
> http://www.wikipedia.org/wiki/Foobar
> old:
> http://www.wikipedia.org/w/wiki.phtml?title=Foobar&oldid=12345
>
> There are two main problems with this (aside from the ugliness of the
> old-reference URLs):
>
> * There is no way to reference the current version _as of the time of
> citation_. Since that revision isn't in the old table, it has no oldid
> assigned yet.
>
> * oldid values sometimes can change, as when an article is deleted and
> subsequently restored (done also when recombining histories of articles
> that have been broken by crude renaming). Possible rearrangements of the
> database (such as combining all languages into a single table) could
> require reassigning oldids en masse. They are *not* reliable long-term
> citations.
>
> One possible solution would be to provide a way of citing articles as of
> a particular timestamp, for instance:
>
> http://www.wikipedia.org/wiki/Foobar?version=20030224161134
>
> which would pull up either a cur or old version with that timestamp.
> (It could also be prettified: version=2003-02-24-16:11:34 etc)
>
> Advantages:
> * consistent, no fuss, no worries about rearrangement of db structure
> * citation URL can be provided in a nice handy link at the bottom of
> every page
>
> Disadvantages:
> * timestamp has 1-second resolution. Generally this is going to be
> unique (at least per article), but it may occasionally not be,
> particularly in cases of recombined histories. Some articles had
> multiple revisions' timestamps set to the same time due to bugs in the
> rename code and other db tweaks in early '02.
> * for this reason it's not suitable as the mainline url for drawing up
> old history revisions via the history list; so people have to remember
> to find and use the citation url separately
>
> Alternatively, we could supply _both_ timestamp and oldid in the URL,
> and let timestamp have priority if an exact match on both is not found.
>
> Thoughts?
>
> -- brion vibber (brion @ pobox.com)
>
Re: Citation of versions by timestamp? [ In reply to ]
Brion Vibber wrote:

>>
>>One possible solution would be to provide a way of citing articles as of
>>a particular timestamp, for instance:
>>
>>http://www.wikipedia.org/wiki/Foobar?version=20030224161134
>>
>>which would pull up either a cur or old version with that timestamp.
>>(It could also be prettified: version=2003-02-24-16:11:34 etc)
>>
>>Advantages:
>>* consistent, no fuss, no worries about rearrangement of db structure
>>* citation URL can be provided in a nice handy link at the bottom of
>>every page
>>
>>Disadvantages:
>>* timestamp has 1-second resolution. Generally this is going to be
>>unique (at least per article), but it may occasionally not be,
>>particularly in cases of recombined histories. Some articles had
>>multiple revisions' timestamps set to the same time due to bugs in the
>>rename code and other db tweaks in early '02.
>>* for this reason it's not suitable as the mainline url for drawing up
>>old history revisions via the history list; so people have to remember
>>to find and use the citation url separately
>>
>>Alternatively, we could supply _both_ timestamp and oldid in the URL,
>>and let timestamp have priority if an exact match on both is not found.
>>
>>
Well, we could also have

http://www.wikipedia.org/wiki/Foobar?md5=1234f53fa34f253f3453abf00f549120

which would identify a unique version with high probability, and also provide a way of verifying the integrity of the old version (otherwise, you're just trusting the owner of the archive). For fanatical levels of caution, you could do:

http://www.wikipedia.org/wiki/Foobar?version=20030224161134&md5=1234f53fa34f253f3453abf00f549120

For the truly paranoid, you could substitute SHA-1 for MD5.

Perhaps we need a "permalink" at the bottom of this page marked
"permanent link to this version"?

-- Neil
Re: Citation of versions by timestamp? [ In reply to ]
On Mon, 24 Feb 2003, Neil Harris wrote:
> Well, we could also have
>
> http://www.wikipedia.org/wiki/Foobar?md5=1234f53fa34f253f3453abf00f549120
>
> which would identify a unique version with high probability, and also provide a way of verifying the integrity of the old version

Well heck, why use the name at all? (After all, pages can be renamed.) Add
a hash field to the table, index it, and presto! Lookup an arbitrary
version of any page, no matter how it's been shuffled around:

http://www.wikipedia.org/cite/1234f53fa34f253f3453abf00f549120

Crazy perhaps, but a thought. ;)

I'm not much for ugly incomprehensible URLs though...

-- brion vibber (brion @ pobox.com)
Re: Citation of versions by timestamp? [ In reply to ]
>> http://www.wikipedia.org/wiki/Foobar?md5=1234f53fa34f253f3453abf00f549120
>> which would identify a unique version with high probability, and also
>> provide a way of verifying the integrity of the old version

> Well heck, why use the name at all? (After all, pages can be renamed.) Add
> a hash field to the table, index it, and presto! Lookup an arbitrary
> version of any page, no matter how it's been shuffled around:
>
> http://www.wikipedia.org/cite/1234f53fa34f253f3453abf00f549120

Yep, renaming is a sticky issue here, and the fact that "current"
articles and old version are in different tables makes implementation
a pain. As an initial effort, I think a title/date link could be
implemented that will work most of the time and only fail on
renamed articles; at some later point in the evolution of the
software we can make it work correctly in all cases. I don't see
any way the hashes wil help.

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC