Mailing List Archive

Watch list and performance issues
Would it not be interesting to slightly change the way the watch list is working ?

That's a very important tool on the en.wiki

1) to follow-up articles one is interested in

It can hardly be done with recent changes now; too many articles modifications everyday.

But in the watch list feature, what is really interesting - imho - is the watch of the most recent articles modified.

2) to a lesser extent to find back some articles one want to go back one day

but this can also be done with the search, or by adding links on personal pages

My watch list is hideously big. Each time I click on watch list, my next move is to click on the *stop loading* button after a small bunch of seconds. Just display the last 3/7 days of watched articles. 99% of the time, that's *all* what is needed. Why should one wait for the whole watch list to be downloaded ? Why should one impose extra useless burden to the server ?

How much of server time does the watch list query take anyway ?

How much of the client time does it take to load 400 kb of data, when only 50k are really needed ?

Why should not a persistent and repetitive human habit be automated ?

I'd like to see at the top of the watch list something like

- display the articles I follow which were modified in the past 3 days

- display the articles I follow which were modified in the past 14 days

- display all the articles I follow

With a default at 3 days for example.

It is not that this is *really* needed by humans, though it would be nice. But could it help performance ?



---------------------------------
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now
RE: Watch list and performance issues [ In reply to ]
Anthere,

I believe that your idea about your watch list could be programmed very easily. Also, it will improve performance -- at least for you. Especially, if you are using a slow connection, like 5 KB / second.

Why should you have to wait 1 minute and 20 seconds, for something that could reach you in 10 seconds?

Brion, could we add a clause like this?

WHERE (today - timestamp) < 3 days? /* I know this is not correct syntax */

Ed Poor

-----Original Message-----
From: Anthere [mailto:anthere5@yahoo.com]
Sent: Tuesday, November 26, 2002 5:48 AM
To: wikitech-l@wikipedia.org
Subject: [Wikitech-l] Watch list and performance issues


Would it not be interesting to slightly change the way the watch list is working ?

That's a very important tool on the en.wiki

1) to follow-up articles one is interested in

It can hardly be done with recent changes now; too many articles modifications everyday.

But in the watch list feature, what is really interesting - imho - is the watch of the most recent articles modified.

2) to a lesser extent to find back some articles one want to go back one day

but this can also be done with the search, or by adding links on personal pages

My watch list is hideously big. Each time I click on watch list, my next move is to click on the *stop loading* button after a small bunch of seconds. Just display the last 3/7 days of watched articles. 99% of the time, that's *all* what is needed. Why should one wait for the whole watch list to be downloaded ? Why should one impose extra useless burden to the server ?

How much of server time does the watch list query take anyway ?

How much of the client time does it take to load 400 kb of data, when only 50k are really needed ?

Why should not a persistent and repetitive human habit be automated ?

I'd like to see at the top of the watch list something like

- display the articles I follow which were modified in the past 3 days

- display the articles I follow which were modified in the past 14 days

- display all the articles I follow

With a default at 3 days for example.

It is not that this is *really* needed by humans, though it would be nice. But could it help performance ?




_____

Do you Yahoo!?
Yahoo! Mail <http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com> Plus - Powerful. Affordable. Sign up <http://rd.yahoo.com/mail/mailsig/*http://mailplus.yahoo.com> now
Re: Watch list and performance issues [ In reply to ]
Poor, Edmund W wrote:
(re Anthere's sensible suggestion for a date limit)
> Brion, could we add a clause like this?
>
> WHERE (today - timestamp) < 3 days? /* I know this is not correct
> syntax */

A few days ago I put in a simple number limit of 250 results. Still too
many, I guess. ;)

I've put in the same limits as Recentchanges. When I'm done messing with
other things, I'll add the links to increase the limits if you want to
see more.

Note that it's still pretty slow if you've got a long watchlist because
it's implemented rather inefficiently on the backend (a gazillion string
comparisons).

-- brion vibber (brion @ pobox.com)
Re: Watch list and performance issues [ In reply to ]
Brion VIBBER wrote:

> Poor, Edmund W wrote:
> (re Anthere's sensible suggestion for a date limit)
>
>> Brion, could we add a clause like this?
>>
>> WHERE (today - timestamp) < 3 days? /* I know this is not correct
>> syntax */
>
>
> A few days ago I put in a simple number limit of 250 results. Still
> too many, I guess. ;)
>
> I've put in the same limits as Recentchanges. When I'm done messing
> with other things, I'll add the links to increase the limits if you
> want to see more.
>
> Note that it's still pretty slow if you've got a long watchlist
> because it's implemented rather inefficiently on the backend (a
> gazillion string comparisons).

I want to object that this has resulted in my receiving only the last
three days of changes on my watch list, and there is nothing in my
preferences to turn this off. Ever since the recent changes became
fairly unusable, I've depended a lot more on the way that I manage my
watch list. Some things that are an ongoing interest are often left for
more than three days, especially when the mailing list is su busy that
it leaves me little time for the Wikipedia itself. Maybe people who
like this 3-day feature should take the time to cull obsolete material
from their watch lists; I do that from time to time.

If the limitations can't be treated as an option, they should be turned
off until they can.

Eclecticology
RE: Watch list and performance issues [ In reply to ]
If it's not too much trouble, could we do the watchlist without string comparisons? I bet a table like this speed things up:

user_id article_id
------- ----------
284 298
284 1598
284 6503
284 1364
284 3305

Then, we could link the tables, like:

SELECT * FROM cur, watch
WHERE cur_id = watch_article_id AND watch_user_id = 284

(Again, I'm not sure of the syntax: I made up a non-existent table!)

Would this work?
How hard would it be to implement?
How much would it help?

Ed Poor
Re: Watch list and performance issues [ In reply to ]
Ray Saintonge wrote:
> Brion VIBBER wrote:
>> I've put in the same limits as Recentchanges. When I'm done messing
>> with other things, I'll add the links to increase the limits if you
>> want to see more.
>
> I want to object that this has resulted in my receiving only the last
> three days of changes on my watch list, and there is nothing in my
> preferences to turn this off.

The options are already available in the URL like so:
http://www.wikipedia.org/w/wiki.phtml?title=Special:Watchlist&days=6

-- brion vibber (brion @ pobox.com)
Re: Watch list and performance issues [ In reply to ]
Poor, Edmund W wrote:
> If it's not too much trouble, could we do the watchlist without string comparisons? I bet a table like this speed things up:
>
> user_id article_id
> ------- ----------
[..]
> Would this work?
> How hard would it be to implement?

For the most part it would be very straightforward; I didn't implement
it when it was first suggested because I wasn't sure how to get the
associated talk pages to come up in the list easily. This is a key
feature of the watchlist and I didn't want to break it without recourse.
We could add them to the wacthlist table along with the main pages when
a user adds a page; but if the talk page is created later, we'd have to
go update everyone's watchlists on page creation. (Hmm, could work.)

Using namespace and title instead of cur_id would make it more
straightforward, but wouldn't give the advantage of automatically
preserving a page in the list after a page move. (Though, we could
easily issue an update on page move that way.)

> How much would it help?

Somewhat, particularly for the small but promiscuous few who have very
very long watchlists.

-- brion vibber (brion @ pobox.com)
Re: Watch list and performance issues [ In reply to ]
Brion VIBBER wrote:

> Ray Saintonge wrote:
>
>> Brion VIBBER wrote:
>>
>>> I've put in the same limits as Recentchanges. When I'm done messing
>>> with other things, I'll add the links to increase the limits if you
>>> want to see more.
>>
>> I want to object that this has resulted in my receiving only the last
>> three days of changes on my watch list, and there is nothing in my
>> preferences to turn this off.
>
>
> The options are already available in the URL like so:
> http://www.wikipedia.org/w/wiki.phtml?title=Special:Watchlist&days=6


My apologies if I seem to jump in on this issue too fast.
Thanks for dealing with this quickly, and for putting the options on the
Watchlist page directly.
Now, at your convenience, it would be nice if the options could be put
on the user preference page. This would go a long way in accomodating
the different work habits of contributors. As things now stand, I have
131 items on my watch list (including talk pages) with the oldest one
being dated Feb. 25 - I would likely set my limits at 150 articles with
an indefinite number of days.

Another place where a longer number of days option on the page would be
convenient is the User contributions page. Often when looking at this
for an unregistered user with questionable edits the default options
give a very small number of edits with the last 24 hours. The question
I then want to ask the system is whether this guy has done any other
stupid edits in the last year.

Anyway, even as a non-techie I sense that completely reconstructing some
of these pages every time must put an enormous burden on the system. At
the risk of appearing naive, it seems that a Recent Changes File
containing the maximum number of entries reflecting the most that people
would normally expect to see on a regular basis would save a lot of
unneeded searches. Searches with a higher number of references to be
sought would still need to be done, but a judicious choice of the number
of items in the recent changes would make such searches relatively
infrequent.

1,000 items in the Recent Changes file where with every edit the most
recent change is added and the 1000th is dropped could accomplish this.

Eclecticology
Re: Watch list and performance issues [ In reply to ]
Ray Saintonge wrote:
> Another place where a longer number of days option on the page would be
> convenient is the User contributions page. Often when looking at this
> for an unregistered user with questionable edits the default options
> give a very small number of edits with the last 24 hours. The question
> I then want to ask the system is whether this guy has done any other
> stupid edits in the last year.

The default options on user contribs were set when it was really slow to
do the query, and can now be changed to something more reasonable.

I think there's no practical reason to keep a date limit around except
for the "gimme the last changes since I loaded this recentchanges page 5
minutes ago, I must stay up to date!!" case. Click "last 100 changes" or
"last 10000 changes" and you know what you're in for, whereas "last 30
days" could give you two changes or it could give you thirty thousand.
Having both interlocking selections is a confusing state that has to go.

> Anyway, even as a non-techie I sense that completely reconstructing some
> of these pages every time must put an enormous burden on the system. At
> the risk of appearing naive, it seems that a Recent Changes File
> containing the maximum number of entries reflecting the most that people
> would normally expect to see on a regular basis would save a lot of
> unneeded searches.

We have a recentchanges table which keeps the last 7 days' worth of
entries (~15000 at our present rate; older ones are cropped off the
bottom of the list every 1000 edits or so). (Currently you can't see
anything older than that through Recentchanges, but requests to do so
haven't come up yet...)

This keeps the burden off the cur and old tables, so a slow
Recentchanges won't affect other pages' response, and slow other pages
won't affect Recentchanges' speed. (At least in theory... actually,
Recentchanges checks for the existence of the pages linked in the header
and the user pages of each user who's edited something, so we still have
to wait on write locks on the cur table.)

-- brion vibber (brion @ pobox.com)