Mailing List Archive

Causes for Slowdown?
Does anybody have any guesses why the site has slowed down recently?
What did we change? Is it possible that the new diff engine eats too
many resources?

I think it would be good to have a user account on wikipedia.com so
that we can monitor the load status and the SQL queries in order to
diagnose these problems better.

Axel
Re: Causes for Slowdown? [ In reply to ]
Axel Boldt wrote:

>Does anybody have any guesses why the site has slowed down recently?
>What did we change? Is it possible that the new diff engine eats too
>many resources?
>
>I think it would be good to have a user account on wikipedia.com so
>that we can monitor the load status and the SQL queries in order to
>diagnose these problems better.
>
>Axel
>_______________________________________________
>Wikitech-l mailing list
>Wikitech-l@ross.bomis.com
>http://ross.bomis.com/mailman/listinfo/wikitech-l
>

I agree. What would be useful would be a server logfile with some
timing/load figures (a real logfile, not in the database).
This could be made available at a static URL, and we could run crunching
scripts on it to try to analyze what the problem might be related to.

-- Neil
Re: Causes for Slowdown? [ In reply to ]
At the end of this month, I am planning to buy a new server for some
of the other stuff that's on the wikipedia server
(jimmywales.com/timshell.com/kirawales.com/ and various other odds and
ends). Once I get that stuff off of this machine, I plan to isolate
this machine from the rest of my network (which sounds fancy but just
involves making sure there are no stray ssh keys around) and give
login accounts to active developers, including root access to anyone
who REALLY needs it.

This will help to remove *me* as the bottleneck to improvements, which I
am right now.

In the meantime, here's the latest log of slow queries -- this should be helpful
in diagnosing our current problems.

Should I install the latest from the cvs?

--Jimbo
Re: Causes for Slowdown? [ In reply to ]
On mar, 2002-04-16 at 12:12, Jimmy Wales wrote:
> At the end of this month, I am planning to buy a new server for some
> of the other stuff that's on the wikipedia server
> (jimmywales.com/timshell.com/kirawales.com/ and various other odds and
> ends). Once I get that stuff off of this machine, I plan to isolate
> this machine from the rest of my network (which sounds fancy but just
> involves making sure there are no stray ssh keys around) and give
> login accounts to active developers, including root access to anyone
> who REALLY needs it.
>
> This will help to remove *me* as the bottleneck to improvements, which I
> am right now.
>
> In the meantime, here's the latest log of slow queries -- this should be helpful
> in diagnosing our current problems.
>
> Should I install the latest from the cvs?

Probably a good idea -- there are a number of bugs fixed there that
we've been getting multiple bug reports on (including the 'bad link
causes following text on same line to disappear' and 'edit links for
articles with non-ascii characters in title are mysteriously converted
to UTF-8 by certain versions of Internet Explorer, resulting in the
wrong page being edited' bugs).

-- brion vibber (brion @ pobox.com)
Re: Causes for Slowdown? (long, but hopefully useful) [ In reply to ]
With the recent increases in traffic (which fit well with my estimates
earlier) I think we should be giving some thought now to how the
Wikipedia can be made massively scalable. If we start now, we should
have a solution available before it is needed, rather than hitting a
crunch point and then fixing things in a hurry.

Some thoughts on this:

* The single image of the database is the ultimate bottleneck. This
should be held in RAM as far as possible. Thought should be given to
using PostgreSQL rather than MySQL, which should scale better in heavily
loaded cases.

* Wikipedia is read an order or magnitude more often than it is written.
Caching can give a major performance boost, and will help palliate
underlying scaling problems.

* Wikipedia read accesses appear to obey [[Zipf's law]]: most pages are
read relatively infrequently, and a few very frequently. However, most
of the load is from the aggregated accesses of the many low-traffic
pages. Therefore, a 'hot spot' cache will not work: the cache has to be
big enough to encompass the whole Wikipedia. Again, the cache will have
to be held in RAM to avoid the 1000-times slower performance of disk
accesses.

* Web serving consumes RAM: a slow modem transfer of a page will lock
down resources in RAM for the duration of a page load. At one end of the
scale, this is simply socket buffers. At the other end of the scale, it
can be a page-rendering thread and its stack, unable to progress to do
other work because it can't commit its output to the lower levels of the
web server. On low-traffic sites, this is insignificant. On high-traffic
sites, it can become a bottleneck. Machines with vast amounts of RAM are
very expensive: it is cheaper to buy a number of smaller machines with
reasonable amounts of RAM in -- and you get the benefit of extra CPUs
for nothing.

=== The grand vision ===

For all these reasons, I propose a solution in which the database is
kept on a separate server from the user-facing servers. The extra
overhead involved in doing this should be repaid many times over in the
long run by the fact that the front-end servers will take more than 90%
of the load off the central server.

The front-end servers should be kept _dumb_, and all the control logic,
program code, and configuration files should reside on the master
server. This removes the need to configure vast numbers of machines in
sync.

The front-end server code can also be kept small and tight, with most of
the work (page rendering, parsing, etc.) being done by the existing
code. (Later, rendering and parsing could be moved to the front-end
servers if needed).

=== How to get there easily ===

The first step is to implement a separate caching server.

The system can even be trialled now, by running the master and slave
servers on the same box! The existing Wikipedia just needs to have a
special skin that serves up 'raw' rendered pages, without any page
decorations, CSS, or whatever.

The caching server will deal with all user interaction: cookies, skins,
etc. It should serve up 'raw' pages decorated with user-specific content
and style sheets.

The caching system could be put on an experimental server, or later put
on the main site, as a beta test.

There is only one other thing needed to implement this. Each page in
the central DB needs to have a 'last changed' time, and a way to get
hold of it. Now, this is where things get slightly tricky. This 'last
changed' timestamp should be updated, not only when the page itself is
edited, but also when any page linked in the page is created or deleted.
That is to say: when creating or deleting a page, timestamps on all
pages that link to that page should be touched.

Note that there is no need to touch other pages' timestamps when a page
is edited. This is good, as otherwise the extra overhead of all that
marking would be unreasonably high.

Now, this last-modified timestamp can be determined using a special
request format: the underlying HTTP mechanism could be used, but this is
too awkward to fiddle with as a way of supporting experimental software.
Better to do something like:

http://www.wikipedia.com/wiki/last_changed?Example_article

(which even avoids invoking the main code at all)

result: a page with the content:

20020417120254


=== Criticisms ===

But, I hear you say -- surely you have added another bottleneck? For
every page hit, you still generate a hit on the central DB for the 'last
changed' timestamp. The answer is: yes, but -- the timestamp db is very
small, performs a very simple lightweight operation, has a single writer
and multiple readers, and can therefore later be made separate from the
main DB, replicated, etc.

So, there are some ideas.

* Client/server database split
* Front-end/back-end script split

Does anyone have any idea how reasonable this is as an idea, based on
refactoring retrofitting the existing script? I think that if things
were done slowly, the front-end code could be made about 10% of the
overall complexity, mostly refactored from existing code.

-- Neil