Mailing List Archive

Long term plans for scalability
I believe Wikipedia is being held back in terms of how many people can use
it and how it can grow, through architectural constraints.

The current architecture of one machine taking the entire burden of all
searches, updates and web page delivery inherently limits the rate at which
Wikipedia can grow.

In order for Wikipedia to grow, it needs an architecture which can easily
devolve work to other servers. A main database is still required to enforce
administrative policy and maintain database consistency.

Work to improve the speed of the database and reduce lag will, in the long
run, only be of very limited benefit and, perhaps, reduce the amount of lag
users experience for a few days or weeks.

A method of easily implementing mirror servers with live, real-time updates
is required. Each mirror server should cater for all the functionality
users expect from Wikipedia except for taking care of form submissions of
updates, which should be forwarded to the master wiki server.

The main database server should be released from the burden of serving web
pages and concentrate on running administrative code, processing and
posting database updates.

The update system can be achieved by either:
1) the main server creating SQL files of incremental changes to
be emailed to mirror servers, signed with a key pair, sequentially
numbered to ensure they are automatically processed in order this
way, the server can run asynchronously with the mirrors which is
better for reliability of the server. The server will not need to
wait for connection responses from the mirror and updates will be
cached in the mail system in the event that the mirror server be
unavailable. (The main server will then only need to create one
email per update. The mail system infrastructure will take care of
sending the data to each mirror. In fact, a system such as pipermail
used on this list would solve the problem wonderfully. Mirror admins
simply subscribe to the list to get all updates sent to their machine
and can manually download updates they are missing from the list!)

Or
2) by the master server opening a connection directly to the SQL daemon
on each remote machine. In which case the server will need to track what
the mirrors have and have not received updates and need to wait for
time-out on non-operational mirrors)(this system may open exploits on the
server via the sql interface).
RE: Long term plans for scalability [ In reply to ]
Nick,

Your idea assumes that the "lag" problem is due to overloading a single machine, which plays double roles: database backeand and web server. So, if we divide the work amoung 2 or more machines, you expect faster throughput. Right?

(I'm just repeating the obvious to make sure that what's obvious to me, is what you really meant!)

I guess if we all pitch in $50 each we can buy another machine. Where should I send my money?

Ed Poor
Re: Long term plans for scalability [ In reply to ]
here's ,my take

split webservers and db servers.

1 master db, only write query's come to this db
as needed add slave servers, these only do read query's

add webservers as needed

this is the easiest way to go about it using mysql's built in replication
feature.. it makes the most sense too in my book....

the only thing needed to make wikipedia work like this is a db connection
library that looks at an SQL statement and routes it to where it's supposed
to be.. i wrote a db library for mysql in php once that did all this, its
pretty cool if i may say so.. if you are interested i'll send you the code,
its part of a much bigger project, but i figure any decent php programmer
should be able to grasp the concept of it... it might not be super efficient
cause i programmed this when i didn't know many tricks and was kinda still
learning, but it works.. oh well .. if anyone is interested let me know


Lightning


----- Original Message -----
From: "Nick Hill" <nick@nickhill.co.uk>
To: <wikitech-l@wikipedia.org>
Sent: Monday, November 25, 2002 4:53 PM
Subject: [Wikitech-l] Long term plans for scalability


> I believe Wikipedia is being held back in terms of how many people can use
> it and how it can grow, through architectural constraints.
>
> The current architecture of one machine taking the entire burden of all
> searches, updates and web page delivery inherently limits the rate at
which
> Wikipedia can grow.
>
> In order for Wikipedia to grow, it needs an architecture which can easily
> devolve work to other servers. A main database is still required to
enforce
> administrative policy and maintain database consistency.
>
> Work to improve the speed of the database and reduce lag will, in the long
> run, only be of very limited benefit and, perhaps, reduce the amount of
lag
> users experience for a few days or weeks.
>
> A method of easily implementing mirror servers with live, real-time
updates
> is required. Each mirror server should cater for all the functionality
> users expect from Wikipedia except for taking care of form submissions of
> updates, which should be forwarded to the master wiki server.
>
> The main database server should be released from the burden of serving web
> pages and concentrate on running administrative code, processing and
> posting database updates.
>
> The update system can be achieved by either:
> 1) the main server creating SQL files of incremental changes to
> be emailed to mirror servers, signed with a key pair, sequentially
> numbered to ensure they are automatically processed in order this
> way, the server can run asynchronously with the mirrors which is
> better for reliability of the server. The server will not need to
> wait for connection responses from the mirror and updates will be
> cached in the mail system in the event that the mirror server be
> unavailable. (The main server will then only need to create one
> email per update. The mail system infrastructure will take care of
> sending the data to each mirror. In fact, a system such as pipermail
> used on this list would solve the problem wonderfully. Mirror admins
> simply subscribe to the list to get all updates sent to their machine
> and can manually download updates they are missing from the list!)
>
> Or
> 2) by the master server opening a connection directly to the SQL daemon
> on each remote machine. In which case the server will need to track what
> the mirrors have and have not received updates and need to wait for
> time-out on non-operational mirrors)(this system may open exploits on the
> server via the sql interface).
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@wikipedia.org
> http://www.wikipedia.org/mailman/listinfo/wikitech-l
>
Re: Long term plans for scalability [ In reply to ]
On Mon, 25 Nov 2002 19:21:57 -0500
"Poor, Edmund W" <Edmund.W.Poor@abc.com> wrote:

> Nick,
>
> Your idea assumes that the "lag" problem is due to overloading a single
> machine, which plays double roles: database backeand and web server. So,
> if we divide the work amoung 2 or more machines, you expect faster
> throughput. Right?

My idea is not to divide the roles of web server and backend. It is to
divide the workload of the one server between many servers. This includes
search queries and other functionality.

I envisage many wikipedia servers around the world, supported by private
individuals, companies and universities. Much like the system of mirror FTP
and mirror web sites. All these servers are updated in real time from the
core wikipedia server. From the user's perspective, all are equivalent.
Each of these servers can do everything the current wikipedia server can do
except for accepting update submissions. Updates from users are accepted
only by the core wiki server.

Reasons for such an architecture:
1) Growth of bandwidth useage may put financial pressure on Wikipedia.
Growth may follow a non-linear growth curve.

2) The cost of implementing one very fast, reliable, redundant machine is
more than the cost of farming out work to many quite fast, unreliable
systems none of which are mission critical. Especially true where there are
people willing to donate part of their hard drive, CPU and net connection
(or even an entire system) to a good cause such as wikipedia. (Overall
system reliability can be guaranteed by using DNS tricks to ensure users
and queries are only directed to working machines).

Given wikipedias limited technical resources, we need to make a choice
between making small scale changes to the system (which may make 30-50%
change in availability) or to make architectural changes which can scale to
improvements of 100x or 1000x magnitude.

The core server takes the update submissions. These are integrated into the
core database. Changes to the core database are reflected in all mirror
servers in real time by using a system of pushing the database update to
the mirror servers.

The core server will implement access control, ip blocking and other such
administration and policy.

I accept the technological suggestion I made is by no means the only way to
achieve this goal, although it may be a good one for it's scalability
potential.

Before mirrors are implemented in the way I suggested, It would be wise to
introduce meta-fields and records into the database. Fields which have no
current use but may be used in future wikipedia software releases. Future
wikipedia software releases for the mirror servers is guaranteed and extra
database fields are almost certainly going to be required. Adding mata
fields can help forward compatibility of databeses. This would be necessary
in advance as not all mirror servers will update their whole database and
software simaultaneously.
Re: Long term plans for scalability [ In reply to ]
Nick Hill wrote:

>I envisage many wikipedia servers around the world, supported by private
>individuals, companies and universities. Much like the system of mirror FTP
>and mirror web sites. All these servers are updated in real time from the
>core wikipedia server. From the user's perspective, all are equivalent.
>Each of these servers can do everything the current wikipedia server can do
>except for accepting update submissions. Updates from users are accepted
>only by the core wiki server.
>
>Reasons for such an architecture:
>1) Growth of bandwidth useage may put financial pressure on Wikipedia.
>Growth may follow a non-linear growth curve.
>
>2) The cost of implementing one very fast, reliable, redundant machine is
>more than the cost of farming out work to many quite fast, unreliable
>systems none of which are mission critical. Especially true where there are
>people willing to donate part of their hard drive, CPU and net connection
>(or even an entire system) to a good cause such as wikipedia. (Overall
>system reliability can be guaranteed by using DNS tricks to ensure users
>and queries are only directed to working machines).
>
>Given wikipedias limited technical resources, we need to make a choice
>between making small scale changes to the system (which may make 30-50%
>change in availability) or to make architectural changes which can scale to
>improvements of 100x or 1000x magnitude.
>
I confess to complete ignorance with the technical aspects of these
proposals. Be that as it may, there still needs to be a review of the
administrative structure of the project. There was some discussion a
couple months ago about forming a non-profit corporation, but that was
quickly forgotten.

A project that continues to depend on funding from a single source is
always at risk. Similarly, depending on ad hoc demands of $50 from
everybody may work once or twice, but it does not give any kind of
financial security. Aside from any moral issues about depending on
handouts from a single benefactor, there is the reality that no person's
pockets are bottomless and we have no idea where the bottom is. It is
irresponsible for the group as a whole to wait for a message from Jimbo
like "I want to contribute more, but I can't." Messages like that never
come at convenient times; they often coincide with major equipment
breakdowns or necessary technical expansion.

The brand new dedicated server is less than a year old, and it already
has difficulties coping with its volume of work. I think that Nick is
conscious of the potential costs that we could be facing. This is
evident from his willingness to look for lower cost alternatives. I
would like to see the budgetting alternatives in terms of costs and in
terms of what will be needed from participants over an extended length
of time.

It is my experience that most volunteers find corporate boards,
budgetting and issues of long term fiscal planning to be frightfully
boring, but facing these issues is a question of fiscal responsibility.

Eclecticology
Re: Long term plans for scalability [ In reply to ]
Nick Hill wrote:

>On Mon, 25 Nov 2002 19:21:57 -0500
>"Poor, Edmund W" <Edmund.W.Poor@abc.com> wrote:
>
>
>
>>Nick,
>>
>>Your idea assumes that the "lag" problem is due to overloading a single
>>machine, which plays double roles: database backeand and web server. So,
>>if we divide the work amoung 2 or more machines, you expect faster
>>throughput. Right?
>>
>>
>
>My idea is not to divide the roles of web server and backend. It is to
>divide the workload of the one server between many servers. This includes
>search queries and other functionality.
>
<snip brave and expensive vision>

While the proposal of Nick will no doubt be the long-term future,
separating database server and webserver is both relatively cheap and
easy, and should increase performance. It is even recommended in either
the MySQL or the PHP online manual (or was it apache? Brain, where are
you?).

Even a smaller server might do, though I'm not sure wether to run apache
or MySQL on it. Probably the latter.

Oh, and we need a place for the sifter project, while we're at it ;-)

Magnus
Re: Long term plans for scalability [ In reply to ]
On Tue, Nov 26, 2002 at 08:12:05PM +0100, Magnus Manske wrote:
> Nick Hill wrote:
>
> >On Mon, 25 Nov 2002 19:21:57 -0500
> >"Poor, Edmund W" <Edmund.W.Poor@abc.com> wrote:
> >>Your idea assumes that the "lag" problem is due to overloading a single
> >>machine, which plays double roles: database backeand and web server. So,
> >>if we divide the work amoung 2 or more machines, you expect faster
> >>throughput. Right?
> >
> >My idea is not to divide the roles of web server and backend. It is to
> >divide the workload of the one server between many servers. This includes
> >search queries and other functionality.
> >
> <snip brave and expensive vision>
>
> While the proposal of Nick will no doubt be the long-term future,
> separating database server and webserver is both relatively cheap and
> easy, and should increase performance. It is even recommended in either
> the MySQL or the PHP online manual (or was it apache? Brain, where are
> you?).
>
> Even a smaller server might do, though I'm not sure wether to run apache
> or MySQL on it. Probably the latter.

Hi,

do we have any numbers related to the CPU usage of the machine? Throwing
hardware at a performance issue is a solution that's chosen very often,
but it's
a) most of the time expensive
b) too often not the solution

In case of a high lag situation, which process is blocking the CPU? Is it
apache (that is: PHP) or mysql? How is the memory utilization? What is
the I/O rate?

Do we have this kind of data already available?

Best regards,

jens frank
Re: Long term plans for scalability [ In reply to ]
Jens Frank wrote:
> do we have any numbers related to the CPU usage of the machine? Throwing
> hardware at a performance issue is a solution that's chosen very often,
> but it's
> a) most of the time expensive
> b) too often not the solution
>
> In case of a high lag situation, which process is blocking the CPU? Is it
> apache (that is: PHP) or mysql? How is the memory utilization? What is
> the I/O rate?

The CPUs seem to have a fair amount of idle time whenever I check... I'm
not sure how to measure CPU usage usefully (I can pop into 'top', but
that corrupts the data -- top itself is usually the biggest user of CPU
time!)

Load average when things are running smoothly runs between 1 and 2 (2 is
ideal usage for a 2-CPU system). Right now there's an old zombie process
that bumps up the load average by 1. During busier times during the day,
3-5 is not uncommon. The busiest periods can push us into the teens or
very occasionally more.

Very roughly from top:

Apache processes take about 15-25 MB, with 10-25 MB of shared memory.
MySQL has a resident memory size of around 234 MB, plus ~88MB of shared
memory. I think most of this is shared between processes.

And more generally:
~14 MB buffers
800-900 MB disk cache (dips to 100-300 during very high load times)
~150-200 MB free (dips much lower during very high load times)

That probably doesn't quite cover everything, as we've got 2GB to fill
up with those figures.

And:
120-180 MB of rarely used stuff sitting in swap


If you want to see load and memory stats from "uptime" and "free"
updated every 10 minutes starting mid-day yesterday, see
http://www.wikipedia.org/tools/uptimelog


How can we measure i/o rate?

-- brion vibber (brion @ pobox.com)
Re: Long term plans for scalability [ In reply to ]
Nick Hill wrote:
> I envisage many wikipedia servers around the world, supported by private
> individuals, companies and universities. Much like the system of mirror FTP
> and mirror web sites. All these servers are updated in real time from the
> core wikipedia server. From the user's perspective, all are equivalent.

My experience from situations like the one you describe tells me that
the designed system can easily get more complex and cause more
overhead than the needed performance gain, and that Moores law will
give us the speed that we need in time when we need it.

Do you have any experience from designing systems like this? Would you
write a prototype for this system that could be tested? The vision
sounds like science fiction to me, but a prototype that I can run is
not science fiction, so that would make all the difference.

Here is another vision: I envision a system where I can synchronize
my laptop or PDA with a wiki, then go offline and use it, update it,
and when I return to my office I can resynchronize the two again.
I have no idea on how to implement this vision. I think it would be a
lot of work. But I think the result could be really useful.

I also see there are similarities between your vision and mine. The
idea is to express the update activity as a series of transactions
(update submits) that can be transfered to another instance or
multiple instances and be applied there. In either case, one must
take care of the case that the transmission of updates gets
interrupted or delayed, and the potential "edit conflicts" that would
result. It doesn't seem trivial to me.


--
Lars Aronsson (lars@aronsson.se)
Aronsson Datateknik
Teknikringen 1e, SE-583 30 Linuxköping, Sweden
tel +46-70-7891609
http://aronsson.se/ http://elektrosmog.nu/ http://susning.nu/
Re: Long term plans for scalability [ In reply to ]
> Nick Hill wrote:
>>I envisage many wikipedia servers around the world, supported by private
>>individuals, companies and universities. Much like the system of mirror FTP
>>and mirror web sites. All these servers are updated in real time from the
>>core wikipedia server. From the user's perspective, all are equivalent.

Having read-only database copies with MySQL's replication sounds doable,
at least in theory. All edits would be directed to the mama website.
User accounts (ie for watchlists) are maybe a trickier matter, as we'd
rather not send passwords and e-mail addresses around willy-nilly.

Lars Aronsson wrote:
> Here is another vision: I envision a system where I can synchronize
> my laptop or PDA with a wiki, then go offline and use it, update it,
> and when I return to my office I can resynchronize the two again.
> I have no idea on how to implement this vision. I think it would be a
> lot of work. But I think the result could be really useful.
>
> I also see there are similarities between your vision and mine. The
> idea is to express the update activity as a series of transactions
> (update submits) that can be transfered to another instance or
> multiple instances and be applied there. In either case, one must
> take care of the case that the transmission of updates gets
> interrupted or delayed, and the potential "edit conflicts" that would
> result. It doesn't seem trivial to me.

Distributed editing is, indeed, a rather bit tricker than distributed
reading, and not something I really want to touch with a ten foot pole
right now. :)

See also discussion on a client-side Wikipedia reader/editor at:
http://meta.wikipedia.org/wiki/Dedicated_Wikipedia_editor

-- brion vibber (brion @ pobox.com)
Re: Long term plans for scalability [ In reply to ]
On Thu, 28 Nov 2002 03:24:43 +0100 (CET)
Lars Aronsson <lars@aronsson.se> wrote:

> Nick Hill wrote:
> > I envisage many wikipedia servers around the world, supported by
> > private individuals, companies and universities. Much like the system
> > of mirror FTP and mirror web sites. All these servers are updated in
> > real time from the core wikipedia server. From the user's perspective,
> > all are equivalent.
>
> My experience from situations like the one you describe tells me that
> the designed system can easily get more complex

Systems can always become complex in an unworkable sense. If the
implementation is carefully managed, the complexity can be kept under
control.

The system I suggested, in effect, distributes chunks of data. The issue is
will these chunks of data, at some point, become incompatible with the
systems which are supposed to read them? Is a degree of non-fatal
incompatibility allowed?

example:
As the definition for tags changes, the front end will interpret them
differently, making the pages look different between newer and older
implementations.

> and cause more
> overhead than the needed performance gain,
Technical or computing overhead? How do you convert technical overhead to
computing overhead? What is the needed performance gain?

> and that Moores law will
> give us the speed that we need in time when we need it.
Several variables in the wikipedia system self-multiply. As the size of
database multiplies, the demands on the database system grow. As the size
of the database grows, the system becomes more attractive, bringing more
people to Wikipedia. We may currently be in a situation where we have
latent demand, which has been held back by system overload. Whilst the size
of wikipedia may approximate moore's law, the demand will probably exceed
it. The demand x size product is likely to far exceed moore's law.

We need more analysis on these issues. We need forecasts which the
architecture can be moulded around.


> Do you have any experience from designing systems like this? Would you
> write a prototype for this system that could be tested?

I have designed database systems. I have not designed a system of exactly
this sort. I don't think the system I proposed is complex. It uses known
database tecniques combined with public key and email. All of which are
well matured and understood technologies which have extensibility built
into the current free software code base. If it becomes clear to me no-one
else is prepared to pick up the gauntlet, I will do so if I get time. A lot
of my time is being spent on the GNU project.

> The vision
> sounds like science fiction to me, but a prototype that I can run is
> not science fiction, so that would make all the difference.

'still haven't made the transporter! :-(

>
> Here is another vision: I envision a system where I can synchronize
> my laptop or PDA with a wiki, then go offline and use it, update it,
> and when I return to my office I can resynchronize the two again.
> I have no idea on how to implement this vision. I think it would be a
> lot of work. But I think the result could be really useful.

The system I mentioned would work for this purpose. The PDA could collect
the update emails then integrate them into the database. Porting Wiki and
the supporting technology to the PDA would be a lot of work.

>
> I also see there are similarities between your vision and mine. The
> idea is to express the update activity as a series of transactions
> (update submits) that can be transfered to another instance or
> multiple instances and be applied there. In either case, one must
> take care of the case that the transmission of updates gets
> interrupted or delayed, and the potential "edit conflicts" that would
> result. It doesn't seem trivial to me.

The solution I proposed is
1) To have edits serialised. They can only be applied in the specific order
they were generated. 2) The pipermail mailing list will give sysadmins the
facility of downloading missed updates. 3) Spoof edits would be filtered-
the attachments from the main wiki server would be signed using a
private/public key pair and verified at the receiving end.
Re: Long term plans for scalability [ In reply to ]
Nick Hill wrote:
> I envisage many wikipedia servers around the world, supported by private
> individuals, companies and universities. Much like the system of mirror FTP
> and mirror web sites. All these servers are updated in real time from the
> core wikipedia server. From the user's perspective, all are equivalent.
> Each of these servers can do everything the current wikipedia server can do
> except for accepting update submissions. Updates from users are accepted
> only by the core wiki server.

Organizationally, this would be a nightmare. Having a central server
farm really eases everything, because the volunteer admins can be
given root passwords, etc., and the ability to make changes on all the
servers very quickly. It's not easy to get that kind of access if we
are co-ordinating across many servers around the world.

Co-ordinating across many servers around the world *is* a solution we can
ponder *if* it appears to be absolutely necessary for some reason. But it
strikes me as unlikely for this to be the case.

> Reasons for such an architecture:
> 1) Growth of bandwidth useage may put financial pressure on Wikipedia.
> Growth may follow a non-linear growth curve.

With the continued fall in bandwidth costs, this is very unlikely to
exceed my capacity to support Wikipedia in the short run. *Bandwidth*
per se, is not a major issue.

> 2) The cost of implementing one very fast, reliable, redundant machine is
> more than the cost of farming out work to many quite fast, unreliable
> systems none of which are mission critical. Especially true where there are
> people willing to donate part of their hard drive, CPU and net connection
> (or even an entire system) to a good cause such as wikipedia. (Overall
> system reliability can be guaranteed by using DNS tricks to ensure users
> and queries are only directed to working machines).

This is easy to say, but harder to implement well. I don't know of
any really successful implementations of the kind you are discussing.

--Jimbo
Re: Long term plans for scalability [ In reply to ]
Ray Saintonge wrote:
> There was some discussion a
> couple months ago about forming a non-profit corporation, but that was
> quickly forgotten.

Not forgotten at all. In the works. Announcements forthcoming soon.

> A project that continues to depend on funding from a single source is
> always at risk. Similarly, depending on ad hoc demands of $50 from
> everybody may work once or twice, but it does not give any kind of
> financial security. Aside from any moral issues about depending on
> handouts from a single benefactor, there is the reality that no person's
> pockets are bottomless and we have no idea where the bottom is. It is
> irresponsible for the group as a whole to wait for a message from Jimbo
> like "I want to contribute more, but I can't." Messages like that never
> come at convenient times; they often coincide with major equipment
> breakdowns or necessary technical expansion.

That's all 100% correct. With the layoff of Larry Sanger and Toan Vo,
both excellent people working full-time on Nupedia, this has already
happened. I've gone from support at a level exceeding $100k per year,
to current levels, which are difficult to measure exactly. (My time,
Jason sometimes, the server, the bandwidth).

What I am current planning is a nonprofit corporation with a separate
bank account, with the ability to accept credit cards, whereby people
can sign up for various levels of monthly or quarterly or annual
support. The contributors will be able, within some limits, to
earmark contributions for particular uses (server, bandwidth,
promotion, etc.).

I think that many regulars will be willing and able to sign up at the
$20 per month level. Just 100 people at that level would give us
$2000 a month, which would fairly quickly translate into a pretty
serious server farm.

--Jimbo