Mailing List Archive

A projection
Using the data set given, and assuming averaged daily growth between
given days, Wikipedia has since 2001-03-07 had an average over all daily
growth of 0.632%

The average growth rate (r) between two sampling days was calculated using

1+r=(d2/d1)^(1/n)

where d1 and d2 are the sample amounts on the first and second days and
n is the number of days between samplings.

The 0.632% amount is a weighted mean of these results over a period of
551 days. Applying the formula:

n=log(100,000/42021)/log(1+r)

gives 138 days when rounded up to the nearest whole number. Thus the
formula projects that article number 100,000 will be reached on
2003-01-24 Using the same techniques, growth in the last 30 days has
been at the more modest rate of 0.410% per day. Projecting this gives a
figure of 212 days or 2003-04-08
Re: A projection [ In reply to ]
At 2002-09-10 14:09 -0700, Ray Saintonge wrote:
>Using the data set given, and assuming averaged daily growth between given days, Wikipedia has since 2001-03-07 had an average over all daily growth of 0.632%
>
>The average growth rate (r) between two sampling days was calculated using
>
>1+r=(d2/d1)^(1/n)
>
>where d1 and d2 are the sample amounts on the first and second days and n is the number of days between samplings.
>
>The 0.632% amount is a weighted mean of these results over a period of 551 days. Applying the formula:
>
>n=log(100,000/42021)/log(1+r)
>
>gives 138 days when rounded up to the nearest whole number. Thus the formula projects that article number 100,000 will be reached on 2003-01-24 Using the same techniques, growth in the last 30 days has been at the more modest rate of 0.410% per day. Projecting this gives a figure of 212 days or 2003-04-08

And suppose I hadn't wasted 10 years of my life on a
technical university, how would you explain this to me?

What for example is the growth per year?

Greetings,
Jaap
Re: A projection [ In reply to ]
Jaap van Ganswijk wrote:

>At 2002-09-10 14:09 -0700, Ray Saintonge wrote:
>
>>Using the data set given, and assuming averaged daily growth between given days, Wikipedia has since 2001-03-07 had an average over all daily growth of 0.632%
>>
>>The average growth rate (r) between two sampling days was calculated using
>>
>>1+r=(d2/d1)^(1/n)
>>
>>where d1 and d2 are the sample amounts on the first and second days and n is the number of days between samplings.
>>
>>The 0.632% amount is a weighted mean of these results over a period of 551 days. Applying the formula:
>>
>>n=log(100,000/42021)/log(1+r)
>>
>>gives 138 days when rounded up to the nearest whole number. Thus the formula projects that article number 100,000 will be reached on 2003-01-24 Using the same techniques, growth in the last 30 days has been at the more modest rate of 0.410% per day. Projecting this gives a figure of 212 days or 2003-04-08
>>
>And suppose I hadn't wasted 10 years of my life on a
>technical university, how would you explain this to me?
>
>What for example is the growth per year?
>
The underlying premise is that growth is exponential. People more
commonly encounter this with compound interest calculations. Thus
$1,000 invested at 12% for one year will give $1,120 at the end of the
year. If it is compounded semi-annually it will give 1.06 * 1.06 * 1000
or $1,123.60 at the end of the year. If it is compounded monthly it
will give (1.01)^12 * 1000 = $1,126.83 at the end of the year. The
calculationsa that I made are similar, although I have not taken into
account any limitations that may exist upon Wikipedia's growth.

The annual growth rate based on 0.632% per day would be (1.00632)^365 -
1 = 896.861%
Based on 0.410% per day it would be 345.239%
These figures do seem quite high, but for a reality check Wikipedia's
size on September 9 of this year was 42,021 and on September 9, 2001 it
was 11,208. 42021/11208 is 3.44920, i.e. growth of 274.920%. but this
does include some periods when the growth was considerable lower than it
has been in the last 30 days.

Eclecticology
Re: A projection [ In reply to ]
Ray Saintonge wrote:

>>
> The underlying premise is that growth is exponential. People more
> commonly encounter this with compound interest calculations. Thus
> $1,000 invested at 12% for one year will give $1,120 at the end of the
> year. If it is compounded semi-annually it will give 1.06 * 1.06 *
> 1000 or $1,123.60 at the end of the year. If it is compounded monthly
> it will give (1.01)^12 * 1000 = $1,126.83 at the end of the year. The
> calculationsa that I made are similar, although I have not taken into
> account any limitations that may exist upon Wikipedia's growth.
>
> The annual growth rate based on 0.632% per day would be (1.00632)^365
> - 1 = 896.861%
> Based on 0.410% per day it would be 345.239%
> These figures do seem quite high, but for a reality check Wikipedia's
> size on September 9 of this year was 42,021 and on September 9, 2001
> it was 11,208. 42021/11208 is 3.44920, i.e. growth of 274.920%. but
> this does include some periods when the growth was considerable lower
> than it has been in the last 30 days.
>
> Eclecticology
>

The graph half way down at [[Wikipedia:Size of Wikipedia]] illustrates
this rather nicely.

It looks exponential to me, with a kink for the Great Slowdown of the
Phase II software. Recent growth is about 217 articles/day for a size
of about 42000 articles, and that's about 0.5% / day.

Extrapolated to 1 year, that's growth of about 500% (ie a factor of six
size ratio) per year.
The implications of this are huge, ''if'' this sort of growth rate keeps
up. In a year's time, we can expect not 100,000 articles, but over
250,000. Then -- almost unbelievably -- 3 million the next year.

This suggests that we will definitely need some more scaling features
in the software sooner rather than later.

Neil
Re: A projection [ In reply to ]
Ray Saintonge wrote:

>>
> The underlying premise is that growth is exponential. People more
> commonly encounter this with compound interest calculations. Thus
> $1,000 invested at 12% for one year will give $1,120 at the end of the
> year. If it is compounded semi-annually it will give 1.06 * 1.06 *
> 1000 or $1,123.60 at the end of the year. If it is compounded monthly
> it will give (1.01)^12 * 1000 = $1,126.83 at the end of the year. The
> calculationsa that I made are similar, although I have not taken into
> account any limitations that may exist upon Wikipedia's growth.
>
> The annual growth rate based on 0.632% per day would be (1.00632)^365
> - 1 = 896.861%
> Based on 0.410% per day it would be 345.239%
> These figures do seem quite high, but for a reality check Wikipedia's
> size on September 9 of this year was 42,021 and on September 9, 2001
> it was 11,208. 42021/11208 is 3.44920, i.e. growth of 274.920%. but
> this does include some periods when the growth was considerable lower
> than it has been in the last 30 days.
>
> Eclecticology
>

The graph half way down at [[Wikipedia:Size of Wikipedia]] illustrates
this rather nicely.

It looks exponential to me, with a kink for the Great Slowdown of the
Phase II software. Recent growth is about 217 articles/day for a size
of about 42000 articles, and that's about 0.5% / day.

Extrapolated to 1 year, that's growth of about 500% (ie a factor of six
size ratio) per year.
The implications of this are huge, ''if'' this sort of growth rate keeps
up. In a year's time, we can expect not 100,000 articles, but over
250,000. Then -- almost unbelievably -- 3 million the next year.

This suggests that we will definitely need some more scaling features
in the software sooner rather than later.

Neil
Re: A projection [ In reply to ]
> Then -- almost unbelievably -- 3 million the next year.

Or, rather, 1.5 million. D'oh!
But it's still amazing.

Neil
Re: A projection [ In reply to ]
Jaap van Ganswijk wrote:

>Hi Neil and Ray,
>
>I know what exponential behaviour is, I was just hoping you'd
>give the figures in a clearer way instead of as a formula.
>It's usual to give the growth per year as a percentage and/or
>to give the amount of time in which the amount doubles.
>
I did use an annual growth rate in my previous response, and Neil's
comments seem to have answered the second approach. I'm sure that some
of ou more mathematically challenged Wikipedians will run the other way
at the sight of any mathematical formula, bu it was only fair for those
who might want to pursue the matter further to know how I arrived at my
view.

>>It looks exponential to me, with a kink for the Great Slowdown of the Phase II software. Recent growth is about 217 articles/day for a size of about 42000 articles, and that's about 0.5% / day.
>>
>Looks very linear to me.
>
>And I think anyway, that the process will be more linear than exponential.
>
My projection was a hypothesis that is as subject to the constraints of
the scientific method as any other.Choosing another data set could have
given different results.

>- When the number of people contributing stays fixed and
> they write a fixed number of articles per time unit
> the growth will be linear.
>
Yes, but is the number of people contributing really staying fixed.
People who only make a single contribution (including vandals) to
Wikipedia are also contributors. What is the relationship between the
number of such people in the last thirty days with the number of such
people in the preceeding 30 days. Any growth there is a function of
finding out that Wikipedia exists.

>- People may get bored or frustrated however and produce
> less articles. They may also lack the knowledge to write
> about other than their favorite subjects. Even if they
> would write about non-favorite subjects it would go
> slower because they would have to do more research.
>
I suspect that the proportion of people who have a 500 article
exhaustion level will be relatively constant.

>- People will also spend time on improving articles
> instead of writing new ones and this get worse the
> more articles there are.
>
Probably another relative constant. Improving articles includes
splitting off sections into "new" articles when they get too long.

>- However, new people will join the club and therefore
> super linear behaviour could occur, but I think that the
> new people will at most counteract the amount that the
> other start writing less articles.
>
Subject to verification. See my comments above re one-time contributors.

>- Even when people don't have to write articles themselves
> but can copy and edit them, the sources that they can
> easily copy them from may dry out over time.
>
This is one of our limits to growth in the long run, but I don't see it
as a factor in the near future.

>- And a major argument against super linear behaviour of
> the growth is, that the bigger the data base becomes,
> the more complicated and time consuming the
> interrelations will get. Which with a fixed staff would
> let the growth tend to logarithmic behaviour.
>
>Given all these factors and the current graph, I think that
>the growth is more likely to be linear (and we should be
>happy enough with that).
>
Indeed we should be happy with it.

In the spirit of compromise, perhaps the growth rate is now exponential
but in the long term the rate of growth will be asymptotic to a linear
function.

Eclecticology
Re: A projection [ In reply to ]
Hi Neil and Ray,

At 2002-09-11 15:55 +0100, Neil Harris wrote:
>Ray Saintonge wrote:
>>The underlying premise is that growth is exponential. People more commonly encounter this with compound interest calculations. Thus $1,000 invested at 12% for one year will give $1,120 at the end of the year. If it is compounded semi-annually it will give 1.06 * 1.06 * 1000 or $1,123.60 at the end of the year. If it is compounded monthly it will give (1.01)^12 * 1000 = $1,126.83 at the end of the year. The calculationsa that I made are similar, although I have not taken into account any limitations that may exist upon Wikipedia's growth.
>>
>>The annual growth rate based on 0.632% per day would be (1.00632)^365
>>- 1 = 896.861%
>>Based on 0.410% per day it would be 345.239%
>>These figures do seem quite high, but for a reality check Wikipedia's size on September 9 of this year was 42,021 and on September 9, 2001 it was 11,208. 42021/11208 is 3.44920, i.e. growth of 274.920%. but this does include some periods when the growth was considerable lower than it has been in the last 30 days.

I know what exponential behaviour is, I was just hoping you'd
give the figures in a clearer way instead of as a formula.
It's usual to give the growth per year as a percentage and/or
to give the amount of time in which the amount doubles.

>The graph half way down at [[Wikipedia:Size of Wikipedia]] illustrates this rather nicely.

>It looks exponential to me, with a kink for the Great Slowdown of the Phase II software. Recent growth is about 217 articles/day for a size of about 42000 articles, and that's about 0.5% / day.

Looks very linear to me.

And I think anyway, that the process will be more linear than exponential.

There are several aspects:
- When the number of people contributing stays fixed and
they write a fixed number of articles per time unit
the growth will be linear.
- People may get bored or frustrated however and produce
less articles. They may also lack the knowledge to write
about other than their favorite subjects. Even if they
would write about non-favorite subjects it would go
slower because they would have to do more research.
- People will also spend time on improving articles
instead of writing new ones and this get worse the
more articles there are.
- However, new people will join the club and therefore
super linear behaviour could occur, but I think that the
new people will at most counteract the amount that the
other start writing less articles.
- Even when people don't have to write articles themselves
but can copy and edit them, the sources that they can
easily copy them from may dry out over time.
- And a major argument against super linear behaviour of
the growth is, that the bigger the data base becomes,
the more complicated and time consuming the
interrelations will get. Which with a fixed staff would
let the growth tend to logarithmic behaviour.

Given all these factors and the current graph, I think that
the growth is more likely to be linear (and we should be
happy enough with that).

Greetings,
Jaap
Re: A projection [ In reply to ]
On Thu, Sep 12, 2002 at 08:35:22PM +0200, Jaap van Ganswijk wrote:
> Given all these factors and the current graph, I think that
> the growth is more likely to be linear (and we should be
> happy enough with that).

I don't know about English Wikipedia, but all others certainly grow at
exponential rate. If it isn't the case with English Wikipedia, then
maybe the software is the bottleneck and too much time is spent on
maintaince task.
Re: A projection [ In reply to ]
On Tue, 10 Sep 2002 14:09:33 -0700
Ray Saintonge <saintonge@telus.net> wrote:

<information suggesting a non-linear growth curve for Wikipedia>

I have seen messages talking about changing the database engine from MySql
to postgreSQL to fix table locking problems on a busy system.

I am concerned that this _type_ of engineering work may not be what is
really needed.

My contention is that Wikipedia load can grow at an exponential rate but
may be constrained by resource availablility. There are many factors which
cause self-multiplication.

Decisions which need to be made:

1) Do we want Wikipedia to be _able_ to grow at an exponential rate?
If yes:
a) We need to consider a technical system which can be put in place to
distribute load such that no one system needs to handle all the load
b) Consider whether the current social system of regulation can scale to
meet demand and monitor this
c) Keep a conscious review open to ensure the quality of Wikipedia
with such an exponential growth and consider adding constraints
to growth if such a growth rate starts causing undesirable effects.

If no:
a) Consider how availability of the system will be limited in order to
prevent exponential growth, and at what rate, if any, availability is
extended.
b) What parts of the system are best rationed to limit growth rate. ie
should searches, page views or edits be limited?

From my experience at using the system the last few days, I percieve there
is currently a technical constraint limiting the rate of growth. This may
be desirable, this may be undesirable. Do we know which it is? Has an
explicit decision been made?

A scalable solution is to give nearly all responsibility for all wiki
functionality to mirror servers. Updates are posted directly to the main
Wiki server which in turn posts the database updates to registered first
tier mirrors which, in turn, can post database updates to second tier
mirrors registered with them and so on. This way, all mirrors can be kept
in sync in near real time with a minimum of CPU, memory and network load.
The main server then need do nothing other than maintain database
consistency, accept and post updates.
Re: A projection [ In reply to ]
On Tue, 12 Nov 2002 11:56:31 +0000
Nick Hill <nick@nickhill.co.uk> wrote:

> A scalable solution is to give nearly all responsibility for all wiki
> functionality to mirror servers. Updates are posted directly to the main
> Wiki server which in turn posts the database updates to registered first
> tier mirrors which, in turn, can post database updates to second tier
> mirrors registered with them and so on.

With such a scheme, IP blocking and anti-vandalism features would still be
implemented in much the same way as they are now, on the main server, where
the master database is held. The master server would handle html form puts.

> This way, all mirrors can be kept
> in sync in near real time with a minimum of CPU, memory and network load.
> The main server then need do nothing other than maintain database
> consistency, accept and post updates.

The update system can be achieved by either:
1) the main server creating SQL files to be emailed to mirror servers,
signed with a key pair, sequentially numbered to ensure they are
automatically processed in order this way, the server can run
asynchronously with the mirrors which is better for reliability of the
server. The server will not need to wait for connection responses from the
mirror and updates will be cached in the mail system should the mirror
be unavailable. The server will only need to create one email per update.
The mail system infrastructure will take care of sending the data to each
mirror. In fact, a system such as pipermail used on this list would solve the
problem wonderfully. Mirror admins simply subscribe to the list to get all
updates sent to their machine and can manually download updates they are
missing from the list!)

Or
2) by the master server opening a connection directly to the SQL daemon on
each remote machine In which case the server will need to track what
the mirrors have and have not received updates and need to wait for
time-out on non-operational mirrors)(this system may open exploits on
the server via the sql interface).