Mailing List Archive

1 2  View All
Re: wikipedia dead again [ In reply to ]
On Wed, 26 Mar 2003, Thomas Corell wrote:
> Brion Vibber wrote:
> > Note that if someone wants us to run PostgreSQL, they should put their
> > code where their mouth is and adapt the software to run with it. :)
> >
>
> Adapting the software IMHO is not the main problem, what about all those
> sysop's depending on e.g. mysql syntax? And I think there are more
> open questions.

Well, if we were to try to run the software on postgresql it would
obviously be very important to adapt the software to work on postgresql.
:)

However it's less clear that it would be particularly advantageous to use
postgresql.

That's why I keep hoping that the people who pop up and say "hey, this
would be a lot better with postgresql" can quantify the claim by making
the software work with postgresql and actually comparing performance with
the same operations on the same dataset on the same machine -- things we
can't do until the software is able to run with postgresql.

-- brion vibber (brion @ pobox.com)
Re: wikipedia dead again [ In reply to ]
Brion Vibber wrote:
> That's why I keep hoping that the people who pop up and say "hey, this
> would be a lot better with postgresql" can quantify the claim by making
> the software work with postgresql and actually comparing performance with
> the same operations on the same dataset on the same machine -- things we
> can't do until the software is able to run with postgresql.

Well, there are 2 ways, as usual. The fast way is to use the scripts
coming with pg to convert tables. There are even tools to convert
queries. But this will give a result you don't want to test performence
with. The second one is to set up tables and views and indices as
requiered by wikipedia, to get the best performance.

I think there have to be one step in the middle, the step Lee mentioned:
all sql - commands in the wiki php code should replaced by functions,
included from sql.inc or so. Then a change of database will affect one
file, and not nearly all. I looked a short look at the sql-tables last
weekend, just for fun, and at the different types mysql and pg using.
Unfortunaly it is very hard to get the structure of how tables are used,
if the queries are spreaded so much.

But, I will start porting the tables as soon I found some spare time.
Lee, is there a loctation where I can get these mysql statements you
isolated?

Smurf
--
------------------------- Anthill inside! ---------------------------
Re: wikipedia dead again [ In reply to ]
> (Thomas Corell <T.Corell@t-online.de>):
> But, I will start porting the tables as soon I found some spare time.
> Lee, is there a loctation where I can get these mysql statements you
> isolated?

All database access in the system should go through the functions
in DatabaseFunctions.php. The only one that might be tricky to
implement is wfInsertID(), which currently depends on a MySQL
extension for auto-increment fields. But most databases have some
similar feature.

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Re: wikipedia dead again [ In reply to ]
Lee Daniel Crocker wrote:
> All database access in the system should go through the functions
> in DatabaseFunctions.php. The only one that might be tricky to
> implement is wfInsertID(), which currently depends on a MySQL
> extension for auto-increment fields. But most databases have some
> similar feature.

Ok, I will try to understand what is the extention they made, first look
at the function in the mysql docs did not give me a hint. Possibly I
miss at midnight the reason why I need to know this number and what
extention is handled with it. But currval(seq-name) must be similar.

Smurf
--
------------------------- Anthill inside! ---------------------------
Re: wikipedia dead again [ In reply to ]
> (Thomas Corell <T.Corell@t-online.de>):
>
> Ok, I will try to understand what is the extention they made, first look
> at the function in the mysql docs did not give me a hint. Possibly I
> miss at midnight the reason why I need to know this number and what
> extention is handled with it. But currval(seq-name) must be similar.

The MySQL extension is the availability of auto-increment integer
fields. They're handy when you need an integer key value that's
guaranteed to be unique, but you don't care what it is. Without the
extension, you can do that by keeping a "last used ID" filed value
in a separate table and updating it every time you insert. But with
MySQL, you just go ahead and do the insert, and then ask it "Oh,
by the way, what integer value did you just assign to the auto-
increment field of the record I just inserted?"

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Re: wikipedia dead again [ In reply to ]
Lee Daniel Crocker wrote:
>>(Thomas Corell <T.Corell@t-online.de>):
>>
>>Ok, I will try to understand what is the extention they made, first look
>>at the function in the mysql docs did not give me a hint. Possibly I
>>miss at midnight the reason why I need to know this number and what
>>extention is handled with it. But currval(seq-name) must be similar.
>
> The MySQL extension is the availability of auto-increment integer
> fields. They're handy when you need an integer key value that's
> guaranteed to be unique, but you don't care what it is. Without the
> extension, you can do that by keeping a "last used ID" filed value
> in a separate table and updating it every time you insert. But with
> MySQL, you just go ahead and do the insert, and then ask it "Oh,
> by the way, what integer value did you just assign to the auto-
> increment field of the record I just inserted?"

Oh, they call that an extention. Well. There's a serial/serial8 type in
Postgres, which implicitly creates a sequence and you can use the
functions nextval() and currval(). So a definition of e.g. "cur_id
serial8 PRIMARY KEY" should handle this.

Smurf
--
------------------------- Anthill inside! ---------------------------
Re: wikipedia dead again [ In reply to ]
On Thu, Mar 27, 2003 at 06:36:29AM +0100, Thomas Corell wrote:
> Lee Daniel Crocker wrote:
> >>(Thomas Corell <T.Corell@t-online.de>):
> >>
> >>Ok, I will try to understand what is the extention they made, first look
> >>at the function in the mysql docs did not give me a hint. Possibly I
> >>miss at midnight the reason why I need to know this number and what
> >>extention is handled with it. But currval(seq-name) must be similar.
> >
> >The MySQL extension is the availability of auto-increment integer
> >fields. They're handy when you need an integer key value that's
> >guaranteed to be unique, but you don't care what it is. Without the
> >extension, you can do that by keeping a "last used ID" filed value
> >in a separate table and updating it every time you insert. But with
> >MySQL, you just go ahead and do the insert, and then ask it "Oh,
> >by the way, what integer value did you just assign to the auto-
> >increment field of the record I just inserted?"
>
> Oh, they call that an extention. Well. There's a serial/serial8 type in
> Postgres, which implicitly creates a sequence and you can use the
> functions nextval() and currval(). So a definition of e.g. "cur_id
> serial8 PRIMARY KEY" should handle this.

Or just serial, which would be considerably faster than serial8, unless
you think that you will end up with more than 2147483648 rows (2^31).

--
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN
Re: wikipedia dead again [ In reply to ]
Nick Reinking wrote:
> On Thu, Mar 27, 2003 at 06:36:29AM +0100, Thomas Corell wrote:
>
>>Oh, they call that an extention. Well. There's a serial/serial8 type in
>>Postgres, which implicitly creates a sequence and you can use the
>>functions nextval() and currval(). So a definition of e.g. "cur_id
>>serial8 PRIMARY KEY" should handle this.
> Or just serial, which would be considerably faster than serial8, unless
> you think that you will end up with more than 2147483648 rows (2^31).

Only to meassure my expectations for wikipedia ;) But keep in mind that
reusing serials is not part of the pg concept, but as far as I know
mysql does. E.g. deleting articles may spend a serial.

Smurf
--
------------------------- Anthill inside! ---------------------------
Re: wikipedia dead again [ In reply to ]
On Thu, Mar 27, 2003 at 06:07:04PM +0100, Thomas Corell wrote:
> Nick Reinking wrote:
> >On Thu, Mar 27, 2003 at 06:36:29AM +0100, Thomas Corell wrote:
> >
> >>Oh, they call that an extention. Well. There's a serial/serial8 type in
> >>Postgres, which implicitly creates a sequence and you can use the
> >>functions nextval() and currval(). So a definition of e.g. "cur_id
> >>serial8 PRIMARY KEY" should handle this.
> >Or just serial, which would be considerably faster than serial8, unless
> >you think that you will end up with more than 2147483648 rows (2^31).
>
> Only to meassure my expectations for wikipedia ;) But keep in mind that
> reusing serials is not part of the pg concept, but as far as I know
> mysql does. E.g. deleting articles may spend a serial.

Fair enough, I do believe you're right. Still, deleted articles are not
terribly common, and I think we should design this with maximum
performance in mind. If we hit some odd 1.8B articles in the future, we
can always figure out a way to change it to serial8. :)

Speaking of maximum performance... I have a question concerning our
implementation. What will be more important in the future, clean and
efficient code, or backwards compatibility? Especially when you
consider the reverse_timestamp hacks everywhere that won't be needed in
MySQL4 (or PostgreSQL), I would think that we should just drop MySQL3
support (especially considering that it is easy to upgrade, and nobody
will be using it in a year or two). But, maybe I'm crazy - anybody else
have any comments?

--
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN
Re: wikipedia dead again [ In reply to ]
> > Oh, they call that an extention. Well. There's a serial/serial8 type in
> > Postgres, which implicitly creates a sequence and you can use the
> > functions nextval() and currval(). So a definition of e.g. "cur_id
> > serial8 PRIMARY KEY" should handle this.
>
> Or just serial, which would be considerably faster than serial8, unless
> you think that you will end up with more than 2147483648 rows (2^31).

But the issue is not just the database--it's how to implement the
wfInsertID() function in the wiki code. PHP's Postgres functions don't
have anything like it, so you'll have to do something like get_last_oid()
and then retrieve the serial field value by OID.

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Re: wikipedia dead again [ In reply to ]
> (Nick Reinking <nick@twoevils.org>):
>
> Speaking of maximum performance... I have a question concerning our
> implementation. What will be more important in the future, clean and
> efficient code, or backwards compatibility? Especially when you
> consider the reverse_timestamp hacks everywhere that won't be needed in
> MySQL4 (or PostgreSQL), I would think that we should just drop MySQL3
> support (especially considering that it is easy to upgrade, and nobody
> will be using it in a year or two). But, maybe I'm crazy - anybody else
> have any comments?

Depends on what you want compatibility with. The internal workings
of the code are completely fair game as far as I'm concerned. MySQL 4
will get rid of the reverse-timestamp hack, and if we want to use
Postgres or something else, we can. The "public interface" to the
wiki is the URLs of various pages. Those should change as little as
possible. But how they are implemented under the scenes is not a
"compatibility" issue at all--hell, the whole thing might be replaced
with Java servlets and an object database for all I care.

That's the whole point of client-server separation after all--how the
server does its job is none of the client's business, only the fact
that it does, in fact, meet its obligations. That's what I'm writing
the test suite for.

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Re: wikipedia dead again [ In reply to ]
On Thu, Mar 27, 2003 at 01:07:12PM -0600, Lee Daniel Crocker wrote:
> > (Nick Reinking <nick@twoevils.org>):
> >
> > Speaking of maximum performance... I have a question concerning our
> > implementation. What will be more important in the future, clean and
> > efficient code, or backwards compatibility? Especially when you
> > consider the reverse_timestamp hacks everywhere that won't be needed in
> > MySQL4 (or PostgreSQL), I would think that we should just drop MySQL3
> > support (especially considering that it is easy to upgrade, and nobody
> > will be using it in a year or two). But, maybe I'm crazy - anybody else
> > have any comments?
>
> Depends on what you want compatibility with. The internal workings
> of the code are completely fair game as far as I'm concerned. MySQL 4
> will get rid of the reverse-timestamp hack, and if we want to use
> Postgres or something else, we can. The "public interface" to the
> wiki is the URLs of various pages. Those should change as little as
> possible. But how they are implemented under the scenes is not a
> "compatibility" issue at all--hell, the whole thing might be replaced
> with Java servlets and an object database for all I care.
>
> That's the whole point of client-server separation after all--how the
> server does its job is none of the client's business, only the fact
> that it does, in fact, meet its obligations. That's what I'm writing
> the test suite for.

Well, I was thinking about people running the Wikipedia code (other than
Wikipedia itself). They will have MySQL3 databases that require the
reverse_timestamp hack - will we continue to keep this cruft in the
Wikipedia code to support those people who don't want to migrate their
databases, or will just clean it up and tell them to migrate their
database, or stay with the old code (which is perfectly functionable,
mostly).

--
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN
Re: wikipedia dead again [ In reply to ]
> Well, I was thinking about people running the Wikipedia code (other than
> Wikipedia itself). They will have MySQL3 databases that require the
> reverse_timestamp hack - will we continue to keep this cruft in the
> Wikipedia code to support those people who don't want to migrate their
> databases, or will just clean it up and tell them to migrate their
> database, or stay with the old code (which is perfectly functionable,
> mostly).

I, personally, am not the least bit interested in making the wikipedia
code easy to use for others. It's GPL, and open, so others can do with
it what they like, but my efforts will be 100% toward making Wikipedia
work, and nothing else. If that inconveniences others who might have
chosen to use the code, well, that's was their choice.

I do, on the other hand, concede that the ability for others to
install and use the code does help the code evolve; the more eyeballs
on it, the better. So I am willing to make a few concessions in
that direction, but only insofar as they serve the primary goal,
which is Wikipedia.

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Re: wikipedia dead again [ In reply to ]
On Thu, Mar 27, 2003 at 02:35:55PM -0600, Lee Daniel Crocker wrote:
> > Well, I was thinking about people running the Wikipedia code (other than
> > Wikipedia itself). They will have MySQL3 databases that require the
> > reverse_timestamp hack - will we continue to keep this cruft in the
> > Wikipedia code to support those people who don't want to migrate their
> > databases, or will just clean it up and tell them to migrate their
> > database, or stay with the old code (which is perfectly functionable,
> > mostly).
>
> I, personally, am not the least bit interested in making the wikipedia
> code easy to use for others. It's GPL, and open, so others can do with
> it what they like, but my efforts will be 100% toward making Wikipedia
> work, and nothing else. If that inconveniences others who might have
> chosen to use the code, well, that's was their choice.
>
> I do, on the other hand, concede that the ability for others to
> install and use the code does help the code evolve; the more eyeballs
> on it, the better. So I am willing to make a few concessions in
> that direction, but only insofar as they serve the primary goal,
> which is Wikipedia.

Fair enough. Big smileys from me. :)

--
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN
Re: wikipedia dead again [ In reply to ]
Lee Daniel Crocker wrote:

> But the issue is not just the database--it's how to implement the
> wfInsertID() function in the wiki code. PHP's Postgres functions don't
> have anything like it, so you'll have to do something like get_last_oid()
> and then retrieve the serial field value by OID.

Why I think I misunderstood something? Please tell me what's wrong.

The wfInsertID() funktion is used to get a unique autoincremented number
in mySQL. And PHP has a function to return this number to you.
I would now do the pg implementation by adding a 'serial' type to the
table. Usually I don't do anything with this value, because pg will
handle the autoincrement. Normaly mysql should do the same. If you for
some reasons will know which value currently is used, you can get this
in pg by the function currval(sequence_name) and the next value you can
get by nextval(sequence_name).

Current PHP:
function wfInsertId() {return mysql_insert_id( wfGetDB() );}

with pg:
function wfInsertId() {
return pq_query( wfGetDB(), "select currval('sequence_name')\;");
}

But usually if you e.g. fetching the data to display an articel you will
get the serial as well.

By the way: using OID's is bad, very bad. Thats an internal value,
noboby should use.

Smurf
--
------------------------- Anthill inside! ---------------------------

1 2  View All