Mailing List Archive

Kosovo database; Python speed
Suppose we were going to make a database to help Kosovars locate
their family members. This would probably result in hundreds of
thousands of records (say 1 record (file) per person).

Would Python be fast enough to manage this data, make queries on
the data, or should compiled programs be used?

Richard.
Kosovo database; Python speed [ In reply to ]
Richard van de Stadt wrote:
>
> Suppose we were going to make a database to help Kosovars locate
> their family members. This would probably result in hundreds of
> thousands of records (say 1 record (file) per person).
>
> Would Python be fast enough to manage this data, make queries on
> the data, or should compiled programs be used?

Depends on what queries you make, but if used smartly, Python can
probably be fast enough. From what I've heard Gadfly is (a database
implemented in Python).

Another alternative is to use Python in combination with an external
database, and communicate to the database with SQL. This is pretty fast.
See for more info:

http://www.python.org/topics/database/

Another thing you may want to look at is Zope -- they have an object
database implemented in Python. It's web oriented, but perhaps that is
what you want:

http://www.zope.org

Regards,

Martijn
Kosovo database; Python speed [ In reply to ]
Richard van de Stadt wrote:
>
> Suppose we were going to make a database to help Kosovars locate
> their family members. This would probably result in hundreds of
> thousands of records (say 1 record (file) per person).
>
> Would Python be fast enough to manage this data, make queries on
> the data, or should compiled programs be used?

Dependant of the operating system, I'd suggest to use a
database extension.
Controlling this database from Python will give you
enough speed. If I had to do this, my preferences are

mySQL for Linux, with its interface,
MS-Access for Windows, with a COM interface.

The latter is not since I like it so much, but we have
used it before, and the interfaces are there.

Since the Kosovars need help quickly, I'd use this combination
instead of writing something special. Python alone will not
be too easy, since your data will probably not fit into memory.
You will also have lots of edits, so I think using a true
database is the better choice here. (Not saying that Access is
a true database, but it works fine with several 100000 records).

But two single columns with a name and a record ID will fit,
so your code might extract this info as a whole, map it to a dict
and search it in some sophisticated manner. This can be even faster
than the database.
Do you have more info on the amount of data, fields per record,
and what search capabilities are needed? Is it designed as a web
based application? Are there on-line updates and such?

ciao - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net
10553 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
we're tired of banana software - shipped green, ripens at home
Kosovo database; Python speed [ In reply to ]
This is a multi-part message in MIME format.
--------------BECB2D20372B85AD298C3E49
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Richard van de Stadt wrote:
>
> Suppose we were going to make a database to help Kosovars locate
> their family members. This would probably result in hundreds of
> thousands of records (say 1 record (file) per person).
>
> Would Python be fast enough to manage this data, make queries on
> the data, or should compiled programs be used?

Given the purpose I would suggest the following:

1. Design an XML document which represents the entry form
refugees would fill in. Make it as complete as possible,
since you don't know what kind of statistics you will
have to produce from those forms.

2. Make a small web application that collects those documents
and automatically store them in a Web as XML and HTML. Basically
it would consist of a CGI upload form.

3. Use a web search engine to index the HTML web.

What you end with is a effective way to collect data inputed
off-line, to publish and search it worldwide. With no need to
distribute software (everything happen on the web server).

Later you may build your own database(s) from the web, once
there is enough data and you know which statistics you want ;-)

Then mySQL should prove to be good enough.


Laurent
--------------BECB2D20372B85AD298C3E49
Content-Type: text/x-vcard; charset=us-ascii;
name="l.szyster.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Laurent Szyster
Content-Disposition: attachment;
filename="l.szyster.vcf"

begin:vcard
n:Szyster;Laurent
tel;work:+322 679 62 08
x-mozilla-html:FALSE
url:http://www.rinet.com
org:RINET s.c.
adr:;;blv. du Souverain, 100;Bruxelles;;1170;BELGIUM
version:2.1
email;internet:l.szyster@ibm.net
fn:Laurent Szyster
end:vcard

--------------BECB2D20372B85AD298C3E49--
Kosovo database; Python speed [ In reply to ]
I presume that a SQL engine would be the fastest solution and all the
programming, in my opinion, should be done in Python using a database
connector. It depends on the uses you will give to this DB, but MySQL should
be the better choice. If you need data consistency (via FK) and a
transactional database, you should be looking perhaps to a commercial SQL
engine like Oracle or MS-SQL ... But if it's for querys mainly, MySQL if
damn fast ... :-)

Hope this helps... and specially I hope this helps a little bit for the
people of Kosovo...

/B

Bruno Mattarollo <bruno@gaiasur.com.ar>
... proud to be a PSA member <http://www.python.org/psa>

> -----Original Message-----
> From: python-list-request@cwi.nl [mailto:python-list-request@cwi.nl]On
> Behalf Of Richard van de Stadt
> Sent: Wednesday, April 21, 1999 7:39 AM
> To: python-list@cwi.nl
> Subject: Kosovo database; Python speed
>
>
> Suppose we were going to make a database to help Kosovars locate
> their family members. This would probably result in hundreds of
> thousands of records (say 1 record (file) per person).
>
> Would Python be fast enough to manage this data, make queries on
> the data, or should compiled programs be used?
>
> Richard.
>
Kosovo database; Python speed [ In reply to ]
In article <371DB466.32097FE5@pop.vet.uu.nl>,
M.Faassen@vet.uu.nl wrote:
> Richard van de Stadt wrote:
> >
> > Suppose we were going to make a database to help Kosovars locate
> > their family members. This would probably result in hundreds of
> > thousands of records (say 1 record (file) per person).
> >
> > Would Python be fast enough to manage this data, make queries on
> > the data, or should compiled programs be used?
>
> Depends on what queries you make, but if used smartly, Python can
> probably be fast enough. From what I've heard Gadfly is (a database
> implemented in Python).

It also depends on what you expect the queries to be. For this
kind of problem "grep" might work pretty well, actually.

Gadfly is best at the moment when you are doing a lot of exact matches,
so I'd expect if you were doing matches on last/first name by exact
spelling gadfly would be okay on a sufficiently large machine.
However for inexact matches I'd recommend other methods, like grep
for example. Generally if all you have is one big table something
like gadfly is less compelling than if you have many interrelated
structures to manage and query. Also look at dbm, gdbm, bplustree,
and similar.

http://www.chordate.com/gadfly.html
http://starship.skyport.net/crew/aaron_watters/bplustree/

-- Aaron Watters

===
% ping elvis
elvis is alive
% _

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
Kosovo database; Python speed [ In reply to ]
aaron_watters@my-dejanews.com wrote:
>
> It also depends on what you expect the queries to be. For this
> kind of problem "grep" might work pretty well, actually.

Seconded. Next step being 'glimpse'.

-Alex
Kosovo database; Python speed [ In reply to ]
Christian Tismer wrote:
>
> Richard van de Stadt wrote:
> >
> > Suppose we were going to make a database to help Kosovars locate
> > their family members. This would probably result in hundreds of
> > thousands of records (say 1 record (file) per person).
> >
> > Would Python be fast enough to manage this data, make queries on
> > the data, or should compiled programs be used?
>
> Dependant of the operating system, I'd suggest to use a
> database extension.
> Controlling this database from Python will give you
> enough speed. If I had to do this, my preferences are
>
> mySQL for Linux, with its interface,
> MS-Access for Windows, with a COM interface.
>
> The latter is not since I like it so much, but we have
> used it before, and the interfaces are there.
>
> Since the Kosovars need help quickly, I'd use this combination
> instead of writing something special.

I developed a system over the last few years which allows online
paper submission and retrieval, which we expect can quite easily
be transformed and reused to create a first prototype. On an old
system (SS10, 128MB RAM), Python is able to copy a test file
about 25000 times per minute, so I expect Python to be fast
enough, but wondered if other projects exist which also use
several 100.000's of records.

> Python alone will not
> be too easy, since your data will probably not fit into memory.

We were donated a system that is, I think, used for videoconferencing.
This probably is a Sun system, running Solaris, with Python 2.5.1
available. I expect at least .5 GB of RAM.

> You will also have lots of edits, so I think using a true
> database is the better choice here. (Not saying that Access is
> a true database, but it works fine with several 100000 records).
>
> But two single columns with a name and a record ID will fit,
> so your code might extract this info as a whole, map it to a dict
> and search it in some sophisticated manner. This can be even faster
> than the database.
> Do you have more info on the amount of data, fields per record,
> and what search capabilities are needed? Is it designed as a web
> based application? Are there on-line updates and such?

We intend to store any data that might be helpful, which includes photos.
Online submissions may not always be possible from within the camps,
but as refugees are being spread all over Europe, we think that it
could be used more often.

We'd like to collect existing databases, merge them, and provide all
kinds of name matching possibilities. Offline consulting and submission
should also be available, so probably there Access might then be used.

> ciao - chris
>
> --
> Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
[...]

Richard.