Mailing List Archive

Crawling Python
Hello everyone,

While playing with anydbm module using IDLE 0.4, I have made Python to do the
following:

Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import anydbm
>>> d=anydbm.open('c:\\any.db','n')
>>> for i in range(1000):
d[`i`]=`range(i,i+10)`


>>> d.close()
>>>
>>> d=anydbm.open('c:\\any.db')
>>> s=''
>>> for i in d.keys():
s=s+d[i]


>>>
>>> len(s)
49035
>>>


Then I tried was typing s only to see the string. Shell printed it after some 20
seconds, and then came to a crawl. It responded to a command once and then it
hang indefinitely. Techically it does not crash, but it displays a prompt,
accepts command and processes it for minutes for some unfathomable reason.
Curioser and curioser, I have said to myself. Can't I have ~50K strings without
Python crawling? I thought that unlike in other cases crawling was not max
performance for _this_ kind of animal?



v----e----r----y------s---l---o----w-----l----y------y'rs


PS. I have tried the same thing in DOS executable -- above does not happen.
Looks like only IDLE is affected. This is second feature, err, bug I have
found in IDLE. Maybe somebody is willing to give me a job? ;-)))








MK


--------------------------------------------------
Reality is something that does not disappear after
you cease believing in it - VALIS, Philip K. Dick
--------------------------------------------------

Delete _spamspamlovelyspam_ from address to email me

postmaster@127.0.0.1
root@127.0.0.1
webmaster@127.0.0.1
Crawling Python [ In reply to ]
[MK]
> While playing with anydbm module using IDLE 0.4, I have made
> Python to do the following:
>
> Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> [... a bunch of code building a long string ...]
> >>> len(s)
> 49035
> >>>
>
> Then I tried was typing s only to see the string. Shell printed
> it after some 20 seconds, and then came to a crawl.

I've noticed that too. Don't know whether it's the IDLE code or the Tk Text
widget that's unhappy. Probably both; e.g., IDLE spends a lot of time
syntax-coloring the junk, and the Tk Text docs warn that long lines can be
expensive.

Everything speeds up back to normal if you simply delete the giant string
from the window.

> ...
> PS. I have tried the same thing in DOS executable -- above does
> not happen.

Sure, and in return none of IDLE's history, undo/redo, searching, replacing,
scrolling or editing cmds work in a DOS box either <wink>.

adaptively y'rs - tim
Crawling Python [ In reply to ]
On Thu, 17 Jun 1999 20:02:39 -0400, "Tim Peters" <tim_one@email.msn.com> wrote:

>> PS. I have tried the same thing in DOS executable -- above does
>> not happen.

>Sure, and in return none of IDLE's history, undo/redo, searching, replacing,
>scrolling or editing cmds work in a DOS box either <wink>.

True, that's not what I meant, however -- whether it is Python or IDLE that
is affected by the problem. Since Python itself is OK, no problem.

Strangling IDLE continued:

in DOS box I tried

>>> s=''
>>> for i in range(10000):
... s=s+`range(i,i+20)`
...

>>> len(s)
1178515

and everything is OK.

While the same loop in IDLE crashed it (I mean it hangs for some period, then it
stops responding and has to be killed).







MK


--------------------------------------------------
Reality is something that does not disappear after
you cease believing in it - VALIS, Philip K. Dick
--------------------------------------------------

Delete _spamspamlovelyspam_ from address to email me

postmaster@127.0.0.1
root@127.0.0.1
webmaster@127.0.0.1
Crawling Python [ In reply to ]
In article <376a1ac1.10261180@news.tpnet.pl>,
mark@_spamspamlovelyspam_btweng.krakow.pl (MK) wrote:
> On Thu, 17 Jun 1999 20:02:39 -0400, "Tim Peters"
<tim_one@email.msn.com> wrote:
> -- clip clip --
> Strangling IDLE continued:
>
> in DOS box I tried
>
> >>> s=''
> >>> for i in range(10000):
> ... s=s+`range(i,i+20)`
> ...
>
> >>> len(s)
> 1178515
>
> and everything is OK.
>
> While the same loop in IDLE crashed it (I mean it hangs for some
period, then it
> stops responding and has to be killed).
>
> MK
> -- clip clip --

I just tried both environments on a WindowsNT system and obtained
130000 for len(s) under IDLE and Console mode python.
IDLE took approx. 25 seconds and Console took approx 20 seconds.


Sam Schulenburg


Sent via Deja.com http://www.deja.com/
Share what you know. Learn what you don't.
Crawling Python [ In reply to ]
mark@_spamspamlovelyspam_btweng.krakow.pl (MK) writes:

> >>> s=''
> >>> for i in d.keys():
> s=s+d[i]

Please, before you complain about Python's speed, think twice about
the algorithm that you are using. The above algorithm is O(n^2),
which means that if you have n keys the time it takes is proportional
to n squared. That's because each time you add to the end of the
string, the whole string has to be copied. (Unless there is a
nontrivial optimization built into Python for this special case, but I
don't think so.)

But if instead you do

>>> import string
>>> s = []
>>> for i in d.keys():
>>> s.append(d[i])
>>> s = string.join(s, '')

then you get exactly the same result in O(n) time, because s.append()
doesn't have to copy each time. For large n the second algorithm will
beat the first one every time, by an increasingly huge margin.

It may be that idle has problems too, but at least part of this
problem can be blamed on the algorithm.

Michael

--
Michael Haggerty
mhagger@blizzard.harvard.edu
Crawling Python [ In reply to ]
On Fri, 18 Jun 1999 19:34:03 GMT, Sam Schulenburg <sams@quinta.com> wrote:

>> >>> len(s)
>> 1178515

>I just tried both environments on a WindowsNT system and obtained
>130000 for len(s) under IDLE and Console mode python.

Win98. I tried the same several more times in DOS box and got different results
every time:

>>> for i in range(10000):
... s=s+`range(i,i+20)`
...
>>>
>>>
>>> len(s)
1645360
>>>
>>>
>>> s=''
>>>
>>> for i in range(10000):
... s=s+`range(i,i+20)`
...
>>> len(s)
1178515
>>>
>>> for i in range(10000):
... s=s+`range(i,i+20)`
...
>>> len(s)
2357030
>>>

Now _that_ is weird.





MK


--------------------------------------------------
Reality is something that does not disappear after
you cease believing in it - VALIS, Philip K. Dick
--------------------------------------------------

Delete _spamspamlovelyspam_ from address to email me

postmaster@127.0.0.1
root@127.0.0.1
webmaster@127.0.0.1
Crawling Python [ In reply to ]
On 18 Jun 1999 16:12:26 -0400, Michael Haggerty <mhagger@blizzard.harvard.edu>
wrote:

>mark@_spamspamlovelyspam_btweng.krakow.pl (MK) writes:
>
>> >>> s=''
>> >>> for i in d.keys():
>> s=s+d[i]
>
>Please, before you complain about Python's speed, think twice about
>the algorithm that you are using.

I don't complain about Python's speed. The speed of Python is fine with me.
What I rather meant to test was performance of anydbm. Now I do know
that daemon of speed it is not. I only wanted to get a grasp how _actually_
it perfoms.


>The above algorithm is O(n^2),
>which means that if you have n keys the time it takes is proportional
>to n squared.

Now I am not _that_ lame not to know what O(n^2) means.

>That's because each time you add to the end of the
>string, the whole string has to be copied.

Oh shit, you're right, only know I realized that s=s+d[i] actually does that
which results in given class of algorithm.

>(Unless there is a
>nontrivial optimization built into Python for this special case, but I
>don't think so.)

>But if instead you do
>
>>>> import string
>>>> s = []
>>>> for i in d.keys():
>>>> s.append(d[i])
>>>> s = string.join(s, '')

>then you get exactly the same result in O(n) time, because s.append()
>doesn't have to copy each time.

Thanks. I suspected that .append does not have to copy but was not sure.

>For large n the second algorithm will
>beat the first one every time, by an increasingly huge margin.

Well, good performance of this thing was not important to
me in first place, I actually tried to do some 'crash tests' in
order to find out limits of the system. If I meant performance,
I would write this thing in a different way definitely.

>It may be that idle has problems too, but at least part of this
>problem can be blamed on the algorithm.

Version of IDLE is still 0.4. It works surprisingly good for such an early
version. I would like a few things here and there (for example, the only thing I
sorely miss in IDLE are breakpoints), but overally it's already fine.





MK


--------------------------------------------------
Reality is something that does not disappear after
you cease believing in it - VALIS, Philip K. Dick
--------------------------------------------------

Delete _spamspamlovelyspam_ from address to email me

postmaster@127.0.0.1
root@127.0.0.1
webmaster@127.0.0.1
Crawling Python [ In reply to ]
[MK]
> ...
> Win98. I tried the same several more times in DOS box and got
> different results every time:
>
> >>> for i in range(10000):
> ... s=s+`range(i,i+20)`
> ...
> >>>
> >>>
> >>> len(s)
> 1645360
> >>>

This didn't show len(s) *before* the loop began, so all that can be
concluded is that the loop added no more than 1645360 bytes to s.

> >>> s=''
> >>>
> >>> for i in range(10000):
> ... s=s+`range(i,i+20)`
> ...
> >>> len(s)
> 1178515
> >>>

This one shows the loop adding exactly 1178515 bytes -- which it does for me
too.

> >>> for i in range(10000):
> ... s=s+`range(i,i+20)`
> ...
> >>> len(s)
> 2357030
> >>>

s wasn't reset before this loop, so the loop adds an additional 1178515
bytes on top of the 1178515 s started with, and 1178515 + 1178515 = 2357030.

> Now _that_ is weird.

not-really<wink>-ly y'rs - tim
Crawling Python [ In reply to ]
Michael Haggerty wrote:
> >>> import string
> >>> s = []
> >>> for i in d.keys():
> >>> s.append(d[i])
> >>> s = string.join(s, '')
>
> then you get exactly the same result in O(n) time, because s.append()
> doesn't have to copy each time. For large n the second algorithm will
> beat the first one every time, by an increasingly huge margin.

How about...

from array import array
s = array('c')
for e in d.values():
s.fromstring(e)
s = s.tostring()

...?