Mailing List Archive

Time complexity of dictionary insertions
As I understood from the Python documentation, dictionaries are
implemented as
extensible hash tables. What I didn't find either in the references or
in the FAQ is: what is the
actual time complexity for an insertion into a dictionary?

Do the old contents (probably references) have to be copied when
extending (doubling?) the
dictionary?

I guess updates and deletions have constand complexity, right?

If the complexity of insertion is something like n*log(n), does anyone
know measurements "how
linear" the real measured times are?


Thanks in advance
--Oliver
Time complexity of dictionary insertions [ In reply to ]
In article <371F2125.BEC5F892@fzi.de>,
Oliver Ciupke <ciupke@fzi.de> wrote:
> As I understood from the Python documentation, dictionaries are
> implemented as extensible hash tables.

Yes.

> What I didn't find either in the references or in the FAQ is: what is
> the actual time complexity for an insertion into a dictionary?

Min O(1), Max O(N), Ave O(1). If the hash function is doing a terrible job
(e.g. maps every key to the same hash value), make those all O(N).

> Do the old contents (probably references)

Yes.

> have to be copied when extending (doubling?)

Approximately doubling, yes.

> the dictionary?

Right.

> I guess updates and deletions have constand complexity, right?

No; see above for insertion. Deletion is O(1) always, because Python doesn't
try to shrink the table by magic (if you stick in a million keys and then
delete them, the table will still contain a million "this entry isn't being
used" markers).

> If the complexity of insertion is something like n*log(n), does anyone
> know measurements "how linear" the real measured times are?

In practice, with a non-pathological hash function + key distribution combo,
insertion & deletion act like O(1) on average.

for-more-details-see-the-source-code<wink>ly y'rs - tim

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own
Time complexity of dictionary insertions [ In reply to ]
tim_one@email.msn.com wrote:
>
> In article <371F2125.BEC5F892@fzi.de>,
> Oliver Ciupke <ciupke@fzi.de> wrote:
> > As I understood from the Python documentation, dictionaries are
> > implemented as extensible hash tables.
>
> Yes.
>
> > What I didn't find either in the references or in the FAQ is: what is
> > the actual time complexity for an insertion into a dictionary?
>
> Min O(1), Max O(N), Ave O(1). If the hash function is doing a terrible job
> (e.g. maps every key to the same hash value), make those all O(N).

C++ STL junkies know this as "amortized constant time".

=========================================================
Tres Seaver tseaver@palladion.com 713-523-6582
Palladion Software http://www.palladion.com
Time complexity of dictionary insertions [ In reply to ]
Tres Seaver <tseaver@palladion.com> writes:

> > Min O(1), Max O(N), Ave O(1). If the hash function is doing a terrible job
> > (e.g. maps every key to the same hash value), make those all O(N).
>
> C++ STL junkies know this as "amortized constant time".

So does anyone who has ever studied much at all about algorithms, data
structures, and optimization.

It's not a C++ thing. It's a computer science thing.

-Justin
Time complexity of dictionary insertions [ In reply to ]
[someone asks about the time complexity of Python dict insertions]

[Tim replies]
> Min O(1), Max O(N), Ave O(1). If the hash function is doing
> a terrible job (e.g. maps every key to the same hash value), make
> those all O(N).

[one person confuses the issue]
> C++ STL junkies know this as "amortized constant time".

[and another compounds it]
> So does anyone who has ever studied much at all about algorithms, data
> structures, and optimization.
>
> It's not a C++ thing. It's a computer science thing.

This one-ups-man-ship would be a lot cuter if Python's dict insertion were
in fact amortized constant time <0.9 wink>. It's not, and the answer I gave
doesn't imply that it is. Insertion in STL hashed associative containers
isn't ACT either.

reassuringly y'rs - tim
Time complexity of dictionary insertions [ In reply to ]
[posted and mailed]

Tim Peters <tim_one@email.msn.com> wrote in
<000901be8e11$b4842560$f09e2299@tim>:

>[someone asks about the time complexity of Python dict insertions]
>
>[Tim replies]
>[one person confuses the issue]
>[and another compounds it]
>This one-ups-man-ship would be a lot cuter if Python's dict insertion
>were in fact amortized constant time <0.9 wink>. It's not, and the
>answer I gave doesn't imply that it is. Insertion in STL hashed
>associative containers isn't ACT either.
>
This attempt at one-ups-man-ship would be a lot cuter if the STL had any
kind of hashed containers, associative or otherwise, whose performance
could be quoted! <tentative wink>

>reassuringly y'rs - tim

Argumentatively y'rs - dan.

--
Dan Smart. C++ Programming and Mentoring.
cppdan@dansmart.com
Time complexity of dictionary insertions [ In reply to ]
> This attempt at one-ups-man-ship would be a lot cuter if the STL had any
> kind of hashed containers, associative or otherwise, whose performance
> could be quoted! <tentative wink>
>
> ...
>
> Argumentatively y'rs - dan.

I specifically had in mind

http://www.sgi.com/Technology/STL/HashedAssociativeContainer.html

and its derivatives. I understand that this goes beyond the officially
blessed C++ STL, even to the point of being useful <0.9 wink>.

no-guts-no-glory-ly y'rs - tim
Time complexity of dictionary insertions [ In reply to ]
On Sat, 24 Apr 1999, Tim Peters wrote:

> [someone asks about the time complexity of Python dict insertions]
>
> [Tim replies]
> > Min O(1), Max O(N), Ave O(1). If the hash function is doing
> > a terrible job (e.g. maps every key to the same hash value), make
> > those all O(N).

<snipped discussion whether Amortized Constant Time is a C++/STL/CS

> This one-ups-man-ship would be a lot cuter if Python's dict insertion were
> in fact amortized constant time <0.9 wink>. It's not, and the answer I gave
> doesn't imply that it is. Insertion in STL hashed associative containers
> isn't ACT either.

This is interesting. What is the WCS behaviour of Python dicts?

but-it-doesn't-really-matter-'cause-it-takes-finite-time-anyway-ly y'rs,
Z.

--
Moshe Zadka <mzadka@geocities.com>.
QOTD: What fun to me! I'm not signing permanent.
Time complexity of dictionary insertions [ In reply to ]
[someone asks about the time complexity of Python dict insertions]
[Tim]
> Min O(1), Max O(N), Ave O(1). If the hash function is doing
> a terrible job (e.g. maps every key to the same hash value), make
> those all O(N).

[Moshe Zadka, gets back to the topic <wink>]
> This is interesting. What is the WCS behaviour of Python dicts?

"If the hash function is doing a terrible job (e.g. maps every key to the
same hash value), make those all O(N)." That implies O(N**2) for a sequence
of N inserts.

> but-it-doesn't-really-matter-'cause-it-takes-finite-time-anyway-ly y'rs,
> Z.

Oh, it matters a lot! I'll attach a test case, showing how to consume 4
minutes with about 2000 measily inserts. Python's tuple hash function once
did a very poor job on int 2-tuples of the form (i, i+-1), which made
marching over diagonals of sparse 2D arrays (represented as dicts) a major
exercise in faith <wink>.

if-it-weren't-for-probability-we'd-all-be-dead-ly y'rs - tim

Output from a careless (non-quiet) run, 1.5.2, Win95, P5-166:

Timings for horrid = 0
total # dict entries; elapsed time; ratio to last elapsed
1 0.00
2 0.00 1.17
4 0.00 1.23
8 0.00 1.36
16 0.00 2.46
32 0.00 1.29
64 0.01 1.46
128 0.01 1.58
256 0.02 1.76
512 0.03 1.94
1024 0.06 1.91
2048 0.11 1.96

Timings for horrid = 1
total # dict entries; elapsed time; ratio to last elapsed
1 0.00
2 0.00 2.91
4 0.00 3.35
8 0.00 3.81
16 0.02 4.14
32 0.06 3.90
64 0.28 4.76
128 1.20 4.25
256 3.91 3.26
512 14.66 3.75
1024 57.85 3.95
2048 231.16 4.00

Code:

import time

class Horrid:
N = 0

def __init__(self, horrid=1):
self.value = Horrid.N
Horrid.N = Horrid.N + 1
self.horrid = horrid

def __hash__(self):
if self.horrid:
return 42
else:
return hash(self.value)

def __cmp__(self, other):
return cmp(self.value, other.value)

MAX = 2**11

def timeit(horrid):
clock = time.clock
d = {}
elapsed = 0.0
numtoadd = 1
while 1:
if numtoadd + len(d) > MAX:
break
stufftoadd = map(Horrid, [horrid] * numtoadd)
start = clock()
for thing in stufftoadd:
d[thing] = 1
finish = clock()
lastelapsed = elapsed
elapsed = elapsed + (finish - start)
print "%7d %8.2f" % (len(d), elapsed),
if lastelapsed:
print "%5.2f" % (elapsed / lastelapsed)
else:
print
numtoadd = len(d)

for horrid in 0, 1:
print "\nTimings for horrid =", horrid
print "total # dict entries; elapsed time; ratio to last elapsed"
timeit(horrid)