Mailing List Archive: KeyError from Shared.DC.DA.DA

Hi,

Under load, we're seeing KeyErrors from the _cached_result method of the DA
class in the code used to clear out cached results. We're using Python
2.3.5, Zope 2.8.5, mxODBCZopeDA-1.0.8 running on RHEL3 connecting to a
remote Microsoft SQL Server 2000 database.

Note this is a different issue to Collector #1745.

Here's the salient part of the traceback:

File "/opt/intranet/cbfmnet/zope_home/Products/Membership/Member.py", line
118, in _getDBValue
values = self._getDBValues(attributes=[attribute])
File "/opt/intranet/cbfmnet/zope_home/Products/Membership/Member.py", line
96, in _getDBValues
, userid=login_elements[1]
File "/opt/intranet/cbfmnet/zope/lib/python/Shared/DC/ZRDB/DA.py", line
453, in __call__
result=self._cached_result(DB__, (query, self.max_rows_))
File "/opt/intranet/cbfmnet/zope/lib/python/Shared/DC/ZRDB/DA.py", line
375, in _cached_result
if int(cache[q][0]) == key:
KeyError: ("execute dbo.EmployeeByNT 'DOMAIN', 'user'\n", 1000, '\nDBConnId:
None')

The ZSQL Method EmployeeByNT looks like this (it's actually an FSZSQL method
in a skin):

<dtml-comment>
connection_id: DB_Connection
arguments:domain userid
cache_time: 60
max_cache: 50000
</dtml-comment>
execute dbo.EmployeeByNT <dtml-sqlvar domain type="string">, <dtml-sqlvar
userid type="string">

We do have other ZSQL methods, but this one is by far the most heavily used.

Here's (most of) the offending method, _cached_result (I've snipped some
code which just appends the db connection id to the query. Note also that
Bucket is also defined as lambda: {} at the top of DA.py):

# Try to fetch from cache
if hasattr(self,'_v_cache'): cache=self._v_cache
else: cache=self._v_cache={}, Bucket()
cache, tcache = cache
max_cache=self.max_cache_
now=time()
t=now-self.cache_time_
if len(cache) > max_cache / 2:
keys=tcache.keys()
keys.reverse()
while keys and (len(keys) > max_cache or keys[-1] < t):
key=keys[-1]
q=tcache[key]
del tcache[key]
if int(cache[q][0]) == key:
del cache[q]
del keys[-1]

if cache.has_key(query):
k, r = cache[query]
if k > t: return r

# call the pure query
result=apply(DB__.query, pure_query)
if self.cache_time_ > 0:
tcache[int(now)]=query
cache[query]= now, result

return result

In this code, 'tcache' is a dict which maps integers representing the time
to queries (the result of calling int(t) where t is the floating-point
result from time.time()) , and 'cache' maps queries to a 2-tuple consisting
of the floating-point timestamp of when the method was called, and the
result of the query.

I've been trying to figure out a circumstance which could cause the KeyError
we're seeing in production, without success.

Here are some observations which may give some clues:

The result of time.time() is converted to an integer and used as a key into
tcache. AFAICS, this means that only 1 entry per second will be stored in
tcache, even though every query will be stored in cache. It also looks like
cache could be the source of a memory leak.

For example:
- 'SELECT a FROM b' and 'SELECT c FROM b' run in the same 1-second window
- 'SELECT c FROM b' will end up stored in tcache.
- Both queries end up in cache.
- 'SELECT a FROM b' will only ever be removed from cache if 'SELECT a FROM
b' is executed once more later that second - and then 'SELECT c FROM b' will
remain.

We only see the KeyError after 'some time' in the application. Presumably
this is the time it takes for the caches to fill and start actively removing
entries as per the while loop. (We restarted the cluster last night, and
have yet to see the errors return).

The SQL query we are running maps an NT credential (domain and userid) to an
internal user id from our corporate database. To give an idea of the usage
pattern of this ZSQL method, it gets run at least once per request, with
application serving approximately 150,000 - 180,000 requests per day, with
12,000 - 14,000 distinct users (and hence different arguments to the ZSQL
method).

I'm going to put some logging code around the line that causes the KeyError
next time we do a release. However - has anyone seen this before? Can anyone
see what might be going wrong here?

Cheers,
Dan

--
dan.fairs@rbos.com | GBMT | Royal Bank of Scotland plc

***********************************************************************************
The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB.
Authorised and regulated by the Financial Services Authority

This e-mail message is confidential and for use by the
addressee only. If the message is received by anyone other
than the addressee, please return the message to the sender
by replying to it and then delete the message from your
computer. Internet e-mails are not necessarily secure. The
Royal Bank of Scotland plc does not accept responsibility for
changes made to this message after it was sent.

Whilst all reasonable care has been taken to avoid the
transmission of viruses, it is the responsibility of the recipient to
ensure that the onward transmission, opening or use of this
message and any attachments will not adversely affect its
systems or data. No responsibility is accepted by The
Royal Bank of Scotland plc in this regard and the recipient should carry
out such virus and other checks as it considers appropriate.
Visit our websites at:
http://www.rbos.com
http://www.rbsmarkets.com
***********************************************************************************
_______________________________________________
Zope-DB mailing list
Zope-DB@zope.org
http://mail.zope.org/mailman/listinfo/zope-db

FAIRS, Dan, GBM wrote at 2006-9-25 08:58 +0100:
> ...
>Here's the salient part of the traceback:
>
> File "/opt/intranet/cbfmnet/zope_home/Products/Membership/Member.py", line
>118, in _getDBValue
> values = self._getDBValues(attributes=[attribute])
> File "/opt/intranet/cbfmnet/zope_home/Products/Membership/Member.py", line
>96, in _getDBValues
> , userid=login_elements[1]
> File "/opt/intranet/cbfmnet/zope/lib/python/Shared/DC/ZRDB/DA.py", line
>453, in __call__
> result=self._cached_result(DB__, (query, self.max_rows_))
> File "/opt/intranet/cbfmnet/zope/lib/python/Shared/DC/ZRDB/DA.py", line
>375, in _cached_result
> if int(cache[q][0]) == key:
>KeyError: ("execute dbo.EmployeeByNT 'DOMAIN', 'user'\n", 1000, '\nDBConnId:
>None')
>

The "DBConnId" is very strange ("None" rather than "DB_Connection").

Somehow, the "tcache" and "cache" variables are inconsistent.
I do not know however, how this can happen.

A theoretical possibility would be two different "tcache" entries
tcache[i1] = tcache[i2] = query.
Then "cache[query]" would be deleted for the first entry
and the second entry would result in a "KeyError".

However, such a case should be impossible...

You might want to use my "CCSQLMethods", which is more efficient
anyway (it uses a module level cache rather than a ZODB cache)
and definitely will avoid the problem above.

--
Dieter
_______________________________________________
Zope-DB mailing list
Zope-DB@zope.org
http://mail.zope.org/mailman/listinfo/zope-db