Mailing List Archive

Implementation of an lru_cache() decorator that ignores the first argument
Hi all,

in a (Flask) web application I often find that many equal (SQLAlchemy) queries
are executed across subsequent requests. So I tried to cache the results of
those queries on the module level like this:

@lru_cache()
def query_db(db, args):
# do the "expensive" query
return result

This obviously doesn't work because each request uses a new database session,
so the db argument always changes from one call to the next, triggering a new
query against the database. But even if that weren't so, the function would
keep returning the same value forever (unless it's kicked out of the cache) and
not reflect the (infrequent) changes on the database. So what I need is some
decorator that can be used like this:

@lru_ignore_first(timeout=10)
def query_db(db, args):
# do the "expensive" query
return result

This is what I came up with. I'm quite happy with it so far. Question: Am I
being too clever? is it too complicated? Am I overlooking something that will
come back and bite me later? Thanks for any comments!

from functools import wraps, lru_cache
from time import time, sleep

def lru_ignore_first(timeout=0, **lru_args):

class TimeCloak():
'''All instances compare equal until timeout expires'''
__slots__ = ('x', 't', 'timeout')

def __init__(self, timeout):
self.timeout = timeout
self.t = 0
self.x = None

def __hash__(self):
return self.t

def __eq__(self, other):
return self.t == other.t

def update(self, x):
self.x = x
if self.timeout:
t = int(time())
if t >= self.t + self.timeout:
self.t = t

cloak = TimeCloak(timeout)

def decorator(func):

@lru_cache(**lru_args)
def worker(cloak, *a, **b):
return func(cloak.x, *a, **b)

@wraps(func)
def wrapped(first, *a, **kw):
cloak.update(first)
return worker(cloak, *a, **kw)

return wrapped

return decorator

@lru_ignore_first(3)
def expensive(first, par):
'''This takes a long time'''
print('Expensive:', first, par)
return par * 2

for i in range(10):
r = expensive(i, 100)
sleep(1)
print(r)
--
https://mail.python.org/mailman/listinfo/python-list
Re: Implementation of an lru_cache() decorator that ignores the first argument [ In reply to ]
On Thu, 29 Sept 2022 at 05:36, Robert Latest via Python-list
<python-list@python.org> wrote:
> in a (Flask) web application I often find that many equal (SQLAlchemy) queries
> are executed across subsequent requests. So I tried to cache the results of
> those queries on the module level like this:
>
> @lru_cache()
> def query_db(db, args):
> # do the "expensive" query
> return result
>
> ...
> This is what I came up with. I'm quite happy with it so far. Question: Am I
> being too clever? is it too complicated? Am I overlooking something that will
> come back and bite me later? Thanks for any comments!
>
> def lru_ignore_first(timeout=0, **lru_args):
> ...

I think this code is fairly specific to what you're doing, which means
the decorator won't be as reusable (first hint of that is the entire
"timeout" feature, which isn't mentioned at all in the function's
name). So it's probably not worth trying to do this multi-layered
approach, and it would be as effective, and a lot simpler, to just
have code at the top of the query_db function to do the cache lookup.
But you may find that your database is *itself* able to do this
caching for you, and it will know when to evict from cache. If you
really have to do it yourself, keep it really really simple, but have
an easy way *in your own code* to do the cache purge; that way, you
guarantee correctness, even at the expense of some performance.

In terms of overall database performance, though: are you using
transactions correctly? With PostgreSQL, especially, the cost of doing
a series of queries in one transaction is barely higher than doing a
single query in a transaction; or, putting it the other way around,
doing several sequential transactions costs several times as much as
doing one combined transaction. Check to see that you aren't
accidentally running in autocommit mode or anything. It could save you
a lot of hassle!

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: Implementation of an lru_cache() decorator that ignores the first argument [ In reply to ]
On 29/09/2022 07.22, Robert Latest via Python-list wrote:
...

> This is what I came up with. I'm quite happy with it so far. Question: Am I
> being too clever? is it too complicated? Am I overlooking something that will
> come back and bite me later? Thanks for any comments!

Thank you for the chuckle: "Yes", you are clever; and "yes", this is
likely a bit too clever (IMHO).

The impression is that LRU will put something more-concrete, 'in front
of' SQLAlchemy - which is more abstract, and which is in-turn 'in front
of' the RDBMS (which is concrete...). Is this the code-smell making
one's nose suspicious?

The criticism is of SQLAlchemy. If the problem can't be solved with that
tool, perhaps it is not the right-tool-for-the-job...

Bias: With decades of SQL/RDBMS experience, it is easy to say, "drop the
tool".

+1 @Chris: depending upon how many transactions-between, it seems likely
find that the RDBMS will cache sufficiently, as SOP.

YMMV, ie there's only one way to find-out!
--
Regards,
=dn
--
https://mail.python.org/mailman/listinfo/python-list
Re: Implementation of an lru_cache() decorator that ignores the first argument [ In reply to ]
Hi Chris and dh,

thanks for your --as usually-- thoughtful and interesting answers. Indeed, when
doing these web applications I find that there are several layers of useful,
maybe less useful, and unknown caching. Many of my requests rely on a
notoriously unreliable read-only database outside of my control, so I cache the
required data into a local DB on my server, then I do some in-memory caching of
expensive data plots because I haven't figured out how to reliably exploit the
client-side caching ... then every middleware on that path may or may not
implement its own version of clever or not-so-clever caching. Probably not a
good idea to try and outsmart that by adding yet another thing that may break
or not be up-to-date at the wrong moment.

That said, the only caching that SQLAlchemy does (to my knowledge) is that it
stores retrieved DB items by their primary keys in the session. Not worth much
since the session gets created and dumped on each request by SQA's unit of work
paradigm. But the DB backend itself may be caching repeated queries.

Back to Python-theory: The "Cloak" object is the only way I could think of to
sneak changing data past lru_cache's key lookup mechanism. Is there some other
method? Just curious.

--
https://mail.python.org/mailman/listinfo/python-list
Re: Implementation of an lru_cache() decorator that ignores the first argument [ In reply to ]
On Thursday, September 29th, 2022 at 07:18, Robert Latest via Python-list <python-list@python.org> wrote:


> Hi Chris and dh,
>
> thanks for your --as usually-- thoughtful and interesting answers. Indeed, when
> doing these web applications I find that there are several layers of useful,
> maybe less useful, and unknown caching. Many of my requests rely on a
> notoriously unreliable read-only database outside of my control, so I cache the
> required data into a local DB on my server, then I do some in-memory caching of
> expensive data plots because I haven't figured out how to reliably exploit the
> client-side caching ... then every middleware on that path may or may not
> implement its own version of clever or not-so-clever caching. Probably not a
> good idea to try and outsmart that by adding yet another thing that may break
> or not be up-to-date at the wrong moment.
>
> That said, the only caching that SQLAlchemy does (to my knowledge) is that it
> stores retrieved DB items by their primary keys in the session. Not worth much
> since the session gets created and dumped on each request by SQA's unit of work
> paradigm. But the DB backend itself may be caching repeated queries.
>
> Back to Python-theory: The "Cloak" object is the only way I could think of to
> sneak changing data past lru_cache's key lookup mechanism. Is there some other
> method? Just curious.
>
> --
> https://mail.python.org/mailman/listinfo/python-list

You could use closures. For example, something like this:

import functools
import time


def my_cache(timeout):
start = time.monotonic()

def cache_decorator(func):
wrapper = _my_cache_wrapper(func, timeout, start)
return functools.update_wrapper(wrapper, func)

return cache_decorator


def _my_cache_wrapper(func, timeout, start):
first = None

@functools.cache
def _cached(timeout_factor, *args):
print("In the _cached function")
return func(first, *args)

def wrapper(*args):
print("In the wrapper")
nonlocal first
first, *rest = args

elapsed = time.monotonic() - start
timeout_factor = elapsed // timeout

return _cached(timeout_factor, *rest)

return wrapper


@my_cache(3)
def expensive(first, second, third):
print("In the expensive function")
return (first, second, third)


if __name__ == "__main__":
print(expensive(1, 2, 3))
print()
time.sleep(2)
print(expensive(2, 2, 3))
print()
time.sleep(2)
print(expensive(3, 2, 3))

This should output the following:

In the wrapper
In the _cached function
In the expensive function
(1, 2, 3)

In the wrapper
(1, 2, 3)

In the wrapper
In the _cached function
In the expensive function
(3, 2, 3)


It's not necessarily better than your version though. :D


Kind regards,
Heinrich Kruger
--
https://mail.python.org/mailman/listinfo/python-list