Mailing List Archive

Caching and tied hashes
In perl version 4, dbmopen cached 64 entries. I've looked at the source
of perl 5, and can see no evidence of a cache in the interface to "tie".

This is of direct interest, since I have a program which needs to update
the value in a GDBM file entry, by appending to it repeatedly. In the
absence of a memory cache, this causes GDBM to blow up - each addition
adds non-recoverable space into the disk file, which increases in size
quadratically. It hit 800 Mb (for approx 3Mb of information) before
failing for lack of disk space.

I wrote a "shim", which does some crude caching, but this requires the
alteration of "tie" calls:
tie %v,MyCache,GDBM_File,$file...

So, to my questions:

1: is the restoration of low-level caching of tied values on a
wish-list (it's not in the level 4 list on perl.com), or is it even
possible any more, given that the GET method has unlimited
permissible side-effects?

2: can anyone think of a way of adding caching to existing tieable
modules without changing the modules themselves or the calling code?

Failing (2):

3: Is it worth adopting a convention to allow a generalised caching
scheme to be added into existing tieable interfaces? I.e. modifying
each of GDBM, NDBM etc?

The only way that springs to my mind is to make, say, GDBM_File a
shadow, so that we have:
package GDBM_File;
@ISA=qw/Real_GDBM_File/;
It should be possible to construct a caching system which slots
itself in by unshifting @GDBM_File::ISA.

Ian
Re: Caching and tied hashes [ In reply to ]
> From: Ian Phillipps <ian@pipex.net>
>
> In perl version 4, dbmopen cached 64 entries. I've looked at the source
> of perl 5, and can see no evidence of a cache in the interface to "tie".
>
> This is of direct interest, since I have a program which needs to update
> the value in a GDBM file entry, by appending to it repeatedly. In the
> absence of a memory cache, this causes GDBM to blow up - each addition
> adds non-recoverable space into the disk file, which increases in size
> quadratically. It hit 800 Mb (for approx 3Mb of information) before
> failing for lack of disk space.

That growth looks like a gdbm bug or mis-feature.

> 1: is the restoration of low-level caching of tied values on a
> wish-list (it's not in the level 4 list on perl.com), or is it even
> possible any more, given that the GET method has unlimited
> permissible side-effects?

I don't think so, for either question.

> 2: can anyone think of a way of adding caching to existing tieable
> modules without changing the modules themselves or the calling code?

Personally I think the low-level interface should look after this.

Where possible I recommend switching to the excellent DB_File.

> Failing (2):
>
> 3: Is it worth adopting a convention to allow a generalised caching
> scheme to be added into existing tieable interfaces? I.e. modifying
> each of GDBM, NDBM etc?
>
> The only way that springs to my mind is to make, say, GDBM_File a
> shadow, so that we have:
> package GDBM_File;
> @ISA=qw/Real_GDBM_File/;
> It should be possible to construct a caching system which slots
> itself in by unshifting @GDBM_File::ISA.

That looks possible but I'd put it into AnyDBM_File.pm (and in gv.c).
So AnyDBM_File ISA CacheDBM_File and CacheDBM_File ISA (... *DBM_File ...).

Tim.
Re: Caching and tied hashes [ In reply to ]
: In perl version 4, dbmopen cached 64 entries. I've looked at the source
: of perl 5, and can see no evidence of a cache in the interface to "tie".

There is no built-in cache anymore.

: This is of direct interest, since I have a program which needs to update
: the value in a GDBM file entry, by appending to it repeatedly. In the
: absence of a memory cache, this causes GDBM to blow up - each addition
: adds non-recoverable space into the disk file, which increases in size
: quadratically. It hit 800 Mb (for approx 3Mb of information) before
: failing for lack of disk space.

That sounds like a bug in GDBM to me.

: I wrote a "shim", which does some crude caching, but this requires the
: alteration of "tie" calls:
: tie %v,MyCache,GDBM_File,$file...
:
: So, to my questions:
:
: 1: is the restoration of low-level caching of tied values on a
: wish-list (it's not in the level 4 list on perl.com), or is it even
: possible any more, given that the GET method has unlimited
: permissible side-effects?

Caching must now be done by the tied package, if by anyone. I have no
plans to put in a cache. Basically, you can't cache magic.

: 2: can anyone think of a way of adding caching to existing tieable
: modules without changing the modules themselves or the calling code?

If someone can, it isn't me.

: Failing (2):
:
: 3: Is it worth adopting a convention to allow a generalised caching
: scheme to be added into existing tieable interfaces? I.e. modifying
: each of GDBM, NDBM etc?
:
: The only way that springs to my mind is to make, say, GDBM_File a
: shadow, so that we have:
: package GDBM_File;
: @ISA=qw/Real_GDBM_File/;
: It should be possible to construct a caching system which slots
: itself in by unshifting @GDBM_File::ISA.

I think the basic packages should retain a simple one-to-one mapping,
and you should change your tie if you want additional behavior like
caching.

Larry
Re: Caching and tied hashes; and DB_File interface [ In reply to ]
Tim Bunce <Tim.Bunce@ig.co.uk> wrote:

> > the value in a GDBM file entry, by appending to it repeatedly. In the
> > absence of a memory cache, this causes GDBM to blow up - each addition

Tim> That growth looks like a gdbm bug or mis-feature.
Larry> That sounds like a bug in GDBM to me.

Yep, sure does - but I'd be more charitable, since I was pushing it
outside a plausible design envelope by appending to the same small set
of keys around 10000 times. It's just the way this particular data set
is :-( I was explaining why I needed a cache, and quickly.

> > 1: is the restoration of low-level caching of tied values on a

Larry> Basically, you can't cache magic.
Motto! Motto!

> > 2: can anyone think of a way of adding caching to existing tieable
> > modules without changing the modules themselves or the calling code?

Larry> If someone can, it isn't me.
That sorts that one out, then :-)

Tim> Personally I think the low-level interface should look after this.
Tim> Where possible I recommend switching to the excellent DB_File.

I did think about it, and intend moving that direction sometime,
especially as I'd like a B tree rather than a hash, but DB's way of file
locking doesn't mesh nicely with Perl and the DB_File interface.
It makes available a file-descriptor number, which isn't a Perl
file-handle. The DB_File interface would benefit from a bit of work in other
directions, too, notably in the handling of R_SETCURSOR/R_CURSOR/R_NEXT
area - which enables access of a group of values in the B-tree.
This, in turn, would benefit from an improvement in the tie interface to
Perl, which currently doesn't provide a meaningful parameter for NEXTKEY
methods.

Who holds the patch-pumpkin for DB_File? (Why a pumpkin, BTW?)
If no-one shouts up, I'll look at it.

> > 3: Is it worth adopting a convention to allow a generalised caching
> > scheme to be added into existing tieable interfaces? I.e. modifying
> > each of GDBM, NDBM etc?

Tim> That looks possible but I'd put it into AnyDBM_File.pm (and in gv.c).
Tim> So AnyDBM_File ISA CacheDBM_File and CacheDBM_File ISA (... *DBM_File ...).

My guess is that the relationship between CacheDBM_file and the
lower-level ones isn't usefully an "ISA", since the object needs to
store extra info, and every one of the calls needs some tweak or other.
My dud-cache module started out with an ISA, but lost it when I'd written
the last routine in the TIEHASH set.

I have to go along with the view that the availability of general magic
outweighs the efficiency gains to be had from caching. A cache provided
by the bottom level routines is probably going to be more efficient
overall than introducing another layer of perl-written processing.
In particular, my re-reading the btree(3) manual page from DB led me to
spot this:

In addition, physical writes are delayed as long as
possible, so a moderate cache can reduce the number of I/O
operations significantly.

Next stop, an improved DB_File module, I think. What should I call the
lock/unlock methods? It would be nice if these could be made general,
and part of AnyDBM_File (at least as a fallback providing a warning).
Both flock and lockf have pre-defined, non-portable, meanings; and it
would be nice to allow the possibility of locking individual records if
the underlying engine provided this.

Ian
Re: Caching and tied hashes; and DB_File interface [ In reply to ]