Mailing List Archive: New module "Set

Re: New module "Set_Calc" in the works [ In reply to ]

Dec 8, 1995, 12:42 PM

Post #26 of 29 (215 views)

I've kept perl5-porters only because of the first item -
otherwise we've gone far enough off topic that Steffen and I can
go to private email.

Steffen Beyer wrote :

|| The issue of this discussion was mainly how to name the module I'm currently
|| writing.

Set::PosInteger might work well. I'm not sure whether we need
to use Fast in the name - I think that an implementation could
be done that was capable of handling sets of integers with
unlimited sparsity, bounded only by machine memory (and machine
integer size unless it went to BigInts), that did not sacrifice
speed for small sets - such an implementation could just
supercede this one, rather than needing a new name. If there is
enough penalty with such an implementation that both would be
retained, then the second could be called Slow (or Unbounded) as
a way of distinguishing them.

The rest of this is basically to Steffen.

|| > || > Someone (Mark Biggar?) mentioned using perl bitvectors and
|| > || > getting an all-perl implementation - I suspect that it could be
|| > || > done fairly easily.
|| > ||
|| > || How do you calculate the minimum, maximum and norm of a set in Perl
|| > || efficiently?
||
|| > That's up to the implementation. If the interface provides a
|| > means of asking for these (as yours does) and the implementor
|| > believes that it is important for them to be implemented
|| > efficiently, then he can internally maintain their values.
||
|| Well, you have a lot of overhead THERE! :-)

Only if you always track it perfectly. I'd keep a flag for
whether the limit was known to be currently in the set. When
adding a new element to the set, if it is less than the lower
limit, then remember it as a known lower limit. When deleting
an element, if it is currently the lower limit, flag the lower
limit as unknown. When a request for the lowest element is
processed, the value is either known, or else a bound on it is
known and you can search up the bitvector from there (possibly
deleting the leading chunk of the bitvector if there is a big
gap to the actual first element).

|| > || > If you kept the bitvector, a start index
|| > || > and an end index, and a positive/negative flag, then you could
|| > || > deal with any origin and any finite set of integers (not just
|| > || > based at zero). The pos/neg flag denotes whether values outside
|| > || > the start/end range are considered to be included or not. Using
|| > || > the indices means that you don't have a lot of wasted space and
|| > || > computational overhead if you are dealing in small sets of
|| > || > integers that are not close to zero, e.g. numbers between 1900
|| > || > and 2000 (year numbers).
|| > ||
|| > || You would have the computational overhead of subtracting/adding 1900
|| > || (in your example) in either case - either YOU do it in your application,
|| > || or the set MODULE does it internally. So I don't see the point for
|| > || making the module's interface more complicated.
||
|| > This is not interface, it is implemetation. The user would say
|| > insert(1950) and the implementation would check whether it was
|| > within the range of the current bitvector, and if so would
|| > subtract the base of the bitvector and set the bit; if not, it
|| > would extend the appropriate end of the bitvector as needed and
|| > then set the bit.
||
|| Oh, now I understand! I thought you meant to create the set with some
|| arbitrary range of values (i.e., Create(1900,2000)) and then to have
|| the module handle the rest (I didn't quite understand what you wanted
|| to use the "-1" flag for, but you would have to pass that to the module
|| as well and so enlarge (= complicate) the interface!)

The pos/neg flag allows you to have infinitely large sets. (All
integers *except* the ones in the bitvector.) To fully use
this, you'd want to have lowest_included and lowest_excluded
(one of which would always be -inf) rather than just lowest.

|| > One implementation could choose to always use
|| > origin 0 bitvectors (as you do), while another could save space
|| > and allow negative elements by keeping a start index for the
|| > bitvector. (I'd keep the bitvector origin and length separate
|| > from any internal cache of lowest and highest elements, so that
|| > the bitvector could be kept aligned on a nice boundary (multiple
|| > of 64 perhaps) so that bitvector operations on two sets would
|| > not require any shifting.)
||
|| > The computational overhead of manipulating long strings of zero
|| > bits can be significant. (In fact, it might be nice to have an
|| > implementation that uses lists of integers for sparse sets, and
|| > then flows to lists of bitvectors for sparse sets that have some
|| > clustering, and concatenates the bitvectors as the set becomes
|| > denser. The interface would still be the same, but it would be
|| > much faster for sets involving perverse quantities.)
||
|| Yes, this all would be very nice to have (just as it's very convenient
|| that Perl grows strings, arrays and hashes automatically), but at the
|| moment, this is far beyond what I intended to write.
||
|| But it's an excellent idea, I'll keep it in mind!

A barrel of rount tuits would be nice...

|| Any suggestions about that?
||
|| Kind regards,

--
Maybe we can fix reality one of these days. | John Macdonald
<- Larry Wall Carl Dichter -> | jmm@Elegant.COM
I'd be happy if we could just index it. |

Re: New module "Set_Calc" in the works [ In reply to ]

sb at sdm

Dec 8, 1995, 5:03 PM

Post #27 of 29 (218 views)

Permalink

In article <9512081549.AA07789@toad.ig.co.uk>, Tim Bunce (Tim.Bunce@ig.co.uk) wrote:

> > From: sb@sdm.de (Steffen Beyer)
> >
> > Thanks for the clarification. So it's a "set of integer"s after all?
> > I'm sorry I "contaminated" others with my own confusion!
> >
> > But my other naming proposition still holds: What about calling Jarkko's
> > module "Set::Symbolic" or "Set::Named"

> Jarkko's Set::Scalar implements a "set of perl scalar values". It is
> perfectly named. Note that scalar values (and hence these sets) can
> include numbers and references, neither of which are symbolic or named
> in the general sense.

Thank you for spelling it out for me!

This really helps me to see it clearer. Thanks again!

You are right, concerning the naming of Jarkko's module.

BTW, I never really intended to propose to rename his module (apologies if
anyone thought so!), it was rather "oil to the fire" of our discussion here.

> > and mine "Set::Abstract"

> Eh? Jarkko's Set::Scalar is _much_ more 'abstract' (read 'can store anything')
> than your module which only stores non-negative numbers.

Granted. :-)

I thought mine was more "abstract" because you have to abstract more from
what your set actually represents ;-) , and convert ("abstract") it all to
numbers!

> > or "Set::Numbered"?

> Sigh.

I'm really sorry I made you sigh so often. Apologies!!!!

> You're elements are not 'numbered' they are numbers (although this
> is a subtle distinction). You're letting yourself be influenced by your
> implementation, thinking in terms of 'setting the numbered bit'.

Indeed!

> I think this is where you're misleading yourself.

Oooooops! Right. :-)

> I still believe that Set::IntegerFast is probably the best name for it.

Ok, let's settle for it, then. I'll use it.

> One day it might support negative numbers (by storing an 'offset' value)

An idea to keep in mind, yes, definitely.

> and I think Set::Rabbinical is rather obscure.

;-) <BIG smile, almost a laugh>

> I also believe that as far as is possible your module (in fact _all_ set
> modules) should share the same interface as Set::Scalar.

Why?

I mean, I see your point in making it possible to just change

use Set::Scalar;

$set = new Set::Scalar;

to whatever the name of another Set package is without need for any further
changes in your code, but who says that the interface in Set::Scalar is
universal in the sense that it's appropriate for every application?

My routines were (admittedly) written with a special purpose in mind, that is,
calculating character sets for parsers.

(Initially, they were written in Pascal to circumvent Turbo Pascal's (on an
Apple ][+ with Z80-card!) restriction which limited sets to 512 elements or so)

And for that purpose (and probably also for similar ones) the current interface
is most appropriate.

The ability to use Perl scalars would be a superfluous overhead killing
efficiency, wouldn't it?

> > Thank you all for this very interesting discussion!

> It shows that we really need a comp.lang.perl.modules for this kind of thing.
> It doesn't really belong in perl5-porters.

Yes, that's true. How can we initiate a request to vote on behalf of this?
I think that's what we should do. :-)
If I can help, let me know!

> Tim.

Thanks again for your clarifications, ideas and suggestions!

Yours,
--
Steffen Beyer
mailto:sb@sdm.de |s |d &|m | software design & management GmbH&Co.KG
phone: +49 89 63812-244 | | | | Thomas-Dehler-Str. 27
fax: +49 89 63812-150 | | | | 81737 Munich, Germany.

Re: New module "Set_Calc" in the works [ In reply to ]

sb at sdm

Dec 9, 1995, 11:36 PM

Post #28 of 29 (212 views)

Permalink

In article <m0tO8gY-0003E1C@elegant.com>, John Macdonald (jmm@elegant.com) wrote:
> I've kept perl5-porters only because of the first item -
> otherwise we've gone far enough off topic that Steffen and I can
> go to private email.

> Steffen Beyer wrote :

> || The issue of this discussion was mainly how to name the module I'm currently
> || writing.

> Set::PosInteger might work well.

The restriction to positive integers can be removed rather easily, so I
think it's better not to use a name referring to "Pos" explicitly since
that would have to go then, and I prefer not to have two modules with
different names that are basically the same.

> I'm not sure whether we need
> to use Fast in the name - I think that an implementation could
> be done that was capable of handling sets of integers with
> unlimited sparsity, bounded only by machine memory (and machine
> integer size unless it went to BigInts

Is there anyone out there who ever needed sets or can think of an application
that would need sets able to store such BIG integers?

), that did not sacrifice
> speed for small sets - such an implementation could just
> supercede this one, rather than needing a new name.

That's the same argument as mine from above, isn't it?

You don't need several modules with different names if you can rather just
increment the version number.

> If there is
> enough penalty with such an implementation that both would be
> retained, then the second could be called Slow (or Unbounded) as
> a way of distinguishing them.

Yes, I agree.

> The rest of this is basically to Steffen.

[answered by mail]

Yours,
--
Steffen Beyer
mailto:sb@sdm.de |s |d &|m | software design & management GmbH&Co.KG
phone: +49 89 63812-244 | | | | Thomas-Dehler-Str. 27
fax: +49 89 63812-150 | | | | 81737 Munich, Germany.

Re: New module "Set_Calc" in the works [ In reply to ]

Tim.Bunce at ig

Dec 11, 1995, 7:58 PM

Post #29 of 29 (219 views)

Permalink

> From: sb@sdm.de (Steffen Beyer)
>
> In article <9512081549.AA07789@toad.ig.co.uk>, Tim Bunce (Tim.Bunce@ig.co.uk) wrote:
>
> > I still believe that Set::IntegerFast is probably the best name for it.
>
> Ok, let's settle for it, then. I'll use it.

Set::IntegerC would also be okay by me.

> > I also believe that as far as is possible your module (in fact _all_ set
> > modules) should share the same interface as Set::Scalar.
>
> Why?
>
> I mean, I see your point in making it possible to just change
>
> use Set::Scalar;
>
> $set = new Set::Scalar;
>
> to whatever the name of another Set package is without need for any further
> changes in your code, but who says that the interface in Set::Scalar is
> universal in the sense that it's appropriate for every application?

Possibly not, in which case we should probably change/extend it.

> The ability to use Perl scalars would be a superfluous overhead killing
> efficiency, wouldn't it?

I'm not suggesting that you store scalars. I'm simply suggesting is that
in order to add an integer to a set or to create a new set from the union
of two other sets (etc etc) the code that the application uses should be
the same regardless of which particular implementation of sets is being
used.

The basic core methods for using a generic 'set' abstract data type
should be shared (as far as possible) by all implementations of that
abstract data type.

(Obviously no one will expect your Set::Integer... sets to be able to store
strings or refs so limit your thoughts to an application that only needs
to manipulate sets of integers.)

> > > Thank you all for this very interesting discussion!
>
> > It shows that we really need a comp.lang.perl.modules for this kind of thing.
> > It doesn't really belong in perl5-porters.
>
> Yes, that's true. How can we initiate a request to vote on behalf of this?

I don't know. Who did the comp.lang.perl.tk procedure?

Tim.