Mailing List Archive

New module "Set_Calc" in the works
Re: New module "Set_Calc" in the works [ In reply to ]
> From: sb@sdm.de (Steffen Beyer)
>
> > I am currently working on a new module for all kinds of set operations.
> >
> > (To dynamically create sets of arbitrary size, to set a set to {} (empty set)
> > or Omega (set containing all possible elements), to insert/delete elements,
> > to test for element inclusion, to calculate the complement/union/intersection/
> > difference sets, to calculate the norm (# of elements) and the minimum and
> > maximum of a set, and to test for set inclusion, equality, and lexical order.)
> >
> > Sets are implemented as bit vectors and created or destroyed at runtime via
> > malloc() and free() (internally).
> >
> > (As a user, you don't need to worry about the details - you just say that you
> > want a set for a certain maximum number of elements, and you get it.)
> >
> > The module is written in C and linked to Perl via the XS interface.
> >
> > It automatically configures itself for the size of a machine word as its
> > basic storage unit, assuming that a machine word is the most efficiently
> > handled storage unit size by the system.
> >
> > Could you add it to your "The Perl 5 Module List", please?
> >
> > 6) Data Types and Data Type Utilities (see also Database Interfaces)
> >
> > Name DSLI Description Info
> > ----------- ---- -------------------------------------------- -----
> > Set::
> > ::Set_Calc adcO Dynamic set creation and all set operations STBEY +

> Any comments/suggestions welcome!

Does the _interface_ differ from Set::Scalar ? If so why?

Set::Set_Calc seems to be a rather poor name. See the Module List for
some background on module naming (and many other issues).

Since it only stores integers then I think the module should be called
Set::Integer (or perhaps Set::IntegerFast since it's written in C and a
Set::Integer could be implemented in perl).

Tim.
Re: New module "Set_Calc" in the works [ In reply to ]
I have discussed about the naming issues with Steffen. How would
Set::BitVector fit? That is what I vaguely remembered from the
discussions back when my Set::Scalar was discussed.

++jhi;
Re: New module "Set_Calc" in the works [ In reply to ]
> From: Jarkko Hietaniemi <jhi@snakemail.hut.fi>
>
> I have discussed about the naming issues with Steffen. How would
> Set::BitVector fit? That is what I vaguely remembered from the
> discussions back when my Set::Scalar was discussed.

But it's not a 'set of bit-vectors', it's a 'set of non-negative integers'
which happens to be implemented by bit vectors.

The module list notes that it's best to avoid naming modules based on
their implementation. That might not be practical where you have multiple
modules with the same _functionality and interface_ (which is not the case
here) but even then you could probably append a 'Fast' to the name (or
something else which reflects the reason for the different implementation).

I'm open to suggestions.

Tim.

p.s. What's the 'proper' name for 'non-negative integers'? I forget.
Is it Natural, Cardinal, Ordinal or something else? (Once upon a time
I wrote a Pascal compiler but I've forgoten all these terms :-)
Re: New module "Set_Calc" in the works [ In reply to ]
In article <9512061403.AA03915@toad.ig.co.uk>, Tim Bunce (Tim.Bunce@ig.co.uk) wrote:
> > > 6) Data Types and Data Type Utilities (see also Database Interfaces)
> > >
> > > Name DSLI Description Info
> > > ----------- ---- -------------------------------------------- -----
> > > Set::
> > > ::Set_Calc adcO Dynamic set creation and all set operations STBEY +

> > Any comments/suggestions welcome!

> Does the _interface_ differ from Set::Scalar ? If so why?

Yes, it does, because the whole purpose is different. It's mainly intended
for mathematical or algorithmical computations, using abstract (numbered)
elements and not symbolic (named) elements like Set::Scalar does.

(The C routines were originally used to calculate first, follow and lookahead
character sets and kernels in LL, SLR, LR and LALR parsers)

The whole interface (as well as the routines themselves) is designed for
speed of execution (index range checking is very efficient: only one compa-
rison in C!), not comfort.

> Set::Set_Calc seems to be a rather poor name.

Yes, Jarkko already convinced me of that!

> See the Module List for
> some background on module naming (and many other issues).

Yes, I had your suggestions in mind when I chose the name above, but still
figured that it would be nice to have a series: Set_Calc, DateCalc, and maybe
TimeCalc, one day.

> Since it only stores integers then I think the module should be called
> Set::Integer (or perhaps Set::IntegerFast since it's written in C and a
> Set::Integer could be implemented in perl).
>
> Tim.

Well, it doesn't store integers. (Ok, of course it does, internally, but
you're not supposed to know that! (Data and Method Encapsulation))

The user only sees a set able to hold a certain maximum number of elements
and he only knows that it's a bit vector, internally.

So I think that the name suggested by Jarkko (Set::BitVector) describes it
best.

If you (or others) don't have any objections, I'd like to use that one.

Any comments/suggestions welcome!

Kind regards,
--
Steffen Beyer
mailto:sb@sdm.de |s |d &|m | software design & management GmbH&Co.KG
phone: +49 89 63812-244 | | | | Thomas-Dehler-Str. 27
fax: +49 89 63812-150 | | | | 81737 Munich, Germany.
Re: New module "Set_Calc" in the works [ In reply to ]
> From: sb@sdm.de (Steffen Beyer)
>
> In article <9512061403.AA03915@toad.ig.co.uk>, Tim Bunce (Tim.Bunce@ig.co.uk) wrote:
> > > > Set::
> > > > ::Set_Calc adcO Dynamic set creation and all set operations STBEY +
>
> > > Any comments/suggestions welcome!
>
> > Does the _interface_ differ from Set::Scalar ? If so why?
>
> Yes, it does, because the whole purpose is different. It's mainly intended
> for mathematical or algorithmical computations, using abstract (numbered)
> elements and not symbolic (named) elements like Set::Scalar does.
>
> (The C routines were originally used to calculate first, follow and lookahead
> character sets and kernels in LL, SLR, LR and LALR parsers)
>
> The whole interface (as well as the routines themselves) is designed for
> speed of execution (index range checking is very efficient: only one compa-
> rison in C!), not comfort.

You've not really talked about the _interface_ here.

Imagine a user who has a large application written using Set::Scalar and
which only stores integers.

You come along offering your module and say "here's a faster Set module to
use in your application".

The question is, can the user _just_ change

use Set::Scalar;
$set = new Set::Scalar;
into
use Set::Set_Calc;
$set = new Set::Set_Calc;

(or similar) and _not_ have to change the many pages of core application code
which manipulates the set objects (using the intersection, union, difference,
inverse, ... methods)?


> > Set::Set_Calc seems to be a rather poor name.
>
> Yes, Jarkko already convinced me of that!
>
> > See the Module List for
> > some background on module naming (and many other issues).
>
> Yes, I had your suggestions in mind when I chose the name above, but still
> figured that it would be nice to have a series: Set_Calc, DateCalc, and maybe
> TimeCalc, one day.

DateCalc makes more sense since it's a library of assorted date calculation
functions. Set::Set_Calc is implementing an abstract data type - a very
different thing.

(The underscore is also inconsistent - but that's a petty nit-pick :-)

> > Since it only stores integers then I think the module should be called
> > Set::Integer (or perhaps Set::IntegerFast since it's written in C and a
> > Set::Integer could be implemented in perl).
>
> Well, it doesn't store integers. (Ok, of course it does, internally, but
> you're not supposed to know that! (Data and Method Encapsulation))
>
> The user only sees a set able to hold a certain maximum number of elements
> and he only knows that it's a bit vector, internally.

Correct me if I'm wrong, but doesn't Set::Set_Calc implement a
"set of integer" abstract data type?

> So I think that the name suggested by Jarkko (Set::BitVector) describes
> it best.

It describes how it's implemented and not what it's used for.
Set::Set_Calc does not implement a "set of bitvector" data type. Or does it?

> If you (or others) don't have any objections, I'd like to use that one.
>
> Any comments/suggestions welcome!

Sorry, but I still have concerns. You could always just ignore me :-)

Tim.
Re: New module "Set_Calc" in the works [ In reply to ]
> But it's not a 'set of bit-vectors', it's a 'set of non-negative integers'
> which happens to be implemented by bit vectors.

Ouch. Sorry Steffen, renaming ahead.

> p.s. What's the 'proper' name for 'non-negative integers'? I forget.
> Is it Natural, Cardinal, Ordinal or something else? (Once upon a time
> I wrote a Pascal compiler but I've forgoten all these terms :-)

If Natural means N...N does not (normally) contain zero, only the
positive number. (N_0 is a notation I have sometimes seen for the one
containing zero)

Cardinal fits the bill, cardinal = used for counting.

Set::Cardinal. Ahem. Do we need Set::Pope, Set::Archbishop, and
Set::Bishop also?

Ordinal does not because zero is, once again, excluded. ordinal = ordering.
The computerology 'zeroth' does not count.

++jhi;
Re: New module "Set_Calc" in the works [ In reply to ]
>From: sb@sdm.de (Steffen Beyer)
>Well, it doesn't store integers. (Ok, of course it does, internally, but
>you're not supposed to know that! (Data and Method Encapsulation))
>The user only sees a set able to hold a certain maximum number of elements
>and he only knows that it's a bit vector, internally.
>So I think that the name suggested by Jarkko (Set::BitVector) describes it
>best.
>If you (or others) don't have any objections, I'd like to use that one.
>Any comments/suggestions welcome!


My only comment is how does it compare speed wise, to a pure perl version
with the same interface written using vec() and the builting perl bit ops
on strings?

--
Mark Biggar
mab@wdl.lroal.com
Re: New module "Set_Calc" in the works [ In reply to ]
Re: New module "Set_Calc" in the works [ In reply to ]
> From: Daffy <m4dodge@srv.pacbell.com>
>
> > p.s. What's the 'proper' name for 'non-negative integers'? I forget.
> > Is it Natural, Cardinal, Ordinal or something else? (Once upon a time
> > I wrote a Pascal compiler but I've forgoten all these terms :-)
>
> I think 'Natural' is the mathematical term for non-negative
> integers.
>
> Mike

Thanks.

Tim.
Re: New module "Set_Calc" in the works [ In reply to ]
In article <9512062150.AA07842@dst17.wdl.loral.com>, Mark A Biggar (mab@wdl.loral.com) wrote:
> >From: sb@sdm.de (Steffen Beyer)
> >Well, it doesn't store integers. (Ok, of course it does, internally, but
> >you're not supposed to know that! (Data and Method Encapsulation))
> >The user only sees a set able to hold a certain maximum number of elements
> >and he only knows that it's a bit vector, internally.
> >So I think that the name suggested by Jarkko (Set::BitVector) describes it
> >best.
> >If you (or others) don't have any objections, I'd like to use that one.
> >Any comments/suggestions welcome!


> My only comment is how does it compare speed wise, to a pure perl version
> with the same interface written using vec() and the builting perl bit ops
> on strings?

> --
> Mark Biggar
> mab@wdl.lroal.com

Some of the functions (min, max, norm) the package offers aren't
implemented in Perl.

(And impossible to implement efficiently in Perl using vec().)

The interface is much safer, since all indices are checked (very
efficiently!).

Moreover, the interface is more user-friendly than Perl's vec().

I haven't evaluated the speed differences yet, though.

More questions?

I'll respond to the other follow-ups later, I have to interrupt myself now ;-) .
--
Steffen Beyer
mailto:sb@sdm.de |s |d &|m | software design & management GmbH&Co.KG
phone: +49 89 63812-244 | | | | Thomas-Dehler-Str. 27
fax: +49 89 63812-150 | | | | 81737 Munich, Germany.
Re: New module "Set_Calc" in the works [ In reply to ]
In article <199512062152.PAA00469@coriolan.amicus.com>, Felix Gallo (fsg@coriolan.amicus.com) wrote:
> Mr. Bunce haec fecit:
> > p.s. What's the 'proper' name for 'non-negative integers'? I forget.
> > Is it Natural, Cardinal, Ordinal or something else? (Once upon a time
> > I wrote a Pascal compiler but I've forgoten all these terms :-)

> Rabbinical.

;-)

--
Steffen Beyer
mailto:sb@sdm.de |s |d &|m | software design & management GmbH&Co.KG
phone: +49 89 63812-244 | | | | Thomas-Dehler-Str. 27
fax: +49 89 63812-150 | | | | 81737 Munich, Germany.
Re: New module "Set_Calc" in the works [ In reply to ]
In article <199512062138.XAA18823@beta.hut.fi>, Jarkko Hietaniemi (jhi@snakemail.hut.fi) wrote:

> > But it's not a 'set of bit-vectors', it's a 'set of non-negative integers'
> > which happens to be implemented by bit vectors.

> Ouch. Sorry Steffen, renaming ahead.

Well, that's not tragic. In the meanwhile, I wrote a "modify" script which
is an analogue of the "rename" tool of the camel book, however, it modifies
file CONTENTS, not file NAMES.

So the changes involved are a snap now.

> > p.s. What's the 'proper' name for 'non-negative integers'? I forget.
> > Is it Natural, Cardinal, Ordinal or something else? (Once upon a time
> > I wrote a Pascal compiler but I've forgoten all these terms :-)

> If Natural means N...N does not (normally) contain zero, only the
> positive number. (N_0 is a notation I have sometimes seen for the one
> containing zero)

> Cardinal fits the bill, cardinal = used for counting.

Yes, ok, but the module isn't for counting...

> Set::Cardinal. Ahem. Do we need Set::Pope, Set::Archbishop, and
> Set::Bishop also?

;-)

> Ordinal does not because zero is, once again, excluded. ordinal = ordering.
> The computerology 'zeroth' does not count.

> ++jhi;

At the university we always used "N" and meant N_0 - we called it the computer
scientist's priviledge...

BTW: How does a computer scientist count his fingers? 0, 1, 2, 3, ... 9 - ah!
9 - 0 + 1 (calculating the number of numbers in this range) = 10!
Ah! 10 fingers!

;-)

Yours,
--
Steffen Beyer
mailto:sb@sdm.de |s |d &|m | software design & management GmbH&Co.KG
phone: +49 89 63812-244 | | | | Thomas-Dehler-Str. 27
fax: +49 89 63812-150 | | | | 81737 Munich, Germany.
Re: New module "Set_Calc" in the works [ In reply to ]
In article <9512062120.AA06154@toad.ig.co.uk>, Tim Bunce (Tim.Bunce@ig.co.uk) wrote:

> > The whole interface (as well as the routines themselves) is designed for
> > speed of execution (index range checking is very efficient: only one compa-
> > rison in C!), not comfort.

> You've not really talked about the _interface_ here.

Ok, you're right.

Here it is:

-------------------- cut here -------------------- cut here --------------------

SET::BITVECTOR(3) PERL EXTENSIONS SET::BITVECTOR(3)


NAME
Set::BitVector - Dynamic set creation and efficient set operations

SYNOPSIS
use Set::BitVector;

Available methods (functions) and how to invoke (call) them:

Fully qualified as functions:

$ver = Set::BitVector::Version();
$set = Set::BitVector::Create($elements);

Set::BitVector::Destroy($set);
Set::BitVector::Empty($set);
Set::BitVector::Fill($set);
Set::BitVector::Insert($set,$i);
Set::BitVector::Delete($set,$i);
Set::BitVector::in($set,$i);
Set::BitVector::Union($set1,$set2,$set3);
Set::BitVector::Intersection($set1,$set2,$set3);
Set::BitVector::Difference($set1,$set2,$set3);
Set::BitVector::Complement($set1,$set2);
Set::BitVector::equal($set1,$set2);
Set::BitVector::inclusion($set1,$set2);
Set::BitVector::lexorder($set1,$set2);
Set::BitVector::Compare($set1,$set2);
Set::BitVector::Norm($set);
Set::BitVector::Min($set);
Set::BitVector::Max($set);
Set::BitVector::Copy($set1,$set2);

Object oriented:

$set->Destroy;
$set->Empty;
$set->Fill;
$set->Insert($i);
$set->Delete($i);
$set->in($i);
$set1->Union($set2,$set3);
$set1->Intersection($set2,$set3);
$set1->Difference($set2,$set3);
$set1->Complement($set2);
$set1->equal($set2);
$set1->inclusion($set2);
$set1->lexorder($set2);
$set1->Compare($set2);
$set->Norm;
$set->Min;
$set->Max;
$set1->Copy($set2);

Brief description:

Version returns a version string
Create the set object constructor
Destroy free() the memory occupied by a set
Empty delete all elements in the set
Fill insert all possible elements into the set
Insert insert a given element
Delete delete a given element
in test the presence of a given element
Union calculate the union of two sets (in-place is possible)
Intersection calculate the intersection of two sets (idem)
Difference calculate the difference of two sets ("A\B") (idem)
Complement calculate the complement of a set (in-place is possible)
equal test two sets for equality relation
inclusion test two sets for inclusion relation
lexorder test two sets for lexical order relation
Compare compare two sets - yields -1, 0 or 1 for <, = and >
Norm calculate the norm (number of elements) of the set
Min return the minimum of the set ( min{} = +infinity )
Max return the maximum of the set ( max{} = -infinity )
Copy copy one set to another

Hint: method names all in lower case indicate a boolean return value!

DESCRIPTION

(under construction)

EXAMPLE
#!perl -w

use strict;
no strict "vars";

use Set::BitVector;

print "\n***** Calculating Prime Numbers - The Seave Of Erathostenes *****\n";

$limit = 0;

if (-t STDIN)
{
while ($limit < 16)
{
print "\nPlease enter an upper limit (>15): ";
$limit = <STDIN>;
chop($limit) while ($limit =~ /\n$/);
$limit = 0 if (($limit eq "") || ($limit =~ /\D/));
}
print "\n";
}
else
{
$limit = 100;
print "\nRunning in batch mode - using $limit as upper limit.\n\n";
}

$set = Set::BitVector::Create($limit+1);

$set->Fill;

$set->Delete(0);
$set->Delete(1);

print "Calculating the prime numbers in the range [2..$limit]...\n\n";

$start = time;

for ( $j = 4; $j <= $limit; $j += 2 )
{
$set->Delete($j);
}

for ( $i = 3; ($j = $i * $i) <= $limit; $i += 2 )
{
for ( ; $j <= $limit; $j += $i )
{
$set->Delete($j);
}
}

$stop = time;

&print_elapsed_time;

$min = $set->Min;
$max = $set->Max;
$norm = $set->Norm;

print "Found $norm prime numbers in the range [2..$limit]:\n\n";

for ( $i = $min, $j = 0; $i <= $max; $i++ )
{
if ($set->in($i))
{
print "prime number #", ++$j, " = $i\n";
}
}

print "\n";

$set->Destroy;

exit;

sub print_elapsed_time
{
($sec,$min,$hour,$year,$yday) = (gmtime($stop - $start))[0,1,2,5,7];
$year -= 70;
$flag = 0;
print "Elapsed time: ";
if ($year > 0)
{
printf("%d year%s ", $year, ($year!=1)?"s":"");
$flag = 1;
}
if (($yday > 0) || $flag)
{
printf("%d day%s ", $yday, ($yday!=1)?"s":"");
$flag = 1;
}
if (($hour > 0) || $flag)
{
printf("%d hour%s ", $hour, ($hour!=1)?"s":"");
$flag = 1;
}
if (($min > 0) || $flag)
{
printf("%d minute%s ", $min, ($min!=1)?"s":"");
}
printf("%d second%s.\n\n", $sec, ($sec!=1)?"s":"");
}

__END__

SEE ALSO
perl(1), perlsub(1), perlmod(1), perlref(1), perlobj(1), perlbot(1),
perlapi(1), perlguts(1).

VERSION
This man page documents Set::BitVector, version 1.0.

AUTHOR
Steffen Beyer <sb@sdm.de> (sd&m GmbH&Co.KG, Munich, Germany)

COPYRIGHT
Copyright (c) 1995 by Steffen Beyer. All rights reserved.

LICENSE AGREEMENT
This package is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.

-------------------- cut here -------------------- cut here --------------------

> Imagine a user who has a large application written using Set::Scalar and
> which only stores integers.

> You come along offering your module and say "here's a faster Set module to
> use in your application".

> The question is, can the user _just_ change

> use Set::Scalar;
> $set = new Set::Scalar;
> into
> use Set::Set_Calc;
> $set = new Set::Set_Calc;

> (or similar) and _not_ have to change the many pages of core application code
> which manipulates the set objects (using the intersection, union, difference,
> inverse, ... methods)?

No. (See interface and example above)

This is because Jarkko's module uses symbolic (named) elements and I am using
abstract (numbered) elements.

> > > Set::Set_Calc seems to be a rather poor name.
> >
> > Yes, Jarkko already convinced me of that!
> >
> > > See the Module List for
> > > some background on module naming (and many other issues).
> >
> > Yes, I had your suggestions in mind when I chose the name above, but still
> > figured that it would be nice to have a series: Set_Calc, DateCalc, and maybe
> > TimeCalc, one day.

> DateCalc makes more sense since it's a library of assorted date calculation
> functions. Set::Set_Calc is implementing an abstract data type - a very
> different thing.

Ok. True.

> (The underscore is also inconsistent - but that's a petty nit-pick :-)

Well, actually it's not totally inconsistent, since I tried to maintain a
DOS-compatible 8.3 name (of constant length), because the C library grew up
under MS C... ;-)

> > > Since it only stores integers then I think the module should be called
> > > Set::Integer (or perhaps Set::IntegerFast since it's written in C and a
> > > Set::Integer could be implemented in perl).
> >
> > Well, it doesn't store integers. (Ok, of course it does, internally, but
> > you're not supposed to know that! (Data and Method Encapsulation))
> >
> > The user only sees a set able to hold a certain maximum number of elements
> > and he only knows that it's a bit vector, internally.

> Correct me if I'm wrong, but doesn't Set::Set_Calc implement a
> "set of integer" abstract data type?

Wrong.

Since you seem to know Pascal very well (you mentioned it somewhere) and since
you're asking in terms of Pascal syntax, I'll reply in terms of Pascal syntax:

The module DOESN'T implement a "set of integer" data type, but a
"set of boolean" data type!

So what would you (all) say if I named it "Set::Boolean"?

> > So I think that the name suggested by Jarkko (Set::BitVector) describes
> > it best.

> It describes how it's implemented and not what it's used for.

Yes, but isn't this true for "Set::Scalar" as well?

Maybe we should call Jarkko's module "Set::Symbolic" and mine "Set::Abstract"
or "Set::Named" and "Set::Numbered", respectively?

> Set::Set_Calc does not implement a "set of bitvector" data type. Or does it?

No, it doesn't. It implements a "set of boolean" data type (see above).

> > If you (or others) don't have any objections, I'd like to use that one.
> >
> > Any comments/suggestions welcome!

> Sorry, but I still have concerns.

Don't hold them back! :-)

> You could always just ignore me :-)

Why should I?

> Tim.

Best regards,
--
Steffen Beyer
mailto:sb@sdm.de |s |d &|m | software design & management GmbH&Co.KG
phone: +49 89 63812-244 | | | | Thomas-Dehler-Str. 27
fax: +49 89 63812-150 | | | | 81737 Munich, Germany.
Re: New module "Set_Calc" in the works [ In reply to ]
> From: sb@sdm.de (Steffen Beyer)
>
> In article <9512062120.AA06154@toad.ig.co.uk>, Tim Bunce (Tim.Bunce@ig.co.uk) wrote:
>
> > Correct me if I'm wrong, but doesn't Set::Set_Calc implement a
> > "set of integer" abstract data type?
>
> Wrong.
>
> Since you seem to know Pascal very well (you mentioned it somewhere) and since
> you're asking in terms of Pascal syntax, I'll reply in terms of Pascal syntax:
>
> The module DOESN'T implement a "set of integer" data type, but a
> "set of boolean" data type!

> > Set::Set_Calc does not implement a "set of bitvector" data type. Or does it?
>
> No, it doesn't. It implements a "set of boolean" data type (see above).


Sigh. Is it just me or is anyone else confused by this?

Tim.
Re: New module "Set_Calc" in the works [ In reply to ]
> From: sb@sdm.de (Steffen Beyer)
>
> In article <9512062120.AA06154@toad.ig.co.uk>, Tim Bunce (Tim.Bunce@ig.co.uk) wrote:
>
> > Correct me if I'm wrong, but doesn't Set::Set_Calc implement a
> > "set of integer" abstract data type?
>
> Wrong.
>
> Since you seem to know Pascal very well (you mentioned it somewhere) and since
> you're asking in terms of Pascal syntax, I'll reply in terms of Pascal syntax:
>
> The module DOESN'T implement a "set of integer" data type, but a
> "set of boolean" data type!

> > Set::Set_Calc does not implement a "set of bitvector" data type. Or does it?
>
> No, it doesn't. It implements a "set of boolean" data type (see above).


Sigh. Is it just me or is anyone else confused by this?

Tim.
Re: New module "Set_Calc" in the works [ In reply to ]
Tim Bunce wrote :
|| > From: sb@sdm.de (Steffen Beyer)
|| >
|| > In article <9512062120.AA06154@toad.ig.co.uk>, Tim Bunce (Tim.Bunce@ig.co.uk) wrote:
|| >
|| > > Correct me if I'm wrong, but doesn't Set::Set_Calc implement a
|| > > "set of integer" abstract data type?
|| >
|| > Wrong.
|| >
|| > Since you seem to know Pascal very well (you mentioned it somewhere) and since
|| > you're asking in terms of Pascal syntax, I'll reply in terms of Pascal syntax:
|| >
|| > The module DOESN'T implement a "set of integer" data type, but a
|| > "set of boolean" data type!
||
|| > > Set::Set_Calc does not implement a "set of bitvector" data type. Or does it?
|| >
|| > No, it doesn't. It implements a "set of boolean" data type (see above).
||
||
|| Sigh. Is it just me or is anyone else confused by this?

It's not you that is confused, I think it is Steffen.

The "boolean" involved is a property of the concept of "set" and
how it differes from other mathematical collections. A set
either contains an element or not, without any means for
counting the number of times. But the boolean value that is
true/false to denote whether a particular object is an element
of a set does *not* have any relationship to the type of objects
that can be contained within the set. It is the type of object
that can be an element that is of significance and Steffen's
module is capable of inserting and deleting (and otherwise
dealing with) positive integers up to a maximum value. So it is
a "set of finite rabbinicals".

Someone (Mark Biggar?) mentioned using perl bitvectors and
getting an all-perl implementation - I suspect that it could be
done fairly easily. If you kept the bitvector, a start index
and an end index, and a positive/negative flag, then you could
deal with any origin and any finite set of integers (not just
based at zero). The pos/neg flag denotes whether values outside
the start/end range are considered to be included or not. Using
the indices means that you don't have a lot of wasted space and
computational overhead if you are dealing in small sets of
integers that are not close to zero, e.g. numbers between 1900
and 2000 (year numbers).

--
Maybe we can fix reality one of these days. | John Macdonald
<- Larry Wall Carl Dichter -> | jmm@Elegant.COM
I'd be happy if we could just index it. |
Re: New module "Set_Calc" in the works [ In reply to ]
In <9512071500.AA01871@toad.ig.co.uk>
On Thu, 7 Dec 1995 15:00:03 +0000
Tim Bunce <Tim.Bunce@ig.co.uk> writes:
>>
>> The module DOESN'T implement a "set of integer" data type, but a
>> "set of boolean" data type!
>
>> > Set::Set_Calc does not implement a "set of bitvector" data type. Or does it?
>>
>> No, it doesn't. It implements a "set of boolean" data type (see above).
>
>
>Sigh. Is it just me or is anyone else confused by this?
>
I for one would be very surprised if it was not a 'set of integer'.

Set of boolean would only have the possible
sets [] [false] [true] [false,true] which is silly.

And set of 'bitvector' is just as silly in the other direction.
Re: New module "Set_Calc" in the works [ In reply to ]
In article <m0tNljA-0003DtC@elegant.com>,
John Macdonald (jmm@elegant.com) wrote:

> Tim Bunce wrote :
> > Sigh. Is it just me or is anyone else confused by this?

> It's not you that is confused, I think it is Steffen.

Maybe I really am. In that case, my apologies to everyone, and please
enlighten me!

> The "boolean" involved is a property of the concept of "set" and
> how it differes from other mathematical collections. A set
> either contains an element or not, without any means for
> counting the number of times. But the boolean value that is
> true/false to denote whether a particular object is an element
> of a set does *not* have any relationship to the type of objects
> that can be contained within the set. It is the type of object
> that can be an element that is of significance and Steffen's
> module is capable of inserting and deleting (and otherwise
> dealing with) positive integers up to a maximum value. So it is
> a "set of finite rabbinicals".

Thanks for the clarification. So it's a "set of integer"s after all?
I'm sorry I "contaminated" others with my own confusion!

But my other naming proposition still holds: What about calling Jarkko's
module "Set::Symbolic" or "Set::Named", and mine "Set::Abstract" or
"Set::Numbered"? I think that's the fundamental difference it boils
down to.

> Someone (Mark Biggar?) mentioned using perl bitvectors and
> getting an all-perl implementation - I suspect that it could be
> done fairly easily.

How do you calculate the minimum, maximum and norm of a set in Perl
efficiently?

> If you kept the bitvector, a start index
> and an end index, and a positive/negative flag, then you could
> deal with any origin and any finite set of integers (not just
> based at zero). The pos/neg flag denotes whether values outside
> the start/end range are considered to be included or not. Using
> the indices means that you don't have a lot of wasted space and
> computational overhead if you are dealing in small sets of
> integers that are not close to zero, e.g. numbers between 1900
> and 2000 (year numbers).

You would have the computational overhead of subtracting/adding 1900
(in your example) in either case - either YOU do it in your application,
or the set MODULE does it internally. So I don't see the point for
making the module's interface more complicated.

Thank you all for this very interesting discussion!

Yours,
--
Steffen Beyer
mailto:sb@sdm.de |s |d &|m | software design & management GmbH&Co.KG
phone: +49 89 63812-244 | | | | Thomas-Dehler-Str. 27
fax: +49 89 63812-150 | | | | 81737 Munich, Germany.
Re: New module "Set_Calc" in the works [ In reply to ]
> How do you calculate the minimum, maximum and norm of a set in Perl
> efficiently?

This one is easy. You don't. :-) Please do not talk about 'sets'
because the concept is by no means well-defined (as this very
discussion serves to show). "The minima, maxima and norms" of a "set"
even less so.

++jhi;
Re: New module "Set_Calc" in the works [ In reply to ]
> From: Jarkko Hietaniemi <jhi@snakemail.hut.fi>
>
> > How do you calculate the minimum, maximum and norm of a set in Perl
> > efficiently?
>
> This one is easy. You don't. :-) Please do not talk about 'sets'
> because the concept is by no means well-defined (as this very
> discussion serves to show). "The minima, maxima and norms" of a "set"
> even less so.

Indeed. They are statistical functions on a list of numbers that, in
this case, are the contents of a set.

Ideally those methods belong in a separate stats module (that has nothing
to do with sets) and you would say something like

use Stats qw(max);
...
$max = max($set->members);

Tim.
Re: New module "Set_Calc" in the works [ In reply to ]
> From: sb@sdm.de (Steffen Beyer)
>
> Thanks for the clarification. So it's a "set of integer"s after all?
> I'm sorry I "contaminated" others with my own confusion!
>
> But my other naming proposition still holds: What about calling Jarkko's
> module "Set::Symbolic" or "Set::Named"

Jarkko's Set::Scalar implements a "set or perl scalar values". It is
perfectly named. Note that scalar values (and hence these sets) can
include numbers and references, neither of which are symbolic or named
in the general sense.

> and mine "Set::Abstract"

Eh? Jarkko's Set::Scalar is _much_ more 'abstract' (read 'can store anything')
than your module which only stores non-negative numbers.

> or "Set::Numbered"?

Sigh. You're elements are not 'numbered' they are numbers (although this
is a subtle distinction). You're letting yourself be influenced by your
implementation, thinking in terms of 'setting the numbered bit'. I think
this is where you're misleading yourself.

I still believe that Set::IntegerFast is probably the best name for it.
One day it might support negative numbers (by storing an 'offset' value)
and I think Set::Rabbinical is rather obscure.

I also believe that as far as is possible your module (in fact _all_ set
modules) should share the same interface as Set::Scalar.

> Thank you all for this very interesting discussion!

It shows that we really need a comp.lang.perl.modules for this kind of thing.
It doesn't really belong in perl5-porters.

Tim.
Re: New module "Set_Calc" in the works [ In reply to ]
Steffen Beyer wrote :
|| In article <m0tNljA-0003DtC@elegant.com>,
|| John Macdonald (jmm@elegant.com) wrote:
|| > Someone (Mark Biggar?) mentioned using perl bitvectors and
|| > getting an all-perl implementation - I suspect that it could be
|| > done fairly easily.
||
|| How do you calculate the minimum, maximum and norm of a set in Perl
|| efficiently?

That's up to the implementation. If the interface provides a
means of asking for these (as yours does) and the implementor
believes that it is important for them to be implemented
efficiently, then he can internally maintain their values.

|| > If you kept the bitvector, a start index
|| > and an end index, and a positive/negative flag, then you could
|| > deal with any origin and any finite set of integers (not just
|| > based at zero). The pos/neg flag denotes whether values outside
|| > the start/end range are considered to be included or not. Using
|| > the indices means that you don't have a lot of wasted space and
|| > computational overhead if you are dealing in small sets of
|| > integers that are not close to zero, e.g. numbers between 1900
|| > and 2000 (year numbers).
||
|| You would have the computational overhead of subtracting/adding 1900
|| (in your example) in either case - either YOU do it in your application,
|| or the set MODULE does it internally. So I don't see the point for
|| making the module's interface more complicated.

This is not interface, it is implemetation. The user would say
insert(1950) and the implementation would check whether it was
within the range of the current bitvector, and if so would
subtract the base of the bitvector and set the bit; if not, it
would extend the appropriate end of the bitvector as needed and
then set the bit. One implementation could choose to always use
origin 0 bitvectors (as you do), while another could save space
and allow negative elements by keeping a start index for the
bitvector. (I'd keep the bitvector origin and length separate
from any internal cache of lowest and highest elements, so that
the bitvector could be kept aligned on a nice boundary (multiple
of 64 perhaps) so that bitvector operations on two sets would
not require any shifting.)

The computational overhead of manipulating long strings of zero
bits can be significant. (In fact, it might be nice to have an
implementation that uses lists of integers for sparse sets, and
then flows to lists of bitvectors for sparse sets that have some
clustering, and concatenates the bitvectors as the set becomes
denser. The interface would still be the same, but it would be
much faster for sets involving perverse quantities.)

--
Maybe we can fix reality one of these days. | John Macdonald
<- Larry Wall Carl Dichter -> | jmm@Elegant.COM
I'd be happy if we could just index it. |
Re: New module "Set_Calc" in the works [ In reply to ]
In article <199512081407.QAA05660@lk-hp-2.hut.fi>, Jarkko Hietaniemi (jhi@snakemail.hut.fi) wrote:

> > How do you calculate the minimum, maximum and norm of a set in Perl
> > efficiently?

> This one is easy. You don't. :-) Please do not talk about 'sets'
> because the concept is by no means well-defined

Cantor is rotating in his tomb... ;-)

> (as this very
> discussion serves to show). "The minima, maxima and norms" of a "set"
> even less so.

This is not true for Analysis, where infimum, supremum, minimum and maximum
are well-defined for sets... (of reals)

And in computer science the norm of a set (of integers) is also well-defined...

;-)

> ++jhi;

Sorry! You don't win a point this time! ;-) ;-) ;-)

Well, but you're right: The concept of sets is not well-defined in the sense
that there are many possible ways of REALIZING sets.

For instance, have you already seen Kruskal's algorithm in "Efficient
Algorithms"?

It uses sets that are stored as TREES in an array!

I show you the Perl program for it:

-------------------- cut here -------------------- cut here --------------------
#!perl -w

# Copyright (c) 1995 by Steffen Beyer. All rights reserved.
# This package is free software; you can redistribute it and/or
# modify it under the same terms as Perl itself.

################################################################################
# Module implementing Kruskal's algorithm for minimal spanning trees in graphs #
################################################################################

# "require" this module in order to use it.

$number_of_edges = 0;
$number_of_vortices = 0;

# Input: A set of vortices which constitute a graph (some cities on a map,
# for example), a set of edges (i.e., roads) between the vortices of the
# (non-directed and connected) graph (i.e., the edges can be traveled in
# either direction, and a path must exist between any two vortices), and
# the cost of each edge (for instance, the geographical distance).

# Output: A set of edges forming a spanning tree (i.e., a set of edges linking
# all vortices, so that a path exists between any two vortices) which is free
# of circles (because it's a tree) and which is minimal in terms of the cost
# function defined on the set of edges.

# See Aho, Hopcroft, Ullman, "The Design and Analysis of Computer Algorithms"
# for more details on the algorithm.

# Remove the '#'s from the beginning of the following two lines and run
# this script as a stand-alone program (or call &example; after requiring
# this module) to see an example:

#&example;

#exit;

sub example
{
my($costs) = 0;
my($k);

print "\n";
print "+++ Kruskal's Algorithm for Minimal Spanning Trees in Graphs +++";
print "\n";

&define_vortices(2,3,5,7,11,13,17,19,23,29,31);

print "\nVortices:\n\n";

for ( $k = 1; $k <= $#VS; ++$k )
{
if (defined $VS[$k]) { print "$k\n"; }
}

&define_edges( 2,13,3, 3,13,2, 5,13,1, 3,5,2, 3,29,21, 23,29,3,
23,31,2, 5,31,15, 5,7,10, 2,11,2, 7,11,2, 7,19,5, 11,19,2,
7,31,4, 3,17,3, 17,23,3, 7,17,3 );

print "\nEdges:\n\n";

for ( $k = 1; $k <= $#E; ++$k )
{
print ${$E[$k]}{'from'}, " <-> ", ${$E[$k]}{'to'}, " = ",
${$E[$k]}{'cost'}, "\n";
}

&kruskal;

print "\nEdges in minimal spanning tree:\n\n";

for ( $k = 1; $k <= $#T; ++$k )
{
print ${$T[$k]}{'from'}, " <-> ", ${$T[$k]}{'to'}, " = ",
${$T[$k]}{'cost'}, "\n";
$costs += ${$T[$k]}{'cost'};
}

print "\nTotal costs: $costs\n\n";
}

sub define_vortices
{
undef @VS;
$number_of_vortices = 0;
foreach (@_)
{
($_ > 0) || die "vortex number is not positive!\n";
$VS[$_] = -1;
++$number_of_vortices;
}
}

sub define_edges
{
undef @E;
$number_of_edges = 0;
while (@_)
{
$from = shift || die "missing 'from' vortex number!\n";
$to = shift || die "missing 'to' vortex number!\n";
$cost = shift || die "missing edge 'cost' value!\n";
defined $VS[$from] || die "vortex '$from' not previously defined!\n";
defined $VS[$to] || die "vortex '$to' not previously defined!\n";
($from != $to) || die "vortices 'from' and 'to' are the same!\n";
$E[++$number_of_edges] =
{ 'from' => $from, 'to' => $to, 'cost' => $cost };
}
}

sub heapify # complexity: O(ld n)
{
my($i,$n) = @_;
my($i2,$i21,$j,$swap);

while ($i < $n)
{
$j = $i;
$i2 = $i * 2;
$i21 = $i2 + 1;
if ($i2 <= $n)
{
if (${$E[$i]}{'cost'} > ${$E[$i2]}{'cost'})
{
$j = $i2;
if ($i21 <= $n)
{
if (${$E[$i2]}{'cost'} > ${$E[$i21]}{'cost'}) { $j = $i21; }
}
}
else
{
if ($i21 <= $n)
{
if (${$E[$i]}{'cost'} > ${$E[$i21]}{'cost'}) { $j = $i21; }
}
}
}
if ($i != $j)
{
$swap = $E[$i];
$E[$i] = $E[$j];
$E[$j] = $swap;
$i = $j;
}
else { $i = $n; }
}
}

sub makeheap # complexity: O(n ld n)
{
my($n) = @_;
my($k);

for ( $k = $n - 1; $k > 0; --$k ) { &heapify($k, $n); }
}

# The following subroutine isn't used by this algorithm, it is only included
# here for the sake of completeness:

sub heapsort # complexity: O(n ld n)
{
my($n) = @_;
my($k,$swap);

for ( $k = $n - 1; $k > 0; --$k ) { &heapify($k, $n); }

for ( $k = $n; $k > 1; --$k )
{
$swap = $E[1];
$E[1] = $E[$k];
$E[$k] = $swap;
&heapify(1, $k - 1);
}
}

# (Disjoint!) sets are stored as trees in an array in this algorithm. Each
# element of some set (a cell in the array) contains a pointer to (the number
# of) another element, up to the root element that does not point anywhere,
# but contains the (negative) number of elements the set contains. The number
# of the root element is also used as an identifier for the set.
#
# Example:
#
# i : | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
# -------------+-----+-----+-----+-----+-----+-----+-----+-----+
# parent[i] : | -4 | -3 | 1 | 2 | 1 | -1 | 3 | 4 |
#
# This array contains the three sets S1, S2 and S6:
#
# 1 2 6
# / \ |
# 3 5 4
# / |
# 7 8
#
# "find" returns the number of the root element (= the identifier of the set)
# of the tree in which the given element is contained:
#
# find(a) := i so that a in Si
#
# It also reduces the height of that tree by changing all the pointers from
# the given element up to the root element to point DIRECTLY to the root
# element.
#
# Example:
#
# find(7) returns "1" and modifies the array as follows:
#
# i : | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
# -------------+-----+-----+-----+-----+-----+-----+-----+-----+
# parent[i] : | -4 | -3 | 1 | 2 | 1 | -1 | 1 | 4 |
#
# 1 2 6
# /|\ |
# 3 5 7 4
# |
# 8
#
# "union" takes the identifiers of two sets (= the numbers of their respective
# root elements) and merges the two sets by appending one of the two trees to
# the other. It always appends the SMALLER set to the LARGER one (to keep the
# height of the resulting tree as small as possible) and updates the number of
# elements contained in the resulting set which is stored in the root element's
# cell of the array.
#
# Example:
#
# union(2,6) does the following:
#
# i : | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
# -------------+-----+-----+-----+-----+-----+-----+-----+-----+
# parent[i] : | -4 | -4 | 1 | 2 | 1 | 2 | 1 | 4 |
#
# 1 2
# /|\ / \
# 3 5 7 4 6
# |
# 8

# complexity for O(n) "find" operations: O( G(n) n )
#
# complexity for one "union" operation: O(1)
#
# complexity for O(n) ( "find" + "union" ) operations: O( G(n) n )
#
# where G(n) := min{ j | F(j) >= n }
#
# and F(j) := 1 for j = 0
# F(j) := 2 ^ F(j-1) for j > 0
#
# also, G(n) <= ld n for all n

sub find
{
my($i) = @_;
my($j,$k,$t);

$j = $i;
while ($VS[$j] > 0) { $j = $VS[$j]; } # find root element (= set identifier)
$k = $i;
while ($k != $j) # height compression of the tree
{
$t = $VS[$k];
$VS[$k] = $j;
$k = $t;
}
return($j);
}

sub union
{
my($i,$j) = @_;
my($x);

$x = $VS[$i] + $VS[$j]; # calculate number of elements in resulting set
if ($VS[$i] > $VS[$j]) # which of the two sets contains more elements?
{
$VS[$i] = $j; # merge them
$VS[$j] = $x; # update number of elements
}
else
{
$VS[$j] = $i; # merge them
$VS[$i] = $x; # update number of elements
}
}

sub kruskal # complexity: O(n ld n) ( where n := |{ Edges }| )
{
my($n) = $number_of_edges;
my($v) = $number_of_vortices;
my($i,$j,$swap);
my($t) = 0;

undef @T;
&makeheap($number_of_edges); # complexity: O(n ld n)
while (($v > 1) && ($n > 0))
{
$swap = $E[1];
$E[1] = $E[$n];
$E[$n] = $swap;
&heapify(1, $n - 1); # complexity: n O(ld n) = O(n ld n)
$i = find(${$E[$n]}{'from'}); # complexity: n ( 2 find + 1 union ) =
$j = find(${$E[$n]}{'to'}); # O( G(n) n ) <= O(n ld n)
if ($i != $j)
{
union($i,$j);
$T[++$t] = $E[$n];
--$v;
}
--$n;
}
return(@T);
}

1;
__END__

-------------------- cut here -------------------- cut here --------------------

Yours,
--
Steffen Beyer
mailto:sb@sdm.de |s |d &|m | software design & management GmbH&Co.KG
phone: +49 89 63812-244 | | | | Thomas-Dehler-Str. 27
fax: +49 89 63812-150 | | | | 81737 Munich, Germany.
Re: New module "Set_Calc" in the works [ In reply to ]
In article <m0tO53m-0003E1C@elegant.com>, John Macdonald (jmm@elegant.com) wrote:
> Steffen Beyer wrote :
> || In article <m0tNljA-0003DtC@elegant.com>,
> || John Macdonald (jmm@elegant.com) wrote:
> || > Someone (Mark Biggar?) mentioned using perl bitvectors and
> || > getting an all-perl implementation - I suspect that it could be
> || > done fairly easily.
> ||
> || How do you calculate the minimum, maximum and norm of a set in Perl
> || efficiently?

> That's up to the implementation. If the interface provides a
> means of asking for these (as yours does) and the implementor
> believes that it is important for them to be implemented
> efficiently, then he can internally maintain their values.

Well, you have a lot of overhead THERE! :-)

> || > If you kept the bitvector, a start index
> || > and an end index, and a positive/negative flag, then you could
> || > deal with any origin and any finite set of integers (not just
> || > based at zero). The pos/neg flag denotes whether values outside
> || > the start/end range are considered to be included or not. Using
> || > the indices means that you don't have a lot of wasted space and
> || > computational overhead if you are dealing in small sets of
> || > integers that are not close to zero, e.g. numbers between 1900
> || > and 2000 (year numbers).
> ||
> || You would have the computational overhead of subtracting/adding 1900
> || (in your example) in either case - either YOU do it in your application,
> || or the set MODULE does it internally. So I don't see the point for
> || making the module's interface more complicated.

> This is not interface, it is implemetation. The user would say
> insert(1950) and the implementation would check whether it was
> within the range of the current bitvector, and if so would
> subtract the base of the bitvector and set the bit; if not, it
> would extend the appropriate end of the bitvector as needed and
> then set the bit.

Oh, now I understand! I thought you meant to create the set with some
arbitrary range of values (i.e., Create(1900,2000)) and then to have
the module handle the rest (I didn't quite understand what you wanted
to use the "-1" flag for, but you would have to pass that to the module
as well and so enlarge (= complicate) the interface!)

> One implementation could choose to always use
> origin 0 bitvectors (as you do), while another could save space
> and allow negative elements by keeping a start index for the
> bitvector. (I'd keep the bitvector origin and length separate
> from any internal cache of lowest and highest elements, so that
> the bitvector could be kept aligned on a nice boundary (multiple
> of 64 perhaps) so that bitvector operations on two sets would
> not require any shifting.)

> The computational overhead of manipulating long strings of zero
> bits can be significant. (In fact, it might be nice to have an
> implementation that uses lists of integers for sparse sets, and
> then flows to lists of bitvectors for sparse sets that have some
> clustering, and concatenates the bitvectors as the set becomes
> denser. The interface would still be the same, but it would be
> much faster for sets involving perverse quantities.)

Yes, this all would be very nice to have (just as it's very convenient
that Perl grows strings, arrays and hashes automatically), but at the
moment, this is far beyond what I intended to write.

But it's an excellent idea, I'll keep it in mind!

> --
> Maybe we can fix reality one of these days. | John Macdonald
> <- Larry Wall Carl Dichter -> | jmm@Elegant.COM
> I'd be happy if we could just index it. |

The issue of this discussion was mainly how to name the module I'm currently
writing.

Any suggestions about that?

Kind regards,
--
Steffen Beyer
mailto:sb@sdm.de |s |d &|m | software design & management GmbH&Co.KG
phone: +49 89 63812-244 | | | | Thomas-Dehler-Str. 27
fax: +49 89 63812-150 | | | | 81737 Munich, Germany.

1 2  View All