I've kept perl5-porters only because of the first item -
otherwise we've gone far enough off topic that Steffen and I can
go to private email.
Steffen Beyer wrote :
|| The issue of this discussion was mainly how to name the module I'm currently
|| writing.
Set::PosInteger might work well. I'm not sure whether we need
to use Fast in the name - I think that an implementation could
be done that was capable of handling sets of integers with
unlimited sparsity, bounded only by machine memory (and machine
integer size unless it went to BigInts), that did not sacrifice
speed for small sets - such an implementation could just
supercede this one, rather than needing a new name. If there is
enough penalty with such an implementation that both would be
retained, then the second could be called Slow (or Unbounded) as
a way of distinguishing them.
The rest of this is basically to Steffen.
|| > || > Someone (Mark Biggar?) mentioned using perl bitvectors and
|| > || > getting an all-perl implementation - I suspect that it could be
|| > || > done fairly easily.
|| > ||
|| > || How do you calculate the minimum, maximum and norm of a set in Perl
|| > || efficiently?
||
|| > That's up to the implementation. If the interface provides a
|| > means of asking for these (as yours does) and the implementor
|| > believes that it is important for them to be implemented
|| > efficiently, then he can internally maintain their values.
||
|| Well, you have a lot of overhead THERE! :-)
Only if you always track it perfectly. I'd keep a flag for
whether the limit was known to be currently in the set. When
adding a new element to the set, if it is less than the lower
limit, then remember it as a known lower limit. When deleting
an element, if it is currently the lower limit, flag the lower
limit as unknown. When a request for the lowest element is
processed, the value is either known, or else a bound on it is
known and you can search up the bitvector from there (possibly
deleting the leading chunk of the bitvector if there is a big
gap to the actual first element).
|| > || > If you kept the bitvector, a start index
|| > || > and an end index, and a positive/negative flag, then you could
|| > || > deal with any origin and any finite set of integers (not just
|| > || > based at zero). The pos/neg flag denotes whether values outside
|| > || > the start/end range are considered to be included or not. Using
|| > || > the indices means that you don't have a lot of wasted space and
|| > || > computational overhead if you are dealing in small sets of
|| > || > integers that are not close to zero, e.g. numbers between 1900
|| > || > and 2000 (year numbers).
|| > ||
|| > || You would have the computational overhead of subtracting/adding 1900
|| > || (in your example) in either case - either YOU do it in your application,
|| > || or the set MODULE does it internally. So I don't see the point for
|| > || making the module's interface more complicated.
||
|| > This is not interface, it is implemetation. The user would say
|| > insert(1950) and the implementation would check whether it was
|| > within the range of the current bitvector, and if so would
|| > subtract the base of the bitvector and set the bit; if not, it
|| > would extend the appropriate end of the bitvector as needed and
|| > then set the bit.
||
|| Oh, now I understand! I thought you meant to create the set with some
|| arbitrary range of values (i.e., Create(1900,2000)) and then to have
|| the module handle the rest (I didn't quite understand what you wanted
|| to use the "-1" flag for, but you would have to pass that to the module
|| as well and so enlarge (= complicate) the interface!)
The pos/neg flag allows you to have infinitely large sets. (All
integers *except* the ones in the bitvector.) To fully use
this, you'd want to have lowest_included and lowest_excluded
(one of which would always be -inf) rather than just lowest.
|| > One implementation could choose to always use
|| > origin 0 bitvectors (as you do), while another could save space
|| > and allow negative elements by keeping a start index for the
|| > bitvector. (I'd keep the bitvector origin and length separate
|| > from any internal cache of lowest and highest elements, so that
|| > the bitvector could be kept aligned on a nice boundary (multiple
|| > of 64 perhaps) so that bitvector operations on two sets would
|| > not require any shifting.)
||
|| > The computational overhead of manipulating long strings of zero
|| > bits can be significant. (In fact, it might be nice to have an
|| > implementation that uses lists of integers for sparse sets, and
|| > then flows to lists of bitvectors for sparse sets that have some
|| > clustering, and concatenates the bitvectors as the set becomes
|| > denser. The interface would still be the same, but it would be
|| > much faster for sets involving perverse quantities.)
||
|| Yes, this all would be very nice to have (just as it's very convenient
|| that Perl grows strings, arrays and hashes automatically), but at the
|| moment, this is far beyond what I intended to write.
||
|| But it's an excellent idea, I'll keep it in mind!
A barrel of rount tuits would be nice...
|| Any suggestions about that?
||
|| Kind regards,
--
Maybe we can fix reality one of these days. | John Macdonald
<- Larry Wall Carl Dichter -> | jmm@Elegant.COM
I'd be happy if we could just index it. |
otherwise we've gone far enough off topic that Steffen and I can
go to private email.
Steffen Beyer wrote :
|| The issue of this discussion was mainly how to name the module I'm currently
|| writing.
Set::PosInteger might work well. I'm not sure whether we need
to use Fast in the name - I think that an implementation could
be done that was capable of handling sets of integers with
unlimited sparsity, bounded only by machine memory (and machine
integer size unless it went to BigInts), that did not sacrifice
speed for small sets - such an implementation could just
supercede this one, rather than needing a new name. If there is
enough penalty with such an implementation that both would be
retained, then the second could be called Slow (or Unbounded) as
a way of distinguishing them.
The rest of this is basically to Steffen.
|| > || > Someone (Mark Biggar?) mentioned using perl bitvectors and
|| > || > getting an all-perl implementation - I suspect that it could be
|| > || > done fairly easily.
|| > ||
|| > || How do you calculate the minimum, maximum and norm of a set in Perl
|| > || efficiently?
||
|| > That's up to the implementation. If the interface provides a
|| > means of asking for these (as yours does) and the implementor
|| > believes that it is important for them to be implemented
|| > efficiently, then he can internally maintain their values.
||
|| Well, you have a lot of overhead THERE! :-)
Only if you always track it perfectly. I'd keep a flag for
whether the limit was known to be currently in the set. When
adding a new element to the set, if it is less than the lower
limit, then remember it as a known lower limit. When deleting
an element, if it is currently the lower limit, flag the lower
limit as unknown. When a request for the lowest element is
processed, the value is either known, or else a bound on it is
known and you can search up the bitvector from there (possibly
deleting the leading chunk of the bitvector if there is a big
gap to the actual first element).
|| > || > If you kept the bitvector, a start index
|| > || > and an end index, and a positive/negative flag, then you could
|| > || > deal with any origin and any finite set of integers (not just
|| > || > based at zero). The pos/neg flag denotes whether values outside
|| > || > the start/end range are considered to be included or not. Using
|| > || > the indices means that you don't have a lot of wasted space and
|| > || > computational overhead if you are dealing in small sets of
|| > || > integers that are not close to zero, e.g. numbers between 1900
|| > || > and 2000 (year numbers).
|| > ||
|| > || You would have the computational overhead of subtracting/adding 1900
|| > || (in your example) in either case - either YOU do it in your application,
|| > || or the set MODULE does it internally. So I don't see the point for
|| > || making the module's interface more complicated.
||
|| > This is not interface, it is implemetation. The user would say
|| > insert(1950) and the implementation would check whether it was
|| > within the range of the current bitvector, and if so would
|| > subtract the base of the bitvector and set the bit; if not, it
|| > would extend the appropriate end of the bitvector as needed and
|| > then set the bit.
||
|| Oh, now I understand! I thought you meant to create the set with some
|| arbitrary range of values (i.e., Create(1900,2000)) and then to have
|| the module handle the rest (I didn't quite understand what you wanted
|| to use the "-1" flag for, but you would have to pass that to the module
|| as well and so enlarge (= complicate) the interface!)
The pos/neg flag allows you to have infinitely large sets. (All
integers *except* the ones in the bitvector.) To fully use
this, you'd want to have lowest_included and lowest_excluded
(one of which would always be -inf) rather than just lowest.
|| > One implementation could choose to always use
|| > origin 0 bitvectors (as you do), while another could save space
|| > and allow negative elements by keeping a start index for the
|| > bitvector. (I'd keep the bitvector origin and length separate
|| > from any internal cache of lowest and highest elements, so that
|| > the bitvector could be kept aligned on a nice boundary (multiple
|| > of 64 perhaps) so that bitvector operations on two sets would
|| > not require any shifting.)
||
|| > The computational overhead of manipulating long strings of zero
|| > bits can be significant. (In fact, it might be nice to have an
|| > implementation that uses lists of integers for sparse sets, and
|| > then flows to lists of bitvectors for sparse sets that have some
|| > clustering, and concatenates the bitvectors as the set becomes
|| > denser. The interface would still be the same, but it would be
|| > much faster for sets involving perverse quantities.)
||
|| Yes, this all would be very nice to have (just as it's very convenient
|| that Perl grows strings, arrays and hashes automatically), but at the
|| moment, this is far beyond what I intended to write.
||
|| But it's an excellent idea, I'll keep it in mind!
A barrel of rount tuits would be nice...
|| Any suggestions about that?
||
|| Kind regards,
--
Maybe we can fix reality one of these days. | John Macdonald
<- Larry Wall Carl Dichter -> | jmm@Elegant.COM
I'd be happy if we could just index it. |