Mailing List Archive

stackable ints [stupid idea (ignore) :v]
While we're talking about stacks...

I've always considered it a major shame that Python ints and floats
and chars and stuff have anything to do with dynamic allocation, and I
always suspected it might be a major speed performance boost if there
was
some way they could be manipulated without the need for dynamic
memory management.

One conceivable alternative approach would change the basic manipulation
of
objects so that instead of representing objects via pyobject pointers
everywhere
represent them using two "slots" in a structure for each object,
one of which is a type descriptor pointer and the other
being a (void *) which could contain the data directly for small objects

such as ints, floats, chars. In this case, for example, integer
addition would
never require any memory management, as it shouldn't, I think, in a
perfect
world.

IE instead of

C-stack or static: Heap:
(pyobject *) ------------> (refcount, typedescr, data ...)

in general you get

(typedescr
repr* ----------------------> (refcount, data, ...)
)

or for small objects like ints and floats and chars simply

(typedescr,
value)

with no dereferencing or memory management required. My feeling is
that common things like arithmetic and indexing lists of integers and
stuff could
be much faster under this approach since it reduces memory management
overhead and fragmentation, dereferencing, etc...

One bad thing, of course, is that this might be a drastic assault on the

way existing code works... Unless I'm just not being creative enough
with my thinking. Is this a good idea? If so, is there any way to add
it
to the interpreter without breaking extension modules and everything
else?
If Python 2.0 will break stuff anyway, would this be an good change
to the internals?

Curious... -- Aaron Watters

ps: I suppose another gotcha is "when do you do increfs/decrefs?"
because
they no longer make sense for ints in this case... maybe add a flag
to the
type descriptor "increfable" and assume that the typedescriptors are
always
in the CPU cache (?). This would slow down increfs by a couple
cycles...
Would it be worth it? Only the benchmark knows... Another fix would
be
to put the refcount in the static side with no speed penalty

(typedescr
repr* ----------------------> data
refcount
)

but would that be wasteful of space?
Re: stackable ints [stupid idea (ignore) :v] [ In reply to ]
[Aaron]
> I've always considered it a major shame that Python ints and floats
> and chars and stuff have anything to do with dynamic allocation, and
> I always suspected it might be a major speed performance boost if
> there was some way they could be manipulated without the need for
> dynamic memory management.

What you're describing is very close to what I recall I once read
about the runtime organization of Icon. Perl may also use a variant
on this (it has fixed-length object headers). On the other hand, I
believe Smalltalks typically uses something like the following ABC
trick:

In ABC, we used a variation: objects were represented by pointers as
in Python, except when the low bit was 1, in which case the remaining
31 bits were a "small int". My experience with this approach was that
it probably saved some memory, but perhaps not time (since almost all
operations on objects were slowed down by the check "is it an int?"
before the pointer could be accessed); and that because of this it was
a major hassle in keeping the implementation code correct. There was
always the temptation to make a check early in a piece of code and
then skip the check later on, which sometimes didn't work when objects
switched places. Plus in general the checks made the code less
readable, and it was just one more thing to remember to do.

The Icon approach (i.e. yours) seems to require a complete rethinking
of all object implementations and all APIs at the C level -- perhaps
we could think about it for Python 2.0. Some ramifications:

- Uses more memory for highly shared objects (there are as many copies
of the type pointer as there are references).

- Thus, lists take double the memory assuming they reference objects
that also exist elsewhere. This affects the performance of slices
etc.

- On the other hand, a list of ints takes half the memory (given that
most of those ints are not shared).

- *Homogeneous* lists (where all elements have the same type --
i.e. arrays) can be represented more efficiently by having only one
copy of the type pointer. This was an idea for ABC (whose type system
required all container types to be homogenous) that was never
implemented (because in practice the type check wasn't always applied,
and the top-level namespace used by the interactive command
interpreter violated all the rules).

- Reference count manipulations could be done by a macro (or C++
behind-the-scense magic using copy constructors and destructors) that
calls a function in the type object -- i.e. each object could decide
on its own reference counting implementation :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: stackable ints [stupid idea (ignore) :v] [ In reply to ]
>>>>> "Guido" == Guido van Rossum <guido@cnri.reston.va.us> writes:

Guido> In ABC, we used a variation: objects were represented by
Guido> pointers as in Python, except when the low bit was 1, in
Guido> which case the remaining 31 bits were a "small int".

Very similar to how Emacs Lisp manages its type system, to which
XEmacs extended. The following is from the XEmacs Internals
documentation[1]. XEmacs' object representation (on a 32 bit machine)
uses the top bit as a GC mark bit, followed by three type tag bits,
followed by a pointer or an integer:

[. 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
[. 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]

^ <---> <------------------------------------------------------>
| tag a pointer to a structure, or an integer
|
`---> mark bit

One of the 8 possible types representable by the tag bits, one is a
"record" type, which essentially allows an unlimited (well, 2^32)
number of data types.

As you might guess there are lots of interesting details and
limitations to this scheme, with lots of interesting macros in the C
code :). Reading and debugging the C implementation gets fun too
(we'll ignore for the moment all the GCPRO'ing going on -- if you
think INCREF/DECREF is trouble prone, hah!).

Whether or not this is at all relevent for Python 2.0, it all seems to
work pretty well in (X)Emacs.

>>>>> "AW" == Aaron Watters <arw@ifu.net> writes:

AW> ps: I suppose another gotcha is "when do you do
AW> increfs/decrefs?" because they no longer make sense for ints
AW> in this case... maybe add a flag to the type descriptor
AW> "increfable" and assume that the typedescriptors are always in
AW> the CPU cache (?). This would slow down increfs by a couple
AW> cycles... Would it be worth it? Only the benchmark knows...
AW> Another fix would be to put the refcount in the static side
AW> with no speed penalty

| (typedescr
| repr* ----------------------> data
| refcount
| )

AW> but would that be wasteful of space?

Once again, you can move the refcount out of the objects, a la
NextStep. Could save space and improve LOC for read-only objects.

-Barry

[1] The Internals documentation comes with XEmacs's Info
documetation. Hit:

C-h i m Internals RET m How RET
RE: stackable ints [stupid idea (ignore) :v] [ In reply to ]
Jumping in to opine that mixing tag/type bits with native pointers is a
Really Bad Idea. Put the bits on the low end and word-addressed machines
are screwed. Put the bits on the high end and you've made severe
assumptions about how the platform parcels out address space. In any case
you're stuck with ugly macros everywhere.

This technique was pioneered by Lisps, and was beautifully exploited by the
Symbolics Lisp Machine and TI Lisp Explorer hardware. Lisp people don't
want to admit those failed, so continue simulating the HW design by hand at
comparatively sluggish C speed <0.6 wink>.

BTW, I've never heard this approach argued as a speed optimization (except
in the HW implementations): software mask-test-branch around every
inc/dec-ref to exempt ints is a nasty new repeated expense. The original
motivation was to save space, and that back in the days when a 128Mb RAM
chip wasn't even conceivable, let alone under $100 <wink>.

once-wrote-a-functional-language-interpreter-in-8085-assembler-that-ran-
in-24Kb-cuz-that's-all-there-was-but-don't-feel-i-need-to-repeat-the-
experience-today-wink>-ly y'rs - tim
RE: stackable ints [stupid idea (ignore) :v] [ In reply to ]
>>>>> "TP" == Tim Peters <tim_one@email.msn.com> writes:

TP> Jumping in to opine that mixing tag/type bits with native
TP> pointers is a Really Bad Idea. Put the bits on the low end
TP> and word-addressed machines are screwed. Put the bits on the
TP> high end and you've made severe assumptions about how the
TP> platform parcels out address space. In any case you're stuck
TP> with ugly macros everywhere.

Ah, so you /have/ read the Emacs source code! I'll agree that it's
just an RBI for Emacs, but for Python, it'd be a RFSI.

TP> This technique was pioneered by Lisps, and was beautifully
TP> exploited by the Symbolics Lisp Machine and TI Lisp Explorer
TP> hardware. Lisp people don't want to admit those failed, so
TP> continue simulating the HW design by hand at comparatively
TP> sluggish C speed <0.6 wink>.

But of course, the ghosts live on at the FSF and xemacs.org (couldn't
tell ya much about how modren <sic> Lisps do it).

-Barry
RE: stackable ints [stupid idea (ignore) :v] [ In reply to ]
From: "Tim Peters" <tim_one@email.msn.com>
>Jumping in to opine that mixing tag/type bits with native pointers is a
>Really Bad Idea. Put the bits on the low end and word-addressed machines
>are screwed. Put the bits on the high end and you've made severe
>assumptions about how the platform parcels out address space. In any case
>you're stuck with ugly macros everywhere.

Agreed. Never ever mess with pointers. This mistake has been made over
and over again by each new generation of computer hardware and software
and it's still a mistake.

I thought it would be good to be able to do the following loop with Numeric
arrays

for x in array1:
array2[x] = array3[x] + array4[x]

without any memory management being involved. Right now, I think the
for loop has to continually dynamically
allocate each new x and intermediate sum
(and immediate deallocate them) and that makes the loop
piteously slow. The idea replacing pyobject *'s with a struct [typedescr *, data
*]
was a space/time tradeoff to speed up operations like the above
by eliminating any need for mallocs or other memory management..
I really can't say whether it'd be worth it or not without some sort of
real testing. Just a thought.

-- Aaron Watters
RE: stackable ints [stupid idea (ignore) :v] [ In reply to ]
On Fri, 11 Jun 1999, Aaron Watters wrote:

> I thought it would be good to be able to do the following loop with Numeric
> arrays
>
> for x in array1:
> array2[x] = array3[x] + array4[x]
>
> without any memory management being involved. Right now, I think the

FYI, I think it should be done by writing:

array2[array1] = array3[array1] + array4[array1]

and doing "the right thing" in NumPy. In other words, I don't think the
core needs to be involved.

--david

PS: I'm in the process of making the NumPy array objects ExtensionClasses,
which will make the above much easier to do.
Re: stackable ints [stupid idea (ignore) :v] [ In reply to ]
David Ascher wrote:
>
> On Fri, 11 Jun 1999, Aaron Watters wrote:
>
> > I thought it would be good to be able to do the following loop with Numeric
> > arrays
> >
> > for x in array1:
> > array2[x] = array3[x] + array4[x]
> >
> > without any memory management being involved. Right now, I think the
>
> FYI, I think it should be done by writing:
>
> array2[array1] = array3[array1] + array4[array1]
>
> and doing "the right thing" in NumPy. In other words, I don't think the
> core needs to be involved.

For NumPy, this is very ok, dealing with arrays in
an array world.

Without trying to repeat myself, I'd like
to say that I still consider it an unsolved
problem which is worth to be solved or to be
proven unsolvable:

How to do simple things in an efficient
way with many tiny Python objects, without
writing an extension, without rethinking a
problem into APL like style, and without
changing the language.

ciao - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net
10553 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
we're tired of banana software - shipped green, ripens at home
RE: stackable ints [stupid idea (ignore) :v] [ In reply to ]
[.Aaron, describes a scheme where objects are represented by a fixed-size
(typecode, variant)
pair, where if the typecode is e.g. INT or FLOAT the variant is the
value directly instead of a pointer to the value]

[Guido]
> What you're describing is very close to what I recall I once read
> about the runtime organization of Icon.

At the lowest level it's exactly what Icon does. It does *not* exempt ints
from Icon's flavor of dynamic memory management, but Icon doesn't use
refcounting -- it uses compacting mark-&-sweep across some 5 distinct
regions each with their own finer-grained policies (e.g., strings are
central to Icon and so it manages the string region a little differently;
and Icon coroutines save away pieces of the platform's C stack so need
*very* special treatment).

So:

1) There are no incref/decref expenses anywhere in Icon.

2) Because of compaction, all allocations cost the same and are dirt cheap:
just increment the appropriate region's "avail" pointer by the number of
bytes you need. If there aren't enough bytes, run GC and try again. If
there still aren't enough bytes, Icon usually shuts down (it's not good at
asking the OS for more memory! it carves up its initial memory in pretty
rigid ways, and relies on tricks like comparing storage addresses to speed
M&S and compaction -- those "regions" are in a fixed order relative to each
other, so new memory can't be tacked on to a region except at the low and
high ends).

3) All the expense is in finding and compacting live objects, so in an odd
literal sense cleaning up trash comes for free.

4) Icon has no finalizers, so it doesn't need to identify or preserve
trash -- compaction simply overwrites "the holes" where the trash used to
be.

Icon is nicely implemented, but it's a "self-contained universe" view of the
world and its memory approach makes life hard for the tiny handful of folks
who have *tried* to make it extendable via C. Icon is also purely
procedural -- no OO, no destructors, no resurrection.

Irony: one reason I picked up Python in '91 is that my int-fiddling code
was too slow in Icon! Even Python 0.9.0 ran int algorithms significantly
faster than the 10-years-refined Icon implementation of that time. Never
looked into why, but now that Aaron brought up the issue I find it very
surprising! Those algorithms had a huge rate of int trash creation, but
very few persistent objects, so Icon's M&S should have run like the wind.
And Icon's allocation is dirt-cheap (at least as fast as Python's fastest
special-purpose allocators), and didn't have any refcounting expenses
either.

There's an important lesson *somewhere* in that <wink>. Maybe it was the
fault of Icon's "goal-directed" expression evaluation, constantly asking
"did this int succeed or fail?", "did that add suceed or fail?", etc.

> ...
> The Icon approach (i.e. yours) seems to require a complete rethinking
> of all object implementations and all APIs at the C level -- perhaps
> we could think about it for Python 2.0. Some ramifications:
>
> - Uses more memory for highly shared objects (there are as many copies
> of the type pointer as there are references).

Actually more than that in Icon: if the "variant" part is a pointer, the
first word of the block it points to is also a copy of the typecode (turns
out the redundancy speeds the GC).

> - Thus, lists take double the memory assuming they reference objects
> that also exist elsewhere. This affects the performance of slices
> etc.
>
> - On the other hand, a list of ints takes half the memory (given that
> most of those ints are not shared).

Isn't this 2/3 rather than 1/2? I'm picturing a list element today as
essentially a pointer to a type object pointer + int (3 units in all), and a
type object pointer + int (2 units in all) "tomorrow". Throw in refcounts
too and the ratio likely gets closer to 1.

> - *Homogeneous* lists (where all elements have the same type --
> i.e. arrays) can be represented more efficiently by having only one
> copy of the type pointer. This was an idea for ABC (whose type system
> required all container types to be homogenous) that was never
> implemented (because in practice the type check wasn't always applied,
> and the top-level namespace used by the interactive command
> interpreter violated all the rules).

Well, Python already has homogeneous int lists (array.array), and while they
save space they suffer in speed due to needing to wrap raw ints "in an
object" upon reference and unwrap them upon storage.

> - Reference count manipulations could be done by a macro (or C++
> behind-the-scense magic using copy constructors and destructors) that
> calls a function in the type object -- i.e. each object could decide
> on its own reference counting implementation :-)

You don't need to switch representations to get that, though, right? That
is, I don't see anything stopping today's type objects from growing
__incref__ and __decref__ slots -- except for common sense <wink>.


An apparent ramification I don't see above that may actually be worth
something <wink>:

- In "i = j + k", the eval stack could contain the ints directly, instead of
pointers to the ints. So fetching the value of i takes two loads (get the
type pointer + the variant) from adjacent stack locations, instead of
today's load-the-pointer + follow-the-pointer (to some other part of
memory); similarly for fetching the value of j. Then the sum can be stored
*directly* into the stack too, without today's need for allocating and
wrapping it in "an int object" first.

Possibly happy variant: on top of the above, *don't* exempt ints from
refcounting. Let 'em incref and decref like everything else. Give them an
intial refcount of max_count/2, and in the exceedingly unlikely event a
decref on an int ever sees zero, the int "destructor" simply resets the
refcount to max_count/2 and is otherwise a nop.

semi-thinking-semi-aloud-ly y'rs - tim
RE: stackable ints [stupid idea (ignore) :v] [ In reply to ]
[Aaron Watters]
> ...
> I thought it would be good to be able to do the following loop
> with Numeric arrays
>
> for x in array1:
> array2[x] = array3[x] + array4[x]
>
> without any memory management being involved. Right now, I think the
> for loop has to continually dynamically allocate each new x

Actually not, it just binds x to the sequence of PyObject*'s already in
array1, one at a time. It does bump & drop the refcount on that object a
lot. Also irksome is that it keeps allocating/deallocating a little integer
on each trip, for the under-the-covers loop index! Marc-Andre (I think)
had/has a patch to worm around that, but IIRC it didn't make much difference
(wouldn't expect it to, though -- not if the loop body does any real work).

One thing a smarter Python compiler could do is notice the obvious <snort>:
the *internal* incref/decref operations on the object denoted by x in the
loop above must cancel out, so there's no need to do any of them.
"internal" == those due to the routine actions of the PVM itself, while
pushing and popping the eval stack. Exploiting that is tedious; e.g.,
inventing a pile of opcode variants that do the same thing as today's except
skip an incref here and a decref there.

> and intermediate sum (and immediate deallocate them)

The intermediate sum is allocated each time, but not deallocated (the
pre-existing object at array2[x] *may* be deallocated, though).

> and that makes the loop piteously slow.

A lot of things conspire to make it slow. David is certainly right that, in
this particular case, array2[array1] = array3[array1] + etc worms around the
worst of them.

> The idea replacing pyobject *'s with a struct [typedescr *, data *]
> was a space/time tradeoff to speed up operations like the above
> by eliminating any need for mallocs or other memory management..

Fleshing out details may make it look less attractive. For machines where
ints are no wider than pointers, the "data *" can be replaced with the int
directly and then there's real potential. If for a float the "data*" really
*is* a pointer, though, what does it point *at*? Some dynamically allocated
memory to hold the float appears to be the only answer, and you're right
back at the problem you were hoping to avoid.

Make the "data*" field big enough to hold a Python float directly, and the
descriptor likely zooms to 128 bits (assuming float is IEEE double and the
machine requires natural alignment).

Let's say we do that. Where does the "+" implementation get the 16 bytes it
needs to store its result? The space presumably already exists in the slot
indexed by array2[x], but the "+" implementation has no way to *know* that.
Figuring it out requires non-local analysis, which is quite a few steps
beyond what Python's compiler can do today. Easiest: internal functions
all grow a new PyDescriptor* argument into which they are to write their
result's descriptor. The PVM passes "+" the address of the slot indexed by
array2[x] if it's smart enough; or, if it's not, the address of the stack
slot descriptor into which today's PVM *would* push the result. In the
latter case the PVM would need to copy those 16 bytes into the slot indexed
by array2[x] later.

Neither of those are simple as they sound, though, at least because if
array2[x] holds a descriptor with a real pointer in its variant half, the
thing to which it points needs to get decref'ed iff the add succeeds. It
can get very messy!

> I really can't say whether it'd be worth it or not without some sort of
> real testing. Just a thought.

It's a good thought! Just hard to make real.

but-if-michael-hudson-keeps-hacking-at-bytecodes-and-christian-
keeps-trying-to-prove-he's-crazier-than-michael-by-2001-
we'll-be-able-to-generate-optimized-vector-assembler-for-
it<wink>-ly y'rs - tim
RE: stackable ints [stupid idea (ignore) :v] [ In reply to ]
[Aaron Watters]
> ...
> Another fix would be to put the refcount in the static side with
> no speed penalty
>
> (typedescr
> repr* ----------------------> data
> refcount
> )
>
> but would that be wasteful of space?

The killer is for types where repr* is a real pointer:

x = [Whatever()]
y = x[:]

Now we have two physically distinct descriptors pointing at the same thing,
and so also two distinct refcounts for that thing -- impossible to keep them
in synch efficiently; "del y" has no way efficient way to find the refcount
hiding in x.

tbings-and-and-their-refcounts-are-monogamous-ly y'rs - tim
Re: stackable ints [stupid idea (ignore) :v] [ In reply to ]
[me]
> > - Thus, lists take double the memory assuming they reference objects
> > that also exist elsewhere. This affects the performance of slices
> > etc.
> >
> > - On the other hand, a list of ints takes half the memory (given that
> > most of those ints are not shared).

[Tim]
> Isn't this 2/3 rather than 1/2? I'm picturing a list element today as
> essentially a pointer to a type object pointer + int (3 units in all), and a
> type object pointer + int (2 units in all) "tomorrow". Throw in refcounts
> too and the ratio likely gets closer to 1.

An int is currently 3 units: type, refcnt, value. (The sepcial int
allocator means that there's no malloc overhead.) A list item is one
unit. So a list of N ints is 4N units (+ overhead). In the proposed
scheme, there would be 2 units. That makes a factor of 1/2 for me...

> Well, Python already has homogeneous int lists (array.array), and while they
> save space they suffer in speed due to needing to wrap raw ints "in an
> object" upon reference and unwrap them upon storage.

Which would become faster with the proposed scheme since it would not
require any heap allocation (presuming 2-unit structs can be passed
around as function results).

> > - Reference count manipulations could be done by a macro (or C++
> > behind-the-scense magic using copy constructors and destructors) that
> > calls a function in the type object -- i.e. each object could decide
> > on its own reference counting implementation :-)
>
> You don't need to switch representations to get that, though, right? That
> is, I don't see anything stopping today's type objects from growing
> __incref__ and __decref__ slots -- except for common sense <wink>.

Eh, indeed <blush>.

> An apparent ramification I don't see above that may actually be worth
> something <wink>:
>
> - In "i = j + k", the eval stack could contain the ints directly, instead of
> pointers to the ints. So fetching the value of i takes two loads (get the
> type pointer + the variant) from adjacent stack locations, instead of
> today's load-the-pointer + follow-the-pointer (to some other part of
> memory); similarly for fetching the value of j. Then the sum can be stored
> *directly* into the stack too, without today's need for allocating and
> wrapping it in "an int object" first.

I though this was assumed all the time? I mentioned "no heap
allocation" above before I read this. I think this is the reason why
it was proposed at all: things for which the value fits in a unit
don't live on the heap at all, *without* playing tricks with pointer
representations.

> Possibly happy variant: on top of the above, *don't* exempt ints from
> refcounting. Let 'em incref and decref like everything else. Give them an
> intial refcount of max_count/2, and in the exceedingly unlikely event a
> decref on an int ever sees zero, the int "destructor" simply resets the
> refcount to max_count/2 and is otherwise a nop.

Don't get this -- there's no object on the heap to hold the refcnt.

--Guido van Rossum (home page: http://www.python.org/~guido/)
RE: [Python-Dev] stackable ints [stupid idea (ignore) :v] [ In reply to ]
[Guido]
>>> - On the other hand, a list of ints takes half the memory (given that
>>> most of those ints are not shared).

[Tim]
>> Isn't this 2/3 rather than 1/2? [yadda yadda]

[Guido]
> An int is currently 3 units: type, refcnt, value. (The sepcial int
> allocator means that there's no malloc overhead.) A list item is one
> unit. So a list of N ints is 4N units (+ overhead). In the proposed
> scheme, there would be 2 units. That makes a factor of 1/2 for me...

Well, if you count the refcount, sure <wink>.

Moving on, implies you're not contemplating making the descriptor big enough
to hold a float (else it would still be 4 units assuming natural alignment),
in turn implying that *only* ints would get the space advantage in
lists/tuples? Plus maybe special-casing the snot out of short strings?

>> Well, Python already has homogeneous int lists (array.array),
>> and while they save space they suffer in speed ...

> Which would become faster with the proposed scheme since it would not
> require any heap allocation (presuming 2-unit structs can be passed
> around as function results).

They can be in any std (even reasonable) C (or C++). If this gets serious,
though, strongly suggest timing it on important compiler + platform combos,
especially RISC. You can probably *count* on a PyObject* result getting
returned in a register, but depressed C++ compiler jockeys have been known
to treat struct/class returns via an unoptimized chain of copy constructors.
Probably better to allocate "result space" in the caller and pass that via
reference to the callee. With care, you can get the result written into its
final resting place efficiently then, more efficiently than even a gonzo
globally optimizing compiler could figure out (A calls B call C calls D, and
A can tell D exactly where to store the result if it's explicit).

>> [other ramifications for
>> "i = j + k"
>> ]

> I though this was assumed all the time?

Apparently it was! At least by you <wink>. Now by me too; no problem.

>> [refcount-on-int drivel]

> Don't get this -- there's no object on the heap to hold the refcnt.

I don't get it either. Desperation? The idea that incref/decref may need
to be treated as virtual methods (in order to exempt ints or other possible
direct values) really disturbs me -- incref/decref happen *all* the time,
explicit integer ops only some of the time. Turning incref/decref into
indirected function calls doesn't sound promising at all. Injecting a
test-branch guard via macro sounds faster but still icky, and especially if
the set of exempt types isn't a singleton.

no-positive-suggestions-just-grousing-ly y'rs - tim
RE: stackable ints [stupid idea (ignore) :v] [ In reply to ]
> no-positive-suggestions-just-grousing-ly y'rs - tim

On the contrary. I think this is definitively a bad idea. Retracted.
A double negative is a positive.
-- Aaron Watters

===

"Criticism serves the same purpose as pain. It's not pleasant
but it suggests that something is wrong."
-- Churchill (paraphrased from memory)
RE: stackable ints [stupid idea (ignore) :v] [ In reply to ]
Backtracking:

[Aaron]
> I've always considered it a major shame that Python ints and floats
> and chars and stuff have anything to do with dynamic allocation ...

[Guido]
> What you're describing is very close to what I recall I once read
> about the runtime organization of Icon. Perl may also use a variant
> on this (it has fixed-length object headers). ...

I've rarely been able to make sense of Perl's source code, but gave it
another try anyway. An hour later I gave up unenlightened, so cruised the
web. Turns out there's a *terrific* writeup of Perl's type representation
at:

http://home.sol.no/~aas/perl/guts/

Pictures and everything <wink>. Header is 3 words: An 8-bit "type" field,
24 baffling flag bits (e.g., flag #14 is "BREAK -- refcnt is artificially
low"(!)), 32 refcount bits, and a 32-bit pointer field.

Appears that the pointer field is always a real (although possibly NULL)
pointer. Plain ints have type code SvIV, and the pointer then points to a
bogus address, but where that address + 3 words points to the actual integer
value. Why? Because then they can use the same offset to get to the int as
when the type is SvPVIV, which is the combined string/integer type, and
needs three words (to point to the string start address, current len and
allocated len) in addition to the integer value at the end. So why is the
integer value at the end? So the same offsets work for the SvPV type, which
is solely a string descriptor. So why is it important that SvPVIV, SvPV and
SvIV all have the same layout? So that either of the latter types can be
dynamically "upgraded" to SvPVIV (when a string is converted to int or vice
versa; Perl then holds on to both representations internally) by plugging in
a new type code and fiddling some of the baffling flag bits.

Brr. I have no idea how they manage to keep Perl running!

and-not-entirely-sure-that-they-do-ly y'rs - tim