Mailing List Archive

about line numbers
Just curious:

Is python with vs. without "-O" equivalent today regarding line numbers?
Are SET_LINENO opcodes a plus in some situations or not?

Next, I see quite often several SET_LINENO in a row in the beginning
of code objects due to doc strings, etc. Since I don't think that
folding them into one SET_LINENO would be an optimisation (it would
rather be avoiding the redundancy), is it possible and/or reasonable
to do something in this direction?

A trivial example:

>>> def f():
... "This is a comment about f"
... a = 1
...
>>> import dis
>>> dis.dis(f)
0 SET_LINENO 1

3 SET_LINENO 2

6 SET_LINENO 3
9 LOAD_CONST 1 (1)
12 STORE_FAST 0 (a)
15 LOAD_CONST 2 (None)
18 RETURN_VALUE
>>>

Can the above become something like this instead:

0 SET_LINENO 3
3 LOAD_CONST 1 (1)
6 STORE_FAST 0 (a)
9 LOAD_CONST 2 (None)
12 RETURN_VALUE


--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
Re: about line numbers [ In reply to ]
The only possible problem I can see with folding linenumbers is if
someone sets a breakpoint on such a line. And I think it'll be
difficult to explain the missing line numbers to pdb, so there isn't
an easy workaround (at least, it takes more than my 30 seconds of
brainpoewr to come up with one:-).
--
Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm
Re: about line numbers [ In reply to ]
Jack Jansen wrote:
>
>
> The only possible problem I can see with folding linenumbers is if
> someone sets a breakpoint on such a line. And I think it'll be
> difficult to explain the missing line numbers to pdb, so there isn't
> an easy workaround (at least, it takes more than my 30 seconds of
> brainpoewr to come up with one:-).
>

Eek! We can set a breakpoint on a doc string? :-) There's no code
in there. It should be treated as a comment by pdb. I can't set a
breakpoint on a comment line even in C ;-) There must be something
deeper about it...

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
RE: about line numbers [ In reply to ]
[Vladimir Marangozov]
> Is python with vs. without "-O" equivalent today regarding
> line numbers?
>
> Are SET_LINENO opcodes a plus in some situations or not?

In theory it should make no difference, except that the trace mechanism
makes a callback on each SET_LINENO, and that's how the debugger implements
line-number breakpoints. Under -O, there are no SET_LINENOs, so debugger
line-number breakpoints don't work under -O.

I think there's also a sporadic buglet, which I've never bothered to track
down: sometimes a line number reported in a traceback under -O (&, IIRC,
it's always the topmost line number) comes out as a senseless negative
value.

> Next, I see quite often several SET_LINENO in a row in the beginning
> of code objects due to doc strings, etc. Since I don't think that
> folding them into one SET_LINENO would be an optimisation (it would
> rather be avoiding the redundancy), is it possible and/or reasonable
> to do something in this direction?

All opcodes consume time, although a wasted trip or two around the eval loop
at the start of a function isn't worth much effort to avoid. Still, it's a
legitimate opportunity for provable speedup, even if unmeasurable speedup
<wink>.

Would be more valuable to rethink the debugger's breakpoint approach so that
SET_LINENO is never needed (line-triggered callbacks are expensive because
called so frequently, turning each dynamic SET_LINENO into a full-blown
Python call; if I used the debugger often enough to care <wink>, I'd think
about munging in a new opcode to make breakpoint sites explicit).

immutability-is-made-to-be-violated-ly y'rs - tim
Re: about line numbers [ In reply to ]
Tim Peters wrote:
>
> [.Vladimir Marangozov, *almost* seems ready to give up on a counter-
> productive dict pessimization <wink>]
>

Of course I will! Now everything is perfectly clear. Thanks.

> ...
> So, that's why <wink>.
>

Now, *this* one explanation of yours should go into a HowTo/BecauseOf
for developers. I timed your scripts and a couple of mine which attest
(again) the validity of the current implementation. My patch is out of
bounds. It even disturbs from time to time the existing harmony in the
results ;-) because of early resizing.

All in all, for performance reasons, dicts remain an exception to the
rule of releasing memory ASAP. They have been designed to tolerate caching
because of their dynamics, which is the main reason for the rare case
addressed by my patch.

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
Re: about line numbers [ In reply to ]
Tim Peters wrote:
>
> Would be more valuable to rethink the debugger's breakpoint approach so that
> SET_LINENO is never needed (line-triggered callbacks are expensive because
> called so frequently, turning each dynamic SET_LINENO into a full-blown
> Python call; if I used the debugger often enough to care <wink>, I'd think
> about munging in a new opcode to make breakpoint sites explicit).
>
> immutability-is-made-to-be-violated-ly y'rs - tim
>

Could you elaborate a bit more on this? Do you mean setting breakpoints
on a per opcode basis (for example by exchanging the original opcode
with a new BREAKPOINT opcode in the code object) and use the lineno tab
for breakpoints based on the source listing?

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
RE: about line numbers [ In reply to ]
[Vladimir Marangozov]
> Could you elaborate a bit more on this?

No time for this now -- sorry.

> Do you mean setting breakpoints on a per opcode basis (for example
> by exchanging the original opcode with a new BREAKPOINT opcode in
> the code object) and use the lineno tab for breakpoints based on
> the source listing?

Something like that. The classic way to implement positional breakpoints is
to perturb the code; the classic problem is how to get back the effect of
the code that was overwritten.
about line numbers [ In reply to ]
[Tim, in an earlier msg]
>
> Would be more valuable to rethink the debugger's breakpoint approach so that
> SET_LINENO is never needed (line-triggered callbacks are expensive because
> called so frequently, turning each dynamic SET_LINENO into a full-blown
> Python call;

Ok. In the meantime I think that folding the redundant SET_LINENO doesn't
hurt. I ended up with a patchlet that seems to have no side effects, that
updates the lnotab as it should and that even makes pdb a bit more clever,
IMHO.

Consider an extreme case for the function f (listed below). Currently,
we get the following:

-------------------------------------------
>>> from test import f
>>> import dis, pdb
>>> dis.dis(f)
0 SET_LINENO 1

3 SET_LINENO 2

6 SET_LINENO 3

9 SET_LINENO 4

12 SET_LINENO 5
15 LOAD_CONST 1 (1)
18 STORE_FAST 0 (a)

21 SET_LINENO 6

24 SET_LINENO 7

27 SET_LINENO 8
30 LOAD_CONST 2 (None)
33 RETURN_VALUE
>>> pdb.runcall(f)
> test.py(1)f()
-> def f():
(Pdb) list 1, 20
1 -> def f():
2 """Comment about f"""
3 """Another one"""
4 """A third one"""
5 a = 1
6 """Forth"""
7 "and pdb can set a breakpoint on this one (simple quotes)"
8 """but it's intelligent about triple quotes..."""
[EOF]
(Pdb) step
> test.py(2)f()
-> """Comment about f"""
(Pdb) step
> test.py(3)f()
-> """Another one"""
(Pdb) step
> test.py(4)f()
-> """A third one"""
(Pdb) step
> test.py(5)f()
-> a = 1
(Pdb) step
> test.py(6)f()
-> """Forth"""
(Pdb) step
> test.py(7)f()
-> "and pdb can set a breakpoint on this one (simple quotes)"
(Pdb) step
> test.py(8)f()
-> """but it's intelligent about triple quotes..."""
(Pdb) step
--Return--
> test.py(8)f()->None
-> """but it's intelligent about triple quotes..."""
(Pdb)
>>>
-------------------------------------------

With folded SET_LINENO, we have this:

-------------------------------------------
>>> from test import f
>>> import dis, pdb
>>> dis.dis(f)
0 SET_LINENO 5
3 LOAD_CONST 1 (1)
6 STORE_FAST 0 (a)

9 SET_LINENO 8
12 LOAD_CONST 2 (None)
15 RETURN_VALUE
>>> pdb.runcall(f)
> test.py(5)f()
-> a = 1
(Pdb) list 1, 20
1 def f():
2 """Comment about f"""
3 """Another one"""
4 """A third one"""
5 -> a = 1
6 """Forth"""
7 "and pdb can set a breakpoint on this one (simple quotes)"
8 """but it's intelligent about triple quotes..."""
[EOF]
(Pdb) break 7
Breakpoint 1 at test.py:7
(Pdb) break 8
*** Blank or comment
(Pdb) step
> test.py(8)f()
-> """but it's intelligent about triple quotes..."""
(Pdb) step
--Return--
> test.py(8)f()->None
-> """but it's intelligent about triple quotes..."""
(Pdb)
>>>
-------------------------------------------

i.e, pdb stops at (points to) the first real instruction and doesn't step
trough the doc strings.

Or is there something I'm missing here?

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

-------------------------------[ cut here ]---------------------------
*** compile.c-orig Thu Aug 19 19:27:13 1999
--- compile.c Thu Aug 19 19:00:31 1999
***************
*** 615,620 ****
--- 615,623 ----
int arg;
{
if (op == SET_LINENO) {
+ if (!Py_OptimizeFlag && c->c_last_addr == c->c_nexti - 3)
+ /* Hack for folding several SET_LINENO in a row. */
+ c->c_nexti -= 3;
com_set_lineno(c, arg);
if (Py_OptimizeFlag)
return;
Re: about line numbers [ In reply to ]
Earlier, you argued that this is "not an optimization," but rather
avoiding redundancy. I should have responded right then that I
disagree, or at least I'm lukewarm about your patch. Either you're
not using -O, and then you don't care much about this; or you care,
and then you should be using -O.

Rather than encrusting the code with more and more ad-hoc micro
optimizations, I'd prefer to have someone look into Tim's suggestion
of supporting more efficient breakpoints...

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: about line numbers [ In reply to ]
Guido van Rossum wrote:
>
> Earlier, you argued that this is "not an optimization," but rather
> avoiding redundancy.

I haven't argued so much; I asked whether this would be reasonable.

Probably I should have said that I don't see the purpose of emitting
SET_LINENO instructions for those nodes for which the compiler
generates no code, mainly because (as I learned subsequently) SET_LINENO
serve no other purpose but debugging. As I haven't payed much attention to
this aspect of the code, I thought thay they might still be used for
tracebacks. But I couldn't have said that because I didn't know it.

> I should have responded right then that I disagree, ...

Although I agree this is a minor issue, I'm interested in your argument
here, if it's something else than the dialectic: "we're more interested
in long term improvements" which is also my opinion.

> ... or at least I'm lukewarm about your patch.

No surprise here :-) But I haven't found another way of not generating
SET_LINENO for doc strings other than backpatching.

> Either you're
> not using -O, and then you don't care much about this; or you care,
> and then you should be using -O.

Neither of those. I don't really care, frankly. I was just intrigued by
the consecutive SET_LINENO in my disassemblies, so I started to think
and ask questions about it.

>
> Rather than encrusting the code with more and more ad-hoc micro
> optimizations, I'd prefer to have someone look into Tim's suggestion
> of supporting more efficient breakpoints...

This is *the* real issue with the real potential solution. I'm willing
to have a look at this (although I don't know pdb/bdb in its finest
details). All suggestions and thoughts are welcome.

We would probably leave the SET_LINENO opcode as is and (eventually)
introduce a new opcode (instead of transforming/renaming it) for
compatibility reasons, methinks.

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
Re: about line numbers [ In reply to ]
Guido van Rossum wrote:
>
> Earlier, you argued that this is "not an optimization," but rather
> avoiding redundancy. I should have responded right then that I
> disagree, or at least I'm lukewarm about your patch. Either you're
> not using -O, and then you don't care much about this; or you care,
> and then you should be using -O.
>
> Rather than encrusting the code with more and more ad-hoc micro
> optimizations, I'd prefer to have someone look into Tim's suggestion
> of supporting more efficient breakpoints...

I didn't think of this before, but I just realized that
I have something like that already in Stackless Python.
It is possible to set a breakpoint at every opcode, for every
frame. Adding an extra opcode for breakpoints is a good thing
as well. The former are good for tracing, conditionla breakpoints
and such, and cost a little more time since the is always one extra
function call. The latter would be a quick, less versatile thing.

The implementation of inserting extra breakpoint opcodes for
running code turns out to be easy to implement, if the running
frame gets a local extra copy of its code object, with the
breakpoints replacing the original opcodes. The breakpoint handler
would then simply look into the original code object.

Inserting breakpoints on the source level gives us breakpoints
per procedure. Doing it in a running frame gives "instance" level
debugging of code. Checking a monitor function on every opcode
is slightly more expensive but most general.
We can have it all, what do you think.
I'm going to finish and publish the stackless/continous package
and submit a paper by end of September. Should I include
this debugging feature?

ciao - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net
10553 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
we're tired of banana software - shipped green, ripens at home
Re: about line numbers [ In reply to ]
> I'll try to sketch here the scheme I'm thinking of for the
> callback/breakpoint issue (without SET_LINENO), although some
> technical details are still missing.
>
> I'm assuming the following, in this order:
>
> 1) No radical changes in the current behavior, i.e. preserve the
> current architecture / strategy as much as possible.
>
> 2) We dont have breakpoints per opcode, but per source line. For that
> matter, we have sys.settrace (and for now, we don't aim to have
> sys.settracei that would be called on every opcode, although we might
> want this in the future)
>
> 3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints,
> used for callbacks from C to Python. So the basic problem is to generate
> these callbacks.

They used to be the only mechanism by which the traceback code knew
the current line number (long before the debugger hooks existed), but
with the lnotab, that's no longer necessary.

> If any of the above is not an appropriate assumption and we want a radical
> change in the strategy of setting breakpoints/ generating callbacks, then
> this post is invalid.

Sounds reasonable.

> The solution I'm thinking of:
>
> a) Currently, we have a function PyCode_Addr2Line which computes the source
> line from the opcode's address. I hereby assume that we can write the
> reverse function PyCode_Line2Addr that returns the address from a given
> source line number. I don't have the implementation, but it should be
> doable. Furthermore, we can compute, having the co_lnotab table and
> co_firstlineno, the source line range for a code object.
>
> As a consequence, even with the dumbiest of all algorithms, by looping
> trough this source line range, we can enumerate with PyCode_Line2Addr
> the sequence of addresses for the source lines of this code object.
>
> b) As Chris pointed out, in case sys.settrace is defined, we can allocate
> and keep a copy of the original code string per frame. We can further
> dynamically overwrite the original code string with a new (internal,
> one byte) CALL_TRACE opcode at the addresses we have enumerated in a).
>
> The CALL_TRACE opcodes will trigger the callbacks from C to Python,
> just as the current SET_LINENO does.
>
> c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger
> the callback and if it returns successfully, we'll fetch the original
> opcode for the current location from the copy of the original co_code.
> Then we directly jump to the arg fetch code (or in case we fetch the
> entire original opcode in CALL_TRACE - we jump to the dispatch code).

Tricky, but doable.

> Hmm. I think that's all.
>
> At the heart of this scheme is the PyCode_Line2Addr function, which is
> the only blob in my head, for now.

I'm pretty sure that this would be straightforward.

I'm a little anxious about modifying the code, and was thinking myself
of a way to specify a bitvector of addresses where to break. But that
would still cause some overhead for code without breakpoints, so I
guess you're right (and it's certainly a long-standing tradition in
breakpoint-setting!)

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: about line numbers [ In reply to ]
I'll try to sketch here the scheme I'm thinking of for the
callback/breakpoint issue (without SET_LINENO), although some
technical details are still missing.

I'm assuming the following, in this order:

1) No radical changes in the current behavior, i.e. preserve the
current architecture / strategy as much as possible.

2) We dont have breakpoints per opcode, but per source line. For that
matter, we have sys.settrace (and for now, we don't aim to have
sys.settracei that would be called on every opcode, although we might
want this in the future)

3) SET_LINENO disappear. Actually, SET_LINENO are conditional breakpoints,
used for callbacks from C to Python. So the basic problem is to generate
these callbacks.

If any of the above is not an appropriate assumption and we want a radical
change in the strategy of setting breakpoints/ generating callbacks, then
this post is invalid.

The solution I'm thinking of:

a) Currently, we have a function PyCode_Addr2Line which computes the source
line from the opcode's address. I hereby assume that we can write the
reverse function PyCode_Line2Addr that returns the address from a given
source line number. I don't have the implementation, but it should be
doable. Furthermore, we can compute, having the co_lnotab table and
co_firstlineno, the source line range for a code object.

As a consequence, even with the dumbiest of all algorithms, by looping
trough this source line range, we can enumerate with PyCode_Line2Addr
the sequence of addresses for the source lines of this code object.

b) As Chris pointed out, in case sys.settrace is defined, we can allocate
and keep a copy of the original code string per frame. We can further
dynamically overwrite the original code string with a new (internal,
one byte) CALL_TRACE opcode at the addresses we have enumerated in a).

The CALL_TRACE opcodes will trigger the callbacks from C to Python,
just as the current SET_LINENO does.

c) At execution time, whenever a CALL_TRACE opcode is reached, we trigger
the callback and if it returns successfully, we'll fetch the original
opcode for the current location from the copy of the original co_code.
Then we directly jump to the arg fetch code (or in case we fetch the
entire original opcode in CALL_TRACE - we jump to the dispatch code).


Hmm. I think that's all.

At the heart of this scheme is the PyCode_Line2Addr function, which is
the only blob in my head, for now.

Christian Tismer wrote:
>
> I didn't think of this before, but I just realized that
> I have something like that already in Stackless Python.
> It is possible to set a breakpoint at every opcode, for every
> frame. Adding an extra opcode for breakpoints is a good thing
> as well. The former are good for tracing, conditionla breakpoints
> and such, and cost a little more time since the is always one extra
> function call. The latter would be a quick, less versatile thing.

I don't think I understand clearly the difference you're talking about,
and why the one thing is better that the other, probably because I'm a
bit far from stackless python.

> I'm going to finish and publish the stackless/continous package
> and submit a paper by end of September. Should I include this debugging
> feature?

Write the paper first, you have more than enough material to talk about
already ;-). Then if you have time to implement some debugging support,
you could always add another section, but it won't be a central point
of your paper.

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
Re: about line numbers [ In reply to ]
Guido van Rossum wrote:
>
>
> I'm a little anxious about modifying the code, and was thinking myself
> of a way to specify a bitvector of addresses where to break. But that
> would still cause some overhead for code without breakpoints, so I
> guess you're right (and it's certainly a long-standing tradition in
> breakpoint-setting!)
>

Hm. You're probably right, especially if someone wants to inspect
a code object from the debugger or something. But I belive, that
we can manage to redirect the instruction pointer in the beginning
of eval_code2 to the *copy* of co_code, and modify the copy with
CALL_TRACE, preserving the original intact.

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
Re: about line numbers [ In reply to ]
[me]
>
> Guido van Rossum wrote:
> >
> >
> > I'm a little anxious about modifying the code, and was thinking myself
> > of a way to specify a bitvector of addresses where to break. But that
> > would still cause some overhead for code without breakpoints, so I
> > guess you're right (and it's certainly a long-standing tradition in
> > breakpoint-setting!)
> >
>
> Hm. You're probably right, especially if someone wants to inspect
> a code object from the debugger or something. But I belive, that
> we can manage to redirect the instruction pointer in the beginning
> of eval_code2 to the *copy* of co_code, and modify the copy with
> CALL_TRACE, preserving the original intact.
>

I wrote a very rough first implementation of this idea. The files are at:

http://sirac.inrialpes.fr/~marangoz/python/lineno/


Basically, what I did is:

1) what I said :-)
2) No more SET_LINENO
3) In tracing mode, a copy of the original code is put in an additional
slot (co_tracecode) of the code object. Then it's overwritten with
CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr.

The VM is routed to execute this code, and not the original one.

4) When tracing is off (i.e. sys_tracefunc is NULL) the VM fallbacks to
normal execution of the original code.


A couple of things that need finalization:

a) how to deallocate the modified code string when tracing is off
b) the value of CALL_TRACE (I almost randomly picked 76)
c) I don't handle the cases where sys_tracefunc is enabled or disabled
within the same code object. Tracing or not is determined before
the main loop.
d) update pdb, so that it does not allow setting breakpoints on lines with
no code. To achieve this, I think that python versions of PyCode_Addr2Line
& PyCode_Line2Addr have to be integrated into pdb as helper functions.
e) correct bugs and design flaws
f) something else?


And here's the sample session of my lousy function f with this
'proof of concept' code:

>>> from test import f
>>> import dis, pdb
>>> dis.dis(f)
0 LOAD_CONST 1 (1)
3 STORE_FAST 0 (a)
6 LOAD_CONST 2 (None)
9 RETURN_VALUE
>>> pdb.runcall(f)
> test.py(5)f()
-> a = 1
(Pdb) list 1, 10
1 def f():
2 """Comment about f"""
3 """Another one"""
4 """A third one"""
5 -> a = 1
6 """Forth"""
7 "and pdb can set a breakpoint on this one (simple quotes)"
8 """but it's intelligent about triple quotes..."""
[EOF]
(Pdb) step
> test.py(8)f()
-> """but it's intelligent about triple quotes..."""
(Pdb) step
--Return--
> test.py(8)f()->None
-> """but it's intelligent about triple quotes..."""
(Pdb)
>>>

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
Re: about line numbers [ In reply to ]
Vladimir Marangozov wrote:
...
> I wrote a very rough first implementation of this idea. The files are at:
>
> http://sirac.inrialpes.fr/~marangoz/python/lineno/
>
> Basically, what I did is:
>
> 1) what I said :-)
> 2) No more SET_LINENO
> 3) In tracing mode, a copy of the original code is put in an additional
> slot (co_tracecode) of the code object. Then it's overwritten with
> CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr.

I'd rather keep the original code object as it is, create a copy
with inserted breakpoints and put that into the frame slot.
Pointing back to the original from there.

Then I'd redirect the code from the CALL_TRACE opcode completely
to a user-defined function.
Getting rid of the extra code object would be done by this function
when tracing is off. It also vanishes automatically when the frame
is released.

> a) how to deallocate the modified code string when tracing is off

By making the copy a frame property which is temporary, I think.
Or, if tracing should work for all frames, by pushing the original
in the back of the modified. Both works.

ciao - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net
10553 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
we're tired of banana software - shipped green, ripens at home
Re: about line numbers [ In reply to ]
Vladimir Marangozov wrote:
>
> Chris, could you please repeat that step by step in more detail?
> I'm not sure I understand your suggestions.

I think I was too quick. I thought of copying the whole
code object, of course.

...
> > I'd rather keep the original code object as it is, create a copy
> > with inserted breakpoints and put that into the frame slot.
>
> You seem to suggest to duplicate the entire code object, right?
> And reference the modified duplicata from the current frame?

Yes.

> I actually duplicate only the opcode string (that is, the co_code string
> object) and I don't see the point of duplicating the entire code object.
>
> Keeping a reference from the current frame makes sense, but won't it
> deallocate the modified version on every frame release (then redo all the
> code duplication work for every frame) ?

You get two options by that.
1) permanently modifying one code object to be traceable is
pushing a copy of the original "behind" by means of some
co_back pointer. This keeps the patched one where the
original was, and makes a global debugging version.

2) Creating a copy for one frame, and putting the original
in to an co_back pointer. This gives debugging just
for this one frame.

...
> > Then I'd redirect the code from the CALL_TRACE opcode completely
> > to a user-defined function.
>
> What user-defined function? I don't understand that either...
> Except the sys_tracefunc, what other (user-defined) function do we have here?
> Is it a Python or a C function?

I would suggest a Python function, of course.

> > Getting rid of the extra code object would be done by this function
> > when tracing is off.
>
> How exactly? This seems to be obvious for you, but obviously, not for me ;-)

If the permanent tracing "1)" is used, just restore the code object's
contents from the original in co_back, and drop co_back.
In the "2)" version, just pull the co_back into the frame's code pointer
and loose the reference to the copy. Occours automatically on frame
release.

> > It also vanishes automatically when the frame is released.
>
> The function or the extra code object?

The extra code object.

...
> I'm confused. I didn't understand your idea.

Forget it, it isn't more than another brain fart :-)

ciao - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101 : *Starship* http://starship.python.net
10553 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
we're tired of banana software - shipped green, ripens at home
Re: about line numbers [ In reply to ]
Chris, could you please repeat that step by step in more detail?
I'm not sure I understand your suggestions.

Christian Tismer wrote:
>
> Vladimir Marangozov wrote:
> ...
> > I wrote a very rough first implementation of this idea. The files are at:
> >
> > http://sirac.inrialpes.fr/~marangoz/python/lineno/
> >
> > Basically, what I did is:
> >
> > 1) what I said :-)
> > 2) No more SET_LINENO
> > 3) In tracing mode, a copy of the original code is put in an additional
> > slot (co_tracecode) of the code object. Then it's overwritten with
> > CALL_TRACE opcodes at the locations returned by PyCode_Line2Addr.
>
> I'd rather keep the original code object as it is, create a copy
> with inserted breakpoints and put that into the frame slot.

You seem to suggest to duplicate the entire code object, right?
And reference the modified duplicata from the current frame?

I actually duplicate only the opcode string (that is, the co_code string
object) and I don't see the point of duplicating the entire code object.

Keeping a reference from the current frame makes sense, but won't it
deallocate the modified version on every frame release (then redo all the
code duplication work for every frame) ?

> Pointing back to the original from there.

I don't understand this. What points back where?

>
> Then I'd redirect the code from the CALL_TRACE opcode completely
> to a user-defined function.

What user-defined function? I don't understand that either...
Except the sys_tracefunc, what other (user-defined) function do we have here?
Is it a Python or a C function?

> Getting rid of the extra code object would be done by this function
> when tracing is off.

How exactly? This seems to be obvious for you, but obviously, not for me ;-)

> It also vanishes automatically when the frame is released.

The function or the extra code object?

>
> > a) how to deallocate the modified code string when tracing is off
>
> By making the copy a frame property which is temporary, I think.

I understood that the frame lifetime could be exploited "somehow"...

> Or, if tracing should work for all frames, by pushing the original
> in the back of the modified. Both works.

Tracing is done for all frames, if sys_tracefunc is not NULL, which
is a function that usually ends up in the f_trace slot.

>
> ciao - chris

I'm confused. I didn't understand your idea.

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
RE: about line numbers [ In reply to ]
[going back a week here, to dict resizing ...]

[Vladimir Marangozov]
> ...
> All in all, for performance reasons, dicts remain an exception
> to the rule of releasing memory ASAP.

Yes, except I don't think there is such a rule! The actual rule is a
balancing act between the cost of keeping memory around "just in case", and
the expense of getting rid of it.

Resizing a dict is extraordinarily expensive because the entire table needs
to be rearranged, but lists make this tradeoff too (when you del a list
element or list slice, it still goes thru NRESIZE, which still keeps space
for as many as 100 "extra" elements around).

The various internal caches for int and frame objects (etc) also play this
sort of game; e.g., if I happen to have a million ints sitting around at
some time, Python effectively assumes I'll never want to reuse that int
storage for anything other than ints again.

python-rarely-releases-memory-asap-ly y'rs - tim
Re: Memory (was: about line numbers, which was shrinking dicts) [ In reply to ]
Tim Peters wrote:
>
> [going back a week here, to dict resizing ...]

Yes, and the subject line does not correspond to the contents because
at the moment I've sent this message, I ran out of disk space and the
mailer picked a random header after destroying half of the messages
in this mailbox.

>
> [Vladimir Marangozov]
> > ...
> > All in all, for performance reasons, dicts remain an exception
> > to the rule of releasing memory ASAP.
>
> Yes, except I don't think there is such a rule! The actual rule is a
> balancing act between the cost of keeping memory around "just in case", and
> the expense of getting rid of it.

Good point.

>
> Resizing a dict is extraordinarily expensive because the entire table needs
> to be rearranged, but lists make this tradeoff too (when you del a list
> element or list slice, it still goes thru NRESIZE, which still keeps space
> for as many as 100 "extra" elements around).
>
> The various internal caches for int and frame objects (etc) also play this
> sort of game; e.g., if I happen to have a million ints sitting around at
> some time, Python effectively assumes I'll never want to reuse that int
> storage for anything other than ints again.
>
> python-rarely-releases-memory-asap-ly y'rs - tim

Yes, and I'm somewhat sensible to this issue afer spending 6 years
in a team which deals a lot with memory management (mainly DSM).

In other words, you say that Python tolerates *virtual* memory fragmentation
(a funny term :-). In the case of dicts and strings, we tolerate "internal
fragmentation" (a contiguous chunk is allocated, then partially used).
In the case of ints, floats or frames, we tolerate "external fragmentation".

And as you said, Python tolerates this because of the speed/space tradeoff.
Hopefully, all we deal with at this level is virtual memory, so even if you
have zillions of ints, it's the OS VMM that will help you more with its
long-term scheduling than Python's wild guesses about a hypothetical usage
of zillions of ints later.

I think that some OS concepts can give us hints on how to reduce our
virtual fragmentation (which, as we all know, is a not a very good thing).
A few kewords: compaction, segmentation, paging, sharing.

We can't do much about our internal fragmentation, except changing the
algorithms of dicts & strings (which is not appealing anyways). But it
would be nice to think about the external fragmentation of Python's caches.
Or even try to reduce the internal fragmentation in combination with the
internal caches...

BTW, this is the whole point of PyMalloc: in a virtual memory world, try
to reduce the distance between the user view and the OS view on memory.
PyMalloc addresses the fragmentation problem at a lower level of granularity
than an OS (that is, *within* a page), because most Python's objects are
very small. However, it can't handle efficiently large chunks like the
int/float caches. Basically what it does is: segmentation of the virtual
space and sharing of the cached free space. I think that Python could
improve on sharing its internal caches, without significant slowdowns...

The bottom line is that there's still plenty of room for exploring alternate
mem mgt strategies that fit better Python's memory needs as a whole.

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
Re: about line numbers [ In reply to ]
[me, dropping SET_LINENO]
>
> I wrote a very rough first implementation of this idea. The files are at:
>
> http://sirac.inrialpes.fr/~marangoz/python/lineno/
>
> ...
>
> A couple of things that need finalization:
>
> ...

An updated version is available at the same location.
I think that this one does The Right Thing (tm).

a) Everything is internal to the VM and totally hidden, as it should be.
b) No modifications of the code and frame objects (no additional slots)
c) The modified code string (used for tracing) is allocated dynamically
when the 1st frame pointing to its original switches in trace mode,
and is deallocated automatically when the last frame pointing to its
original dies.

I feel better with this code so I can stop thinking about it and move on :-)
(leaving it to your appreciation).

What's next? File attributes? ;-)

It's not easy to weight what kind of common interface would be easy to grasp,
intuitive and unambiguous for the average user. I think that the people on
this list (being core developers) are more or less biased here (I'd say more
than less). Perhaps some input from the community (c.l.py) would help?

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252