Mailing List Archive

Python 3.11 bytecode and exception table
Hi all,

I am the current maintainer of bytecode
(https://github.com/MatthieuDartiailh/bytecode) which is a library to
perform assembly and disassembly of Python bytecode. The library was
created by V. Stinner.

I started looking in Python 3.11 support in bytecode, I read
Objects/exception_handling_notes.txt and I have a couple of questions
regarding the exception table:

Currently bytecode exposes three level of abstractions:
  - the concrete level in which one deals with instruction offset for
jumps and explicit indexing into the known constants and names
  - the bytecode level which uses labels for jumps and allow non
integer argument to instructions
  - the cfg level which provides basic blocks delineation over the
bytecode level

So my first idea was to directly expose the unpacked exception table
(start, stop, target, stack_depth, last_i) at the concrete level and use
pseudo-instruction and labels at the bytecode level. At this point of my
reflections, I saw
https://github.com/python/cpython/commit/c57aad777afc6c0b382981ee9e4bc94c03bf5f68
about adding pseudo-instructionto dis output in 3.12 and though it would
line up quite nicely. Reading through, I got curious about how
SETUP_WITH handled popping one extra item from the stack so I went to
look at dis results on a couple of small examples. I tried on 3.10 and
3.11b3 (for some reasons I cannot compile main at a391b74d on windows).

I looked at simple things and got a bit surprised:

Disassembling:
deff():
try:
a= 1
except:
raise

I get on 3.11:
 1           0 RESUME                   0

  2           2 NOP

  3           4 LOAD_CONST               1 (1)
              6 STORE_FAST               0 (a)
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
        >>   12 PUSH_EXC_INFO

  4          14 POP_TOP

  5          16 RAISE_VARARGS            0
        >>   18 COPY                     3
             20 POP_EXCEPT
             22 RERAISE                  1
ExceptionTable:
  4 to 6 -> 12 [0]
  12 to 16 -> 18 [1] lasti

On 3.10:
  2           0 SETUP_FINALLY            5 (to 12)

  3           2 LOAD_CONST               1 (1)
              4 STORE_FAST               0 (a)
              6 POP_BLOCK
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

  4     >>   12 POP_TOP
             14 POP_TOP
             16 POP_TOP

  5          18 RAISE_VARARGS            0

This surprised me on two levels:
- first I have never seen the RESUME opcode and it is currently not
documented
- my second surprise comes from the second entry in the exception table.
At first I failed to see why it was needed but writing this I realize it
corresponds to the explicit handling of exception propagation to the
caller. Since I cannot compile 3.12 ATM I am wondering how this plays
with pseudo-instruction: in particular are pseudo-instructions generated
for all entries in the exception table ?

My initial idea was to have a SETUP_FINALLY/SETUP_CLEANUP - POP_BLOCK
pair for each line in the exception table and label for the jump target.
But I realize it means we will have many such pairs than in 3.10. It is
fine by me but I wondered what choice was made in 3.12 dis and if this
approach made sense.

Best regards

Matthieu
Re: Python 3.11 bytecode and exception table [ In reply to ]
On 05/07/2022 09:22, Matthieu Dartiailh wrote:
> This surprised me on two levels:
> - first I have never seen the RESUME opcode and it is currently not
> documented
RESUME occurs at the start of every function (and some other places),
and is only used for some internal interpreter bookkeeping. It is
documented at https://docs.python.org/3.11/library/dis.html#opcode-RESUME
Re: Python 3.11 bytecode and exception table [ In reply to ]
Hi Matthieu,

The dis output for this function in 3.12 is the same as it is in 3.11.

The pseudo-instructions are emitted by the compiler's codegen stage, but
never make it to compiled bytecode. They are removed or replaced by real
opcodes before the code object is created.

The recent change to the dis module that you mentioned did not change how
the disassembly of bytecode gets displayed. Rather, it added the
pseudo-instructions to the opcodes list so that we have access to their
mnemonics from python. This is a step towards exposing intermediate
compilation steps to python (for unit tests, etc). BTW - part of this will
require writing some test utilities for cpython that let us specify and
compare opcode sequences, similar to what you have in bytecode.

As for deconstructing the exception table and planting the pseudo
instructions back into the code - it would be nice if dis could do that,
but we may need to settle for an approximation because I'm not sure the
exact block structure can be reliably reconstructed from the exception
table at the moment. I may be wrong.

Having a SETUP_*/POP_BLOCK for each line in the exception table is not
going to be correct - there can be nested try-except blocks, for instance,
and even without them the compiler can emit the code of an except block in
non-contiguous order (in https://github.com/python/cpython/pull/93622 I
fixed one of those cases to reduce the size of the exception table, but it
wasn't a correctness bug).

Irit

On Tue, Jul 5, 2022 at 9:27 AM Matthieu Dartiailh <m.dartiailh@gmail.com>
wrote:

> Hi all,
>
> I am the current maintainer of bytecode (
> https://github.com/MatthieuDartiailh/bytecode) which is a library to
> perform assembly and disassembly of Python bytecode. The library was
> created by V. Stinner.
>
> I started looking in Python 3.11 support in bytecode, I read
> Objects/exception_handling_notes.txt and I have a couple of questions
> regarding the exception table:
>
> Currently bytecode exposes three level of abstractions:
> - the concrete level in which one deals with instruction offset for
> jumps and explicit indexing into the known constants and names
> - the bytecode level which uses labels for jumps and allow non integer
> argument to instructions
> - the cfg level which provides basic blocks delineation over the
> bytecode level
>
> So my first idea was to directly expose the unpacked exception table
> (start, stop, target, stack_depth, last_i) at the concrete level and use
> pseudo-instruction and labels at the bytecode level. At this point of my
> reflections, I saw
> https://github.com/python/cpython/commit/c57aad777afc6c0b382981ee9e4bc94c03bf5f68
> about adding pseudo-instructionto dis output in 3.12 and though it would
> line up quite nicely. Reading through, I got curious about how SETUP_WITH
> handled popping one extra item from the stack so I went to look at dis
> results on a couple of small examples. I tried on 3.10 and 3.11b3 (for some
> reasons I cannot compile main at a391b74d on windows).
>
> I looked at simple things and got a bit surprised:
>
> Disassembling:
> def f():
> try:
> a = 1
> except:
> raise
>
> I get on 3.11:
> 1 0 RESUME 0
>
> 2 2 NOP
>
> 3 4 LOAD_CONST 1 (1)
> 6 STORE_FAST 0 (a)
> 8 LOAD_CONST 0 (None)
> 10 RETURN_VALUE
> >> 12 PUSH_EXC_INFO
>
> 4 14 POP_TOP
>
> 5 16 RAISE_VARARGS 0
> >> 18 COPY 3
> 20 POP_EXCEPT
> 22 RERAISE 1
> ExceptionTable:
> 4 to 6 -> 12 [0]
> 12 to 16 -> 18 [1] lasti
>
> On 3.10:
> 2 0 SETUP_FINALLY 5 (to 12)
>
> 3 2 LOAD_CONST 1 (1)
> 4 STORE_FAST 0 (a)
> 6 POP_BLOCK
> 8 LOAD_CONST 0 (None)
> 10 RETURN_VALUE
>
> 4 >> 12 POP_TOP
> 14 POP_TOP
> 16 POP_TOP
>
> 5 18 RAISE_VARARGS 0
>
> This surprised me on two levels:
> - first I have never seen the RESUME opcode and it is currently not
> documented
> - my second surprise comes from the second entry in the exception table.
> At first I failed to see why it was needed but writing this I realize it
> corresponds to the explicit handling of exception propagation to the
> caller. Since I cannot compile 3.12 ATM I am wondering how this plays with
> pseudo-instruction: in particular are pseudo-instructions generated for all
> entries in the exception table ?
>
> My initial idea was to have a SETUP_FINALLY/SETUP_CLEANUP - POP_BLOCK pair
> for each line in the exception table and label for the jump target. But I
> realize it means we will have many such pairs than in 3.10. It is fine by
> me but I wondered what choice was made in 3.12 dis and if this approach
> made sense.
>
> Best regards
>
> Matthieu
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/XZ7KDCI3TXEUERU3YIFKC543GAGIYG6Q/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Python 3.11 bytecode and exception table [ In reply to ]
Hi Irit, hi Patrick,

Thanks for your quick answers.

First thanks Patrick, it seems I went back to the stable docs at one
point without noticing it and hence I missed the new opcodes.

Thanks Irit for the clarification regarding the pseudo-instructions use
in dis.

Regarding the existence of nested try/except I believe a we could have 2
SETUP_* followed by 2 POP_BLOCK so I am not sure what issue you see
there. However if we can have exception tables with two rows such as (1,
3, ...) and (2, 4, ...) then yes I will have an issue. I guess I will
have to try implementing something and try to roundtrip on as many
examples as possible. Would you be interested in being posted about my
progress ?

Best

Matthieu

Le 7/5/2022 à 11:01 AM, Irit Katriel a écrit :
> Hi Matthieu,
>
> The dis output for this function in 3.12 is the same as it is in 3.11.
>
> The pseudo-instructions are emitted by the compiler's codegen stage,
> but never make it to compiled bytecode. They are removed or replaced
> by real opcodes before the code object is created.
>
> The recent change to the dis module that you mentioned did not change
> how the disassembly of bytecode gets displayed. Rather, it added the
> pseudo-instructions to the opcodes list so that we have access to
> their mnemonics from python. This is a step towards exposing
> intermediate compilation steps to python (for unit tests, etc).  BTW -
> part of this will require writing some test utilities for cpython that
> let us specify and compare opcode sequences, similar to what you have
> in bytecode.
>
> As for deconstructing the exception table and planting the pseudo
> instructions back into the code - it would be nice if dis could do
> that, but we may need to settle for an approximation because I'm not
> sure the exact block structure can be reliably reconstructed from the
> exception table at the moment. I may be wrong.
>
> Having a SETUP_*/POP_BLOCK for each line in the exception table is not
> going to be correct - there can be nested try-except blocks, for
> instance, and even without them the compiler can emit the code of an
> except block in non-contiguous order (in
> https://github.com/python/cpython/pull/93622 I fixed one of those
> cases to reduce the size of the exception table, but it wasn't a
> correctness bug).
>
> Irit
>
> On Tue, Jul 5, 2022 at 9:27 AM Matthieu Dartiailh
> <m.dartiailh@gmail.com> wrote:
>
> Hi all,
>
> I am the current maintainer of bytecode
> (https://github.com/MatthieuDartiailh/bytecode) which is a library
> to perform assembly and disassembly of Python bytecode. The
> library was created by V. Stinner.
>
> I started looking in Python 3.11 support in bytecode, I read
> Objects/exception_handling_notes.txt and I have a couple of
> questions regarding the exception table:
>
> Currently bytecode exposes three level of abstractions:
>   - the concrete level in which one deals with instruction offset
> for jumps and explicit indexing into the known constants and names
>   - the bytecode level which uses labels for jumps and allow non
> integer argument to instructions
>   - the cfg level which provides basic blocks delineation over the
> bytecode level
>
> So my first idea was to directly expose the unpacked exception
> table (start, stop, target, stack_depth, last_i) at the concrete
> level and use pseudo-instruction and labels at the bytecode level.
> At this point of my reflections, I saw
> https://github.com/python/cpython/commit/c57aad777afc6c0b382981ee9e4bc94c03bf5f68
> about adding pseudo-instructionto dis output in 3.12 and though it
> would line up quite nicely. Reading through, I got curious about
> how SETUP_WITH handled popping one extra item from the stack so I
> went to look at dis results on a couple of small examples. I tried
> on 3.10 and 3.11b3 (for some reasons I cannot compile main at
> a391b74d on windows).
>
> I looked at simple things and got a bit surprised:
>
> Disassembling:
> deff():
> try:
> a= 1
> except:
> raise
>
> I get on 3.11:
>  1           0 RESUME                   0
>
>   2           2 NOP
>
>   3           4 LOAD_CONST               1 (1)
>               6 STORE_FAST               0 (a)
>               8 LOAD_CONST               0 (None)
>              10 RETURN_VALUE
>         >>   12 PUSH_EXC_INFO
>
>   4          14 POP_TOP
>
>   5          16 RAISE_VARARGS            0
>         >>   18 COPY                     3
>              20 POP_EXCEPT
>              22 RERAISE                  1
> ExceptionTable:
>   4 to 6 -> 12 [0]
>   12 to 16 -> 18 [1] lasti
>
> On 3.10:
>   2           0 SETUP_FINALLY            5 (to 12)
>
>   3           2 LOAD_CONST               1 (1)
>               4 STORE_FAST               0 (a)
>               6 POP_BLOCK
>               8 LOAD_CONST               0 (None)
>              10 RETURN_VALUE
>
>   4     >>   12 POP_TOP
>              14 POP_TOP
>              16 POP_TOP
>
>   5          18 RAISE_VARARGS            0
>
> This surprised me on two levels:
> - first I have never seen the RESUME opcode and it is currently
> not documented
> - my second surprise comes from the second entry in the exception
> table. At first I failed to see why it was needed but writing this
> I realize it corresponds to the explicit handling of exception
> propagation to the caller. Since I cannot compile 3.12 ATM I am
> wondering how this plays with pseudo-instruction: in particular
> are pseudo-instructions generated for all entries in the exception
> table ?
>
> My initial idea was to have a SETUP_FINALLY/SETUP_CLEANUP -
> POP_BLOCK pair for each line in the exception table and label for
> the jump target. But I realize it means we will have many such
> pairs than in 3.10. It is fine by me but I wondered what choice
> was made in 3.12 dis and if this approach made sense.
>
> Best regards
>
> Matthieu
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/XZ7KDCI3TXEUERU3YIFKC543GAGIYG6Q/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Python 3.11 bytecode and exception table [ In reply to ]
Hi Matthieu,

Yes I am interested. Please ping me for PR reviews or any progress updates.

Thanks
Irit

> On 5 Jul 2022, at 20:27, Matthieu Dartiailh <m.dartiailh@gmail.com> wrote:
>
> ? Hi Irit, hi Patrick,
>
> Thanks for your quick answers.
>
> First thanks Patrick, it seems I went back to the stable docs at one point without noticing it and hence I missed the new opcodes.
>
> Thanks Irit for the clarification regarding the pseudo-instructions use in dis.
>
> Regarding the existence of nested try/except I believe a we could have 2 SETUP_* followed by 2 POP_BLOCK so I am not sure what issue you see there. However if we can have exception tables with two rows such as (1, 3, ...) and (2, 4, ...) then yes I will have an issue. I guess I will have to try implementing something and try to roundtrip on as many examples as possible. Would you be interested in being posted about my progress ?
>
> Best
>
> Matthieu
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/B2G4LQ3WPAYEUEZX6LURWO336EMZTPC2/
Code of Conduct: http://python.org/psf/codeofconduct/