Mailing List Archive

Decreasing refcount for locals before popping frame
Consider this example code:

def test():
a = A()

test()

Currently, the locals (i.e. `a`) are cleared only after the function
has returned:

If we attach a finalizer to `a` immediately after the declaration then
the frame stack available via `sys._getframe()` inside the finalizer
function does not include the frame used to evaluate the function
(i.e. with the code object of the `test` function).

The nearest frame is that of the top-level module (where we make the
call to the function).

This is in practical terms no different than:

def test():
return A()

test()

There's no way to distinguish between the two cases even though in the
second example, the object is dropped only after the frame (used to
evaluate the function) has been cleared.

The effect I am trying to achieve is:

def test():
a = A()
del a

Here's a use-case to motivate this need:

In Airflow, we're considering introducing some "magic" to help users write:

with DAG(...):
# some code here

That is, without declaring a top-level variable such as `dag`.

However, we can't detect the following situation:

def create():
with DAG(...) as dag:
# some code here

create()

The DAG is not returned from the function but nevertheless, we can't
distinguish between this code and the correct version:

def create():
with DAG(...) as dag:
# some code here
return dag

In this case, calling `create` will then "return" the DAG and of
course, without a variable assignment, the finalizer will be called –
but now we can detect this.

I'm thinking that it ought to be possible to clear out
`frame->localsplus` before leaving the function frame.

I played around with "ceval.c" and only got segfaults. It's
complicated machinery :-)

Thoughts?
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/D5HCLMN42SIRRUHWPU566R7YYAVLCAEN/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Decreasing refcount for locals before popping frame [ In reply to ]
I don't know if there's anything specifically stopping this, but from what I understand, the precise moment that a finalizer gets called is unspecified, so relying on any sort of behavior there is undefined and non-portable. Implementations like PyPy don't always use reference counting, so their garbage collection might get called some unspecified amount of time later.

I'm not familiar with Airflow, but would you be able to decorate the create() function to check for good return values? Something like

: import functools
:
: def dag_initializer(func):
: @functools.wraps(func)
: def wrapper():
: with DAG(...) as dag:
: result = func(dag)
: del dag
: if not isinstance(result, DAG):
: raise ValueError(f"{func.__name__} did not return a dag")
: return result
: return wrapper
:
: @dag_initializer
: def create(dag):
: "some code here"
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EBCLFYZLCTANUYSPZ55GFHG5I7DDTR76/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Decreasing refcount for locals before popping frame [ In reply to ]
As it has been mentioned there is no guarantee that your variable will even
be finalized (or even destroyed) after the frame finishes. For example, if
your variable goes into a reference cycle for whatever reason it may not be
cleared until a GC run happens (and in some situations it may not even be
cleared at any point). The language gives you no guarantees over when or
how objects will be finalized or destroyed and any attempt at relying on
specific behaviour is deemed to fail because it can change between versions
and implementations.



On Thu, 28 Apr 2022, 14:14 Malthe, <mborch@gmail.com> wrote:

> Consider this example code:
>
> def test():
> a = A()
>
> test()
>
> Currently, the locals (i.e. `a`) are cleared only after the function
> has returned:
>
> If we attach a finalizer to `a` immediately after the declaration then
> the frame stack available via `sys._getframe()` inside the finalizer
> function does not include the frame used to evaluate the function
> (i.e. with the code object of the `test` function).
>
> The nearest frame is that of the top-level module (where we make the
> call to the function).
>
> This is in practical terms no different than:
>
> def test():
> return A()
>
> test()
>
> There's no way to distinguish between the two cases even though in the
> second example, the object is dropped only after the frame (used to
> evaluate the function) has been cleared.
>
> The effect I am trying to achieve is:
>
> def test():
> a = A()
> del a
>
> Here's a use-case to motivate this need:
>
> In Airflow, we're considering introducing some "magic" to help users write:
>
> with DAG(...):
> # some code here
>
> That is, without declaring a top-level variable such as `dag`.
>
> However, we can't detect the following situation:
>
> def create():
> with DAG(...) as dag:
> # some code here
>
> create()
>
> The DAG is not returned from the function but nevertheless, we can't
> distinguish between this code and the correct version:
>
> def create():
> with DAG(...) as dag:
> # some code here
> return dag
>
> In this case, calling `create` will then "return" the DAG and of
> course, without a variable assignment, the finalizer will be called –
> but now we can detect this.
>
> I'm thinking that it ought to be possible to clear out
> `frame->localsplus` before leaving the function frame.
>
> I played around with "ceval.c" and only got segfaults. It's
> complicated machinery :-)
>
> Thoughts?
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/D5HCLMN42SIRRUHWPU566R7YYAVLCAEN/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Decreasing refcount for locals before popping frame [ In reply to ]
Dennis Sweeney wrote:
> I don't know if there's anything specifically stopping this, but from what I understand, the precise moment that a finalizer gets called is unspecified, so relying on any sort of behavior there is undefined and non-portable. Implementations like PyPy don't always use reference counting, so their garbage collection might get called some unspecified amount of time later.

It's unspecified of course for the language as such, but in the specific case of CPython (which we're targeting), I think the refcounting logic is here to stay and generally speaking, can be relied on. Of course some version may come along to break expectations and I suppose we might cross that bridge when we get to it.

> I'm not familiar with Airflow, but would you be able to decorate the create() function to check for good return values?

We could but for the most part, people don't define DAGs inside functions – it happens, but it is not the most simple usage pattern. It's not so much about the function itself, but about being able to determine if a DAG was dropped at the top-level of the module.

If the frame clearing behavior was changed so that locals were reclaimed before popping the frame, I think the line number (i.e. `f_lineno`) would have to be that of the function definition, i.e. `def test():` in the examples above.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FWRP3RPCGXXDQT2IVO7HQBCUQFHGTCRM/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Decreasing refcount for locals before popping frame [ In reply to ]
Pablo Galindo Salgado wrote:
> As it has been mentioned there is no guarantee that your variable will even
> be finalized (or even destroyed) after the frame finishes. For example, if
> your variable goes into a reference cycle for whatever reason it may not be
> cleared until a GC run happens (and in some situations it may not even be
> cleared at any point).

I think there is a reasonable guarantee in CPython that it will happen exactly when you leave the frame, assuming there are no cycles or other references to the object. There's always the future, but I don't see a very near future where this will change fundamentally.

Relying too much on CPython's behavior is a bad thing, but I think there are cases where it makes sense and can be a pragmatic choice. Certainly lots of programs have successfully relied on `sys._getframe` over the years.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BVO7RMMZ2LJFEG4GRNNTYZU3Q4P3DHV3/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Decreasing refcount for locals before popping frame [ In reply to ]
Can you show a run-able example of the successful and unsuccessful usage of
`with DAG(): ... `?

On Fri, Apr 29, 2022, 6:31 AM Malthe <mborch@gmail.com> wrote:

> Pablo Galindo Salgado wrote:
> > As it has been mentioned there is no guarantee that your variable will
> even
> > be finalized (or even destroyed) after the frame finishes. For example,
> if
> > your variable goes into a reference cycle for whatever reason it may not
> be
> > cleared until a GC run happens (and in some situations it may not even be
> > cleared at any point).
>
> I think there is a reasonable guarantee in CPython that it will happen
> exactly when you leave the frame, assuming there are no cycles or other
> references to the object. There's always the future, but I don't see a very
> near future where this will change fundamentally.
>
> Relying too much on CPython's behavior is a bad thing, but I think there
> are cases where it makes sense and can be a pragmatic choice. Certainly
> lots of programs have successfully relied on `sys._getframe` over the years.
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/BVO7RMMZ2LJFEG4GRNNTYZU3Q4P3DHV3/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Decreasing refcount for locals before popping frame [ In reply to ]
On Fri, 29 Apr 2022 at 06:38, Thomas Grainger <tagrain@gmail.com> wrote:
> Can you show a run-able example of the successful and unsuccessful usage of `with DAG(): ... `?

from airflow import DAG

# correct:
dag = DAG("my_dag")

# incorrect:
DAG("my_dag")

The with construct really has nothing to do with it, but it is a
common source of confusion:

# incorrect
with DAG("my_dag"):
...

It is less obvious (to some) in this way that the entire DAG will not
be picked up. You will in fact have to write:

# correct
with DAG("my_dag") as dag:
...

This way, you're capturing the DAG in the top-level scope which is the
requirement.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HREOTTGPB5JMLGYMIQL4VR2DFI6GBG5J/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Decreasing refcount for locals before popping frame [ In reply to ]
Does this only apply to DAGfiles? Eg
https://airflow.apache.org/docs/apache-airflow/1.10.12/concepts.html#scope

You can use a `__del__` method that warns on collection - like an unawaited
coroutine

Also if you're in control of importing the dagfile you can record all
created dags and report any that are missing from the globals of the module


On Fri, Apr 29, 2022, 7:45 AM Malthe <mborch@gmail.com> wrote:

> On Fri, 29 Apr 2022 at 06:38, Thomas Grainger <tagrain@gmail.com> wrote:
> > Can you show a run-able example of the successful and unsuccessful usage
> of `with DAG(): ... `?
>
> from airflow import DAG
>
> # correct:
> dag = DAG("my_dag")
>
> # incorrect:
> DAG("my_dag")
>
> The with construct really has nothing to do with it, but it is a
> common source of confusion:
>
> # incorrect
> with DAG("my_dag"):
> ...
>
> It is less obvious (to some) in this way that the entire DAG will not
> be picked up. You will in fact have to write:
>
> # correct
> with DAG("my_dag") as dag:
> ...
>
> This way, you're capturing the DAG in the top-level scope which is the
> requirement.
>
Re: Decreasing refcount for locals before popping frame [ In reply to ]
On Fri, 29 Apr 2022 at 06:50, Thomas Grainger <tagrain@gmail.com> wrote:
> You can use a `__del__` method that warns on collection - like an unawaited coroutine
>
> Also if you're in control of importing the dagfile you can record all created dags and report any that are missing from the globals of the module

Yes and I think this is the best we can do given how frames are being cleared.

We can notify the user that a DAG was instantiated and not exposed at
the top-level which is almost guaranteed to be a mistake. There's
probably no good way currently to do better (for some value of
"better").

Thanks
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/E4IU26RL4I72FMACQLNTIPT5DN5XTE3S/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Decreasing refcount for locals before popping frame [ In reply to ]
Can you ping me on the airflow PR for this change? (@graingert)

On Fri, Apr 29, 2022, 7:54 AM Malthe <mborch@gmail.com> wrote:

> On Fri, 29 Apr 2022 at 06:50, Thomas Grainger <tagrain@gmail.com> wrote:
> > You can use a `__del__` method that warns on collection - like an
> unawaited coroutine
> >
> > Also if you're in control of importing the dagfile you can record all
> created dags and report any that are missing from the globals of the module
>
> Yes and I think this is the best we can do given how frames are being
> cleared.
>
> We can notify the user that a DAG was instantiated and not exposed at
> the top-level which is almost guaranteed to be a mistake. There's
> probably no good way currently to do better (for some value of
> "better").
>
> Thanks
>