Mailing List Archive: Re: A technique from a chatbot

Re: A technique from a chatbot

Apr 2, 2024, 10:47 AM

Post #1 of 12 (122 views)

On 02/04/2024 19.18, Stefan Ram wrote:
> Some people can't believe it when I say that chatbots improve
> my programming productivity. So, here's a technique I learned
> from a chatbot!
>
> It is a structured "break". "Break" still is a kind of jump,
> you know?
>
> So, what's a function to return the first word beginning with
> an "e" in a given list, like for example
>
> [ 'delta', 'epsilon', 'zeta', 'eta', 'theta' ]
>
> ? Well it's
>
> def first_word_beginning_with_e( list_ ):
> for word in list_:
> if word[ 0 ]== 'e': return word
>
> . "return" still can be considered a kind of "goto" statement.
> It can lead to errors:
>
> def first_word_beginning_with_e( list_ ):
> for word in list_:
> if word[ 0 ]== 'e': return word
> something_to_be_done_at_the_end_of_this_function()
>
> The call sometimes will not be executed here!
> So, "return" is similar to "break" in that regard.
>
> But in Python we can write:
>
> def first_word_beginning_with_e( list_ ):
> return next( ( word for word in list_ if word[ 0 ]== 'e' ), None )

Doesn't look a smart advice.

> . No jumps anymore, yet the loop is aborted on the first hit

First of all, I fail to understand why there
should be no jumps any more.
It depends on how "return" and "if" are handled,
I guess, in different context.
Maybe they're just "masked".
In any case, the "compiler" should have just
done the same.

> (if I guess correctly how its working).

Second, it is difficult to read, which is bad.
The "guess" above is just evidence of that.

My personal opinion about these "chatbots", is
that, while they might deliver clever solutions,
they are not explaining *why* these solutions
should be considered "clever".
Which is the most important thing (the solution
itself is _not_).

bye,

--

piergiorgio

--
https://mail.python.org/mailman/listinfo/python-list

Re: A technique from a chatbot [ In reply to ]

python-list at python

Apr 2, 2024, 12:31 PM

Post #2 of 12 (120 views)

Permalink

On 4/2/2024 1:47 PM, Piergiorgio Sartor via Python-list wrote:
> On 02/04/2024 19.18, Stefan Ram wrote:
>>    Some people can't believe it when I say that chatbots improve
>>    my programming productivity. So, here's a technique I learned
>>    from a chatbot!
>>    It is a structured "break". "Break" still is a kind of jump,
>>    you know?
>>    So, what's a function to return the first word beginning with
>>    an "e" in a given list, like for example
>> [ 'delta', 'epsilon', 'zeta', 'eta', 'theta' ]
>>
>>    ? Well it's
>> def first_word_beginning_with_e( list_ ):
>>      for word in list_:
>>          if word[ 0 ]== 'e': return word
>>
>>    . "return" still can be considered a kind of "goto" statement.
>>    It can lead to errors:
>>
>> def first_word_beginning_with_e( list_ ):
>>      for word in list_:
>>          if word[ 0 ]== 'e': return word
>>      something_to_be_done_at_the_end_of_this_function()
>>    The call sometimes will not be executed here!
>>    So, "return" is similar to "break" in that regard.
>>    But in Python we can write:
>> def first_word_beginning_with_e( list_ ):
>>      return next( ( word for word in list_ if word[ 0 ]== 'e' ), None )
>
> Doesn't look a smart advice.
>
>>    . No jumps anymore, yet the loop is aborted on the first hit

It's worse than "not a smart advice". This code constructs an
unnecessary tuple, then picks out its first element and returns that.
The something_to_be_done() function may or may not be called. And it's
harder to read and understand than necessary. Compare, for example,
with this version:

def first_word_beginning_with_e(target, wordlist):
result = ''
for w in wordlist:
if w.startswith(target):
res = w
break
do_something_else()
return result

If do_something_else() is supposed to fire only if the target is not
found, then this slight modification will do:

def first_word_beginning_with_e(target, wordlist):
result = ''
for w in wordlist:
if w.startswith(target):
res = w
break
else:
do_something_else()
return result

[Using the "target" argument instead of "target[0]" will let you match
an initial string instead of just a the first character].

> First of all, I fail to understand why there
> should be no jumps any more.
> It depends on how "return" and "if" are handled,
> I guess, in different context.
> Maybe they're just "masked".
> In any case, the "compiler" should have just
> done the same.
>
>>    (if I guess correctly how its working).
>
> Second, it is difficult to read, which is bad.
> The "guess" above is just evidence of that.
>
> My personal opinion about these "chatbots", is
> that, while they might deliver clever solutions,
> they are not explaining *why* these solutions
> should be considered "clever".
> Which is the most important thing (the solution
> itself is _not_).
>
> bye,
>

--
https://mail.python.org/mailman/listinfo/python-list

RE: A technique from a chatbot [ In reply to ]

python-list at python

Apr 2, 2024, 10:27 PM

Post #3 of 12 (119 views)

Permalink

I am a tad confused by a suggestion that any kind of GOTO variant is bad. The suggestion runs counter to the reality that underneath it all, compiled programs are chock full of GOTO variants even for simple things like IF-ELSE.

Consider the code here:

>> def first_word_beginning_with_e( list_ ):
>> for word in list_:
>> if word[ 0 ]== 'e': return word
>> something_to_be_done_at_the_end_of_this_function()

If instead the function initialized a variable to nothing useful and in the loop if it found a word beginning with e and it still contained nothing useful, copied it into the variable and then allowed the code to complete the loop and finally returned the variable, that would simply be a much less efficient solution to the problem and gain NOTHING. There are many variants you can come up with and when the conditions are complex and many points of immediate return, fine, then it may be dangerous. But a single return is fine.

The function does have a flaw as it is not clear what it should do if nothing is found. Calling a silly long name does not necessarily return anything.

Others, like Thomas, have shown other variants including some longer and more complex ways.

A fairly simple one-liner version, not necessarily efficient, would be to just use a list comprehension that makes a new list of just the ones matching the pattern of starting with an 'e' and then returns the first entry or None. This shows the code and test it:

text = ["eastern", "Western", "easter"]

NorEaster = ["North", "West", "orient"]

def first_word_beginning_with_e( list_ ):
return(result[0] if (result := [word for word in list_ if word[0].lower() == 'e']) else None)

print(first_word_beginning_with_e( text ))
print(first_word_beginning_with_e( NorEaster ))

Result of running it on a version of python ay least 3.8 so it supports the walrus operator:

eastern
None

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Thomas Passin via Python-list
Sent: Tuesday, April 2, 2024 3:31 PM
To: python-list@python.org
Subject: Re: A technique from a chatbot

On 4/2/2024 1:47 PM, Piergiorgio Sartor via Python-list wrote:
> On 02/04/2024 19.18, Stefan Ram wrote:
>> Some people can't believe it when I say that chatbots improve
>> my programming productivity. So, here's a technique I learned
>> from a chatbot!
>> It is a structured "break". "Break" still is a kind of jump,
>> you know?
>> So, what's a function to return the first word beginning with
>> an "e" in a given list, like for example
>> [ 'delta', 'epsilon', 'zeta', 'eta', 'theta' ]
>>
>> ? Well it's
>> def first_word_beginning_with_e( list_ ):
>> for word in list_:
>> if word[ 0 ]== 'e': return word
>>
>> . "return" still can be considered a kind of "goto" statement.
>> It can lead to errors:
>>
>> def first_word_beginning_with_e( list_ ):
>> for word in list_:
>> if word[ 0 ]== 'e': return word
>> something_to_be_done_at_the_end_of_this_function()
>> The call sometimes will not be executed here!
>> So, "return" is similar to "break" in that regard.
>> But in Python we can write:
>> def first_word_beginning_with_e( list_ ):
>> return next( ( word for word in list_ if word[ 0 ]== 'e' ), None )
>
> Doesn't look a smart advice.
>
>> . No jumps anymore, yet the loop is aborted on the first hit

It's worse than "not a smart advice". This code constructs an
unnecessary tuple, then picks out its first element and returns that.
The something_to_be_done() function may or may not be called. And it's
harder to read and understand than necessary. Compare, for example,
with this version:

def first_word_beginning_with_e(target, wordlist):
result = ''
for w in wordlist:
if w.startswith(target):
res = w
break
do_something_else()
return result

If do_something_else() is supposed to fire only if the target is not
found, then this slight modification will do:

def first_word_beginning_with_e(target, wordlist):
result = ''
for w in wordlist:
if w.startswith(target):
res = w
break
else:
do_something_else()
return result

[Using the "target" argument instead of "target[0]" will let you match
an initial string instead of just a the first character].

> First of all, I fail to understand why there
> should be no jumps any more.
> It depends on how "return" and "if" are handled,
> I guess, in different context.
> Maybe they're just "masked".
> In any case, the "compiler" should have just
> done the same.
>
>> (if I guess correctly how its working).
>
> Second, it is difficult to read, which is bad.
> The "guess" above is just evidence of that.
>
> My personal opinion about these "chatbots", is
> that, while they might deliver clever solutions,
> they are not explaining *why* these solutions
> should be considered "clever".
> Which is the most important thing (the solution
> itself is _not_).
>
> bye,
>

--
https://mail.python.org/mailman/listinfo/python-list

--
https://mail.python.org/mailman/listinfo/python-list

Re: A technique from a chatbot [ In reply to ]

python-list at python

Apr 3, 2024, 4:50 AM

Post #4 of 12 (117 views)

Permalink

On 4/3/2024 1:27 AM, AVI GROSS via Python-list wrote:
> I am a tad confused by a suggestion that any kind of GOTO variant is bad. The suggestion runs counter to the reality that underneath it all, compiled programs are chock full of GOTO variants even for simple things like IF-ELSE.
>
> Consider the code here:
>
>>> def first_word_beginning_with_e( list_ ):
>>> for word in list_:
>>> if word[ 0 ]== 'e': return word
>>> something_to_be_done_at_the_end_of_this_function()
>
> If instead the function initialized a variable to nothing useful and in the loop if it found a word beginning with e and it still contained nothing useful, copied it into the variable and then allowed the code to complete the loop and finally returned the variable, that would simply be a much less efficient solution to the problem and gain NOTHING. There are many variants you can come up with and when the conditions are complex and many points of immediate return, fine, then it may be dangerous. But a single return is fine.
>
> The function does have a flaw as it is not clear what it should do if nothing is found. Calling a silly long name does not necessarily return anything.
>
> Others, like Thomas, have shown other variants including some longer and more complex ways.
>
> A fairly simple one-liner version, not necessarily efficient, would be to just use a list comprehension that makes a new list of just the ones matching the pattern of starting with an 'e' and then returns the first entry or None. This shows the code and test it:
>
> text = ["eastern", "Western", "easter"]
>
> NorEaster = ["North", "West", "orient"]
>
> def first_word_beginning_with_e( list_ ):
> return(result[0] if (result := [word for word in list_ if word[0].lower() == 'e']) else None)
>
> print(first_word_beginning_with_e( text ))
> print(first_word_beginning_with_e( NorEaster ))
>
> Result of running it on a version of python ay least 3.8 so it supports the walrus operator:
>
> eastern
> None

The OP seems to want to return None if a match is not found. If a
Python function ends without a return statement, it automatically
returns None. So nothing special needs to be done. True, that is
probably a special case, but it suggests that the problem posed to the
chatbot was not posed well. A truly useful chatbot could have discussed
many of the points we've been discussing. That would have made for a
good learning experience. Instead the chatbot produced poorly
constructed code that caused a bad learning experience.

> [snip...]

--
https://mail.python.org/mailman/listinfo/python-list

RE: A technique from a chatbot [ In reply to ]

python-list at python

Apr 3, 2024, 8:32 AM

Post #5 of 12 (117 views)

Permalink

Sadly, Thomas, this is not even all that new.

I have seen people do searches on the internet for how to do one thing at a
time and then cobble together some code that does something but perhaps not
quite what they intended. Some things are just inefficient such as reading
data from a file, doing some calculations, writing the results to another
file, reading them back in and doing more calculations and writing them out
again and so on. Yes, there can be value in storing intermediate results but
why read it in again when it is already in memory? And, in some cases, why
not do multiple steps instead of one at a time and so on.

How many people ask how to TEST the code they get, especially from an
AI-like ...?

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On
Behalf Of Thomas Passin via Python-list
Sent: Wednesday, April 3, 2024 7:51 AM
To: python-list@python.org
Subject: Re: A technique from a chatbot

On 4/3/2024 1:27 AM, AVI GROSS via Python-list wrote:
> I am a tad confused by a suggestion that any kind of GOTO variant is bad.
The suggestion runs counter to the reality that underneath it all, compiled
programs are chock full of GOTO variants even for simple things like
IF-ELSE.
>
> Consider the code here:
>
>>> def first_word_beginning_with_e( list_ ):
>>> for word in list_:
>>> if word[ 0 ]== 'e': return word
>>> something_to_be_done_at_the_end_of_this_function()
>
> If instead the function initialized a variable to nothing useful and in
the loop if it found a word beginning with e and it still contained nothing
useful, copied it into the variable and then allowed the code to complete
the loop and finally returned the variable, that would simply be a much less
efficient solution to the problem and gain NOTHING. There are many variants
you can come up with and when the conditions are complex and many points of
immediate return, fine, then it may be dangerous. But a single return is
fine.
>
> The function does have a flaw as it is not clear what it should do if
nothing is found. Calling a silly long name does not necessarily return
anything.
>
> Others, like Thomas, have shown other variants including some longer and
more complex ways.
>
> A fairly simple one-liner version, not necessarily efficient, would be to
just use a list comprehension that makes a new list of just the ones
matching the pattern of starting with an 'e' and then returns the first
entry or None. This shows the code and test it:
>
> text = ["eastern", "Western", "easter"]
>
> NorEaster = ["North", "West", "orient"]
>
> def first_word_beginning_with_e( list_ ):
> return(result[0] if (result := [word for word in list_ if
word[0].lower() == 'e']) else None)
>
> print(first_word_beginning_with_e( text ))
> print(first_word_beginning_with_e( NorEaster ))
>
> Result of running it on a version of python ay least 3.8 so it supports
the walrus operator:
>
> eastern
> None

The OP seems to want to return None if a match is not found. If a
Python function ends without a return statement, it automatically
returns None. So nothing special needs to be done. True, that is
probably a special case, but it suggests that the problem posed to the
chatbot was not posed well. A truly useful chatbot could have discussed
many of the points we've been discussing. That would have made for a
good learning experience. Instead the chatbot produced poorly
constructed code that caused a bad learning experience.

> [snip...]

--
https://mail.python.org/mailman/listinfo/python-list

--
https://mail.python.org/mailman/listinfo/python-list

Re: A technique from a chatbot [ In reply to ]

python-list at python

Apr 3, 2024, 2:15 PM

Post #6 of 12 (116 views)

Permalink

ram@zedat.fu-berlin.de (Stefan Ram) writes:

> It can lead to errors:
>
> def first_word_beginning_with_e( list_ ):
> for word in list_:
> if word[ 0 ]== 'e': return word
> something_to_be_done_at_the_end_of_this_function()
>
> The call sometimes will not be executed here!
> So, "return" is similar to "break" in that regard.

That can be solved with finally:

def first_word_beginning_with_e( list_ ):
try:
for word in list_:
if word[ 0 ]== 'e': return word
finally:
print("something_to_be_done_at_the_end_of_this_function()")

--
Pieter van Oostrum <pieter@vanoostrum.org>
www: http://pieter.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
--
https://mail.python.org/mailman/listinfo/python-list

Re: A technique from a chatbot [ In reply to ]

python-list at python

Apr 3, 2024, 2:36 PM

Post #7 of 12 (116 views)

Permalink

On 03/04/2024 13.45, Gilmeh Serda wrote:
> On 2 Apr 2024 17:18:16 GMT, Stefan Ram wrote:
>
>> first_word_beginning_with_e
>
> Here's another one:
>
>>>> def ret_first_eword():
> ... return [w for w in ['delta', 'epsilon', 'zeta', 'eta', 'theta'] if w.startswith('e')][0]
> ...
>>>> ret_first_eword()
> 'epsilon'

Doesn't work in the case where there isn't a word starting with 'e':

>>> def find_e( l ):
... return [w for w in l if w.startswith('e')][0]
...
>>> l = ['delta', 'epsilon', 'zeta', 'eta', 'theta']
>>> find_e(l)
'epsilon'
>>> l = ['The','fan-jet','airline']
>>> find_e(l)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in find_e
IndexError: list index out of range
>>>

--
Michael F. Stemper
If it isn't running programs and it isn't fusing atoms, it's just bending space.

--
https://mail.python.org/mailman/listinfo/python-list

Re: A technique from a chatbot [ In reply to ]

python-list at python

Apr 4, 2024, 12:03 PM

Post #8 of 12 (115 views)

Permalink

Thomas Passin wrote:
> On 4/2/2024 1:47 PM, Piergiorgio Sartor via Python-list wrote:
>> On 02/04/2024 19.18, Stefan Ram wrote:
>>>    Some people can't believe it when I say that chatbots improve
>>>    my programming productivity. So, here's a technique I learned
>>>    from a chatbot!
>>>    It is a structured "break". "Break" still is a kind of jump,
>>>    you know?
>>>    So, what's a function to return the first word beginning with
>>>    an "e" in a given list, like for example
>>> [ 'delta', 'epsilon', 'zeta', 'eta', 'theta' ]
>>>
>>>    ? Well it's
>>> def first_word_beginning_with_e( list_ ):
>>>      for word in list_:
>>>          if word[ 0 ]== 'e': return word
>>>
>>>    . "return" still can be considered a kind of "goto" statement.
>>>    It can lead to errors:
>>>
>>> def first_word_beginning_with_e( list_ ):
>>>      for word in list_:
>>>          if word[ 0 ]== 'e': return word
>>>      something_to_be_done_at_the_end_of_this_function()
>>>    The call sometimes will not be executed here!
>>>    So, "return" is similar to "break" in that regard.
>>>    But in Python we can write:
>>> def first_word_beginning_with_e( list_ ):
>>>      return next( ( word for word in list_ if word[ 0 ]== 'e' ), None )
>>
>> Doesn't look a smart advice.
>>
>>>    . No jumps anymore, yet the loop is aborted on the first hit
>
> It's worse than "not a smart advice". This code constructs an
> unnecessary tuple, then picks out its first element and returns that.

I don't think there's a tuple being created. If you mean:
( word for word in list_ if word[ 0 ]== 'e' )

...that's not creating a tuple. It's a generator expression, which
generates the next value each time it's called for. If you only ever
ask for the first item, it only generates that one.

When I first came across them, I did find it a bit odd that generator
expressions look like the tuple equivalent of list/dictionary
comprehensions.

FWIW, if you actually wanted a tuple from that expression, you'd need to
pass the generator to tuple's constructor:
tuple(word for word in list_ if word[0] == 'e')
(You don't need to include an extra set of brackets when passing a
generator a the only argument to a function).

--
Mark.
--
https://mail.python.org/mailman/listinfo/python-list

RE: A technique from a chatbot [ In reply to ]

python-list at python

Apr 4, 2024, 1:33 PM

Post #9 of 12 (115 views)

Permalink

That is an excellent point, Mark. Some of the proposed variants to the requested problem, including mine, do indeed find all instances only to return the first. This can use additional time and space but when done, some of the overhead is also gone. What I mean is that a generator you create and invoke once, generally sits around indefinitely in your session unless it leaves your current range or something. It does only a part of the work and must remain suspended and ready to be called again to do more.

If you create a generator inside a function and the function returns, presumably it can be garbage-collected.

But if it is in the main body, I have to wonder what happen.

There seem to be several related scenarios to consider.

- You may want to find, in our example, a first instance. Right afterwards, you want the generator to disassemble anything in use.
- You may want the generator to stick around and later be able to return the next instance. The generator can only really go away when another call has been made after the last available instance and it cannot look for more beyond some end.
- Finally, you can call a generator with the goal of getting all instances such as by asking it to populate a list. In such a case, you may not necessarily want or need to use a generator expression and can use something straightforward and possible cheaper.

What confuses the issue, for me, is that you can make fairly complex calculations in python using various forms of generators that implement a sort of just-in-time approach as generators call other generators which call yet others and so on. Imagine having folders full of files that each contain a data structure such as a dictionary or set and writing functionality that searches for the first match for a key in any of the dictionaries (or sets or whatever) along the way? Now imagine that dictionary items can be a key value pair that can include the value being a deeper dictionary, perhaps down multiple levels.

You could get one generator that generates folder names or opens them and another that generates file names and reads in the data structure such as a dictionary and yet another that searches each dictionary and also any internally embedded dictionaries by calling another instance of the same generator as much as needed.

You can see how this creates and often consumes generators along the way as needed and in a sense does the minimum amount of work needed to find a first instance. But what might it leave open and taking up resources if not finished in a way that dismantles it?

Perhaps worse, imagine doing the search in parallel and as sone as it is found anywhere, ...

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Mark Bourne via Python-list
Sent: Thursday, April 4, 2024 3:04 PM
To: python-list@python.org
Subject: Re: A technique from a chatbot

Thomas Passin wrote:
> On 4/2/2024 1:47 PM, Piergiorgio Sartor via Python-list wrote:
>> On 02/04/2024 19.18, Stefan Ram wrote:
>>> Some people can't believe it when I say that chatbots improve
>>> my programming productivity. So, here's a technique I learned
>>> from a chatbot!
>>> It is a structured "break". "Break" still is a kind of jump,
>>> you know?
>>> So, what's a function to return the first word beginning with
>>> an "e" in a given list, like for example
>>> [ 'delta', 'epsilon', 'zeta', 'eta', 'theta' ]
>>>
>>> ? Well it's
>>> def first_word_beginning_with_e( list_ ):
>>> for word in list_:
>>> if word[ 0 ]== 'e': return word
>>>
>>> . "return" still can be considered a kind of "goto" statement.
>>> It can lead to errors:
>>>
>>> def first_word_beginning_with_e( list_ ):
>>> for word in list_:
>>> if word[ 0 ]== 'e': return word
>>> something_to_be_done_at_the_end_of_this_function()
>>> The call sometimes will not be executed here!
>>> So, "return" is similar to "break" in that regard.
>>> But in Python we can write:
>>> def first_word_beginning_with_e( list_ ):
>>> return next( ( word for word in list_ if word[ 0 ]== 'e' ), None )
>>
>> Doesn't look a smart advice.
>>
>>> . No jumps anymore, yet the loop is aborted on the first hit
>
> It's worse than "not a smart advice". This code constructs an
> unnecessary tuple, then picks out its first element and returns that.

I don't think there's a tuple being created. If you mean:
( word for word in list_ if word[ 0 ]== 'e' )

...that's not creating a tuple. It's a generator expression, which
generates the next value each time it's called for. If you only ever
ask for the first item, it only generates that one.

When I first came across them, I did find it a bit odd that generator
expressions look like the tuple equivalent of list/dictionary
comprehensions.

FWIW, if you actually wanted a tuple from that expression, you'd need to
pass the generator to tuple's constructor:
tuple(word for word in list_ if word[0] == 'e')
(You don't need to include an extra set of brackets when passing a
generator a the only argument to a function).

--
Mark.
--
https://mail.python.org/mailman/listinfo/python-list

--
https://mail.python.org/mailman/listinfo/python-list

Re: A technique from a chatbot [ In reply to ]

python-list at python

Apr 4, 2024, 2:10 PM

Post #10 of 12 (114 views)

Permalink

On 4/4/2024 3:03 PM, Mark Bourne via Python-list wrote:
> Thomas Passin wrote:
>> On 4/2/2024 1:47 PM, Piergiorgio Sartor via Python-list wrote:
>>> On 02/04/2024 19.18, Stefan Ram wrote:
>>>>    Some people can't believe it when I say that chatbots improve
>>>>    my programming productivity. So, here's a technique I learned
>>>>    from a chatbot!
>>>>    It is a structured "break". "Break" still is a kind of jump,
>>>>    you know?
>>>>    So, what's a function to return the first word beginning with
>>>>    an "e" in a given list, like for example
>>>> [ 'delta', 'epsilon', 'zeta', 'eta', 'theta' ]
>>>>
>>>>    ? Well it's
>>>> def first_word_beginning_with_e( list_ ):
>>>>      for word in list_:
>>>>          if word[ 0 ]== 'e': return word
>>>>
>>>>    . "return" still can be considered a kind of "goto" statement.
>>>>    It can lead to errors:
>>>>
>>>> def first_word_beginning_with_e( list_ ):
>>>>      for word in list_:
>>>>          if word[ 0 ]== 'e': return word
>>>>      something_to_be_done_at_the_end_of_this_function()
>>>>    The call sometimes will not be executed here!
>>>>    So, "return" is similar to "break" in that regard.
>>>>    But in Python we can write:
>>>> def first_word_beginning_with_e( list_ ):
>>>>      return next( ( word for word in list_ if word[ 0 ]== 'e' ), None )
>>>
>>> Doesn't look a smart advice.
>>>
>>>>    . No jumps anymore, yet the loop is aborted on the first hit
>>
>> It's worse than "not a smart advice". This code constructs an
>> unnecessary tuple, then picks out its first element and returns that.
>
> I don't think there's a tuple being created. If you mean:
>     ( word for word in list_ if word[ 0 ]== 'e' )
>
> ...that's not creating a tuple. It's a generator expression, which
> generates the next value each time it's called for. If you only ever
> ask for the first item, it only generates that one.

Yes, I was careless when I wrote that. Still, the tuple machinery has to
be created and that's not necessary here. My point was that you are
asking the Python machinery to do extra work for no benefit in
performance or readability.

> When I first came across them, I did find it a bit odd that generator
> expressions look like the tuple equivalent of list/dictionary
> comprehensions.
>
> FWIW, if you actually wanted a tuple from that expression, you'd need to
> pass the generator to tuple's constructor:
>     tuple(word for word in list_ if word[0] == 'e')
> (You don't need to include an extra set of brackets when passing a
> generator a the only argument to a function).
>

--
https://mail.python.org/mailman/listinfo/python-list

Re: A technique from a chatbot [ In reply to ]

python-list at python

Apr 5, 2024, 12:42 PM

Post #11 of 12 (101 views)

Permalink

avi.e.gross@gmail.com wrote:
> That is an excellent point, Mark. Some of the proposed variants to the requested problem, including mine, do indeed find all instances only to return the first. This can use additional time and space but when done, some of the overhead is also gone. What I mean is that a generator you create and invoke once, generally sits around indefinitely in your session unless it leaves your current range or something. It does only a part of the work and must remain suspended and ready to be called again to do more.

It goes out of scope at the end of the function. Unless you return it
or store a reference to it elsewhere, it will then be deleted.

Or in this case, since the `first_word_beginning_with_e` function
doesn't even have a local reference to the generator (it is just created
and immediately passed as an argument to `next`), it goes out of scope
once the `next` function returns.

> If you create a generator inside a function and the function returns, presumably it can be garbage-collected.

Exactly. It probably doesn't even need to wait for garbage collection -
once the reference count is zero, it can be destroyed.

> But if it is in the main body, I have to wonder what happen.

If you mean in the top-level module scope outside of any
function/method, then it would remain in memory until the process exits.

> There seem to be several related scenarios to consider.
>
> - You may want to find, in our example, a first instance. Right afterwards, you want the generator to disassemble anything in use.
> - You may want the generator to stick around and later be able to return the next instance. The generator can only really go away when another call has been made after the last available instance and it cannot look for more beyond some end.
> - Finally, you can call a generator with the goal of getting all instances such as by asking it to populate a list. In such a case, you may not necessarily want or need to use a generator expression and can use something straightforward and possible cheaper.

Yes, so you create and assign it at an appropriate scope. In the
example here, it's just passed to `next` and then destroyed. Passing a
generator to the `list` constructor (or the `tuple` constructor in my
"FWIW") would behave similarly - you'd get the final list/tuple back,
but the generator would be destroyed once that call is done. If you
assigned it to a function-local variable, it would exist until the end
of that function.

> What confuses the issue, for me, is that you can make fairly complex calculations in python using various forms of generators that implement a sort of just-in-time approach as generators call other generators which call yet others and so on.

Yes, you can. It can be quite useful when used appropriately.

> Imagine having folders full of files that each contain a data structure such as a dictionary or set and writing functionality that searches for the first match for a key in any of the dictionaries (or sets or whatever) along the way? Now imagine that dictionary items can be a key value pair that can include the value being a deeper dictionary, perhaps down multiple levels.
>
> You could get one generator that generates folder names or opens them and another that generates file names and reads in the data structure such as a dictionary and yet another that searches each dictionary and also any internally embedded dictionaries by calling another instance of the same generator as much as needed.

You probably could do that. Personally, I probably wouldn't use
generators for that, or at least not custom ones - if you're talking
about iterating over directories and files on disk, I'd probably just
use `os.walk` (which probably is a generator) and iterate over that,
opening each file and doing whatever you want with the contents.

> You can see how this creates and often consumes generators along the way as needed and in a sense does the minimum amount of work needed to find a first instance. But what might it leave open and taking up resources if not finished in a way that dismantles it?

You'd need to make sure any files are closed (`with open(...)` helps
with that). If you're opening files within a generator, I'm pretty sure
you can do something like:
```
def iter_files(directory):
for filename in directory:
with open(filename) as f:
yield f
```

Then the file will be closed when the iterator leaves the `with` block
and moved on to the next item (presumably there's some mechanism for the
context manager's `__exit__` to be called if the generator is destroyed
without having iterated over the items - the whole point of using `with`
is that `__exit__` is guaranteed to be called whatever happens).

Other than that, the generators themselves would be destroyed once they
go out of scope. If there are no references to a generator left,
nothing is going to be able to call `next` (nor anything else) on it, so
no need for it to be kept hanging around in memory.

> Perhaps worse, imagine doing the search in parallel and as sone as it is found anywhere, ...
>
>
>
> -----Original Message-----
> From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Mark Bourne via Python-list
> Sent: Thursday, April 4, 2024 3:04 PM
> To: python-list@python.org
> Subject: Re: A technique from a chatbot
>
> Thomas Passin wrote:
>> On 4/2/2024 1:47 PM, Piergiorgio Sartor via Python-list wrote:
>>> On 02/04/2024 19.18, Stefan Ram wrote:
>>>> Some people can't believe it when I say that chatbots improve
>>>> my programming productivity. So, here's a technique I learned
>>>> from a chatbot!
>>>> It is a structured "break". "Break" still is a kind of jump,
>>>> you know?
>>>> So, what's a function to return the first word beginning with
>>>> an "e" in a given list, like for example
>>>> [ 'delta', 'epsilon', 'zeta', 'eta', 'theta' ]
>>>>
>>>> ? Well it's
>>>> def first_word_beginning_with_e( list_ ):
>>>> for word in list_:
>>>> if word[ 0 ]== 'e': return word
>>>>
>>>> . "return" still can be considered a kind of "goto" statement.
>>>> It can lead to errors:
>>>>
>>>> def first_word_beginning_with_e( list_ ):
>>>> for word in list_:
>>>> if word[ 0 ]== 'e': return word
>>>> something_to_be_done_at_the_end_of_this_function()
>>>> The call sometimes will not be executed here!
>>>> So, "return" is similar to "break" in that regard.
>>>> But in Python we can write:
>>>> def first_word_beginning_with_e( list_ ):
>>>> return next( ( word for word in list_ if word[ 0 ]== 'e' ), None )
>>>
>>> Doesn't look a smart advice.
>>>
>>>> . No jumps anymore, yet the loop is aborted on the first hit
>>
>> It's worse than "not a smart advice". This code constructs an
>> unnecessary tuple, then picks out its first element and returns that.
>
> I don't think there's a tuple being created. If you mean:
> ( word for word in list_ if word[ 0 ]== 'e' )
>
> ...that's not creating a tuple. It's a generator expression, which
> generates the next value each time it's called for. If you only ever
> ask for the first item, it only generates that one.
>
> When I first came across them, I did find it a bit odd that generator
> expressions look like the tuple equivalent of list/dictionary
> comprehensions.
>
> FWIW, if you actually wanted a tuple from that expression, you'd need to
> pass the generator to tuple's constructor:
> tuple(word for word in list_ if word[0] == 'e')
> (You don't need to include an extra set of brackets when passing a
> generator a the only argument to a function).
>
--
https://mail.python.org/mailman/listinfo/python-list

Re: A technique from a chatbot [ In reply to ]

python-list at python

Apr 5, 2024, 12:59 PM

Post #12 of 12 (101 views)

Permalink

Stefan Ram wrote:
> Mark Bourne <nntp.mbourne@spamgourmet.com> wrote or quoted:
>> I don't think there's a tuple being created. If you mean:
>> ( word for word in list_ if word[ 0 ]== 'e' )
>> ...that's not creating a tuple. It's a generator expression, which
>> generates the next value each time it's called for. If you only ever
>> ask for the first item, it only generates that one.
>
> Yes, that's also how I understand it!
>
> In the meantime, I wrote code for a microbenchmark, shown below.
>
> This code, when executed on my computer, shows that the
> next+generator approach is a bit faster when compared with
> the procedural break approach. But when the order of the two
> approaches is being swapped in the loop, then it is shown to
> be a bit slower. So let's say, it takes about the same time.

There could be some caching going on, meaning whichever is done second
comes out a bit faster.

> However, I also tested code with an early return (not shown below),
> and this was shown to be faster than both code using break and
> code using next+generator by a factor of about 1.6, even though
> the code with return has the "function call overhead"!

To be honest, that's how I'd probably write it - not because of any
thought that it might be faster, but just that's it's clearer. And if
there's a `do_something_else()` that needs to be called regardless of
the whether a word was found, split it into two functions:
```
def first_word_beginning_with_e(target, wordlist):
for w in wordlist:
if w.startswith(target):
return w
return ''

def find_word_and_do_something_else(target, wordlist):
result = first_word_beginning_with_e(target, wordlist)
do_something_else()
return result
```

> But please be aware that such results depend on the implementation
> and version of the Python implementation being used for the benchmark
> and also of the details of how exactly the benchmark is written.
>
> import random
> import string
> import timeit
>
> print( 'The following loop may need a few seconds or minutes, '
> 'so please bear with me.' )
>
> time_using_break = 0
> time_using_next = 0
>
> for repetition in range( 100 ):
> for i in range( 100 ): # Yes, this nesting is redundant!
>
> list_ = \
> [. ''.join \
> ( random.choices \
> ( string.ascii_lowercase, k=random.randint( 1, 30 )))
> for i in range( random.randint( 0, 50 ))]
>
> start_time = timeit.default_timer()
> for word in list_:
> if word[ 0 ]== 'e':
> word_using_break = word
> break
> else:
> word_using_break = ''
> time_using_break += timeit.default_timer() - start_time
>
> start_time = timeit.default_timer()
> word_using_next = \
> next( ( word for word in list_ if word[ 0 ]== 'e' ), '' )
> time_using_next += timeit.default_timer() - start_time
>
> if word_using_next != word_using_break:
> raise Exception( 'word_using_next != word_using_break' )
>
> print( f'{time_using_break = }' )
> print( f'{time_using_next = }' )
> print( f'{time_using_next / time_using_break = }' )
>
--
https://mail.python.org/mailman/listinfo/python-list