Mailing List Archive

Regular expression bug?
I'm trying to split a CamelCase string into its constituent components.
This kind of works:

>>> re.split('[a-z][A-Z]', 'fooBarBaz')
['fo', 'a', 'az']

but it consumes the boundary characters. To fix this I tried using
lookahead and lookbehind patterns instead, but it doesn't work:

>>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz')
['fooBarBaz']

However, it does seem to work with findall:

>>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz')
['', '']

So the regular expression seems to be doing the Right Thing. Is this a
bug in re.split, or am I missing something?

(BTW, I tried looking at the source code for the re module, but I could
not find the relevant code. re.split calls sre_compile.compile().split,
but the string 'split' does not appear in sre_compile.py. So where does
this method come from?)

I'm using Python2.5.

Thanks,
rg
--
http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug? [ In reply to ]
On Thu, 2009-02-19 at 10:55 -0800, Ron Garret wrote:
> I'm trying to split a CamelCase string into its constituent components.
> This kind of works:
>
> >>> re.split('[a-z][A-Z]', 'fooBarBaz')
> ['fo', 'a', 'az']
>
> but it consumes the boundary characters. To fix this I tried using
> lookahead and lookbehind patterns instead, but it doesn't work:

That's how re.split works, same as str.split...

> >>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz')
> ['fooBarBaz']
>
> However, it does seem to work with findall:
>
> >>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz')
> ['', '']


Wow!

To tell you the truth, I can't even read that... but one wonders why
don't you just do

def ccsplit(s):
cclist = []
current_word = ''
for char in s:
if char in string.uppercase:
if current_word:
cclist.append(current_word)
current_word = char
else:
current_word += char
if current_word:
ccl.append(current_word)
return cclist

>>> ccsplit('fooBarBaz')
--> ['foo', 'Bar', 'Baz']

This is arguably *much* more easy to read than the re example doesn't
require one to look ahead in the string.

-a


--
http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug? [ In reply to ]
On Thu, Feb 19, 2009 at 12:55 PM, Ron Garret <rNOSPAMon@flownet.com> wrote:
> I'm trying to split a CamelCase string into its constituent components.
> This kind of works:
>
>>>> re.split('[a-z][A-Z]', 'fooBarBaz')
> ['fo', 'a', 'az']
>
> but it consumes the boundary characters. To fix this I tried using
> lookahead and lookbehind patterns instead, but it doesn't work:
>
>>>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz')
> ['fooBarBaz']
>
> However, it does seem to work with findall:
>
>>>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz')
> ['', '']
>
> So the regular expression seems to be doing the Right Thing. Is this a
> bug in re.split, or am I missing something?

>From what I can tell, re.split can't split on zero-length boundaries.
It needs something to split on, like str.split. Is this a bug?
Possibly. The docs for re.split say:

Split the source string by the occurrences of the pattern,
returning a list containing the resulting substrings.

Note that it does not say that zero-length matches won't work.

I can work around the problem thusly:

re.sub(r'(?<=[a-z])(?=[A-Z])', '_', 'fooBarBaz').split('_')

Which is ugly. I reckon you can use re.findall with a pattern that
matches the components and not the boundaries, but you have to take
care of the beginning and end as special cases.

Kurt
--
http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug? [ In reply to ]
i wonder what fraction of people posting with "bug?" in their titles here
actually find bugs?

anyway, how about:

re.findall('[A-Z]?[a-z]*', 'fooBarBaz')

or

re.findall('([A-Z][a-z]*|[a-z]+)', 'fooBarBaz')

(you have to specify what you're matching and lookahead/back doesn't do
that).

andrew


Ron Garret wrote:
> I'm trying to split a CamelCase string into its constituent components.
> This kind of works:
>
>>>> re.split('[a-z][A-Z]', 'fooBarBaz')
> ['fo', 'a', 'az']
>
> but it consumes the boundary characters. To fix this I tried using
> lookahead and lookbehind patterns instead, but it doesn't work:
>
>>>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz')
> ['fooBarBaz']
>
> However, it does seem to work with findall:
>
>>>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz')
> ['', '']
>
> So the regular expression seems to be doing the Right Thing. Is this a
> bug in re.split, or am I missing something?
>
> (BTW, I tried looking at the source code for the re module, but I could
> not find the relevant code. re.split calls sre_compile.compile().split,
> but the string 'split' does not appear in sre_compile.py. So where does
> this method come from?)
>
> I'm using Python2.5.
>
> Thanks,
> rg
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>


--
http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug? [ In reply to ]
Ron Garret wrote:

> I'm trying to split a CamelCase string into its constituent components.

How about

>>> re.compile("[A-Za-z][a-z]*").findall("fooBarBaz")
['foo', 'Bar', 'Baz']

> This kind of works:
>
>>>> re.split('[a-z][A-Z]', 'fooBarBaz')
> ['fo', 'a', 'az']
>
> but it consumes the boundary characters. To fix this I tried using
> lookahead and lookbehind patterns instead, but it doesn't work:
>
>>>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz')
> ['fooBarBaz']
>
> However, it does seem to work with findall:
>
>>>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz')
> ['', '']
>
> So the regular expression seems to be doing the Right Thing. Is this a
> bug in re.split, or am I missing something?

IRC the split pattern must consume at least one character, but I can't find
the reference.

> (BTW, I tried looking at the source code for the re module, but I could
> not find the relevant code. re.split calls sre_compile.compile().split,
> but the string 'split' does not appear in sre_compile.py. So where does
> this method come from?)

It's coded in C. The source is Modules/sremodule.c.

Peter
--
http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug? [ In reply to ]
Ron Garret wrote:
> I'm trying to split a CamelCase string into its constituent components.
> This kind of works:
>
>>>> re.split('[a-z][A-Z]', 'fooBarBaz')
> ['fo', 'a', 'az']
>
> but it consumes the boundary characters. To fix this I tried using
> lookahead and lookbehind patterns instead, but it doesn't work:
>
>>>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz')
> ['fooBarBaz']
>
> However, it does seem to work with findall:
>
>>>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz')
> ['', '']
>
> So the regular expression seems to be doing the Right Thing. Is this a
> bug in re.split, or am I missing something?
>
> (BTW, I tried looking at the source code for the re module, but I could
> not find the relevant code. re.split calls sre_compile.compile().split,
> but the string 'split' does not appear in sre_compile.py. So where does
> this method come from?)
>
> I'm using Python2.5.
>
I, amongst others, think it's a bug (or 'misfeature'); Guido thinks it
might be intentional, but changing it could break some existing code.
You could do this instead:

>>> re.sub('(?<=[a-z])(?=[A-Z])', '@', 'fooBarBaz').split('@')
['foo', 'Bar', 'Baz']
--
http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug? [ In reply to ]
In article <mailman.281.1235073821.11746.python-list@python.org>,
MRAB <google@mrabarnett.plus.com> wrote:

> Ron Garret wrote:
> > I'm trying to split a CamelCase string into its constituent components.
> > This kind of works:
> >
> >>>> re.split('[a-z][A-Z]', 'fooBarBaz')
> > ['fo', 'a', 'az']
> >
> > but it consumes the boundary characters. To fix this I tried using
> > lookahead and lookbehind patterns instead, but it doesn't work:
> >
> >>>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz')
> > ['fooBarBaz']
> >
> > However, it does seem to work with findall:
> >
> >>>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz')
> > ['', '']
> >
> > So the regular expression seems to be doing the Right Thing. Is this a
> > bug in re.split, or am I missing something?
> >
> > (BTW, I tried looking at the source code for the re module, but I could
> > not find the relevant code. re.split calls sre_compile.compile().split,
> > but the string 'split' does not appear in sre_compile.py. So where does
> > this method come from?)
> >
> > I'm using Python2.5.
> >
> I, amongst others, think it's a bug (or 'misfeature'); Guido thinks it
> might be intentional, but changing it could break some existing code.

That seems unlikely. It would only break where people had code invoking
re.split on empty matches, which at the moment is essentially a no-op.
It's hard to imagine there's a lot of code like that around. What would
be the point?

> You could do this instead:
>
> >>> re.sub('(?<=[a-z])(?=[A-Z])', '@', 'fooBarBaz').split('@')
> ['foo', 'Bar', 'Baz']

Blech! ;-) But thanks for the suggestion.

rg
--
http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug? [ In reply to ]
In article <gnkdal$bcq$01$1@news.t-online.com>,
Peter Otten <__peter__@web.de> wrote:

> Ron Garret wrote:
>
> > I'm trying to split a CamelCase string into its constituent components.
>
> How about
>
> >>> re.compile("[A-Za-z][a-z]*").findall("fooBarBaz")
> ['foo', 'Bar', 'Baz']

That's very clever. Thanks!

> > (BTW, I tried looking at the source code for the re module, but I could
> > not find the relevant code. re.split calls sre_compile.compile().split,
> > but the string 'split' does not appear in sre_compile.py. So where does
> > this method come from?)
>
> It's coded in C. The source is Modules/sremodule.c.

Ah. Thanks!

rg
--
http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug? [ In reply to ]
In article <mailman.277.1235073073.11746.python-list@python.org>,
"andrew cooke" <andrew@acooke.org> wrote:

> i wonder what fraction of people posting with "bug?" in their titles here
> actually find bugs?

IMHO it ought to be an invariant that len(r.split(s)) should always be
one more than len(r.findall(s)).

> anyway, how about:
>
> re.findall('[A-Z]?[a-z]*', 'fooBarBaz')
>
> or
>
> re.findall('([A-Z][a-z]*|[a-z]+)', 'fooBarBaz')

That will do it. Thanks!

rg
--
http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug? [ In reply to ]
In article <mailman.273.1235071607.11746.python-list@python.org>,
Albert Hopkins <marduk@letterboxes.org> wrote:

> On Thu, 2009-02-19 at 10:55 -0800, Ron Garret wrote:
> > I'm trying to split a CamelCase string into its constituent components.
> > This kind of works:
> >
> > >>> re.split('[a-z][A-Z]', 'fooBarBaz')
> > ['fo', 'a', 'az']
> >
> > but it consumes the boundary characters. To fix this I tried using
> > lookahead and lookbehind patterns instead, but it doesn't work:
>
> That's how re.split works, same as str.split...

I think one could make the argument that 'foo'.split('') ought to return
['f','o','o']

>
> > >>> re.split('((?<=[a-z])(?=[A-Z]))', 'fooBarBaz')
> > ['fooBarBaz']
> >
> > However, it does seem to work with findall:
> >
> > >>> re.findall('(?<=[a-z])(?=[A-Z])', 'fooBarBaz')
> > ['', '']
>
>
> Wow!
>
> To tell you the truth, I can't even read that...

It's a regexp. Of course you can't read it. ;-)

rg
--
http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug? [ In reply to ]
andrew cooke wrote:

>
> i wonder what fraction of people posting with "bug?" in their titles here
> actually find bugs?

About 99.99%.

Unfortunately, 99.98% have found bugs in their code, not in Python.


--
Steven

--
http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug? [ In reply to ]
On Thu, 19 Feb 2009 13:03:59 -0800, Ron Garret wrote:

> In article <gnkdal$bcq$01$1@news.t-online.com>,
> Peter Otten <__peter__@web.de> wrote:
>
>> Ron Garret wrote:
>>
>> > I'm trying to split a CamelCase string into its constituent
>> > components.
>>
>> How about
>>
>> >>> re.compile("[A-Za-z][a-z]*").findall("fooBarBaz")
>> ['foo', 'Bar', 'Baz']
>
> That's very clever. Thanks!
>
>> > (BTW, I tried looking at the source code for the re module, but I
>> > could not find the relevant code. re.split calls
>> > sre_compile.compile().split, but the string 'split' does not appear
>> > in sre_compile.py. So where does this method come from?)
>>
>> It's coded in C. The source is Modules/sremodule.c.
>
> Ah. Thanks!
>
> rg

This re.split() doesn't consume character:

>>> re.split('([A-Z][a-z]*)', 'fooBarBaz')
['foo', 'Bar', '', 'Baz', '']

it does what the OP wants, albeit with extra blank strings.

--
http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression bug? [ In reply to ]
More elegant way

>>> [x for x in re.split('([A-Z]+[a-z]+)', a) if x ]
['foo', 'Bar', 'Baz']

R.

On Feb 20, 2:03 pm, Lie Ryan <lie.1...@gmail.com> wrote:
> On Thu, 19 Feb 2009 13:03:59 -0800, Ron Garret wrote:
> > In article <gnkdal$bcq$0...@news.t-online.com>,
> >  Peter Otten <__pete...@web.de> wrote:
>
> >> Ron Garret wrote:
>
> >> > I'm trying to split a CamelCase string into its constituent
> >> > components.
>
> >> How about
>
> >> >>> re.compile("[A-Za-z][a-z]*").findall("fooBarBaz")
> >> ['foo', 'Bar', 'Baz']
>
> > That's very clever.  Thanks!
>
> >> > (BTW, I tried looking at the source code for the re module, but I
> >> > could not find the relevant code.  re.split calls
> >> > sre_compile.compile().split, but the string 'split' does not appear
> >> > in sre_compile.py.  So where does this method come from?)
>
> >> It's coded in C. The source is Modules/sremodule.c.
>
> > Ah.  Thanks!
>
> > rg
>
> This re.split() doesn't consume character:
>
> >>> re.split('([A-Z][a-z]*)', 'fooBarBaz')
>
> ['foo', 'Bar', '', 'Baz', '']
>
> it does what the OP wants, albeit with extra blank strings.

--
http://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug? [ In reply to ]
On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <jicman@gmail.com> wrote:
>
> Greetings.
>
> For the RegExp Gurus, consider the following python3 code:
> <code>
> import re
> s = "pn=align upgrade sd=2023-02-"
> ro = re.compile(r"pn=(.+) ")
> r0=ro.match(s)
> >>> print(r0.group(1))
> align upgrade
> </code>
>
> This is wrong. It should be 'align' because the group only goes up-to
> the space. Thoughts? Thanks.
>

Not a bug. Find the longest possible match that fits this; as long as
you can find a space immediately after it, everything in between goes
into the .+ part.

If you want to exclude spaces, either use [^ ]+ or .+?.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug? [ In reply to ]
On 2023-03-02 at 14:22:41 -0500,
jose isaias cabrera <jicman@gmail.com> wrote:

> For the RegExp Gurus, consider the following python3 code:
> <code>
> import re
> s = "pn=align upgrade sd=2023-02-"
> ro = re.compile(r"pn=(.+) ")
> r0=ro.match(s)
> >>> print(r0.group(1))
> align upgrade
> </code>
>
> This is wrong. It should be 'align' because the group only goes up-to
> the space. Thoughts? Thanks.

The bug is in your regular expression; the plus modifier is greedy.

If you want to match up to the first space, then you'll need something
like [^ ] (i.e., everything that isn't a space) instead of that dot.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug? [ In reply to ]
On 3/2/23 12:28, Chris Angelico wrote:
> On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <jicman@gmail.com> wrote:
>>
>> Greetings.
>>
>> For the RegExp Gurus, consider the following python3 code:
>> <code>
>> import re
>> s = "pn=align upgrade sd=2023-02-"
>> ro = re.compile(r"pn=(.+) ")
>> r0=ro.match(s)
>>>>> print(r0.group(1))
>> align upgrade
>> </code>
>>
>> This is wrong. It should be 'align' because the group only goes up-to
>> the space. Thoughts? Thanks.
>>
>
> Not a bug. Find the longest possible match that fits this; as long as
> you can find a space immediately after it, everything in between goes
> into the .+ part.
>
> If you want to exclude spaces, either use [^ ]+ or .+?.


https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

--
https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug? [ In reply to ]
On Thu, Mar 2, 2023 at 2:32?PM <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
>
> On 2023-03-02 at 14:22:41 -0500,
> jose isaias cabrera <jicman@gmail.com> wrote:
>
> > For the RegExp Gurus, consider the following python3 code:
> > <code>
> > import re
> > s = "pn=align upgrade sd=2023-02-"
> > ro = re.compile(r"pn=(.+) ")
> > r0=ro.match(s)
> > >>> print(r0.group(1))
> > align upgrade
> > </code>
> >
> > This is wrong. It should be 'align' because the group only goes up-to
> > the space. Thoughts? Thanks.
>
> The bug is in your regular expression; the plus modifier is greedy.
>
> If you want to match up to the first space, then you'll need something
> like [^ ] (i.e., everything that isn't a space) instead of that dot.

Thanks. I appreciate your wisdom.

josé
--

What if eternity is real? Where will you spend it? Hmmmm...
--
https://mail.python.org/mailman/listinfo/python-list
RE: Regular Expression bug? [ In reply to ]
José,

Matching can be greedy. Did it match to the last space?

What you want is a pattern that matches anything except a space (or whitespace) followed b matching a space or something similar.

Or use a construct that makes matching non-greedy.

Avi

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of jose isaias cabrera
Sent: Thursday, March 2, 2023 2:23 PM
To: python-list@python.org
Subject: Regular Expression bug?

Greetings.

For the RegExp Gurus, consider the following python3 code:
<code>
import re
s = "pn=align upgrade sd=2023-02-"
ro = re.compile(r"pn=(.+) ")
r0=ro.match(s)
>>> print(r0.group(1))
align upgrade
</code>

This is wrong. It should be 'align' because the group only goes up-to the space. Thoughts? Thanks.

josé

--

What if eternity is real? Where will you spend it? Hmmmm...
--
https://mail.python.org/mailman/listinfo/python-list

--
https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug? [ In reply to ]
On Thu, Mar 2, 2023 at 2:38?PM Mats Wichmann <mats@wichmann.us> wrote:
>
> On 3/2/23 12:28, Chris Angelico wrote:
> > On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <jicman@gmail.com>
wrote:
> >>
> >> Greetings.
> >>
> >> For the RegExp Gurus, consider the following python3 code:
> >> <code>
> >> import re
> >> s = "pn=align upgrade sd=2023-02-"
> >> ro = re.compile(r"pn=(.+) ")
> >> r0=ro.match(s)
> >>>>> print(r0.group(1))
> >> align upgrade
> >> </code>
> >>
> >> This is wrong. It should be 'align' because the group only goes up-to
> >> the space. Thoughts? Thanks.
> >>
> >
> > Not a bug. Find the longest possible match that fits this; as long as
> > you can find a space immediately after it, everything in between goes
> > into the .+ part.
> >
> > If you want to exclude spaces, either use [^ ]+ or .+?.
>
> https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

This re is a bit different than the one I am used. So, I am trying to match
everything after 'pn=':

import re
s = "pm=jose pn=2017"
m0 = r"pn=(.+)"
r0 = re.compile(m0)
s0 = r0.match(s)
>>> print(s0)
None

Any help is appreciated.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug? [ In reply to ]
On 02Mar2023 20:06, jose isaias cabrera <jicman@gmail.com> wrote:
>This re is a bit different than the one I am used. So, I am trying to
>match
>everything after 'pn=':
>
>import re
>s = "pm=jose pn=2017"
>m0 = r"pn=(.+)"
>r0 = re.compile(m0)
>s0 = r0.match(s)

`match()` matches at the start of the string. You want r0.search(s).
- Cameron Simpson <cs@cskk.id.au>
--
https://mail.python.org/mailman/listinfo/python-list
RE: Regular Expression bug? [ In reply to ]
It is a well-known fact, Jose, that GIGO.

The letters "n" and "m" are not interchangeable. Your pattern fails because you have "pn" in one place and "pm" in the other.


>>> s = "pn=jose pn=2017"
...
>>> s0 = r0.match(s)
>>> s0
<re.Match object; span=(0, 15), match='pn=jose pn=2017'>



-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of jose isaias cabrera
Sent: Thursday, March 2, 2023 8:07 PM
To: Mats Wichmann <mats@wichmann.us>
Cc: python-list@python.org
Subject: Re: Regular Expression bug?

On Thu, Mar 2, 2023 at 2:38?PM Mats Wichmann <mats@wichmann.us> wrote:
>
> On 3/2/23 12:28, Chris Angelico wrote:
> > On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <jicman@gmail.com>
wrote:
> >>
> >> Greetings.
> >>
> >> For the RegExp Gurus, consider the following python3 code:
> >> <code>
> >> import re
> >> s = "pn=align upgrade sd=2023-02-"
> >> ro = re.compile(r"pn=(.+) ")
> >> r0=ro.match(s)
> >>>>> print(r0.group(1))
> >> align upgrade
> >> </code>
> >>
> >> This is wrong. It should be 'align' because the group only goes up-to
> >> the space. Thoughts? Thanks.
> >>
> >
> > Not a bug. Find the longest possible match that fits this; as long as
> > you can find a space immediately after it, everything in between goes
> > into the .+ part.
> >
> > If you want to exclude spaces, either use [^ ]+ or .+?.
>
> https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

This re is a bit different than the one I am used. So, I am trying to match
everything after 'pn=':

import re
s = "pm=jose pn=2017"
m0 = r"pn=(.+)"
r0 = re.compile(m0)
s0 = r0.match(s)
>>> print(s0)
None

Any help is appreciated.
--
https://mail.python.org/mailman/listinfo/python-list

--
https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug? [ In reply to ]
jose isaias cabrera <jicman@gmail.com> writes:

On Thu, Mar 2, 2023 at 2:38?PM Mats Wichmann <mats@wichmann.us> wrote:

This re is a bit different than the one I am used. So, I am trying to match
everything after 'pn=':

import re
s = "pm=jose pn=2017"
m0 = r"pn=(.+)"
r0 = re.compile(m0)
s0 = r0.match(s)
>>> print(s0)
None

Assuming that you were expecting to match "pn=2017", then you probably
don't want the 'match' method. Read its documentation. Then read the
documentation for the _other_ methods that a Pattern supports. Then you
will be enlightened.

- Alan
--
https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug? [ In reply to ]
On Thu, Mar 2, 2023 at 8:30 PM Cameron Simpson <cs@cskk.id.au> wrote:
>
> On 02Mar2023 20:06, jose isaias cabrera <jicman@gmail.com> wrote:
> >This re is a bit different than the one I am used. So, I am trying to
> >match
> >everything after 'pn=':
> >
> >import re
> >s = "pm=jose pn=2017"
> >m0 = r"pn=(.+)"
> >r0 = re.compile(m0)
> >s0 = r0.match(s)
>
> `match()` matches at the start of the string. You want r0.search(s).
> - Cameron Simpson <cs@cskk.id.au>

Thanks. Darn it! I knew it was something simple.


--

What if eternity is real? Where will you spend it? Hmmmm...
--
https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug? [ In reply to ]
On Thu, Mar 2, 2023 at 8:35 PM <avi.e.gross@gmail.com> wrote:
>
> It is a well-known fact, Jose, that GIGO.
>
> The letters "n" and "m" are not interchangeable. Your pattern fails because you have "pn" in one place and "pm" in the other.

It is not GIGO. pm=project manager. pn=project name. I needed search()
rather than match().

>
> >>> s = "pn=jose pn=2017"
> ...
> >>> s0 = r0.match(s)
> >>> s0
> <re.Match object; span=(0, 15), match='pn=jose pn=2017'>
>
>
>
> -----Original Message-----
> From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of jose isaias cabrera
> Sent: Thursday, March 2, 2023 8:07 PM
> To: Mats Wichmann <mats@wichmann.us>
> Cc: python-list@python.org
> Subject: Re: Regular Expression bug?
>
> On Thu, Mar 2, 2023 at 2:38?PM Mats Wichmann <mats@wichmann.us> wrote:
> >
> > On 3/2/23 12:28, Chris Angelico wrote:
> > > On Fri, 3 Mar 2023 at 06:24, jose isaias cabrera <jicman@gmail.com>
> wrote:
> > >>
> > >> Greetings.
> > >>
> > >> For the RegExp Gurus, consider the following python3 code:
> > >> <code>
> > >> import re
> > >> s = "pn=align upgrade sd=2023-02-"
> > >> ro = re.compile(r"pn=(.+) ")
> > >> r0=ro.match(s)
> > >>>>> print(r0.group(1))
> > >> align upgrade
> > >> </code>
> > >>
> > >> This is wrong. It should be 'align' because the group only goes up-to
> > >> the space. Thoughts? Thanks.
> > >>
> > >
> > > Not a bug. Find the longest possible match that fits this; as long as
> > > you can find a space immediately after it, everything in between goes
> > > into the .+ part.
> > >
> > > If you want to exclude spaces, either use [^ ]+ or .+?.
> >
> > https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy
>
> This re is a bit different than the one I am used. So, I am trying to match
> everything after 'pn=':
>
> import re
> s = "pm=jose pn=2017"
> m0 = r"pn=(.+)"
> r0 = re.compile(m0)
> s0 = r0.match(s)
> >>> print(s0)
> None
>
> Any help is appreciated.
> --
> https://mail.python.org/mailman/listinfo/python-list
>


--

What if eternity is real? Where will you spend it? Hmmmm...
--
https://mail.python.org/mailman/listinfo/python-list
Re: Regular Expression bug? [ In reply to ]
On Thu, Mar 2, 2023 at 9:56 PM Alan Bawden <alan@csail.mit.edu> wrote:
>
> jose isaias cabrera <jicman@gmail.com> writes:
>
> On Thu, Mar 2, 2023 at 2:38?PM Mats Wichmann <mats@wichmann.us> wrote:
>
> This re is a bit different than the one I am used. So, I am trying to match
> everything after 'pn=':
>
> import re
> s = "pm=jose pn=2017"
> m0 = r"pn=(.+)"
> r0 = re.compile(m0)
> s0 = r0.match(s)
> >>> print(s0)
> None
>
> Assuming that you were expecting to match "pn=2017", then you probably
> don't want the 'match' method. Read its documentation. Then read the
> documentation for the _other_ methods that a Pattern supports. Then you
> will be enlightened.

Yes. I need search. Thanks.

--

What if eternity is real? Where will you spend it? Hmmmm...
--
https://mail.python.org/mailman/listinfo/python-list