Mailing List Archive

matching any number of any character
To do it, I want to use the regex module, heretical though that may
be. I think my brain is missing a piece, because I can't seem to
accomplish such a task. I thought this would work:

>>> import regex
>>> rx = regex.compile('\(.\|\n\)+')
>>> rx.match('abc')
3
>> rx.group(1)
'c'

but I wanted 'abc'!

little help?

-Lyn
matching any number of any character [ In reply to ]
[Lyn A Headley]
> To do it, I want to use the regex module, heretical though that may
> be.

Not heretical so much as ornery and self-destructive <wink>.

> I think my brain is missing a piece, because I can't seem to
> accomplish such a task. I thought this would work:
>
> >>> import regex
> >>> rx = regex.compile('\(.\|\n\)+')
> >>> rx.match('abc')
> 3
> >> rx.group(1)
> 'c'
>
> but I wanted 'abc'!

Persist:

>>> rx.group(0)
'abc'
>>>

The thing inside your parens only matches one character; repeating it doesn't
change that; you get back the character (c) it matched last. Life is easier
with re:

>>> import re
>>> rx = re.compile("(.+)", re.DOTALL)
>>> m = rx.match("abc")
>>> m.group(1)
'abc'
>>>

this-is-your-brain-that-was-your-brain-on-regex-ly y'rs - tim
matching any number of any character [ In reply to ]
Lyn A Headley <laheadle@boguscs.uchicago.edu> wrote:

> accomplish such a task. I thought this would work:

it does...

>>>> import regex
>>>> rx = regex.compile('\(.\|\n\)+')
>>>> rx.match('abc')
> 3

you see, it just matched 3 characters, but...

>>> rx.group(1)
> 'c'

here you are tricked by the semantics of a quantified grouping operator.
A quantified group only remembers the last match, and as your group only
matches a single character, 'c' it is:)
(( shouldn't this be documented in the re module? it has stung more people
recently ))

> but I wanted 'abc'!

so you have to refer to the whole match (rx.group(0)) or get the
quantification inside a group. e.g.

>>> rx = regex.compile('\(\(.\|\n|)+\)')
>>> rx.group(1)
'abc'
>>> rx.group(2)
'c'

--
groetjes, carel
matching any number of any character [ In reply to ]
Carel Fellinger <cfelling@iae.nl> wrote:

>>>> rx = regex.compile('\(\(.\|\n|)+\)')

aiaia, typo, this ^^ should be \)
beter use re as Tim said, saves a lot of typos eh backslashes

--
groetjes, carel