Mailing List Archive: Newbie question about Re

Newbie question about Re

Aug 24, 1999, 6:05 AM

Post #1 of 5 (466 views)

Hi

I just want to split a string of n x (p characters) into a list of
p_length strings using the Re module
Example: Assume p=5 the string '12345ABCDE12345IJKLM' must be split in
['12345','ABCDE','12345','IJKLM'']
The way I find is to make a replace before splitting:

p=re.compile('(.{5,5})',re.DOTALL)
m=p.sub(r'\g<0>###','12345ABCDE12345IJKLM')
# I use ### as separator
re.split('###',m)
['12345', 'ABCDE', '12345', 'IJKLM', '']
# And I have to remove the trailing ''

But I don't like this too much.
Is there a better way?

Thanks

Newbie question about Re [ In reply to ]

ionescu at pik-potsdam

Aug 24, 1999, 7:09 AM

Post #2 of 5 (463 views)

Permalink

MB wrote:

> I just want to split a string of n x (p characters) into a list of
> p_length strings using the Re module
> Example: Assume p=5 the string '12345ABCDE12345IJKLM' must be split in
> ['12345','ABCDE','12345','IJKLM'']
> The way I find is to make a replace before splitting:
>

I'm not sure I understand. why use re? Why not:

str = '12345ABCDE12345IJKLM'
l = []
i = 0

while i < len(str):
l.append(str[i:i+5])
i = i+5

This gives the list you wanted.

All the best,
Cezar.

Newbie question about Re [ In reply to ]

fredrik at pythonware

Aug 24, 1999, 7:43 AM

Post #3 of 5 (461 views)

Permalink

MB <ritm@gnet.tn> wrote:
> I just want to split a string of n x (p characters) into a list of
> p_length strings using the Re module.

umm. since you don't really care what's IN the string,
how about using string slicing instead? it's not only
more pythonish, it's also faster (at least in this case):

def mysplit(s, p):
out = []
for i in range(0, len(s), p):
out.append(s[i:i+p])
return out

or, if you prefer one-liners:

list = map(lambda i, s=s, p=p: s[i:i+p], range(0, len(s), p))

</F>

"Some people, when confronted with a problem,
think 'I know, I'll use regular expressions.' Now
they have two problems." -- jwz

Newbie question about Re [ In reply to ]

mcfletch at vrtelecom

Aug 24, 1999, 7:58 AM

Post #4 of 5 (466 views)

Permalink

Well, here's my take on a "better way"... I don't see why you would want to
use the RE module when you aren't doing any pattern matching...

def splitlen( instring, desiredlength):
temp = []
current = 0
lenstr = len( instring)
while current < lenstr:
newcurrent = current+desiredlength
temp.append( instring[current:newcurrent] )
current = newcurrent
return temp

I haven't even tested this for correctness, but I think it's close to
correct.

Hope this helps,
Mike

-----Original Message-----
From: python-list-request@cwi.nl [mailto:python-list-request@cwi.nl]On
Behalf Of MB
Sent: August 24, 1999 9:05 AM
To: python-list@cwi.nl
Subject: Newbie question about Re

Hi

I just want to split a string of n x (p characters) into a list of
p_length strings using the Re module
Example: Assume p=5 the string '12345ABCDE12345IJKLM' must be split in
['12345','ABCDE','12345','IJKLM'']
The way I find is to make a replace before splitting:

p=re.compile('(.{5,5})',re.DOTALL)
m=p.sub(r'\g<0>###','12345ABCDE12345IJKLM')
# I use ### as separator
re.split('###',m)
['12345', 'ABCDE', '12345', 'IJKLM', '']
# And I have to remove the trailing ''

But I don't like this too much.
Is there a better way?

Thanks

Newbie question about Re [ In reply to ]

tim_one at email

Aug 24, 1999, 5:53 PM

Post #5 of 5 (460 views)

Permalink

[MB]
> I just want to split a string of n x (p characters) into a list of
> p_length strings using the Re module
> Example: Assume p=5 the string '12345ABCDE12345IJKLM' must be split in
> ['12345','ABCDE','12345','IJKLM'']
> ...

You're much better off (somewhat clearer in the normal case, much clearer in
end cases, and much faster regardless) using string-slicing in a loop, as
others have suggested.

If you've been cursed by a powerful wizard such that you must use re for this
or be turned into a toad,

>>> import re
>>> def chunk(s, p):
return re.findall("." * p, s)

>>> chunk("12345ABCDE12345IJKLM", 5)
['12345', 'ABCDE', '12345', 'IJKLM']
>>> chunk("12345ABCDE12345IJKLM", 4)
['1234', '5ABC', 'DE12', '345I', 'JKLM']
>>> chunk("12345ABCDE12345IJKLM", 1)
[.'1', '2', '3', '4', '5', 'A', 'B', 'C', 'D', 'E',
'1', '2', '3', '4', '5', 'I', 'J', 'K', 'L', 'M']
>>> chunk("12345ABCDE12345IJKLM", 0)
[.'', '', '', '', '', '', '', '', '', '', '',
'', '', '', '', '', '', '', '', '', '']
>>> chunk("12345ABCDE12345IJKLM", 17)
['12345ABCDE12345IJ']
>>> chunk("12345ABCDE12345IJKLM", 10)
['12345ABCDE', '12345IJKLM']
>>> chunk("12345ABCDE12345IJKLM", 20)
['12345ABCDE12345IJKLM']
>>> chunk("12345ABCDE12345IJKLM", 21)
[]
>>>

being-a-toad-ain't-so-bad-ly y'rs - tim