Mailing List Archive

Newbie question about Re
Hi

I just want to split a string of n x (p characters) into a list of
p_length strings using the Re module
Example: Assume p=5 the string '12345ABCDE12345IJKLM' must be split in
['12345','ABCDE','12345','IJKLM'']
The way I find is to make a replace before splitting:

p=re.compile('(.{5,5})',re.DOTALL)
m=p.sub(r'\g<0>###','12345ABCDE12345IJKLM')
# I use ### as separator
re.split('###',m)
['12345', 'ABCDE', '12345', 'IJKLM', '']
# And I have to remove the trailing ''

But I don't like this too much.
Is there a better way?

Thanks
Newbie question about Re [ In reply to ]
MB wrote:

> I just want to split a string of n x (p characters) into a list of
> p_length strings using the Re module
> Example: Assume p=5 the string '12345ABCDE12345IJKLM' must be split in
> ['12345','ABCDE','12345','IJKLM'']
> The way I find is to make a replace before splitting:
>

I'm not sure I understand. why use re? Why not:

str = '12345ABCDE12345IJKLM'
l = []
i = 0

while i < len(str):
l.append(str[i:i+5])
i = i+5

This gives the list you wanted.

All the best,
Cezar.
Newbie question about Re [ In reply to ]
MB <ritm@gnet.tn> wrote:
> I just want to split a string of n x (p characters) into a list of
> p_length strings using the Re module.

umm. since you don't really care what's IN the string,
how about using string slicing instead? it's not only
more pythonish, it's also faster (at least in this case):

def mysplit(s, p):
out = []
for i in range(0, len(s), p):
out.append(s[i:i+p])
return out

or, if you prefer one-liners:

list = map(lambda i, s=s, p=p: s[i:i+p], range(0, len(s), p))

</F>

"Some people, when confronted with a problem,
think 'I know, I'll use regular expressions.' Now
they have two problems." -- jwz
Newbie question about Re [ In reply to ]
Well, here's my take on a "better way"... I don't see why you would want to
use the RE module when you aren't doing any pattern matching...

def splitlen( instring, desiredlength):
temp = []
current = 0
lenstr = len( instring)
while current < lenstr:
newcurrent = current+desiredlength
temp.append( instring[current:newcurrent] )
current = newcurrent
return temp

I haven't even tested this for correctness, but I think it's close to
correct.

Hope this helps,
Mike

-----Original Message-----
From: python-list-request@cwi.nl [mailto:python-list-request@cwi.nl]On
Behalf Of MB
Sent: August 24, 1999 9:05 AM
To: python-list@cwi.nl
Subject: Newbie question about Re


Hi

I just want to split a string of n x (p characters) into a list of
p_length strings using the Re module
Example: Assume p=5 the string '12345ABCDE12345IJKLM' must be split in
['12345','ABCDE','12345','IJKLM'']
The way I find is to make a replace before splitting:

p=re.compile('(.{5,5})',re.DOTALL)
m=p.sub(r'\g<0>###','12345ABCDE12345IJKLM')
# I use ### as separator
re.split('###',m)
['12345', 'ABCDE', '12345', 'IJKLM', '']
# And I have to remove the trailing ''

But I don't like this too much.
Is there a better way?

Thanks
Newbie question about Re [ In reply to ]
[MB]
> I just want to split a string of n x (p characters) into a list of
> p_length strings using the Re module
> Example: Assume p=5 the string '12345ABCDE12345IJKLM' must be split in
> ['12345','ABCDE','12345','IJKLM'']
> ...

You're much better off (somewhat clearer in the normal case, much clearer in
end cases, and much faster regardless) using string-slicing in a loop, as
others have suggested.

If you've been cursed by a powerful wizard such that you must use re for this
or be turned into a toad,

>>> import re
>>> def chunk(s, p):
return re.findall("." * p, s)

>>> chunk("12345ABCDE12345IJKLM", 5)
['12345', 'ABCDE', '12345', 'IJKLM']
>>> chunk("12345ABCDE12345IJKLM", 4)
['1234', '5ABC', 'DE12', '345I', 'JKLM']
>>> chunk("12345ABCDE12345IJKLM", 1)
[.'1', '2', '3', '4', '5', 'A', 'B', 'C', 'D', 'E',
'1', '2', '3', '4', '5', 'I', 'J', 'K', 'L', 'M']
>>> chunk("12345ABCDE12345IJKLM", 0)
[.'', '', '', '', '', '', '', '', '', '', '',
'', '', '', '', '', '', '', '', '', '']
>>> chunk("12345ABCDE12345IJKLM", 17)
['12345ABCDE12345IJ']
>>> chunk("12345ABCDE12345IJKLM", 10)
['12345ABCDE', '12345IJKLM']
>>> chunk("12345ABCDE12345IJKLM", 20)
['12345ABCDE12345IJKLM']
>>> chunk("12345ABCDE12345IJKLM", 21)
[]
>>>

being-a-toad-ain't-so-bad-ly y'rs - tim