Mailing List Archive

Re: [Python-checkins] CVS: distutils/distutils util.py,1.36,1.37
Would the "shlex" module be helpful here? It is in the standard library and
is (well?) maintained by ESR. It could help reduce the code inside
distutils.
[. I've always questioned the need for distutils' own "copy file" functions
and whatnot... seems there is a bit of duplication occurring... ]

Cheers,
-g

On Sat, Jun 24, 2000 at 01:40:05PM -0700, Greg Ward wrote:
> Update of /cvsroot/python/distutils/distutils
> In directory slayer.i.sourceforge.net:/tmp/cvs-serv28287
>
> Modified Files:
> util.py
> Log Message:
> Added 'split_quoted()' function to deal with strings that are quoted in
> Unix shell-like syntax (eg. in Python's Makefile, for one thing -- now that
> I have this function, I'll probably allow quoted strings in config files too.
>
> Index: util.py
> ===================================================================
> RCS file: /cvsroot/python/distutils/distutils/util.py,v
> retrieving revision 1.36
> retrieving revision 1.37
> diff -C2 -r1.36 -r1.37
> *** util.py 2000/06/18 15:45:55 1.36
> --- util.py 2000/06/24 20:40:02 1.37
> ***************
> *** 167,168 ****
> --- 167,235 ----
>
> return error
> +
> +
> + # Needed by 'split_quoted()'
> + _wordchars_re = re.compile(r'[^\\\'\"\ ]*')
> + _squote_re = re.compile(r"'(?:[^'\\]|\\.)*'")
> + _dquote_re = re.compile(r'"(?:[^"\\]|\\.)*"')
> +
> + def split_quoted (s):
> + """Split a string up according to Unix shell-like rules for quotes and
> + backslashes. In short: words are delimited by spaces, as long as those
> + spaces are not escaped by a backslash, or inside a quoted string.
> + Single and double quotes are equivalent, and the quote characters can
> + be backslash-escaped. The backslash is stripped from any two-character
> + escape sequence, leaving only the escaped character. The quote
> + characters are stripped from any quoted string. Returns a list of
> + words.
> + """
> +
> + # This is a nice algorithm for splitting up a single string, since it
> + # doesn't require character-by-character examination. It was a little
> + # bit of a brain-bender to get it working right, though...
> +
> + s = string.strip(s)
> + words = []
> + pos = 0
> +
> + while s:
> + m = _wordchars_re.match(s, pos)
> + end = m.end()
> + if end == len(s):
> + words.append(s[:end])
> + break
> +
> + if s[end] == ' ': # unescaped, unquoted space: now
> + words.append(s[:end]) # we definitely have a word delimiter
> + s = string.lstrip(s[end:])
> + pos = 0
> +
> + elif s[end] == '\\': # preserve whatever is being escaped;
> + # will become part of the current word
> + s = s[:end] + s[end+1:]
> + pos = end+1
> +
> + else:
> + if s[end] == "'": # slurp singly-quoted string
> + m = _squote_re.match(s, end)
> + elif s[end] == '"': # slurp doubly-quoted string
> + m = _dquote_re.match(s, end)
> + else:
> + raise RuntimeError, \
> + "this can't happen (bad char '%c')" % s[end]
> +
> + if m is None:
> + raise ValueError, \
> + "bad string (mismatched %s quotes?)" % s[end]
> +
> + (beg, end) = m.span()
> + s = s[:beg] + s[beg+1:end-1] + s[end:]
> + pos = m.end() - 2
> +
> + if pos >= len(s):
> + words.append(s)
> + break
> +
> + return words
> +
> + # split_quoted ()
>
>
> _______________________________________________
> Python-checkins mailing list
> Python-checkins@python.org
> http://www.python.org/mailman/listinfo/python-checkins

--
Greg Stein, http://www.lyra.org/
Re: Re: [Python-checkins] CVS: distutils/distutils util.py,1.36,1.37 [ In reply to ]
On 24 June 2000, Greg Stein said:
> Would the "shlex" module be helpful here? It is in the standard library and
> is (well?) maintained by ESR. It could help reduce the code inside
> distutils.

I looked at "shlex", but didn't like the fact that it 1) does
character-by-character analysis of input, and 2) requires a file-like
object. Just a performance concern, really.

> [. I've always questioned the need for distutils' own "copy file" functions
> and whatnot... seems there is a bit of duplication occurring... ]

Two reasons for that: bugs in the standard library versions, and missing
features in the standard library versions. I think the first argument
goes away now that I've given up on 1.5.1 compatibility (shutil.py was
really broken in 1.5.1), but the fact remains that the copy functions in
shutil.py don't have a dry_run option, don't have a verbose option,
don't have a preserve_times option, don't have a preserve_symlinks
option, etc. All of these things are somewhere between useful and
necessary.

I'm always open for ideas on reducing the amount of code in the
Distutils; it really is getting ridiculous. It cracked 10k lines of
code+comments+doc this weekend -- about 5300 lines of straight code, I
think. Anyways, the basic required functionality is now in place, so
I'm open to clever refactoring/reduction/simplification patches.

Greg