Mailing List Archive

String methods... finally
I've finally checked my string methods changes into the source tree,
albeit on a CVS branch (see below). These changes are outgrowths of
discussions we've had on the string-sig, with I think Greg Stein
giving lots of very useful early feedback. I'll call these changes
controversial (hence the branch) because Guido hasn't had much
opportunity to play with them. Now that he -- and you -- can check
them out, I'm sure I'll get lots more feedback!

First, to check them out you need to switch to the string_methods CVS
branch. On Un*x:

cvs update -r string_methods

You might want to do this in a separate tree because this will sticky
tag your tree to this branch. If so, try

cvs checkout -r string_methods python

Here's a brief summary of the changes (as best I can restore the state
-- its been a while since I actually made all these changes ;)

Strings now have as methods most of the functions that were previously
only in the string module. If you've played with JPython, you've
already had this feature for a while. So you can do:

Python 1.5.2+ (#1, Jun 10 1999, 18:22:14) [GCC 2.8.1] on sunos5
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> s = 'Hello There Devheads'
>>> s.lower()
'hello there devheads'
>>> s.upper()
'HELLO THERE DEVHEADS'
>>> s.split()
['Hello', 'There', 'Devheads']
>>> 'hello'.upper()
'HELLO'

that sort of thing. Some of the string module functions don't make
sense as string methods, like join, and others never had a C
implementation so weren't added, like center.

Two new methods startswith and endswith act like their Java cousins.

The string module has been rewritten to be completely (I hope)
backwards compatible. No code should break, though they could be
slower. Guido and I decided that was acceptable.

What else? Some cleaning up of the internals based on Greg's
suggestions. A couple of new C API additions. Builtin int(), long(),
and float() have grown a few new features. I believe they are
essentially interchangable with string.atoi(), string.atol(), and
string.float() now.

After you guys get to toast me (in either sense of the word) for a
while and these changes settle down, I'll make a wider announcement.

Enjoy,
-Barry
Re: String methods... finally [ In reply to ]
On Thu, 10 Jun 1999, Barry A. Warsaw wrote:

> I've finally checked my string methods changes into the source tree,

Great!

> ... others never had a C implementation so weren't added, like center.

I assume that's not a design decision but a "haven't gotten around to it
yet" statement, right?

> Two new methods startswith and endswith act like their Java cousins.

aaaah... <sigh of relief>.

--david
RE: String methods... finally [ In reply to ]
> I've finally checked my string methods changes into the source tree,
> albeit on a CVS branch (see below). These changes are outgrowths of

Yay!

Would this also be a good opportunity to dust-off the Unicode
implementation the string-sig recently came up with (as implemented by
Fredrik) and get this in as a type?

Although we still have the unresolved issue of how to use PyArg_ParseTuple
etc to convert to/from Unicode and 8bit, it would still be nice to have
Unicode and String objects capable of being used interchangably at the
Python level.

Of course, the big problem with attempting to test out these sorts of
changes is that you must do so in code that will never see the public for a
good 12 months. I suppose a 1.5.25 is out of the question ;-)

Mark.
Re: String methods... finally [ In reply to ]
> Would this also be a good opportunity to dust-off the Unicode
> implementation the string-sig recently came up with (as implemented by
> Fredrik) and get this in as a type?
>
> Although we still have the unresolved issue of how to use PyArg_ParseTuple
> etc to convert to/from Unicode and 8bit, it would still be nice to have
> Unicode and String objects capable of being used interchangably at the
> Python level.

Yes, yes, yes! Even if it's not supported everywhere, at least having
the Unicode type in the source tree would definitely help!

> Of course, the big problem with attempting to test out these sorts of
> changes is that you must do so in code that will never see the public for a
> good 12 months. I suppose a 1.5.25 is out of the question ;-)

We'll see about that...

(I sometimes wished I wasn't in the business of making releases. I've
asked for help with making essential patches to 1.5.2 available but
nobody volunteered... :-( )

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: String methods... finally [ In reply to ]
>>>>> "DA" == David Ascher <da@ski.org> writes:

>> ... others never had a C implementation so weren't added, like
>> center.

DA> I assume that's not a design decision but a "haven't gotten
DA> around to it yet" statement, right?

I think we decided that they weren't used enough to implement in
C.

>> Two new methods startswith and endswith act like their Java
>> cousins.

DA> aaaah... <sigh of relief>.

Tell me about it!
-Barry
RE: String methods... finally [ In reply to ]
> Two new methods startswith and endswith act like their Java cousins.

Barry, suggest that both of these grow optional start and end slice indices.
Why? It's Pythonic <wink>. Really, I'm forever marching over huge strings
a slice-pair at a time, and it's important that searches and matches never
give me false hits due to slobbering over the current slice bounds. regexp
objects in general, and string.find/.rfind in particular, support this
beautifully. Java feels less need since sub-stringing is via cheap
descriptor there. The optional indices wouldn't hurt Java, but would help
Python.

then-again-if-strings-were-so-great-i'd-switch-to-tcl<wink>-ly y'rs - tim
Re: String methods... finally [ In reply to ]
Barry> Some of the string module functions don't make sense as string
Barry> methods, like join, and others never had a C implementation so
Barry> weren't added, like center.

I take it string.capwords falls into that category. It's one of those
things that's so easy to write in Python and there's no real speed gain in
going to C, that it didn't make much sense to add it to the strop module,
right?

I see the following functions in string.py that could reasonably be
methodized:

ljust, rjust, center, expandtabs, capwords

That's not very many, and it would appear that this stuff won't see
widespread use for quite some time. I think for completeness sake we should
bite the bullet on them.

BTW, I built it and think it is very cool. Tipping my virtual hat to Barry,
I am...

Skip Montanaro | Mojam: "Uniting the World of Music" http://www.mojam.com/
skip@mojam.com | Musi-Cal: http://www.musi-cal.com/
518-372-5583
Re: String methods... finally [ In reply to ]
Skip> I see the following functions in string.py that could reasonably be
Skip> methodized:

Skip> ljust, rjust, center, expandtabs, capwords

It occurred to me just a few minutes after sending my previous message that
it might make sense to make string.join a method for lists and tuples.
They'd obviously have to make the same type checks that string.join does.

That would leave the string/strip modules implementing just a couple
functions.

Skip
Re: String methods... finally [ In reply to ]
On Fri, 11 Jun 1999, Skip Montanaro wrote:

> It occurred to me just a few minutes after sending my previous message that
> it might make sense to make string.join a method for lists and tuples.
> They'd obviously have to make the same type checks that string.join does.

as in:

>>> ['spam!', 'eggs!'].join()
'spam! eggs!'

?

I like the notion, but I think it would naturally migrate towards
genericity, at which point it might be called "reduce", so that:

>>> ['spam!', 'eggs!'].reduce()
'spam!eggs!'
>>> ['spam!', 'eggs!'].reduce(' ')
'spam! eggs!'
>>> [1,2,3].reduce()
6 # 1 + 2 + 3
>>> [1,2,3].reduce(10)
26 # 1 + 10 + 2 + 10 + 3

note that string.join(foo) == foo.reduce(' ')
and string.join(foo, '') == foo.reduce()

--david
Re: String methods... finally [ In reply to ]
> On Fri, 11 Jun 1999, Skip Montanaro wrote:
>
> > It occurred to me just a few minutes after sending my previous message that
> > it might make sense to make string.join a method for lists and tuples.
> > They'd obviously have to make the same type checks that string.join does.
>
> as in:
>
> >>> ['spam!', 'eggs!'].join()
> 'spam! eggs!'

Note that this is not as powerful as string.join(); the latter works
on any sequence, not just on lists and tuples. (Though that may not
be a big deal.)

I also find it slightly objectionable that this is a general list
method but only works if the list contains only strings; Dave Ascher's
generalization to reduce() is cute but strikes me are more general
than useful, and the name will forever present a mystery to most
newcomers.

Perhaps join() ought to be a built-in function?

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: String methods... finally [ In reply to ]
On Fri, 11 Jun 1999, Guido van Rossum wrote:

> Perhaps join() ought to be a built-in function?

Would it do the moral equivalent of a reduce(operator.add, ...) or of a
string.join?

I think it should do the former (otherwise something about 'string' should
be in the name), and as a consequence I think it shouldn't have the
default whitespace spacer.

cute-but-general'ly y'rs, david
Re: String methods... finally [ In reply to ]
Barry wrote:
> Some of the string module functions don't make sense as
> string methods, like join, and others never had a C
> implementation so weren't added, like center.

fwiw, the Unicode module available from pythonware.com
implements them all, and more importantly, it can be com-
piled for either 8-bit or 16-bit characters...

join is a special problem; IIRC, Guido came up with what
I at that time thought was an excellent solution, but I
don't recall what it was right now ;-)

anyway, maybe we should start by figuring out what methods
we really want in there, and then figure out whether we
should have one or two independent string implementations
in the core...

</F>
Re: String methods... finally [ In reply to ]
Guido wrote:
> Note that this is not as powerful as string.join(); the latter works
> on any sequence, not just on lists and tuples. (Though that may not
> be a big deal.)
>
> I also find it slightly objectionable that this is a general list
> method but only works if the list contains only strings; Dave Ascher's
> generalization to reduce() is cute but strikes me are more general
> than useful, and the name will forever present a mystery to most
> newcomers.
>
> Perhaps join() ought to be a built-in function?

come to think of it, the last design I came up with (inspired
by a mail from you which I cannot find right now), was this:

def join(sequence, sep=None):
# built-in
if not sequence:
return ""
sequence[0].__join__(sequence, sep)

string.join => join

and __join__ methods in the unicode and string classes.

Guido?

</F>
Re: String methods... finally [ In reply to ]
David> I think it should do the former (otherwise something about
David> 'string' should be in the name), and as a consequence I think it
David> shouldn't have the default whitespace spacer.

Perhaps "joinstrings" would be an appropriate name (though it seems
gratuitously long) or join should call str() on non-string elements.

My thought here is that we have left in the string module a couple functions
that ought to be string object methods but aren't yet mostly for convenience
or time constraints, and one (join) that is 99.9% of the time used on lists
or tuples of strings. That leaves a very small handful of methods that
don't naturally fit somewhere else. You can, of course, complete the
picture and add a join method to string objects, which would be useful to
explode them into individual characters. That would complete the
join-as-a-sequence-method picture I think. If you don't somebody else (and
not me, cuz I'll know why already!) is bound to ask why capwords, join,
ljust, etc got left behind in the string module while all the other
functions got promotions to object methods.

Oh, one other thing I forgot. Split (join) and splitfields (joinfields)
used to be different. They've been the same for a long time now, long
enough that I no longer recall how they used to differ. In making the leap
from string module to string methods, I suggest dropping the long names
altogether. There's no particular compatibility reason to keep them and
they're not really any more descriptive than their shorter siblings. It's
not like you'll be preserving backward compatibility for anyone's code by
having them. However, if you release this code to the larger public, then
you'll be stuck with both in perpetuity.

Skip
Re: String methods... finally [ In reply to ]
David Ascher wrote:
>
> On Fri, 11 Jun 1999, Guido van Rossum wrote:
>
> > Perhaps join() ought to be a built-in function?
>
> Would it do the moral equivalent of a reduce(operator.add, ...) or of a
> string.join?
>
> I think it should do the former (otherwise something about 'string' should
> be in the name), and as a consequence I think it shouldn't have the
> default whitespace spacer.

AFAIK, Guido himself proposed something like this on c.l.p a
few months ago. I think something like the following written
in C and optimized for lists of strings might be useful:

def join(sequence,sep=None):

x = sequence[0]
if sep:
for y in sequence[1:]:
x = x + sep + y
else:
for y in sequence[1:]:
x = x + y
return x

>>> join(('a','b'))
'ab'
>>> join(('a','b'),' ')
'a b'
>>> join((1,2,3),3)
12
>>> join(((1,2),(3,)))
(1, 2, 3)

Also, while we're at string functions/methods. Some of the stuff
in mxTextTools (see Python Pages link below) might be of general
use as well, e.g. splitat(), splitlines() and charsplit().

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 203 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
Re: String methods... finally [ In reply to ]
> Two new methods startswith and endswith act like their Java cousins.

is it just me, or do those method names suck?

begin? starts_with? startsWith? (ouch)
has_prefix?

</F>
Re: String methods... finally [ In reply to ]
Fredrik Lundh wrote:
>
> > Two new methods startswith and endswith act like their Java cousins.
>
> is it just me, or do those method names suck?
>
> begin? starts_with? startsWith? (ouch)
> has_prefix?

In mxTextTools I used the names prefix() and suffix() for much
the same thing except that those functions accept a list of
strings and return the (first) matching string instead of
just 1 or 0. Details are available at:

http://starship.skyport.net/~lemburg/mxTextTools.html

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 203 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
Re: String methods... finally [ In reply to ]
> > > Two new methods startswith and endswith act like their Java cousins.
> >
> > is it just me, or do those method names suck?

It's just you.

> > begin? starts_with? startsWith? (ouch)
> > has_prefix?

Those are all painful to type, except "begin", which isn't expressive.

> In mxTextTools I used the names prefix() and suffix() for much

The problem with those is that it's arbitrary (==> harder to remember)
whether A.prefix(B) means that A is a prefix of B or that A has B for
a prefix.

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: String methods... finally [ In reply to ]
Guido van Rossum wrote:
>
> > > > Two new methods startswith and endswith act like their Java cousins.
> > >
> > > is it just me, or do those method names suck?
>
> It's just you.
>
> > > begin? starts_with? startsWith? (ouch)
> > > has_prefix?
>
> Those are all painful to type, except "begin", which isn't expressive.
>
> > In mxTextTools I used the names prefix() and suffix() for much
>
> The problem with those is that it's arbitrary (==> harder to remember)
> whether A.prefix(B) means that A is a prefix of B or that A has B for
> a prefix.

True. These are functions in mxTextTools and take a sequence
as second argument, so the order is clear there... has_prefix()
has_suffix() would probably be appropriate as methods (you don't
type them that often ;-)

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 203 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
RE: String methods... finally [ In reply to ]
>>>>> "TP" == Tim Peters <tim_one@email.msn.com> writes:

>> Two new methods startswith and endswith act like their Java
>> cousins.

TP> Barry, suggest that both of these grow optional start and end
TP> slice indices.

'Course it'll make the Java implementations of these extra args a
little more work. Right now they just forward off to the underlying
String methods. No biggie though.

I've got new implementations to check in -- let me add a few new tests
to cover 'em and watch your checkin emails.

-Barry
Re: String methods... finally [ In reply to ]
> From: "Barry A. Warsaw" <bwarsaw@cnri.reston.va.us>
>
> 'Course it'll make the Java implementations of these extra args a
> little more work. Right now they just forward off to the underlying
> String methods. No biggie though.

Which reminds me -- are you tracking this in JPython too?

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: String methods... finally [ In reply to ]
>>>>> "Guido" == Guido van Rossum <guido@cnri.reston.va.us> writes:

Guido> Which reminds me -- are you tracking this in JPython too?

That's definitely my plan.
Re: String methods... finally [ In reply to ]
>>>>> "SM" == Skip Montanaro <skip@mojam.com> writes:

SM> Oh, one other thing I forgot. Split (join) and splitfields
SM> (joinfields) used to be different. They've been the same for
SM> a long time now, long enough that I no longer recall how they
SM> used to differ.

I think it was only in the number of arguments they'd accept (at least
that's what's implied by the module docos).

SM> In making the leap from string module to
SM> string methods, I suggest dropping the long names altogether.

I agree. Thinking about it, I'm also inclined to not include
startswith and endswith in the string module.

-Barry
Re: String methods... finally [ In reply to ]
>>>>> "FL" == Fredrik Lundh <fredrik@pythonware.com> writes:

>> Two new methods startswith and endswith act like their Java
>> cousins.

FL> is it just me, or do those method names suck?

FL> begin? starts_with? startsWith? (ouch)
FL> has_prefix?

The inspiration was Java string objects, while trying to remain as
Pythonic as possible (no mixed case). startswith and endswith doen't
seem as bad as issubclass to me :)

-Barry
Re: String methods... finally [ In reply to ]
>>>>> "FL" == Fredrik Lundh <fredrik@pythonware.com> writes:

FL> fwiw, the Unicode module available from pythonware.com
FL> implements them all, and more importantly, it can be com-
FL> piled for either 8-bit or 16-bit characters...

Are these separately available? I don't see them under downloads.
Send me a URL, and if I can figure out how to get CVS to add files to
the branch :/, maybe I can check this in so people can play with it.

-Barry

1 2 3 4  View All