Mailing List Archive

How do I match the last occurence of a character?
I've been messing with this for a little while now, but I can't get this
to work. How do I match the last occurence of '\' in a string? I want to
remove the file name from a url (e.g. news/breakingnews/981001b.html -->
news/breakingnews/ ).

Here is the simplest version (that doesn't work). It matches the _first_
backslash.

import regex

file_in = open( 'dir.txt', 'r+' )
for line in file_in.readlines():
print regex.search( '/', line )



Thanks for any help!

Jeff P.
How do I match the last occurence of a character? [ In reply to ]
On Thu, 10 Jun 1999, Jeff P. <jeffrey.polaski@west.boeing.com> wrote:
>How do I match the last occurence of '\' in a string?
>import regex

This is your first problem. Why use a tool more powerful than you
really want? Check out the string module.

>>> import string
>>> name = "test\\a\\b\\c.html"
>>> string.rfind(name,"\\")
8
>>> name2 = name[0:string.rfind(name,"\\" )]
>>> name2
'test\\a\\b'

Note that I have to use "\\" to match a single "\".
Oh, and to get output similar to the one in your example, you would have
to use something like

baseurl = url[ 0 : string.rfind( url, "\\" )+1 ]

Hope this helps,
Blake.

--
One Will. One Dream. One Truth. One Destiny. One Love.
How do I match the last occurence of a character? [ In reply to ]
Why not just use the os module:

>>> import os
>>> name = 'news/breakingnews/981001b.html'
>>> base = os.path.split(name)[0]
>>> base
'news/breakingnews'
>>> file = os.basename(name)
>>> file
'981001b.html'


Steve H.

Blake Winton wrote:

> On Thu, 10 Jun 1999, Jeff P. <jeffrey.polaski@west.boeing.com> wrote:
> >How do I match the last occurence of '\' in a string?
> >import regex
>
> This is your first problem. Why use a tool more powerful than you
> really want? Check out the string module.
>
> >>> import string
> >>> name = "test\\a\\b\\c.html"
> >>> string.rfind(name,"\\")
> 8
> >>> name2 = name[0:string.rfind(name,"\\" )]
> >>> name2
> 'test\\a\\b'
>
> Note that I have to use "\\" to match a single "\".
> Oh, and to get output similar to the one in your example, you would have
> to use something like
>
> baseurl = url[ 0 : string.rfind( url, "\\" )+1 ]
How do I match the last occurence of a character? [ In reply to ]
On Wed, 09 Jun 1999, Steve Halliburton <shallib@netscape.com> wrote:
>Why not just use the os module:
>>>> import os
>>>> name = 'news/breakingnews/981001b.html'
>>>> base = os.path.split(name)[0]
>>>> base
>'news/breakingnews'
>>>> file = os.basename(name)
>>>> file
>'981001b.html'

Would that work on NT, where the path separator is "/", but the
separator for the strings he wants to split is "\"? Does it put the
drive in [0], and then the directory in [1]? Unfortunately, I left my
NT box at the office, so I can't check and tell you. And how would that
handle a full URL (i.e. "http://tor.dhs.org/~bwinton/index.html")? What
about the Mac, where the separator is ":"? Couldn't you have a basename
of "http", and a filename of "//tor.dhs.org/~bwinton/index.html"? For
that matter, how does that handle something like "home/bwinton/"? I'm
pretty sure my code can deal with both in a reasonable way, but I have
no idea what the os.path.split code would do, or what it would do on
different machines.

Later,
Blake.

P.S. Sorry about mailing this to you as well as replying in the group...
I didn't realize that you were posting as well, and perhaps it's a
point worth sharing with the group.

--
One Will. One Dream. One Truth. One Destiny. One Love.
How do I match the last occurence of a character? [ In reply to ]
I haven't used it on NT, but I know on unix it handles
everything gracefully:

>>> import os
>>> t1 = 'http://tor.dhs.org/~bwinton/index.html'
>>> t2 = 'home/bwinton/'
>>> os.path.split(t1)
('http://tor.dhs.org/~bwinton', 'index.html')
>>> os.path.split(t2)
('home/bwinton', '')

It just splits off the last part and calls that the filename and the
rest
is the basename.

Steve H

Blake Winton wrote:

> > Why not just use the os module:
> > >>> import os
> > >>> name = 'news/breakingnews/981001b.html'
> > >>> base = os.path.split(name)[0]
> > >>> base
> > 'news/breakingnews'
> > >>> file = os.basename(name)
> > >>> file
> > '981001b.html'
>
> Would that work on NT, where the path separator is /, but the separator
> for the strings he wants to split is \? Does it put the drive in [0],
> and then the directory in [1]? Unfortunately, I left my NT box at the
> office, so I can't check and tell you. And how would that handle a full
> URL (i.e. "http://tor.dhs.org/~bwinton/index.html")? What about the
> Mac, where the separator is ":"? Couldn't you have a basename of
> "http", and a filename of "//tor.dhs.org/~bwinton/index.html"? For that
> matter, how does that handle something like "home/bwinton/"? I'm pretty
> sure my code can deal with both in a reasonable way, but I have no idea
> what the os.path.split code would do, or what it would do on different
> machines.
>
> Later,
> Blake.
How do I match the last occurence of a character? [ In reply to ]
> I'm pretty sure that it does basically the same thing your code did...
> just under the covers. I haven't used it on NT, but I know on unix it
> handles everything gracefully:

One of the nice things about Unices is that a whole lot of specs are
written to accomodate them. Like using "/" for the separator in URLs.
I hypothesize that if the Mr. Berners-Lee was a Mac addict, the Weeb
would look a lot different today.

> It just splits off the last part and calls that the filename and the
> rest is the basename.

The Macintosh version of os.split (which is accessed as os.split, and
can be grabbed from macpath.py), looks like:

# Split a pathname in two parts: the directory leading up to the final
# bit, and the basename (the filename, without colons, in that
# directory). The result (s, t) is such that join(s, t) yields the
# original argument.
def split(s):
if ':' not in s: return '', s
colon = 0
for i in range(len(s)):
if s[i] == ':': colon = i+1
path, file = s[:colon-1], s[colon:]
if path and not ':' in path:
path = path + ':'
return path, file

And when we use it to split, we get results like

>>> name = "http://www.foo.com/~bwinton/test.html"
>>> name2 = "/home/bwinton/test.html"
>>> split( name )
('http:', '//www.foo.com/~bwinton/test.html')
>>> split( name2 )
('', '/home/bwinton/test.html')

Which I think you will agree is not the desired behaviour.
(ntpath.split seems to do the right thing, though.)

Later,
Blake.
How do I match the last occurence of a character? [ In reply to ]
This is. essentially, what worked.



import string

file_in = open( 'c:/temp/dd.txt', 'r+' )
for line in file_in.readlines():
name2 = line[0:string.rfind(line,"/" )]
print name2



Thanks, everyone!

Jeff P.



"Jeff P." wrote:
>
> I've been messing with this for a little while now, but I can't get this
> to work. How do I match the last occurence of '\' in a string? I want to
> remove the file name from a url (e.g. news/breakingnews/981001b.html -->
> news/breakingnews/ ).
>
> Here is the simplest version (that doesn't work). It matches the _first_
> backslash.
>
> import regex
>
> file_in = open( 'dir.txt', 'r+' )
> for line in file_in.readlines():
> print regex.search( '/', line )
>
> Thanks for any help!
>
> Jeff P.
How do I match the last occurence of a character? [ In reply to ]
Jeff P. writes:
>I've been messing with this for a little while now, but I can't get this
>to work. How do I match the last occurence of '\' in a string? I want to
>remove the file name from a url (e.g. news/breakingnews/981001b.html -->
>news/breakingnews/ ).

Rather than using all the machinery of regular expressions, use
the string module: string.rfind(S, Ssub) returns the last occurrence
of Ssub in S:

>>> import string
>>> s='a/b/c/d'
>>> string.rfind(s, '/')
5
>>> s[5:]
'/d'

--
A.M. Kuchling http://starship.python.net/crew/amk/
The world is full of people whose notion of a satisfactory future is, in fact,
a return to an idealised past.
-- Robertson Davies, _A Voice from the Attic_