Mailing List Archive

webchecker on Windows
Hi,
I have just recently downloaded 1.5.2(final) and tried to use webchecker
on a local file tree.

I am unable to get it to work - as it did in 1.5.2b2 (with a patch
applied to urllib)

The arguments to webchecker that I use are:

-x file:///D|/test1/test2/index.htm

this does not find the file!

-x file:/D|/test1/test2/index.htm

this is able to read the file but not able to process any links
contained in the file (all local links are internally created like
file:xxx.htm)

What am I doing wrong?
--
Des Barry
webchecker on Windows [ In reply to ]
Des Barry <desb@desb.demon.co.uk> writes:

> I have just recently downloaded 1.5.2(final) and tried to use webchecker
> on a local file tree.
>
> I am unable to get it to work - as it did in 1.5.2b2 (with a patch
> applied to urllib)
>
> The arguments to webchecker that I use are:
>
> -x file:///D|/test1/test2/index.htm
>
> this does not find the file!
>
> -x file:/D|/test1/test2/index.htm
>
> this is able to read the file but not able to process any links
> contained in the file (all local links are internally created like
> file:xxx.htm)

Unfortunately, you're right. I think that the change has to do with
the changes in urllib.py regarding when to use url2pathname() and
pathname2url() -- the new policy is much more useful, but webchecker
was counting on the old policy. (The policy change is that the url
argument to open(), open_file(), open_local_file() and the like must
always be in url format.)

Below is a patch that I think makes it work, but it still requires
that you use forward slashes in the file: URL you give it. It
supports drive letters but only if you use the form "file:/D|/path";
the form "file:///D|/path" doesn't seem to work due to the way
urlparse works.

I hope someone else can continue the analysis from here...

--Guido van Rossum (home page: http://www.python.org/~guido/)
webchecker on Windows [ In reply to ]
In article <5logkbrndp.fsf@eric.cnri.reston.va.us>, Guido van Rossum
<guido@eric.cnri.reston.va.us> writes
>Des Barry <desb@desb.demon.co.uk> writes:
>
>> I have just recently downloaded 1.5.2(final) and tried to use webchecker
>> on a local file tree.
>>
>> I am unable to get it to work - as it did in 1.5.2b2 (with a patch
>> applied to urllib)
>>
>
>Unfortunately, you're right. I think that the change has to do with
>the changes in urllib.py regarding when to use url2pathname() and
>pathname2url() -- the new policy is much more useful, but webchecker
>was counting on the old policy. (The policy change is that the url
>argument to open(), open_file(), open_local_file() and the like must
>always be in url format.)
>
>Below is a patch that I think makes it work, but it still requires
>that you use forward slashes in the file: URL you give it. It
>supports drive letters but only if you use the form "file:/D|/path";
>the form "file:///D|/path" doesn't seem to work due to the way
>urlparse works.
>
>I hope someone else can continue the analysis from here...
>
>--Guido van Rossum (home page: http://www.python.org/~guido/)

On further investigation I find that urlparse.py and nturlpath.py have
also been touched.

As a simple test I have tried urlparse.urlparse and urlparse.urlunparse
on file: and found them not to be symmetric (same in, same out)
This is also the case for nturlpath.url2pathname and
nturlpath.pathname2url

Before going any further, was it intended to break this symmetry? and if
so, what is the reasoning behind these changes?

According to RFC1808 (in my interpretation) all use of local files
should be of the form:
file:///user/etc/xxx.htm - for unix and
file:///C|/dir1/dir2/test.htm - for windows

That is, I believe that
file:/C|/dir1/dir2/test.htm is illegal

--
Des Barry