Mailing List Archive

[issue43883] Making urlparse WHATWG conformant
New submission from Senthil Kumaran <senthil@uthcode.com>:

Mike Lissner reported that a set test suites that exercise extreme conditions with URLs, but in conformance with url.spec.whatwg.org
was maintained here:

https://github.com/web-platform-tests/wpt/tree/77da471a234e03e65a22ee6df8ceff7aaba391f8/url

These test cases were used against urlparse and urljoin method.

https://gist.github.com/mlissner/4d2110d7083d74cff3893e261a801515


Quoting verbatim


```
The basic idea is to iterate over the test cases and try joining and parsing them. The script wound up messier than I wanted b/c there's a fair bit of normalization you have to do (e.g., the test cases expect blank paths to be '/', while urlparse returns an empty string), but you'll get the idea.

The bad news is that of the roughly 600 test cases fewer than half pass. Some more normalization would fix some more of this, and I don't imagine all of these have security concerns (I haven't thought through it, honestly, but there are issues with domain parsing too that look meddlesome). For now I've taken it as far as I can, and it should be a good start, I think.

The final numbers the script cranks out are:

Done. 231/586 successes. 1 skipped.
```

----------
assignee: orsenthil
messages: 391344
nosy: orsenthil
priority: normal
severity: normal
stage: needs patch
status: open
title: Making urlparse WHATWG conformant
type: behavior

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue43883>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue43883] Making urlparse WHATWG conformant [ In reply to ]
Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:

It would be interesting to test also with the yarl module. It is based on urlparse and urljoin, but does extra normalization of %-encoding.

----------
nosy: +serhiy.storchaka

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue43883>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue43883] Making urlparse WHATWG conformant [ In reply to ]
Change by Karthikeyan Singaravelan <tir.karthi@gmail.com>:


----------
nosy: +xtreak

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue43883>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue43883] Making urlparse WHATWG conformant [ In reply to ]
Change by Mike Lissner <mlissner@michaeljaylissner.com>:


----------
nosy: +Mike.Lissner

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue43883>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue43883] Making urlparse WHATWG conformant [ In reply to ]
STINNER Victor <vstinner@python.org> added the comment:

See also bpo-43882.

----------
nosy: +vstinner

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue43883>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue43883] Making urlparse WHATWG conformant [ In reply to ]
Change by Gregory P. Smith <greg@krypto.org>:


----------
nosy: +gregory.p.smith

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue43883>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue43883] Making urlparse WHATWG conformant [ In reply to ]
Gregory P. Smith <greg@krypto.org> added the comment:

FWIW rather than implementing our own URL parsing at all... wrapping a library extracted from a compatible-license major browser (Chromium or Firefox) and keeping it updated would avoid disparities.

Unfortunately, I'm not sure how feasible this really is. Do all of the API surfaces we must support in the stdlib for compatibility's sake with urllib line up with such a browser core URL parsing library?

Something to ponder. Unlikely something we'll actually do.

----------

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue43883>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com