Mailing List Archive

[issue43882] urllib.parse should sanitize urls containing ASCII newline and tabs.
New submission from Senthil Kumaran <senthil@uthcode.com>:

A security issue was reported by Mike Lissner wherein an attacker was able to use `\r\n` in the url path, the urlparse method didn't sanitize and allowed those characters be present in the request.

> In [9]: from urllib.parse import urlsplit
> In [10]: urlsplit("java\nscript:alert('bad')")
> Out[10]: SplitResult(scheme='', netloc='', path="java\nscript:alert('bad')", query='', fragment='')



Firefox and other browsers ignore newlines in the scheme. From
the browser console:

>> new URL("java\nscript:alert(bad)")
<< URL { href: "javascript:alert(bad)", origin: "null", protocol:
"javascript:", username: "", password: "", host: "", hostname: "", port: "", pathname: "alert(bad)", search: ""

Mozilla Developers informed about the controlling specification for URLs is in fact defined by the "URL Spec"
from WHATWG which updates RFC 3986 and specifies that tabs and newlines
should be stripped from the scheme.

See: https://url.spec.whatwg.org/#concept-basic-url-parser

That link defines an automaton for URL parsing. From that link, steps 2 and 3 of scheme parsing read:

If input contains any ASCII tab or newline, validation error.
3. Remove all ASCII tab or newline from input.

----


urlparse module behavior should be updated, and an ASCII tab or newline should be removed from the url (sanitized) before it is sent to the request, as WHATWG spec.

----------
assignee: orsenthil
messages: 391343
nosy: orsenthil
priority: normal
severity: normal
stage: needs patch
status: open
title: urllib.parse should sanitize urls containing ASCII newline and tabs.
type: security
versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue43882>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue43882] urllib.parse should sanitize urls containing ASCII newline and tabs. [ In reply to ]
Karthikeyan Singaravelan <tir.karthi@gmail.com> added the comment:

See also a related issue to sanitise newline on other helper functions https://bugs.python.org/issue30713

See also discussion and compatibility on disallowing control characters : https://bugs.python.org/issue30458

----------
nosy: +gregory.p.smith, vstinner, xtreak

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue43882>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com
[issue43882] urllib.parse should sanitize urls containing ASCII newline and tabs. [ In reply to ]
Change by Mike Lissner <mlissner@michaeljaylissner.com>:


----------
nosy: +Mike.Lissner

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue43882>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com