Mailing List Archive: [Fwd: CGI.pm and the untrusted-URL problem]

it wasn't clear if they made Lincoln aware of this problem, and
I didn't see this pop up on our list so....
i thought i'd forward it just in case. Not a modperl problem
in particular, but I know that embperl uses CGI.pm for a lot
of its processing, and I am sure some other modperl modules
do as well.

cliff rayman
genwax.com

Kragen Sitaker wrote:

> Description of the Problem
> --------------------------
>
> CGI.pm contains a method self_url which returns the URL with which the
> script was called, including all of the data fields submitted ---
> except for the .submit= field added by CGI.pm.
>
> Normally, this is used something like this:
>
> my $self = self_url;
> print qq(<a href="$self#Section2">Section 2</a>\n);
>
> If CGI.pm is running on Apache 1.3.6, probably other versions of
> Apache, and possibly other Web servers, it is possible for a client to
> cause self_url to include arbitrary sequences of characters at its
> beginning, such as
>
> "><script language="JavaScript">evil_code()</script><a href="
>
> which, if used in the manner described above, leads to the problem
> described in CERT Advisory CA-2000-02, "Malicious HTML Tags Embedded in
> Client Web Requests".
>
> Apparently, anything following an unencoded space in the URL used to
> invoke the script ends up being inserted, unencoded but converted to
> lower case, at the beginning of self_url's return value.
>
> Unencoded spaces are, of course, illegal in URLs. Most web browsers
> accept them anyway in HREF attributes, and don't bother to %-encode
> them when they send them in a GET request.
>
> Netscape 4.6, MSIE 3.0, Mozilla M12, and Lynx 2.8.1rel.2 at least,
> allow HREF attribute values to be delimited by ' single-quotes instead
> of " double-quotes, which allows insertion of unencoded " double-quotes
> into the URL --- which is crucial to exploiting this problem. Lynx
> 2.8.1rel.2, however, strips the spaces from the URL found in HTML,
> preventing it from being exploited via <A HREF=''>.
>
> Diagnosis
> ---------
>
> It appears that this happens because the unencoded space is interpreted
> by the HTTP server (Apache 1.3.6 in my tests) as separating the URL
> from the protocol name. So the environment variable SERVER_PROTOCOL
> gets set to everything following the space, followed by a space and the
> actual protocol, such as "HTTP/1.0".
>
> Three of the four tested browsers (Netscape 4.6, MSIE 3.0, and Mozilla
> M12) send the unencoded space in the request URL, which generates an
> illegal HTTP Request-Line.
>
> CGI.pm simply takes that environment variable, chops off everything
> from the slash onwards, lowercases it, and returns the result as the
> URL scheme.
>
> Suggested fixes
> ---------------
>
> RFC 1738 and RFC 2068 say that only a-z, 0-9, "+", ".",
> and "-" are allowed in scheme names. Accordingly, I suggest the
> following change to CGI.pm:
>
> *** /usr/local/lib/perl5/5.00503/CGI.pm Tue May 18 00:04:20 1999
> --- /home/kragen/lib/perl5/site_perl/5.005//CGI.pm Mon Feb 14 12:07:37 2000
> ***************
> *** 2594,2600 ****
> return 'https' if $self->server_port == 443;
> my $prot = $self->server_protocol;
> my($protocol,$version) = split('/',$prot);
> ! return "\L$protocol\E";
> }
> END_OF_FUNC
>
> --- 2594,2602 ----
> return 'https' if $self->server_port == 443;
> my $prot = $self->server_protocol;
> my($protocol,$version) = split('/',$prot);
> ! $protocol = lc $protocol;
> ! $protocol =~ tr/-+.a-z0-9//cd;
> ! return $protocol;
> }
> END_OF_FUNC
>
> (Sorry --- I'm using Solaris diff, which doesn't have unified diff
> capability.)
>
> This prevents the exploit, but of course the resulting URL is
> incorrect. It won't affect responses to well-formed HTTP requests,
> which should never have anything other than HTTP for the $protocol to
> begin with.
>
> It might be smarter to always return 'http' when not returning 'https';
> I'm not presently aware of any protocols other than HTTP and SSL HTTP used with
> CGI. The current draft CGI spec says:
>
> Note that the scheme and the protocol are not identical; for
> instance, a resource accessed via an SSL mechanism may have a
> Client-URI with a scheme of "https" rather than "http".
> CGI/1.1 provides no means for the script to reconstruct this,
> and therefore the Script-URI includes the base protocol used.
>
> . . . in other words, implementing self_url in a way that is guaranteed
> to be correct for future non-HTTP CGI implementations is not possible.
>
> The successful exploit requires a remarkable chain of extreme forgiveness:
> 1- The web browser must accept an illegal URL from (possibly valid,
> although very unusual) HTML.
> 2- The web browser must send an illegal HTTP request with the illegal
> URL, without %-encoding the URL to make it legal.
> 3- The HTTP server must accept the illegal HTTP request.
> 4- The HTTP server must invoke the CGI script with a nonsensical
> SERVER_PROTOCOL.
> 5- The CGI script must accept the nonsensical SERVER_PROTOCOL and use it to
> produce an illegal URL, which it must then embed in HTML it outputs.
> 6- The web browser must then trust the output of the CGI script in some
> fashion inappropriate to the supplier of the original URL.
>
> Netscape 4.6, MSIE 3.0, and Mozilla M12 (and, I would guess, most Web
> browsers) will happily perform steps 1 and 2; Apache 1.3.6 (and, I
> would guess, most Web servers) will happily perform steps 3 and 4; any
> program using CGI.pm and embedding self_url's return value in their
> outputs will perform step 5; and as CERT advisory CA-2000-02 documents,
> there are a wide variety of situations that can cause step 6 to
> happen.
>
> My patch above breaks the chain at step 5. It would be nice to break
> it at other steps as well.
>
> The HTTP requests used in this exploit are broken --- i.e. by having a
> Request-Line that has a protocol name that not only fails to be "HTTP",
> but actually fails to be a valid protocol name at all. Perhaps Apache
> and other web servers should respond to such egregious protocol
> violations with error messages, rather than passing the bogus data on
> to CGI scripts.
>
> I have not sent copies of this mail to other web-server teams, because
> I do not have the facilities or inclination to properly verify that
> they are equally lenient. Preliminary testing suggests that they are
> not:
>
> - IIS 5.0 responds, "The parameter is incorrect".
> - Netscape-Enterprise/3.6 responds, "Your browser sent a
> message this server could not understand."
> - Zeus 3.3 responds with a 400 Bad Request error.
> - thttpd 2.15 responds with a 400 Bad Request error.
>
> I also believe that Web browsers should take some steps to avoid
> sending illegal HTTP requests; since the problem here happens only when
> both the server and browser are trusted --- perhaps due to some earlier
> authentication exchange between them --- while the URL is untrusted,
> the browser should validate the URL, at least to the point of not
> sending illegal requests to the server.
>
> References
> ----------
>
> http://www.w3.org/CGI/ --- information about CGI
> http://Web.Golux.Com/coar/cgi/draft-coar-cgi-v11-03-clean.html --- current
> draft specification for CGI
> http://www.cert.org/advisories/CA-2000-02.html --- CERT advisory CA-2000-02,
> "Malicious HTML Tags Embedded in Client Web Requests"
> RFC 1738, http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1738.txt ---
> "Uniform Resource Locators (URL)" --- in particular, section 2.1,
> which defines the syntax of scheme names
> RFC 2068, http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2068.txt ---
> "Hypertext Transfer Protocol -- HTTP/1.1"
> --- in particular, section 3.2.1, which defines the syntax of
> URI scheme names identically to RFC 1738, but including
> uppercase US-ASCII letters.
> --- and section 5.1, which defines the syntax of HTTP Request-Lines,
> indicating (together with the sections defining URI syntax and
> section 33.1, defining HTTP-Version syntax) that they must
> contain exactly two spaces.
> http://stein.cshl.org/WWW/CGI/ --- documentation for CGI.pm
> http://www.apache.org/info/css-security/apache_specific.html --- changes made
> to Apache in response to CA-2000-02
> http://www.netcraft.co.uk/survey/ --- Netcraft Web Server Survey,
> which lists the most popular web server software
>
> --
> <kragen@pobox.com> Kragen Sitaker <http://www.pobox.com/~kragen/>
> The Internet stock bubble didn't burst on 1999-11-08. Hurrah!
> <URL:http://www.pobox.com/~kragen/bubble.html>
> The power didn't go out on 2000-01-01 either. :)