Mailing List Archive

Re: svn commit: r1908301 - in /httpd/httpd/trunk: changes-entries/rewrite-bctls docs/manual/rewrite/flags.xml modules/mappers/mod_rewrite.c
On Sat, Mar 11, 2023 at 11:10?PM <covener@apache.org> wrote:
>
> Author: covener
> Date: Sat Mar 11 22:10:09 2023
> New Revision: 1908301
>
> URL: http://svn.apache.org/viewvc?rev=1908301&view=rev
> Log:
> add [BCTLS] alternative to [B] for 2.4.56 problems

Nice, that was the missing bit I think. Would have been great if
[BCTLS] was the default but I guess it's not possible with the
redirect case which %-encodes by default (chicken and egg problem to
avoid double encoding..).

>
> +++ httpd/httpd/trunk/modules/mappers/mod_rewrite.c Sat Mar 11 22:10:09 2023
> @@ -684,14 +685,27 @@ static APR_INLINE unsigned char *c2x(uns
> * Escapes a backreference in a similar way as php's urlencode does.
> * Based on ap_os_escape_path in server/util.c
> */
> -static char *escape_backref(apr_pool_t *p, const char *path, const char *escapeme, int noplus) {
> +static char *escape_backref(apr_pool_t *p, const char *path, const char *escapeme, int flags) {
> char *copy = apr_palloc(p, 3 * strlen(path) + 3);
> const unsigned char *s = (const unsigned char *)path;
> unsigned char *d = (unsigned char *)copy;
> unsigned c;
> + int noplus = flags & RULEFLAG_ESCAPENOPLUS;
> + int ctls = flags & RULEFLAG_ESCAPECTLS;
>
> while ((c = *s)) {
> - if (!escapeme) {
> + if (ctls) {
> + if (c == ' ' && !noplus) {
> + *d++ = '+';
> + }
> + else if (apr_iscntrl(c)) {

Seems we should encode space here too => (apr_iscntrl(c) || c == ' ') ?

> + d = c2x(c, '%', d);
> + }
> + else {
> + *d++ = c;
> + }
> + }
> + else if (!escapeme) {
> if (apr_isalnum(c) || c == '_') {
> *d++ = c;
> }
> @@ -702,9 +716,9 @@ static char *escape_backref(apr_pool_t *
> d = c2x(c, '%', d);
> }
> }
> - else {
> + else {

Maybe we could make [B=] and [BTCLS] not mutually exclusive (encode
both controls and whatever in B=).
I was thinking of a [BNEG] flag too (encode everything but what's in
B=), and never encode alnum or '_', so all in all some further patch
like the attached one. WDYT?

Regards;
Yann.
Re: svn commit: r1908301 - in /httpd/httpd/trunk: changes-entries/rewrite-bctls docs/manual/rewrite/flags.xml modules/mappers/mod_rewrite.c [ In reply to ]
On Mon, Mar 13, 2023 at 8:31?AM Yann Ylavic <ylavic.dev@gmail.com> wrote:
>
> On Sat, Mar 11, 2023 at 11:10?PM <covener@apache.org> wrote:
> >
> > Author: covener
> > Date: Sat Mar 11 22:10:09 2023
> > New Revision: 1908301
> >
> > URL: http://svn.apache.org/viewvc?rev=1908301&view=rev
> > Log:
> > add [BCTLS] alternative to [B] for 2.4.56 problems
>
> Nice, that was the missing bit I think. Would have been great if
> [BCTLS] was the default but I guess it's not possible with the
> redirect case which %-encodes by default (chicken and egg problem to
> avoid double encoding..).
>
> >
> > +++ httpd/httpd/trunk/modules/mappers/mod_rewrite.c Sat Mar 11 22:10:09 2023
> > @@ -684,14 +685,27 @@ static APR_INLINE unsigned char *c2x(uns
> > * Escapes a backreference in a similar way as php's urlencode does.
> > * Based on ap_os_escape_path in server/util.c
> > */
> > -static char *escape_backref(apr_pool_t *p, const char *path, const char *escapeme, int noplus) {
> > +static char *escape_backref(apr_pool_t *p, const char *path, const char *escapeme, int flags) {
> > char *copy = apr_palloc(p, 3 * strlen(path) + 3);
> > const unsigned char *s = (const unsigned char *)path;
> > unsigned char *d = (unsigned char *)copy;
> > unsigned c;
> > + int noplus = flags & RULEFLAG_ESCAPENOPLUS;
> > + int ctls = flags & RULEFLAG_ESCAPECTLS;
> >
> > while ((c = *s)) {
> > - if (!escapeme) {
> > + if (ctls) {
> > + if (c == ' ' && !noplus) {
> > + *d++ = '+';
> > + }
> > + else if (apr_iscntrl(c)) {
>
> Seems we should encode space here too => (apr_iscntrl(c) || c == ' ') ?

yeah looks wrong, thanks, but let's proceed with your flavor with the
extra features.

>
> > + d = c2x(c, '%', d);
> > + }
> > + else {
> > + *d++ = c;
> > + }
> > + }
> > + else if (!escapeme) {
> > if (apr_isalnum(c) || c == '_') {
> > *d++ = c;
> > }
> > @@ -702,9 +716,9 @@ static char *escape_backref(apr_pool_t *
> > d = c2x(c, '%', d);
> > }
> > }
> > - else {
> > + else {
>
> Maybe we could make [B=] and [BTCLS] not mutually exclusive (encode
> both controls and whatever in B=).
> I was thinking of a [BNEG] flag too (encode everything but what's in
> B=), and never encode alnum or '_', so all in all some further patch
> like the attached one. WDYT?

Looks good to me, I have an ancient patch where I did something very
similar to a copy of int:escape where you could set exceptions in
subprocess_env. The config is ugly so I never upstreamed it.

if you make the change I can add some tests/doc. Should caution
against plain [B] and refer to the others in the doc?
Re: svn commit: r1908301 - in /httpd/httpd/trunk: changes-entries/rewrite-bctls docs/manual/rewrite/flags.xml modules/mappers/mod_rewrite.c [ In reply to ]
On Mon, Mar 13, 2023 at 2:32?PM Eric Covener <covener@gmail.com> wrote:
>
> On Mon, Mar 13, 2023 at 8:31?AM Yann Ylavic <ylavic.dev@gmail.com> wrote:
> >
> > Maybe we could make [B=] and [BTCLS] not mutually exclusive (encode
> > both controls and whatever in B=).
> > I was thinking of a [BNEG] flag too (encode everything but what's in
> > B=), and never encode alnum or '_', so all in all some further patch
> > like the attached one. WDYT?
>
> Looks good to me, I have an ancient patch where I did something very
> similar to a copy of int:escape where you could set exceptions in
> subprocess_env. The config is ugly so I never upstreamed it.
>
> if you make the change I can add some tests/doc. Should caution
> against plain [B] and refer to the others in the doc?

r1902323, thanks for the tests/doc.
IIUC, when the query-string is rewritten, I think we should caution
against using [B] with a redirect (double encoding) and not using
[BCTLS] (or some careful flavor of [B]) for a non-redirect..
Re: svn commit: r1908301 - in /httpd/httpd/trunk: changes-entries/rewrite-bctls docs/manual/rewrite/flags.xml modules/mappers/mod_rewrite.c [ In reply to ]
On Mon, Mar 13, 2023 at 2:57?PM Yann Ylavic <ylavic.dev@gmail.com> wrote:
>
> On Mon, Mar 13, 2023 at 2:32?PM Eric Covener <covener@gmail.com> wrote:
> >
> > On Mon, Mar 13, 2023 at 8:31?AM Yann Ylavic <ylavic.dev@gmail.com> wrote:
> > >
> > > Maybe we could make [B=] and [BTCLS] not mutually exclusive (encode
> > > both controls and whatever in B=).
> > > I was thinking of a [BNEG] flag too (encode everything but what's in
> > > B=), and never encode alnum or '_', so all in all some further patch
> > > like the attached one. WDYT?
> >
> > Looks good to me, I have an ancient patch where I did something very
> > similar to a copy of int:escape where you could set exceptions in
> > subprocess_env. The config is ugly so I never upstreamed it.
> >
> > if you make the change I can add some tests/doc. Should caution
> > against plain [B] and refer to the others in the doc?
>
> r1902323, thanks for the tests/doc.
> IIUC, when the query-string is rewritten, I think we should caution
> against using [B] with a redirect (double encoding)

[NE,BCTLS] might be a good option too for redirects that rewrite the
query-string, provided the capture comes from r->uri and not r->args
already.

> and not using
> [BCTLS] (or some careful flavor of [B]) for a non-redirect..
Re: svn commit: r1908301 - in /httpd/httpd/trunk: changes-entries/rewrite-bctls docs/manual/rewrite/flags.xml modules/mappers/mod_rewrite.c [ In reply to ]
On 3/13/23 2:57 PM, Yann Ylavic wrote:
> On Mon, Mar 13, 2023 at 2:32?PM Eric Covener <covener@gmail.com> wrote:
>>
>> On Mon, Mar 13, 2023 at 8:31?AM Yann Ylavic <ylavic.dev@gmail.com> wrote:
>>>
>>> Maybe we could make [B=] and [BTCLS] not mutually exclusive (encode
>>> both controls and whatever in B=).
>>> I was thinking of a [BNEG] flag too (encode everything but what's in
>>> B=), and never encode alnum or '_', so all in all some further patch
>>> like the attached one. WDYT?
>>
>> Looks good to me, I have an ancient patch where I did something very
>> similar to a copy of int:escape where you could set exceptions in
>> subprocess_env. The config is ugly so I never upstreamed it.
>>
>> if you make the change I can add some tests/doc. Should caution
>> against plain [B] and refer to the others in the doc?
>
> r1902323, thanks for the tests/doc.
> IIUC, when the query-string is rewritten, I think we should caution
> against using [B] with a redirect (double encoding) and not using
> [BCTLS] (or some careful flavor of [B]) for a non-redirect..
>

I think the biggest issue with [B] is that when you capture data from the path which is decoded
and try to use this data in the new URL in path and query string at the same time. For putting
it in the path B does the wrong thing, for putting it in the query string it does the correct thing.
If you do not use B then the substitution in the path will be correct, but not in the query string.
Apart from the additional B variants now available I guess a further internal rewritemap can help
that unlike 'int:escape' would consider using '+' instead of %20 to encode spaces.
Of course if you capture data from the query string you have to fight with the other way round and
need to ensure that you decode the data before putting it in the path as otherwise you probably have
an issue with double encoding.

BTW: I guess if time is found we should reorganize our escaping / unescaping code. It seems to be
widespread across the codebase to handle various special cases / edge cases and it seems to duplicate
code in a lot of locations. I think it would be better if it could be concentrated in server/util.c
and if needed add some helper functions that can be reused if we cannot move all special cases / edge cases
there such that their handling uses a more common base.
Another thought that I had from the performance point of view is if we could create an API that would allow
to build tables similar to the test_char_table on the "fly" e.g. during init and let functions use these
tables for effective character classification. I think this would be more effective for longer lists of characters
then using apr_isalnum or strchr in loops again and again.

Regards

RĂ¼diger