Mailing List Archive

Stripping query string except from specific URL
Hi,

We have a situation where we need to strip a query string from all URLs
except ones matching a particular pattern. However, when I try the rules
below, it redirects to the homepage for some reason.

In this example, I'd like to strip off the query string from all URLs
except those involving /resources/blog:

RewriteCond %{REQUEST_URI} !/resources/blog
RewriteCond %{QUERY_STRING} ^start=
RewriteRule (.*) https://guardiandigital.com$1    [L,QSD]

What am I missing?

Thanks,
Dave
Re: Stripping query string except from specific URL [ In reply to ]
On Fri, Apr 19, 2024 at 11:16?AM Dave Wreski
<dwreski@guardiandigital.com.invalid> wrote:

> Hi,
>
> We have a situation where we need to strip a query string from all URLs
> except ones matching a particular pattern. However, when I try the rules
> below, it redirects to the homepage for some reason.
>
> In this example, I'd like to strip off the query string from all URLs
> except those involving /resources/blog:
>
> RewriteCond %{REQUEST_URI} !/resources/blog
> RewriteCond %{QUERY_STRING} ^start=
> RewriteRule (.*) https://guardiandigital.com$1 [L,QSD]
>
> What am I missing?
>
> Thanks,
> Dave
>
>
>
To remove the query string, see the QSD flag, or append a ? at the end of
the target.
Re: Stripping query string except from specific URL [ In reply to ]
Hi,

> We have a situation where we need to strip a query string from all
> URLs except ones matching a particular pattern. However, when I
> try the rules below, it redirects to the homepage for some reason.
>
> In this example, I'd like to strip off the query string from all
> URLs except those involving /resources/blog:
>
> RewriteCond %{REQUEST_URI} !/resources/blog
> RewriteCond %{QUERY_STRING} ^start=
> RewriteRule (.*) https://guardiandigital.com$1 [L,QSD]
>
> What am I missing?
>
> Thanks,
> Dave
>
>
>
> To remove the query string, see the QSD flag, or append a ? at the end
> of the target.

That's what I'm doing, I think. What am I missing? It just redirects to
the homepage somehow.

Shouldn't I be able to stack RewriteConds in this way, followed by a
RewriteRule?

I have no idea what could be wrong.
Re: Stripping query string except from specific URL [ In reply to ]
On Wed, Apr 24, 2024 at 12:43?PM Dave Wreski
<dwreski@guardiandigital.com.invalid> wrote:

> Hi,
>
> We have a situation where we need to strip a query string from all URLs
>> except ones matching a particular pattern. However, when I try the rules
>> below, it redirects to the homepage for some reason.
>>
>> In this example, I'd like to strip off the query string from all URLs
>> except those involving /resources/blog:
>>
>> RewriteCond %{REQUEST_URI} !/resources/blog
>> RewriteCond %{QUERY_STRING} ^start=
>> RewriteRule (.*) https://guardiandigital.com$1 [L,QSD]
>>
>> What am I missing?
>>
>> Thanks,
>> Dave
>>
>>
>>
> To remove the query string, see the QSD flag, or append a ? at the end of
> the target.
>
> That's what I'm doing, I think. What am I missing? It just redirects to
> the homepage somehow.
>
> Shouldn't I be able to stack RewriteConds in this way, followed by a
> RewriteRule?
>
> I have no idea what could be wrong.
>
>
>
Test with curl, and see if you get redirected after the fact.
Re: Stripping query string except from specific URL [ In reply to ]
Hi,

>> We have a situation where we need to strip a query string
>> from all URLs except ones matching a particular pattern.
>> However, when I try the rules below, it redirects to the
>> homepage for some reason.
>>
>> In this example, I'd like to strip off the query string from
>> all URLs except those involving /resources/blog:
>>
>> RewriteCond %{REQUEST_URI} !/resources/blog
>> RewriteCond %{QUERY_STRING} ^start=
>> RewriteRule (.*) https://guardiandigital.com$1 [L,QSD]
>>
>> What am I missing?
>>
>> Thanks,
>> Dave
>>
>>
>>
>> To remove the query string, see the QSD flag, or append a ? at
>> the end of the target.
>
> That's what I'm doing, I think. What am I missing? It just
> redirects to the homepage somehow.
>
> Shouldn't I be able to stack RewriteConds in this way, followed by
> a RewriteRule?
>
> I have no idea what could be wrong.
>
>
> Test with curl, and see if you get redirected after the fact.

I've enabled trace3 to try and figure this out. But line 8 says
"discarding query string, no parse from substitution" and I don't know
why or what really that means.

1 [Wed Apr 24 15:19:36.440500 2024] [rewrite:trace2] [pid 748062:tid
748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - -
[guardiandigital.com/sid#55743f0bbb58][rid#7f9cd4016af0/initial] init
rewrite engine with requested uri /resources/blog

2 [Wed Apr 24 15:19:36.445306 2024] [rewrite:trace1] [pid 748062:tid
748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - -
[guardiandigital.com/sid#55743f0bbb58][rid#7f9cd4016af0/initial] pass
through /resources/blog

3 [Wed Apr 24 15:19:36.449369 2024] [rewrite:trace3] [pid 748062:tid
748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - -
[guardiandigital.com/sid#55743f0bbb58][rid#7f9cd4016af0/initial] [perdir
/home/docroot/] applying pattern '.*' to uri 'resources/blog'

4 [Wed Apr 24 15:19:36.449413 2024] [rewrite:trace2] [pid 748062:tid
748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - -
[guardiandigital.com/sid#55743f0bbb58][rid#7f9cd4016af0/initial] [perdir
/home/docroot/] rewrite 'resources/blog' -> 'index.php'

5 [Wed Apr 24 15:19:36.449453 2024] [rewrite:trace1] [pid 748062:tid
748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - -
[guardiandigital.com/sid#55743f0bbb58][rid#7f9cd4016af0/initial] [perdir
/home/docroot/] internal redirect with /index.php [INTERNAL REDIRECT]

6 [Wed Apr 24 15:19:36.449830 2024] [rewrite:trace3] [pid 748062:tid
748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - -
[guardiandigital.com/sid#55743f0bbb58][rid#7f9ccc0e6000/initial/redir#1]
applying pattern '(.*)' to uri '/index.php'

7 [Wed Apr 24 15:19:36.449848 2024] [rewrite:trace2] [pid 748062:tid
748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - -
[guardiandigital.com/sid#55743f0bbb58][rid#7f9ccc0e6000/initial/redir#1]
rewrite '/index.php' -> 'https://guardiandigital.com/index.php'

8 [Wed Apr 24 15:19:36.449857 2024] [rewrite:trace2] [pid 748062:tid
748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - -
[guardiandigital.com/sid#55743f0bbb58][rid#7f9ccc0e6000/initial/redir#1]
discarding query string, no parse from substitution

9 [Wed Apr 24 15:19:36.449864 2024] [rewrite:trace2] [pid 748062:tid
748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - -
[guardiandigital.com/sid#55743f0bbb58][rid#7f9ccc0e6000/initial/redir#1]
explicitly forcing redirect with https://guardiandigital.com/index.php

10 [Wed Apr 24 15:19:36.449871 2024] [rewrite:trace1] [pid 748062:tid
748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - -
[guardiandigital.com/sid#55743f0bbb58][rid#7f9ccc0e6000/initial/redir#1]
escaping https://guardiandigital.com/index.php for redirect

11 [Wed Apr 24 15:19:36.449880 2024] [rewrite:trace1] [pid 748062:tid
748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - -
[guardiandigital.com/sid#55743f0bbb58][rid#7f9ccc0e6000/initial/redir#1]
redirect to https://guardiandigital.com/index.php [REDIRECT/301]

12 62.111.193.42 - - [24/Apr/2024:15:19:36 -0400] "GET
/resources/blog?start=48 HTTP/1.1" 301 245 r:"-" "Wget/1.21.4"
X:"SAMEORIGIN" 0/9647 1183/6254/245 H:HTTP/1.1 U:/resources/blog gd443 s:301

... more checks against our rewrites ...

13 62.111.193.42 - - [24/Apr/2024:15:19:36 -0400] "GET /index.php
HTTP/1.1" 200 33921 r:"-" "Wget/1.21.4" X:"SAMEORIGIN" 0/129431
573/35481/33921 H:HTTP/1.1 U:/index.php gd443 s:200
Re: Stripping query string except from specific URL [ In reply to ]
On Wed, Apr 24, 2024 at 4:58?PM Dave Wreski
<dwreski@guardiandigital.com.invalid> wrote:

> Hi,
>
> We have a situation where we need to strip a query string from all URLs
>>> except ones matching a particular pattern. However, when I try the rules
>>> below, it redirects to the homepage for some reason.
>>>
>>> In this example, I'd like to strip off the query string from all URLs
>>> except those involving /resources/blog:
>>>
>>> RewriteCond %{REQUEST_URI} !/resources/blog
>>> RewriteCond %{QUERY_STRING} ^start=
>>> RewriteRule (.*) https://guardiandigital.com$1 [L,QSD]
>>>
>>> What am I missing?
>>>
>>> Thanks,
>>> Dave
>>>
>>>
>>>
>> To remove the query string, see the QSD flag, or append a ? at the end of
>> the target.
>>
>> That's what I'm doing, I think. What am I missing? It just redirects to
>> the homepage somehow.
>>
>> Shouldn't I be able to stack RewriteConds in this way, followed by a
>> RewriteRule?
>>
>> I have no idea what could be wrong.
>>
>
> Test with curl, and see if you get redirected after the fact.
>
> I've enabled trace3 to try and figure this out. But line 8 says
> "discarding query string, no parse from substitution" and I don't know why
> or what really that means.
>
> 1 [Wed Apr 24 15:19:36.440500 2024] [rewrite:trace2] [pid 748062:tid
> 748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - - [
> guardiandigital.com/sid#55743f0bbb58][rid#7f9cd4016af0/initial
> <http://guardiandigital.com/sid#55743f0bbb58][rid%237f9cd4016af0/initial>]
> init rewrite engine with requested uri /resources/blog
>
> 2 [Wed Apr 24 15:19:36.445306 2024] [rewrite:trace1] [pid 748062:tid
> 748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - - [
> guardiandigital.com/sid#55743f0bbb58][rid#7f9cd4016af0/initial
> <http://guardiandigital.com/sid#55743f0bbb58][rid%237f9cd4016af0/initial>]
> pass through /resources/blog
>
> 3 [Wed Apr 24 15:19:36.449369 2024] [rewrite:trace3] [pid 748062:tid
> 748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - - [
> guardiandigital.com/sid#55743f0bbb58][rid#7f9cd4016af0/initial
> <http://guardiandigital.com/sid#55743f0bbb58][rid%237f9cd4016af0/initial>]
> [perdir /home/docroot/] applying pattern '.*' to uri 'resources/blog'
>
> 4 [Wed Apr 24 15:19:36.449413 2024] [rewrite:trace2] [pid 748062:tid
> 748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - - [
> guardiandigital.com/sid#55743f0bbb58][rid#7f9cd4016af0/initial
> <http://guardiandigital.com/sid#55743f0bbb58][rid%237f9cd4016af0/initial>]
> [perdir /home/docroot/] rewrite 'resources/blog' -> 'index.php'
>
> 5 [Wed Apr 24 15:19:36.449453 2024] [rewrite:trace1] [pid 748062:tid
> 748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - - [
> guardiandigital.com/sid#55743f0bbb58][rid#7f9cd4016af0/initial
> <http://guardiandigital.com/sid#55743f0bbb58][rid%237f9cd4016af0/initial>]
> [perdir /home/docroot/] internal redirect with /index.php [INTERNAL
> REDIRECT]
>
> 6 [Wed Apr 24 15:19:36.449830 2024] [rewrite:trace3] [pid 748062:tid
> 748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - - [
> guardiandigital.com/sid#55743f0bbb58][rid#7f9ccc0e6000/initial/redir#1
> <http://guardiandigital.com/sid#55743f0bbb58][rid%237f9ccc0e6000/initial/redir%231>]
> applying pattern '(.*)' to uri '/index.php'
>
> 7 [Wed Apr 24 15:19:36.449848 2024] [rewrite:trace2] [pid 748062:tid
> 748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - - [
> guardiandigital.com/sid#55743f0bbb58][rid#7f9ccc0e6000/initial/redir#1
> <http://guardiandigital.com/sid#55743f0bbb58][rid%237f9ccc0e6000/initial/redir%231>]
> rewrite '/index.php' -> 'https://guardiandigital.com/index.php'
>
> 8 [Wed Apr 24 15:19:36.449857 2024] [rewrite:trace2] [pid 748062:tid
> 748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - - [
> guardiandigital.com/sid#55743f0bbb58][rid#7f9ccc0e6000/initial/redir#1
> <http://guardiandigital.com/sid#55743f0bbb58][rid%237f9ccc0e6000/initial/redir%231>]
> discarding query string, no parse from substitution
>
> 9 [Wed Apr 24 15:19:36.449864 2024] [rewrite:trace2] [pid 748062:tid
> 748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - - [
> guardiandigital.com/sid#55743f0bbb58][rid#7f9ccc0e6000/initial/redir#1
> <http://guardiandigital.com/sid#55743f0bbb58][rid%237f9ccc0e6000/initial/redir%231>]
> explicitly forcing redirect with https://guardiandigital.com/index.php
>
> 10 [Wed Apr 24 15:19:36.449871 2024] [rewrite:trace1] [pid 748062:tid
> 748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - - [
> guardiandigital.com/sid#55743f0bbb58][rid#7f9ccc0e6000/initial/redir#1
> <http://guardiandigital.com/sid#55743f0bbb58][rid%237f9ccc0e6000/initial/redir%231>]
> escaping https://guardiandigital.com/index.php for redirect
>
> 11 [Wed Apr 24 15:19:36.449880 2024] [rewrite:trace1] [pid 748062:tid
> 748212] mod_rewrite.c(493): [client 62.111.193.42:0] 62.111.193.42 - - [
> guardiandigital.com/sid#55743f0bbb58][rid#7f9ccc0e6000/initial/redir#1
> <http://guardiandigital.com/sid#55743f0bbb58][rid%237f9ccc0e6000/initial/redir%231>]
> redirect to https://guardiandigital.com/index.php [REDIRECT/301]
>
> 12 62.111.193.42 - - [24/Apr/2024:15:19:36 -0400] "GET
> /resources/blog?start=48 HTTP/1.1" 301 245 r:"-" "Wget/1.21.4"
> X:"SAMEORIGIN" 0/9647 1183/6254/245 H:HTTP/1.1 U:/resources/blog gd443 s:301
>
> ... more checks against our rewrites ...
>
> 13 62.111.193.42 - - [24/Apr/2024:15:19:36 -0400] "GET /index.php
> HTTP/1.1" 200 33921 r:"-" "Wget/1.21.4" X:"SAMEORIGIN" 0/129431
> 573/35481/33921 H:HTTP/1.1 U:/index.php gd443 s:200
>
>
>
>
>
>
It did exactly what you asked, yes.

Further, I asked you to use curl to see if you get redirected from
https://guardiandigital.com/index.php to another URL, but you seem to have
ignored that part of the answer.
Re: Stripping query string except from specific URL [ In reply to ]
> 13 62.111.193.42 - - [24/Apr/2024:15:19:36 -0400] "GET /index.php
> HTTP/1.1" 200 33921 r:"-" "Wget/1.21.4" X:"SAMEORIGIN" 0/129431
> 573/35481/33921 H:HTTP/1.1 U:/index.php gd443 s:200
>
>
> It did exactly what you asked, yes.
>
> Further, I asked you to use curl to see if you get redirected from
> https://guardiandigital.com/index.php to another URL, but you seem to
> have ignored that part of the answer.

My apologies - the output was from wget, as that's what I typically use.

$ curl 'https://guardiandigital.com/resources/blog?start=48'
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a
href="https://guardiandigital.com/index.php">here</a>.</p>
</body></html>
Re: Stripping query string except from specific URL [ In reply to ]
On Wed, Apr 24, 2024 at 7:05?PM Dave Wreski
<dwreski@guardiandigital.com.invalid> wrote:

>
> 13 62.111.193.42 - - [24/Apr/2024:15:19:36 -0400] "GET /index.php
>> HTTP/1.1" 200 33921 r:"-" "Wget/1.21.4" X:"SAMEORIGIN" 0/129431
>> 573/35481/33921 H:HTTP/1.1 U:/index.php gd443 s:200
>>
>
> It did exactly what you asked, yes.
>
> Further, I asked you to use curl to see if you get redirected from
> https://guardiandigital.com/index.php to another URL, but you seem to
> have ignored that part of the answer.
>
> My apologies - the output was from wget, as that's what I typically use.
>
> $ curl 'https://guardiandigital.com/resources/blog?start=48'
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> <html><head>
> <title>301 Moved Permanently</title>
> </head><body>
> <h1>Moved Permanently</h1>
> <p>The document has moved <a href="https://guardiandigital.com/index.php"
> <https://guardiandigital.com/index.php>>here</a>.</p>
> </body></html>
>
>
>
The next step is to find out where the 301 is coming from - your rules will
generate a 302.
Re: Stripping query string except from specific URL [ In reply to ]
Hi,

I'm really quite stuck and hoped you could help.

> My apologies - the output was from wget, as that's what I
> typically use.
>
> $ curl 'https://guardiandigital.com/resources/blog?start=48'
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> <html><head>
> <title>301 Moved Permanently</title>
> </head><body>
> <h1>Moved Permanently</h1>
> <p>The document has moved <a
> href="https://guardiandigital.com/index.php"
> <https://guardiandigital.com/index.php>>here</a>.</p>
> </body></html>
>
>
>
> The next step is to find out where the 301 is coming from - your rules
> will generate a 302.

That may have been the result of me trying many different things and
getting a bit confused (again). Here's what I know - when I insert the
following code into my virtual host config, it strips the query string
off the pages that don't involve /resources/blog, but redirects to a 404
when attempting to access a page involving "/resources/blog" and the
"?start=" query string.

RewriteCond %{REQUEST_URI} !/resources/blog
RewriteCond %{QUERY_STRING} ^start=\d+$
RewriteRule (.*)       /$1?    [L,R=301,QSD]

[Sun Apr 28 15:40:02.614893 2024] ... rewrite 'resources/blog' ->
'index.php'
[Sun Apr 28 15:40:02.614921 2024] ... internal redirect with /index.php
[INTERNAL REDIRECT]

If I don't involve the first RewriteCond, it successfully strips off the
start= from every URL I tried.

What does "INTERNAL REDIRECT" mean? Is that something done outside of
apache? Perhaps by joomla? I believe there are other relevant redirects
after these, but it's very difficult to isolate what's relevant.
Re: Stripping query string except from specific URL [ In reply to ]
On Sun, Apr 28, 2024 at 4:05?PM Dave Wreski
<dwreski@guardiandigital.com.invalid> wrote:

> Hi,
>
> I'm really quite stuck and hoped you could help.
>
> My apologies - the output was from wget, as that's what I typically use.
>>
>> $ curl 'https://guardiandigital.com/resources/blog?start=48'
>> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
>> <html><head>
>> <title>301 Moved Permanently</title>
>> </head><body>
>> <h1>Moved Permanently</h1>
>> <p>The document has moved <a href="https://guardiandigital.com/index.php"
>> <https://guardiandigital.com/index.php>>here</a>.</p>
>> </body></html>
>>
>>
>>
> The next step is to find out where the 301 is coming from - your rules
> will generate a 302.
>
> That may have been the result of me trying many different things and
> getting a bit confused (again). Here's what I know - when I insert the
> following code into my virtual host config, it strips the query string off
> the pages that don't involve /resources/blog, but redirects to a 404 when
> attempting to access a page involving "/resources/blog" and the "?start="
> query string.
>
> RewriteCond %{REQUEST_URI} !/resources/blog
> RewriteCond %{QUERY_STRING} ^start=\d+$
> RewriteRule (.*) /$1? [L,R=301,QSD]
>
> [Sun Apr 28 15:40:02.614893 2024] ... rewrite 'resources/blog' ->
> 'index.php'
> [Sun Apr 28 15:40:02.614921 2024] ... internal redirect with /index.php
> [INTERNAL REDIRECT]
>
> If I don't involve the first RewriteCond, it successfully strips off the
> start= from every URL I tried.
>
> What does "INTERNAL REDIRECT" mean? Is that something done outside of
> apache? Perhaps by joomla? I believe there are other relevant redirects
> after these, but it's very difficult to isolate what's relevant.
>
>
>
The internal redirect is the result of your rewrite rule, without a fully
qualified URL as a target.

Side note: the "rewrite 'resources/blog' -> 'index.php'" line seems to
contradict your RewriteCond logic, so increasing the verbosity of the
logging and looking at the previous lines will help fix that.
Re: Stripping query string except from specific URL [ In reply to ]
> RewriteCond %{REQUEST_URI} !/resources/blog
> RewriteCond %{QUERY_STRING} ^start=\d+$
> RewriteRule (.*)       /$1?    [L,R=301,QSD]
>
> [Sun Apr 28 15:40:02.614893 2024] ... rewrite 'resources/blog' ->
> 'index.php'
> [Sun Apr 28 15:40:02.614921 2024] ... internal redirect with
> /index.php [INTERNAL REDIRECT]
>
> If I don't involve the first RewriteCond, it successfully strips
> off the start= from every URL I tried.
>
> What does "INTERNAL REDIRECT" mean? Is that something done outside
> of apache? Perhaps by joomla? I believe there are other relevant
> redirects after these, but it's very difficult to isolate what's
> relevant.
>
> The internal redirect is the result of your rewrite rule, without a
> fully qualified URL as a target.
>
> Side note: the "rewrite 'resources/blog' -> 'index.php'" line seems to
> contradict your RewriteCond logic, so increasing the verbosity of the
> logging and looking at the previous lines will help fix that.

I increased it to trace5, and it did reveal more useful info.

[Sun Apr 28 21:55:36.542349 2024] ...  RewriteCond:
input='/resources/blog' pattern='!/resources/blog' => not-matched

It looks like after this it just moved on to the next rewriterule, not
the next rewritecond as part of this block, of sorts. I was assuming it
was more of an AND statement, like "if URI is NOT /resources/blog AND
query string contains start=..., then apply the following rewrite rule,
but that's apparently not how it works.

I only want the rewrite rule above to apply to URLs that don't involve
our blog.

And because the first RewriteCond isn't matched, it doesn't check the
second RewriteCond, and therefore treats the RewriteRule as a standalone
and not part of the previous RewriteRule, so then just redirects to the
root, apparently still with the start= query string attached.

How do I write the logic such that it applies to every URL EXCEPT those
I specify?