Mailing List Archive

URL rewriting / ignoring query parameters?
Hello all:

I'm using varnish to act as a front-end cache for yet another social
networking widget site. Our SWFs are presented with a tracking ID in
the URI request which allows us to know which instance of the same
widget in the wild is being viewed the most. The requests look like:

GET http://content.razz.com/mixer/playerdefaulta.swf?
u=2743&s=28&o=181618
GET http://content.razz.com/mixer/playerdefaulta.swf?
u=2743&s=25&o=186119
GET http://content.razz.com/mixer/playerdefaulta.swf?u=178170&s=1

All 3 of these requests serve the same SWF (playerdefaulta.swf). At
the moment, we use squid for our reverse-proxy solution. I'm using
squirm to provide regexp-rewriting logic within squid since otherwise
squid would cache 3 separate objects to serve each of these requests,
when in reality the backend makes no differentiation between which
object is served (playerdefaulta.swf)

I've tried poking through the varnish documentation I could find, and
after reviewing the mailing list and the man page on vcl I'm not
certain if it's possible to perform the following tasks in varnish w/
r/t these requests:

1) log the request exactly as it came from the client. We use these
logs to track which distinct widget in the wild was viewed.
2) instruct varnish to ignore the query parameters and only cache one
instance of the swf for all of these requests.

I hope I've explained things adequately. Thanks in advance for your
consideration, and I'm really glad someone's tackling reverse-proxy
with a project specifically designed to address it!

Best,
-t
URL rewriting / ignoring query parameters? [ In reply to ]
In message <DB38C99E-0E5F-4FD0-8505-3ED870F758CB at razz.com>, Tom Pepper writes:

>1) log the request exactly as it came from the client. We use these
>logs to track which distinct widget in the wild was viewed.

Varnish will alway record the request exactly as received.

>2) instruct varnish to ignore the query parameters and only cache one
>instance of the swf for all of these requests.

sub vcl_recv {
set req.url = regsub(req.url, "?.*", "");
}

should do it.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
URL rewriting / ignoring query parameters? [ In reply to ]
HI Poul:

Thanks for what should have probably been a glaringly obvious
answer. I now have a somewhat more strange problem.

My disturbingly simple vcl_recv routine is presently:

backend default {
set backend.host = "nyp-web-3.corp.razz.com";
set backend.port = "80";
}

sub vcl_recv {

set req.backend = default;
set req.http.host = "www.razz.com";

// change all /static/ requests into /mixer/ requests
if (req.url ~ "/static/") {
set req.url = regsub(req.url, "/static/", "/mixer/");
}

// strip query parameters from all swf requests (so they cache
as a single object)
if (req.url ~ "\.swf?.*") {
set req.url = regsub(req.url, "\.swf?.*", "\.swf");
}

if (req.request != "GET" && req.request != "HEAD") {
pipe;
}

if (req.http.Expect) {
pipe;
}

if (req.http.Authenticate || req.http.Cookie) {
pass;
}

lookup;

}

This would (to my eyes) appear to closely mirror the default example
in vcf's manpage. However, in practice, when running under this
configuration, many distinct requests seem to retrieve the same
document from the cache. For example:

1) client requests http://varnish:10080/ -- varnishd returns / off
the backend correctly.
2) client requests /css/banner.css - varnishd returns correct file /
css/banner.css off of backend.
3) client requests /css/default.css - varnishd returns banner.css
4) client requests /images/blank.gif - varnishd returns banner.css

and so on. commenting out the entire routine seems to get things
functioning, but has the caveat (according to the log) that there's a
fetch for every request, probably due to the fact that the browser
presents a cookie used site-wide, which per the above config would
seem to force a pass on every request. am i doing something that
confuses the hash algorithm? i'm invoking currently as:

varnishd -a 192.168.10.100:10080 -T 192.168.10.100:10088 -f /etc/
varnish/main.vcl -s file,/cache/varnish_storage.bin,1G -g nobody -u
nobody

i issue a url.purge .* before each test run.

Thanks again,
-Tom


On Sep 10, 2007, at 11:35 AM, Poul-Henning Kamp wrote:

> In message <DB38C99E-0E5F-4FD0-8505-3ED870F758CB at razz.com>, Tom
> Pepper writes:
>
>> 1) log the request exactly as it came from the client. We use these
>> logs to track which distinct widget in the wild was viewed.
>
> Varnish will alway record the request exactly as received.
>
>> 2) instruct varnish to ignore the query parameters and only cache one
>> instance of the swf for all of these requests.
>
> sub vcl_recv {
> set req.url = regsub(req.url, "?.*", "");
> }
>
> should do it.
>
> --
> Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
> phk at FreeBSD.ORG | TCP/IP since RFC 956
> FreeBSD committer | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by
> incompetence.
URL rewriting / ignoring query parameters? [ In reply to ]
In message <78234BEF-BBF3-489E-A8A7-BD177EE00FDC at razz.com>, Tom Pepper writes:
>HI Poul:
>
>Thanks for what should have probably been a glaringly obvious
>answer. I now have a somewhat more strange problem.

>1) client requests http://varnish:10080/ -- varnishd returns / off
>the backend correctly.
>2) client requests /css/banner.css - varnishd returns correct file /
>css/banner.css off of backend.
>3) client requests /css/default.css - varnishd returns banner.css
>4) client requests /images/blank.gif - varnishd returns banner.css

Sounds like one of the rewrites are going haywire.

My best suggestion is to study the varnislog output and try to
find out what exactly is going on.

You may want to enable the "vcl.trace" parameter, that will give
you log records of each file.line.pos of VCL code executed.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
URL rewriting / ignoring query parameters? [ In reply to ]
"Poul-Henning Kamp" <phk at phk.freebsd.dk> writes:
> sub vcl_recv {
> set req.url = regsub(req.url, "?.*", "");
> }

You need to quote the ?...

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no
URL rewriting / ignoring query parameters? [ In reply to ]
Hi Poul:

Thanks for the suggestions. Removing this routine seems to have
fixed my issues:

sub vcl_hash {
hash;
}

It appears, contrary to the vcl(7) manpage, to be different than the
functional default. Is there an updated routine which would better
match? From my experience, cutting and pasting the example out of
the man page into the first working example seems to have issues (all
of my hits came back in the logs with the same hash id when I had
that routine as part of my vcl.)

Best,
-t


On Sep 10, 2007, at 3:34 PM, Poul-Henning Kamp wrote:

> In message <78234BEF-BBF3-489E-A8A7-BD177EE00FDC at razz.com>, Tom
> Pepper writes:
>> HI Poul:
>>
>> Thanks for what should have probably been a glaringly obvious
>> answer. I now have a somewhat more strange problem.
>
>> 1) client requests http://varnish:10080/ -- varnishd returns / off
>> the backend correctly.
>> 2) client requests /css/banner.css - varnishd returns correct file /
>> css/banner.css off of backend.
>> 3) client requests /css/default.css - varnishd returns banner.css
>> 4) client requests /images/blank.gif - varnishd returns banner.css
>
> Sounds like one of the rewrites are going haywire.
>
> My best suggestion is to study the varnislog output and try to
> find out what exactly is going on.
>
> You may want to enable the "vcl.trace" parameter, that will give
> you log records of each file.line.pos of VCL code executed.
>
> --
> Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
> phk at FreeBSD.ORG | TCP/IP since RFC 956
> FreeBSD committer | BSD since 4.3-tahoe
> Never attribute to malice what can adequately be explained by
> incompetence.
URL rewriting / ignoring query parameters? [ In reply to ]
In message <51DCE61D-8289-4244-9ED7-8C58FB99A310 at razz.com>, Tom Pepper writes:
>Hi Poul:
>
>Thanks for the suggestions. Removing this routine seems to have
>fixed my issues:
>
> sub vcl_hash {
> hash;
> }
>

Ahh yes, that wouldn't work, I overlooked that.

I think that may be the vcl(7) being a bit behind.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
URL rewriting / ignoring query parameters? [ In reply to ]
Tom Pepper <tom at razz.com> writes:
> From my experience, cutting and pasting the example out of the man
> page into the first working example seems to have issues (all of my
> hits came back in the logs with the same hash id when I had that
> routine as part of my vcl.)

Why did you even cut and paste that into your config? It is not an
example, but a listing of the built-in configuration. Copying it into
your config is the best way to ensure that you will be royally screwed
every time you upgrade to a new version.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no
URL rewriting / ignoring query parameters? [ In reply to ]
um, because i am teh crazy?

kidding aside, i was attempting to establish a baseline config from
which i could get a grasp of how varnish works through each
particular routine. i'll be the first to agree that it's not a
brilliant idea for production, but was rather hoping to have a solid
grasp of what each routine within VCL was doing, exactly. call it
occupational curiosity, I suppose, and a small plea from your end
users to give the documentation a cursory glance when upgrades take
place.

Thanks!
-t

On Sep 11, 2007, at 9:31 AM, Dag-Erling Sm?rgrav wrote:

> Tom Pepper <tom at razz.com> writes:
>> From my experience, cutting and pasting the example out of the man
>> page into the first working example seems to have issues (all of my
>> hits came back in the logs with the same hash id when I had that
>> routine as part of my vcl.)
>
> Why did you even cut and paste that into your config? It is not an
> example, but a listing of the built-in configuration. Copying it into
> your config is the best way to ensure that you will be royally screwed
> every time you upgrade to a new version.
>
> DES
> --
> Dag-Erling Sm?rgrav
> Senior Software Developer
> Linpro AS - www.linpro.no