Mailing List Archive

Ignore utm_* values with varnish?
Hi,

We are using varnish 4.1.2 for our website caching. We use bunch of
standard query parameters (like utm*) to track the channels for our
website visits - this is quite standard in the web world.

Can I 'ignore' query string variables before pulling matching objects
from the cache, but not actually remove them from the URL to the end-user?

For example, all the marketing|utm_source|,|utm_campaign|,|utm_*|values
don't change the content of the page, they just vary a lot from campaign
to campaign and are used by all of our client-side tracking.

So this also means that the URL can't change on the client side, but it
should somehow be 'normalized' in the cache.

Essentially I want all of these...

|http://example.com/page/?utm_source=google|

|http://example.com/page/?utm_source=facebook&utm_content=123|

|http://example.com/page/?utm_campaign=usa|

... to all access HIT the cache for|http://example.com/page/|

However, this URL would cause a MISS (because the param is not a utm_*
param)

|http://example.com/page/?utm_source=google&variation=5|

Would trigger the cache for

|http://example.com/page/?variation=5|

Also, keeping in mind that the URL the user sees must remain the same, I
can't redirect to something without params or any kind of solution like
that.

Would appreciate if you could help me with the above to increase the
performance of our site.

Thanks,

Pinakee
Re: Ignore utm_* values with varnish? [ In reply to ]
On Thu, Oct 12, 2017 at 1:56 PM, Pinakee BIswas <pinakee@waltzz.com> wrote:
> Hi,
>
> We are using varnish 4.1.2 for our website caching. We use bunch of standard
> query parameters (like utm*) to track the channels for our website visits -
> this is quite standard in the web world.

You should upgrade right away to 4.1.8:

https://varnish-cache.org/security/VSV00001.html

> Can I 'ignore' query string variables before pulling matching objects from
> the cache, but not actually remove them from the URL to the end-user?
>
> For example, all the marketing utm_source, utm_campaign, utm_* values don't
> change the content of the page, they just vary a lot from campaign to
> campaign and are used by all of our client-side tracking.
>
> So this also means that the URL can't change on the client side, but it
> should somehow be 'normalized' in the cache.
>
> Essentially I want all of these...
>
> http://example.com/page/?utm_source=google
>
> http://example.com/page/?utm_source=facebook&utm_content=123
>
> http://example.com/page/?utm_campaign=usa
>
> ... to all access HIT the cache for http://example.com/page/
>
> However, this URL would cause a MISS (because the param is not a utm_*
> param)
>
> http://example.com/page/?utm_source=google&variation=5
>
> Would trigger the cache for
>
> http://example.com/page/?variation=5
>
> Also, keeping in mind that the URL the user sees must remain the same, I
> can't redirect to something without params or any kind of solution like
> that.
>
> Would appreciate if you could help me with the above to increase the
> performance of our site.

May I suggest vmod-querystring?

https://github.com/Dridi/libvmod-querystring#vmod-querystring

Dridi
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Ignore utm_* values with varnish? [ In reply to ]
> Can I 'ignore' query string variables before pulling matching objects from the cache, but not actually remove them from the URL to the end-user?

The quickest ‘hack’ is to strip those parameters from the req.url, for a copy/paste’able example, please see here: https://github.com/mattiasgeniar/varnish-4.0-configuration-templates/blob/master/default.vcl#L111-L115

Mattias

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Ignore utm_* values with varnish? [ In reply to ]
On Thu, Oct 12, 2017 at 2:10 PM, Mattias Geniar <mattias@nucleus.be> wrote:
>> Can I 'ignore' query string variables before pulling matching objects from the cache, but not actually remove them from the URL to the end-user?
>
> The quickest ‘hack’ is to strip those parameters from the req.url, for a copy/paste’able example, please see here: https://github.com/mattiasgeniar/varnish-4.0-configuration-templates/blob/master/default.vcl#L111-L115

You can indeed do it in pure VCL, but for long URLs it also means a
lot more workspace consumption. If you want to increase you
performance even further, vmod-querystring can sort too (if
appropriate). The difference with std.querysort is that a
vmod-querystring filter will both sanitize your URL and do the sorting
with the same memory footprint: no extra cost (except CPU time
obviously) comes from the sort operation.

Another interesting feature is the ability to whitelist query-params
instead, this way you may only retain what your application needs and
not care when the next campaign doesn't use Google Analytics' utm_*
parameters, they will be filtered out already.

Cheers
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Ignore utm_* values with varnish? [ In reply to ]
> You can indeed do it in pure VCL, but for long URLs it also means a
> lot more workspace consumption.

Oh absolutely, long-term vmod’s are the way to go, but depending
on the server setup, those can be cumbersome to install & get going
since they get compiled from source. Not always convenient on
servers.

If Pinakee is looking for a stable, supported solution, vmod’s should
definitely be on top of his list.

Mattias

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Ignore utm_* values with varnish? [ In reply to ]
Thanks for all the insights.

I am not familiar with complexity of the setup of vmod. Would have to
look into it and then take a call in terms of our project planning.
Would have to gaze into all the features provided by vmod and take a
call based on effort vs the need/what we could get out of vmod.

But certainly stable and supported solution would be the way to go for
the long term.

Thanks,

Pinakee


On 12/10/17 6:01 pm, Mattias Geniar wrote:
>> You can indeed do it in pure VCL, but for long URLs it also means a
>> lot more workspace consumption.
> Oh absolutely, long-term vmod’s are the way to go, but depending
> on the server setup, those can be cumbersome to install & get going
> since they get compiled from source. Not always convenient on
> servers.
>
> If Pinakee is looking for a stable, supported solution, vmod’s should
> definitely be on top of his list.
>
> Mattias
>

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Ignore utm_* values with varnish? [ In reply to ]
On Thu, Oct 12, 2017 at 2:31 PM, Mattias Geniar <mattias@nucleus.be> wrote:
>> You can indeed do it in pure VCL, but for long URLs it also means a
>> lot more workspace consumption.
>
> Oh absolutely, long-term vmod’s are the way to go, but depending
> on the server setup, those can be cumbersome to install & get going
> since they get compiled from source. Not always convenient on
> servers.

Yeah, if it's not already available in repositories, some will give
up. I added rpm and dpkg (experimental) packaging to vmod-querystring
to help a bit, but you still have to build it from source... I'm not
hosting apt or yum repositories myself (and we still have an open
question regarding vmod packaging and varnish upgrades).

> If Pinakee is looking for a stable, supported solution, vmod’s should
> definitely be on top of his list.
>
> Mattias

Cheers
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc