Mailing List Archive

Value of builtin.vcl - vcl_recv on modern websites
Hi There,

This may be a biblical / philosophical question but when teaching people
Varnish ive been struggling to justify leveraging the builtin.vcl - Mainly
vcl_recv in builtin.vcl

Since Varnish 1.0 the builtin (or default) vcl_recv has had this statement:
if (req.http.Authorization || req.http.Cookie) {

/* Not cacheable by default */

return (pass);

}

My issue with this is the req.http.Cookie check, In any modern website
cookies are always present.

This means that out of the box Varnish doesnt cache any content, which may
be an ok safe starting point but without the cookie check the default
implementation of most webservers/applications would cache static content
and not cache on dynamic content (HTML) - via cache-control / set-cookie
response headers etc - A good outcome.

Aside from the above poor caching start point (which could be ok from a low
risk kickoff perspective), biggest issue with the cookie check is that it
forces the user into a cookie management strategy.

The most common scenario we hit is that users try to "do the right thing"
and remove individual cookies so that they can fall through to underlying
vcl_recv cookie check.
The ends in disaster when marketing departments, other parts of the IT
departments or anyone involved in the website such as SEO agencies; add an
additional cookie to the site. A classic example of this is someone adding
javascript via Google Tag Manager which then sets a cookie.

The outcome of the above scenario is that suddenly the cache stops doing
anything because there is a new cookie that is not "managed" and hence all
requests, both static and dynamic "pass" via the underlying builtin.vcl
logic.

Do you still recommend configurations fall through to the underlying
vcl_recv logic?

Options i can think of:
1) Build lots of cookie whitelist/blacklist functionality above builtin.vcl
so the underlying logic doesnt break things

2) Remove cookies entirely for some request types (such as static objects)
so the underlying logic always works for some content types - My experience
is that this generates issues on some customers sites as they have static
handlers that are looking for a cookie in the origin and then do
redirects/change response content if no cookies are present.

3) Explicitly handle all scenarios and return(pass) or return(hash) to
avoid vcl_recv in builtin.vcl (and lift up the good bits of vcl_recv into
the main config)

Interested in your views, As i work on internet facing websites - I would
have thought this was the most common scenario but maybe there are more
users doing other things with Varnish or i'm missing something simple in
terms of handling cookies?

Matt
Re: Value of builtin.vcl - vcl_recv on modern websites [ In reply to ]
On Wed, Jul 12, 2017 at 9:14 AM, Matthew Johnson <matt@section.io> wrote:
> Hi There,
>
> This may be a biblical / philosophical question but when teaching people
> Varnish ive been struggling to justify leveraging the builtin.vcl - Mainly
> vcl_recv in builtin.vcl
>
> Since Varnish 1.0 the builtin (or default) vcl_recv has had this statement:
> if (req.http.Authorization || req.http.Cookie) {
>
> /* Not cacheable by default */
>
> return (pass);
>
> }

I can see a couple reasons:

- HTTP (in general) is poorly designed
- Cookie don't integrate well with other HTTP mechanisms
- Web applications are often poorly implemented

Not trying to be offensive, just factual. I could pinpoint the problems
in HTTP, but I already have a blog post [1] and an unfinished draft
for that. Not in the mood to duplicate that effort here, I don't really
enjoy digging in that area.

> My issue with this is the req.http.Cookie check, In any modern website
> cookies are always present.
>
> This means that out of the box Varnish doesnt cache any content, which may
> be an ok safe starting point but without the cookie check the default
> implementation of most webservers/applications would cache static content
> and not cache on dynamic content (HTML) - via cache-control / set-cookie
> response headers etc - A good outcome.

The problem is that we've often seen inconsistent cache directives,
untrustworthy backends. You could very well cache dynamic contents,
there's no point in using Varnish if you only cache static resources.
The problem is when a backend sends user-specific contents and
doesn't say so (Vary header) then you risk an information leak.

Varnish doesn't cache by default for this reason.

> Aside from the above poor caching start point (which could be ok from a low
> risk kickoff perspective), biggest issue with the cookie check is that it
> forces the user into a cookie management strategy.

If your backend speaks HTTP fluently and provides correct support for
cookies and caching, the aforementioned blog post [1] gives you a
simple solution to deal with that in pure VCL.

<snip>

> Do you still recommend configurations fall through to the underlying
> vcl_recv logic?
>
> Options i can think of:
> 1) Build lots of cookie whitelist/blacklist functionality above builtin.vcl
> so the underlying logic doesnt break things

This can be avoided with the blog post [1] trick if you are confident
your backend won't make Varnish leak information.

> 2) Remove cookies entirely for some request types (such as static objects)
> so the underlying logic always works for some content types - My experience
> is that this generates issues on some customers sites as they have static
> handlers that are looking for a cookie in the origin and then do
> redirects/change response content if no cookies are present.

So now you need to make assumptions about resources not managed by
Varnish. A common pattern is to remove cookies when the path
terminates with a "static resource" file extension. And then you run
into applications that generate images on the fly and need cookies.

> 3) Explicitly handle all scenarios and return(pass) or return(hash) to avoid
> vcl_recv in builtin.vcl (and lift up the good bits of vcl_recv into the
> main config)

While I lean towards composing on top of the built-in, I don't mind
this approach.

> Interested in your views, As i work on internet facing websites - I would
> have thought this was the most common scenario but maybe there are more
> users doing other things with Varnish or i'm missing something simple in
> terms of handling cookies?

It can be simple only if your backend is reliable when it comes to
cookie handling and caching. Otherwise you're in for a VCL soup of
cookie filtering...

Dridi

[1] https://info.varnish-software.com/blog/yet-another-post-on-caching-vs-cookies

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Value of builtin.vcl - vcl_recv on modern websites [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 07/12/2017 09:14 AM, Matthew Johnson wrote:
>
> Since Varnish 1.0 the builtin (or default) vcl_recv has had this
> statement: if (req.http.Authorization || req.http.Cookie) {
>
> /* Not cacheable by default */
>
> return (pass);
>
> }
>
> My issue with this is the req.http.Cookie check, In any modern
> website cookies are always present.

While I'm sympathetic to all the things you said and don't mean to
disregard it but cutting things short, the answer here is simple:
there is no other choice for the default configuration of a caching
proxy, because those two headers mean that the response may be
personalized.

Many uses of cookies don't have that effect, of course, but Varnish
has no way of knowing that. As bad as the effect of the default config
may seem on sites that use cookies on every request -- Varnish doesn't
cache anything -- it would be much worse if someone sets up Varnish,
doesn't think of the consequences of not changing the default
configuration, and you end up seeing someone else's personal data in
cached responses.

This problem is not specific to Varnish, but to any server that tries
to do what Varnish tries to do. I know from experience that it's
generally futile to say so, but this situation really ought to lead to
some widespread re-thinking throughout the industry. Forgive me for
shouting, soapbox-style, but this gives me an opportunity to sound off
on a pet peeve:

Because of the way cookies interfere with downstream caching.

I have come to the conviction that many uses of cookies are a result
of lazy thinking in app development. Many PHP devs, for example, are
in the habit of saying session_start(); at the beginning of every
script, without thinking twice about whether they really need it. I
have seen uses of cookies where "just toss that thing into a cookie"
was evidently the easy decision to make. I have seen cookies with
values that are 3KB long.

(Sometime over beer I'll tell you about that little database that
someone wanted to transport over a cookie, a base64-encoded CSV file
whose data was *also* base64-encoded, leading to a doubly
base64-encoded cookie value, in every request.)

This is an instance of an issue that you encounter a lot with the use
of Varnish in practice: app development that doesn't think outside of
its own box in terms of functionality and performance. Rather than
thinking about the benefits of handing off some of your work, by
letting someone else serve your cached responses for you.

HTTP was conceived from the beginning to enable caching as a means of
solving performance problems in slow networks. A well-configured
deployment of Varnish shows how beneficial that can be. But the
universal and unreflected use of cookies is one of the forces
presently at work that actively undermine that part of the equation.

> A classic example of this is someone adding javascript via Google
> Tag Manager which then sets a cookie.

One might have hoped that the Googlers, of all people, would have more
awareness of the trouble that they could cause by doing that.

> Do you still recommend configurations fall through to the
> underlying vcl_recv logic?
>
> Options i can think of:

In a project where I am able to work with the app devs, I have had
good experience with working out a policy with them: if you MUST have
cookies in every request (although I WISH YOU WOULD RECONSIDER THAT),
then the caching proxy cannot make caching decisions on your behalf.
Only you can know if your response is cacheable, despite the presence
of cookie foo or bar, but is not cacheable if the cookie is baz.

So if you want your response to be cached, you MUST say so in a
Cache-Control header. The proxy will not cache any other responses.

Then we write VCL to bypass builtin's vcl_recv, and start Varnish with
- -t 0 (default TTL is 0s). Responses are then cached only if they
announce that they are cacheable.

Of course, this has the effect that you're lamenting -- Varnish
doesn't cache anything by default -- but in my experience, the result
is that devs have become very good at thinking about the cacheability
of their responses.

That boils down to answering your question by saying no, you can't use
builtin vcl_recv in a situation like that. When the cookies, like the
Evil, are always and everywhere (to paraphrase a saying in Germany),
and some cookies lead to cacheable responses while others don't, then
there's no other option for a caching proxy.


Best,
Geoff
- --
** * * UPLEX - Nils Goroll Systemoptimierung

Scheffelstra?e 32
22301 Hamburg

Tel +49 40 2880 5731
Mob +49 176 636 90917
Fax +49 40 42949753

http://uplex.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJZZen0AAoJEOUwvh9pJNURnWEQAK9ucYSXEcrEwbmOrroBWoGK
iR9a8OFhst1rVKFQ2vNTUpw+OYM8vf8SJYToyq2VxG5/f/uGsT6nSVRWGKThgeV1
geyyUQfbbDte1at4aFy6HX6LeCt62Si0L9KUZMZMkI5C6m6FKgA/5HecUchhuXdP
Li17DXKvzrQyTBJpvbk2vBZGPVlErnVAUz75IeJMrD6t/WGO0PsZvkC/l8LZhqcD
2S2R9SHYMIyBrWSZm+YsI1DxMwvH6Gt84NRPpKcHQQ7TKEfvtOq0NwoqcNOB26EL
KIfOuVJdbiMvD5D+BZud/7a7UzSpJz5klLMdTcJMN60MrHJjGcok/5KiG7TNNowj
hMy5YUYpuIybsWzcvB5Ie/nteb0WyXt5+LYkxjP9dbN7AN3k+aU1PboSOyqYXO3u
KK1al00LMKHfzMHs1vF3QHRt2Q1Udud6dCdHuw6TyJ7eWCc9YGgU8NyboMLkXBhO
fBVNUQQjfNjaDWhKFvUIMEsGZhgvzzuMvjlZNhkc/lcDLmU8wVXyiMFSoBcuR1sX
7EM1wBUKcKix0wE4QPl9608ql/5LF3Ms+wqDpmS0ECgFIf1yKMWFZt9iHhMUbch2
7fhr77vVjVD1K6nKHqDGOuLp4Cq+lfBJkd7PX2huQUV/hc00C8+NEieD77wuAwk7
OPiGNt5YqmDNjtZmFUVH
=BlAn
-----END PGP SIGNATURE-----

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Value of builtin.vcl - vcl_recv on modern websites [ In reply to ]
On Wed, Jul 12, 2017 at 7:01 PM, Dridi Boukelmoune <dridi@varni.sh> wrote:
> On Wed, Jul 12, 2017 at 9:14 AM, Matthew Johnson <matt@section.io> wrote:
>> Hi There,
>>
>> This may be a biblical / philosophical question but when teaching people
>> Varnish ive been struggling to justify leveraging the builtin.vcl - Mainly
>> vcl_recv in builtin.vcl
>>
>> Since Varnish 1.0 the builtin (or default) vcl_recv has had this statement:
>> if (req.http.Authorization || req.http.Cookie) {
>>
>> /* Not cacheable by default */
>>
>> return (pass);
>>
>> }
>
> I can see a couple reasons:
>
> - HTTP (in general) is poorly designed
> - Cookie don't integrate well with other HTTP mechanisms
> - Web applications are often poorly implemented
>
> Not trying to be offensive, just factual. I could pinpoint the problems
> in HTTP, but I already have a blog post [1] and an unfinished draft
> for that. Not in the mood to duplicate that effort here, I don't really
> enjoy digging in that area.
>

Great blog post (that i haven't seen before), thanks for sharing its
right on topic, looking forward to part 2

>> My issue with this is the req.http.Cookie check, In any modern website
>> cookies are always present.
>>
>> This means that out of the box Varnish doesnt cache any content, which may
>> be an ok safe starting point but without the cookie check the default
>> implementation of most webservers/applications would cache static content
>> and not cache on dynamic content (HTML) - via cache-control / set-cookie
>> response headers etc - A good outcome.
>
> The problem is that we've often seen inconsistent cache directives,
> untrustworthy backends. You could very well cache dynamic contents,
> there's no point in using Varnish if you only cache static resources.
> The problem is when a backend sends user-specific contents and
> doesn't say so (Vary header) then you risk an information leak.
>
> Varnish doesn't cache by default for this reason.
>
>> Aside from the above poor caching start point (which could be ok from a low
>> risk kickoff perspective), biggest issue with the cookie check is that it
>> forces the user into a cookie management strategy.
>
> If your backend speaks HTTP fluently and provides correct support for
> cookies and caching, the aforementioned blog post [1] gives you a
> simple solution to deal with that in pure VCL.

It feels like one way or another a solution is going to be needed
before vcl_recv in builtin.vcl to make the logic work on almost any
web application.

Im wondering if its more logical for new users to override (return)
based based on explicit conditions that they define rather than move
the cookie in and out of scope for different scenarios to achieve the
same outcome (and have them forced to understand how we are being
sneaky with the cookie to avoid the underlying cookie check).

That said the "cookie shuffle" does keep the rest of the good logic in
builtin vcl_recv in scope and saves lifting it up.

>
> <snip>
>
>> Do you still recommend configurations fall through to the underlying
>> vcl_recv logic?
>>
>> Options i can think of:
>> 1) Build lots of cookie whitelist/blacklist functionality above builtin.vcl
>> so the underlying logic doesnt break things
>
> This can be avoided with the blog post [1] trick if you are confident
> your backend won't make Varnish leak information.
>
I like the trick personally, It avoids alot of muckyness in cookie management.

I still come back to whether its easier to teach than option 3 where
there are explicit rules and the good code from builtin.vcl is now
visible to the user in their default.vcl


>> 2) Remove cookies entirely for some request types (such as static objects)
>> so the underlying logic always works for some content types - My experience
>> is that this generates issues on some customers sites as they have static
>> handlers that are looking for a cookie in the origin and then do
>> redirects/change response content if no cookies are present.
>
> So now you need to make assumptions about resources not managed by
> Varnish. A common pattern is to remove cookies when the path
> terminates with a "static resource" file extension. And then you run
> into applications that generate images on the fly and need cookies.
>
>> 3) Explicitly handle all scenarios and return(pass) or return(hash) to avoid
>> vcl_recv in builtin.vcl (and lift up the good bits of vcl_recv into the
>> main config)
>
> While I lean towards composing on top of the built-in, I don't mind
> this approach.

Depending on the customers skills and application complexity I have
been recommending this approach. It seems to feel more logical to new
users of Varnish.

>
>> Interested in your views, As i work on internet facing websites - I would
>> have thought this was the most common scenario but maybe there are more
>> users doing other things with Varnish or i'm missing something simple in
>> terms of handling cookies?
>
> It can be simple only if your backend is reliable when it comes to
> cookie handling and caching. Otherwise you're in for a VCL soup of
> cookie filtering...

Agreed, though nothing is simple! a tricky topic, Thanks for your
views and letting me know this is an area you have crossed recently!

>
> Dridi
>
> [1] https://info.varnish-software.com/blog/yet-another-post-on-caching-vs-cookies

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Value of builtin.vcl - vcl_recv on modern websites [ In reply to ]
> ==> Maybe modern web site SHOULD NOT use cookies on every request!
> Because of the way cookies interfere with downstream caching.

Does it really matter? Cookies or not, they should rather *always*
include at least a Cache-Control header in responses.

> HTTP was conceived from the beginning to enable caching as a means of
> solving performance problems in slow networks. A well-configured

I must politely disagree here. HTTP/0.9 had no such thing, 1.0
introduced limited caching support and 1.1 got its act together but
ended up with broken semantics. Spoiler alert, covering this in the
part 2 draft.

>> A classic example of this is someone adding javascript via Google
>> Tag Manager which then sets a cookie.
>
> One might have hoped that the Googlers, of all people, would have more
> awareness of the trouble that they could cause by doing that.

I don't see this as a bad thing from a technical point of view. If you
need to carry state in requests, you need cookies.

> In a project where I am able to work with the app devs, I have had
> good experience with working out a policy with them: if you MUST have
> cookies in every request (although I WISH YOU WOULD RECONSIDER THAT),
> then the caching proxy cannot make caching decisions on your behalf.
> Only you can know if your response is cacheable, despite the presence
> of cookie foo or bar, but is not cacheable if the cookie is baz.

Here I disagree...

> So if you want your response to be cached, you MUST say so in a
> Cache-Control header. The proxy will not cache any other responses.

...but here I agree. If you need cookies, use cookies, but if you
serve responses in the first place make sure to let downstreams know
what to do with said response. For example it could be cacheable by
the client but not by proxies in-between.

> Then we write VCL to bypass builtin's vcl_recv, and start Varnish with
> - -t 0 (default TTL is 0s). Responses are then cached only if they
> announce that they are cacheable.

+1

Ideally Varnish shouldn't make decisions in the lack of information.
So default TTL and grace periods should ideally be zero and instead
rely on Cache-Control entries (max-age, s-maxage,
stale-while-revalidate...)

> That boils down to answering your question by saying no, you can't use
> builtin vcl_recv in a situation like that. When the cookies, like the
> Evil, are always and everywhere (to paraphrase a saying in Germany),
> and some cookies lead to cacheable responses while others don't, then
> there's no other option for a caching proxy.

Well, the proxy could always believe that the web application did its
homework. That's what nginx does (or so I'm lead to believe) but
experience shows that homework is often skipped for non-business
topics like HTTP that are handed over to the underlying framework or
CMS. I wish more webdevs would realize that HTTP is an application
protocol, not transport (for 5 minutes when HTTP APIs became a thing I
thought it was going to happen).

Dridi

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Value of builtin.vcl - vcl_recv on modern websites [ In reply to ]
> Great blog post (that i haven't seen before), thanks for sharing its
> right on topic, looking forward to part 2

Thank you very much, really appreciated. The current draft for part 2
is complete but not really a good read. I'm having a hard time going
over arcane HTTP contradictions and I don't have time these days to
sit and write.

> It feels like one way or another a solution is going to be needed
> before vcl_recv in builtin.vcl to make the logic work on almost any
> web application.
>
> Im wondering if its more logical for new users to override (return)
> based based on explicit conditions that they define rather than move
> the cookie in and out of scope for different scenarios to achieve the
> same outcome (and have them forced to understand how we are being
> sneaky with the cookie to avoid the underlying cookie check).

It really depends. By default Varnish will pipe requests with unknown
methods because part of the poor design of HTTP is the lack of
semantics of the methods: if you don't already know a method, you
can't anticipate how to handle it. For example with a HEAD request you
may get a positive Content-Length but nevertheless no body. What if an
unknown method has some similar behavior? (this is also true for
response statues like 204.)

So let's imagine that you'd use Varnish in front of a webdav
application, you'd get a lot of legit methods that wouldn't go through
the state machine (straight to vcl_pipe) if you run through the
built-in vcl_recv{}.

> That said the "cookie shuffle" does keep the rest of the good logic in
> builtin vcl_recv in scope and saves lifting it up.
<snip>
> I like the trick personally, It avoids alot of muckyness in cookie management.

Yes, simplest solution I found in the "composing with the built-in" case.

> I still come back to whether its easier to teach than option 3 where
> there are explicit rules and the good code from builtin.vcl is now
> visible to the user in their default.vcl
<snip>
> Depending on the customers skills and application complexity I have
> been recommending this approach. It seems to feel more logical to new
> users of Varnish.

Correct, the built-in vcl_recv{} would still show up with the
`vcl.show -v` command but it would indeed be defeated.

> Agreed, though nothing is simple! a tricky topic, Thanks for your
> views and letting me know this is an area you have crossed recently!

Too frequently I'm afraid ;-)

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc