Mailing List Archive

Varnish
Hi,

We would like to try Varnish with an eye towards putting it in our
production environment; we've pushed Squid about as far as it can go
performance-wise, and we'd have better luck waiting for Godot than
proper HTTP 1.1 support.

We currently use squid to reverse-proxy for a large number of (mostly)
small sites. I've been working with Varnish 1.1.1 and it's passed my
basic "does it work?" tests, but I've come up with a list of how-to
questions that are between us and a full-scale trial deployment.

1) The "Guru Meditation" error messages, while very Amiga-nostalgic,
aren't customer suitable, but appear to be hard-coded in
cache_synthetic.c. If we want nice, pretty error messages, are we
basically on our own, or is there an imminent plan for this?

2) Since we have an extremely large hostname->backend map, we need to
choose the right one efficiently and dynamically on a per connection
basis; we cannot statically configure every possibility into a VCL file.
How can we tie some sort of external lookup (pretty much any sort will
do) into VCL?

3) The log files look like they are great for debugging obscure
complicated problems, but for day-to-day usage, we need something
similar to the squid_access format (timestamp, client IP, URL, status
code, fetch/cache status, bytes). How would we approach this?

4) We would like to limit the number of simultaneous open connections
from a single client IP to 10-16 or so to thwart certain types of
malicious crawlers that open them by the dozens, and kick back a 403
error to extra ones. Is this possible with Varnish?

5) We need to If-Modified-Since: revalidate back to the origin server on
every request, even if 99% of the time it gets a 304 response, in order
to get log files on the back end that awstats can parse. However, we
want to preserve Expires: and max-age values to pass along to the
client, so something as heavy-handed as setting the max TTL to 0
probably would not work. I think this can be done in VCL, I just can't
seem to wrap my head around it. What would be the best way for us to
handle that?

I've been looking at the documentation and source, and will continue to
do so, but if anyone can point me in the right direction on any of these
issues, it would be very much appreciated. Varnish is incredibly cool,
and it's designed right; I would love to see it working on our network.

Thanks for any advice!

Jeff
Varnish [ In reply to ]
In message <46DDFECE.3080004 at wheelhouse.org>, Jeff writes:

>1) The "Guru Meditation" error messages, while very Amiga-nostalgic,
>aren't customer suitable, but appear to be hard-coded in
>cache_synthetic.c. If we want nice, pretty error messages, are we
>basically on our own, or is there an imminent plan for this?

I belive its on our list somewhere.

>2) Since we have an extremely large hostname->backend map, we need to
>choose the right one efficiently and dynamically on a per connection
>basis; we cannot statically configure every possibility into a VCL file.
> How can we tie some sort of external lookup (pretty much any sort will
>do) into VCL?

First of all, VCL is very efficient, so even very large maps in VCL
code will do well.

VCL also has an "include" facility, so you could machinegenerate
that part of your VCL program from your database.

Anyhow, what exactly is "extremely large" in this context ?

>3) The log files look like they are great for debugging obscure
>complicated problems, but for day-to-day usage, we need something
>similar to the squid_access format (timestamp, client IP, URL, status
>code, fetch/cache status, bytes). How would we approach this?

Did you miss the NCSA format writer ?

>4) We would like to limit the number of simultaneous open connections
>from a single client IP to 10-16 or so to thwart certain types of
>malicious crawlers that open them by the dozens, and kick back a 403
>error to extra ones. Is this possible with Varnish?

We have what it takes to implement this, what's missing is the VCL
access to the data.

>5) We need to If-Modified-Since: revalidate back to the origin server on
>every request, even if 99% of the time it gets a 304 response, in order
>to get log files on the back end that awstats can parse. However, we
>want to preserve Expires: and max-age values to pass along to the
>client, so something as heavy-handed as setting the max TTL to 0
>probably would not work. I think this can be done in VCL, I just can't
>seem to wrap my head around it. What would be the best way for us to
>handle that?

This one is tricky.

Our design assumption was that you would want to keep your backend
as much out of the loop as possible and use the varnish logfiles
for your traffic analysis.

Therefore, varnish will either request the object body unconditionally
or not bother the backend at all.

We have some room in the code where we could make this stuff more
flexible, but right now it is not on the todo list.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Varnish [ In reply to ]
"Poul-Henning Kamp" <phk at phk.freebsd.dk> writes:
> In message <46DDFECE.3080004 at wheelhouse.org>, Jeff writes:
> > 5) We need to If-Modified-Since: revalidate back to the origin
> > server on every request, even if 99% of the time it gets a 304
> > response, in order to get log files on the back end that awstats
> > can parse.
> This one is tricky.

Not at all, they should simply use varnishncsa to generate the logs
they need.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no
Varnish [ In reply to ]
Poul-Henning Kamp wrote:
>> If we want nice, pretty error messages, are we
>> >basically on our own, or is there an imminent plan for this?
>
> I belive its on our list somewhere.

Would it be possible to have an "error server" and then hack up some VCL
to "pass through" various errors to that server? I.e, if the cache is
about to through back a 502 Bad Gateway, have some VCL that sets the
backend to pass from:

http://errors.example.com/err502?h=www.example.org&ip=10.11.12.13

(Where errors.example.com is the "error server," www.example.org is the
site the person has trying to reach, and 10.11.12.13 is the client IP?)

> First of all, VCL is very efficient, so even very large maps in VCL
> code will do well.
>
> VCL also has an "include" facility, so you could machinegenerate
> that part of your VCL program from your database.

The list of names is not static; dozens of names are added or removed
throughout a given day, and the overhead of generating/loading a massive
static configuration file on every change would probably be prohibitive.

> Anyhow, what exactly is "extremely large" in this context ?

Between 10^4 and 10^5 active hostname->backend map entries.

Calling out to an external routine for the map lookup sounds inefficient
but with the necessary hashing and the ability to do nonintrusive
dynamic updates, it's an overall win.

Since Varnish already generating and loading dynamic objects, would it
be possible to add support for dynamically extending VCL at runtime?

For example, we could add do something like:

req.backend.host = my_map_lookup(http.header.host)

And then provide a C function that implements my_map_lookup() to do what
we need? (Including provide an "error host" for invalid host values.)

That would be pretty cool, and the ability to write arbitrary extension
functions in C would most likely lend itself to all sorts of other
creative uses.

Unfortunately I highly doubt I'm qualified to do that. When it comes to
extending scripting languages, I either reach for Swig or I give up. :)

> Did you miss the NCSA format writer ?

Apparently I missed it completely. The "combined" format does toss some
highly useful info when used with caches, particularly whether an object
was served from cache or origin, but I presume the varnishncsa util
could fairly readily serve as the template for us to whip up a similar
util that outputs in the format we need. Thanks for the pointer.

(Squid gets a lot of things wrong, but fair's fair, they get a lot of
things right and their log format is one of the latter.)

>> 5) We need to If-Modified-Since: revalidate back to the origin server on
>> every request, [...]
>
> Our design assumption was that you would want to keep your backend
> as much out of the loop as possible and use the varnish logfiles
> for your traffic analysis.

I have no doubt that works very well when frontending one site.
However, it does not scale well in our environment.

Even so, parsing one aggregate stream of Varnish log data into N
individual streams (one for each backend) would probably be doable.

Parsing M streams from a load-balanced cluster of Varnish servers into N
time-collated streams in something approaching real time, however,
sounds like a hard problem. :-)

But if you can think of a viable way to do that, it would no doubt be
much faster than (and therefore superior to) the If-Modified-Since:
approach.

Thanks for the response! Sorry if my questions seem naive; I don't
really understand VCL yet, so I'm making the classic assumption that
everything I don't understand is easy. :-)

Jeff
Varnish [ In reply to ]
In message <46DE6D6A.90009 at wheelhouse.org>, Jeff writes:
>Poul-Henning Kamp wrote:

>>> If we want nice, pretty error messages, are we
>>> >basically on our own, or is there an imminent plan for this?
>>
>> I belive its on our list somewhere.
>
>Would it be possible to have an "error server" and then hack up some VCL
>to "pass through" various errors to that server? I.e, if the cache is
>about to through back a 502 Bad Gateway, have some VCL that sets the
>backend to pass from:

I'm not sure we know exactly how we want to handle it.

One option is to let people rewrite in vcl_error() and restart the
transaction with the (presumably) new url.

>> First of all, VCL is very efficient, so even very large maps in VCL
>> code will do well.
>>
>> VCL also has an "include" facility, so you could machinegenerate
>> that part of your VCL program from your database.
>
>The list of names is not static; dozens of names are added or removed
>throughout a given day, and the overhead of generating/loading a massive
>static configuration file on every change would probably be prohibitive.

Ok, in that case, putting some relevant in-line C code in your VCL
file is probably your best bet.

This is intentionally not well documented, but cases like this is
what it is intended for.

The syntax is:

C{
whatever you want
}C

and it gets copied through exactly at that place.

Use the varnishd "-C" option to see the result before compilation
and good luck :-)


>That would be pretty cool, and the ability to write arbitrary extension
>functions in C would most likely lend itself to all sorts of other
>creative uses.

See above :-)

>>> 5) We need to If-Modified-Since: revalidate back to the origin server on
>>> every request, [...]
>>

>Parsing M streams from a load-balanced cluster of Varnish servers into N
>time-collated streams in something approaching real time, however,
>sounds like a hard problem. :-)

Yes, that is probably true, although not impossible.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Varnish [ In reply to ]
Poul-Henning Kamp wrote:
> I'm not sure we know exactly how we want to handle it.
>
> One option is to let people rewrite in vcl_error() and restart the
> transaction with the (presumably) new url.

Would that be difficult? In such a case, I gather the guru would hang
around to meditate on double-faults, thus avoiding loops?

> putting some relevant in-line C code in your VCL
> file is probably your best bet.
>
> This is intentionally not well documented, but cases like this is
> what it is intended for.
>
> The syntax is:
>
> C{
> whatever you want
> }C

I will take a look at this, but since it is intentionally not well
documented, I am likely to have follow-up questions. :-)

>> Parsing M streams from a load-balanced cluster of Varnish servers into N
>> time-collated streams in something approaching real time, however,
>> sounds like a hard problem. :-)
>
> Yes, that is probably true, although not impossible.

Now, put multiple clusters in multiple cities (in multiple time zones)
and you will begin to understand why I think IMS: and letting each
individual backend do its own logging is such a good idea. :-)

Thanks,
Jeff
Varnish [ In reply to ]
In message <46DF128E.9040002 at wheelhouse.org>, Jeff writes:
>Poul-Henning Kamp wrote:
>> I'm not sure we know exactly how we want to handle it.
>>
>> One option is to let people rewrite in vcl_error() and restart the
>> transaction with the (presumably) new url.
>
>Would that be difficult? In such a case, I gather the guru would hang
>around to meditate on double-faults, thus avoiding loops?

The plan is to have a "max restarts per request" parameter to break
such loops.

>> putting some relevant in-line C code in your VCL
>> file is probably your best bet.
>>
>> This is intentionally not well documented, but cases like this is
>> what it is intended for.
>>
>> The syntax is:
>>
>> C{
>> whatever you want
>> }C
>
>I will take a look at this, but since it is intentionally not well
>documented, I am likely to have follow-up questions. :-)

I'm sure :-)

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.