Mailing List Archive

Serve stale content if backend is healthy but "not too healthy"
Hello everyone,

We have a backend that actually proxies different services (mangling
the original response). Sometimes one of those backends are not
available and the general response goes from 200 to a 50x.
Is there a way to serve a stale (valid) content (if present) for a
request that comes from a backend in a healthy state?

I was thinking about something like this:
sub backend_fetch {
if (beresp.status >= 500) {
return_a_stale;
}
}

From the state machine
(https://varnish-cache.org/docs/6.0/reference/states.html) it seems
that I'm not allowed to return(hash) nor switch to an unhealthy
backend (that i keep configured) to reach what I want.

Please forgive me if do exists a facility to reach my goal and feel
free to direct me to the right document.

Ah. Varnish 6.x.

Thanks
Luca
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Serve stale content if backend is healthy but "not too healthy" [ In reply to ]
On Tue, Sep 21, 2021 at 1:33 PM Luca Gervasi <luca.gervasi@gmail.com> wrote:
>
> Hello everyone,
>
> We have a backend that actually proxies different services (mangling
> the original response). Sometimes one of those backends are not
> available and the general response goes from 200 to a 50x.
> Is there a way to serve a stale (valid) content (if present) for a
> request that comes from a backend in a healthy state?
>
> I was thinking about something like this:
> sub backend_fetch {
> if (beresp.status >= 500) {
> return_a_stale;
> }
> }
>
> From the state machine
> (https://varnish-cache.org/docs/6.0/reference/states.html) it seems
> that I'm not allowed to return(hash) nor switch to an unhealthy
> backend (that i keep configured) to reach what I want.
>
> Please forgive me if do exists a facility to reach my goal and feel
> free to direct me to the right document.
>
> Ah. Varnish 6.x.

Hi Luca,

Varnish Cache does not have this feature, you should be able to do
that with Varnish Enterprise instead. What you are looking for is
stale-if-error and you may find some implementations using VCL but I
can't vouch for any, not having experience with them.

https://docs.varnish-software.com/varnish-cache-plus/vmods/stale/#description

Cheers,
Dridi
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Serve stale content if backend is healthy but "not too healthy" [ In reply to ]
Hi,

As Dridi said, what you are looking for is exactly vmod_stale, but I wanted
to point out that part:

> We have a backend that actually proxies different services

In that case, it might be good to actually have a Varnish backend for each
type of backend behind the proxies. The backend definition would be exactly
the same, but the probe definitions would be different, with a specific
URL/host. this way, Varnish would be aware of who is actually unhealthy and
you don't have to deal with the stale thing.

If you need an open-source approach, I reckon the best you can do is
restart with a zero TTL if you detect a bad response. It does have a couple
of race conditions baked-in that vmod_stale sidesteps, but that's usually
good enough:

sub vcl_recv {
# be hopeful that the backend will send us something good, ignore grace
if (req.restarts == 0) {
set req.grace = 0s;
}
}

sub vcl_deliver {
# welp, that didn't go well, try again without limiting the grace
if (req.restarts == 0 && resp.status >= 500) {
set req.ttl = 10y;
return (restart);
}
}


Main issue is that you restart, so you are going to spend a lil bit more
time/resources processing the request, and the object in cache may have
expired by the time you realize you need it.

Hope that helps,
--
Guillaume Quintard
Re: Serve stale content if backend is healthy but "not too healthy" [ In reply to ]
Hi Luca,

As Dridi and Guillaume said, what you're looking for is something like the stale VMOD. I think the OSS approach suggested by Guillaume could be extended a little bit in order to improve performance and avoid request serialization. Problem is the final VCL is *ugly* and hard to understand. In any case, next I'm sharing a VTC showing how it works:

varnishtest "Full vs. limited grace"

server s_backend {
# First request when the cache is empty (see 1).
rxreq
txresp

# Background fetch done while inside the limited grace period (see 2).
rxreq
txresp

# Another request to refresh the content after the limited grace period has
# finished (see 3).
rxreq
txresp

# Another request to refresh the content after the limited grace period has
# finished one more time (see 4). This time, we return an error so a cached
# version will have to be returned to the client.
rxreq
txresp -status 500
} -start

varnish v_backend -vcl {
import std;

backend default {
.host = "${s_backend_addr}";
.port = "${s_backend_port}";
}

sub vcl_recv {
# Clean up internal headers.
if (req.restarts == 0) {
unset req.http.X-Varnish-Restarted-5xx;
}
unset req.http.X-Varnish-Use-Limited-Grace;

# Set a limited grace unless a restart has been done to use full grace.
if (!(req.restarts > 0 && req.http.X-Varnish-Restarted-5xx)) {
set req.http.X-Varnish-Use-Limited-Grace = "1";
set req.grace = 2s;
} else {
set req.grace = 100y;
}
}

sub vcl_backend_response {
set beresp.ttl = 1s;

# Set full grace value. This could be done by returning a proper value for
# the stale-while-revalidate property in the Cache-Control header, which
# Varnish understands (that's not the case with the stale-if-error
# property).
set beresp.grace = 24h;

# Send requests with a broken backend response to vcl_backend_error so they
# can be restarted.
if (beresp.status >= 500 && beresp.status < 600) {
return (error);
}
}

sub vcl_backend_error {
if (bereq.http.X-Varnish-Use-Limited-Grace && !bereq.uncacheable) {
# Trigger restart in the client side in order to enable full grace and
# try to deliver a staled object. Also, cache error response but under
# a variant to avoid overwritting the staled object that may already be
# in cache. Grace and keep are explicitly disabled to overwrite current
# default behaviour (https://github.com/varnishcache/varnish-cache/issues/3024).
set beresp.http.X-Varnish-Restart-5xx = "1";
set beresp.ttl = 1s;
set beresp.grace = 0s;
set beresp.keep = 0s;
set beresp.http.Vary = "X-Varnish-Use-Limited-Grace";
return (deliver);
} else {
# Jump to 'vcl_synth' with a 503 status code.
return (abandon);
}
}

sub vcl_deliver {
# Execute restart if the backend side requested so (see 'vcl_backend_error').
if (resp.http.X-Varnish-Restart-5xx && !req.http.X-Varnish-Restarted-5xx) {
set req.http.X-Varnish-Restarted-5xx = "1";
return (restart);
}

# Clean up Vary header.
if (resp.http.Vary == "X-Varnish-Use-Limited-Grace") {
unset resp.http.Vary;
}

# Debug.
set resp.http.X-Cache-Hits = obj.hits;
}

sub vcl_backend_fetch {
# Clean up internal headers.
if (bereq.retries == 0) {
unset bereq.http.X-Varnish-Restart-5xx;
}

# Do not retry requests restarted due to 5xx backend responses, no
# matter if a staled object has not been found or if a bgfetch has been
# spawned after serving staled content.
if (bereq.retries == 0 && bereq.http.X-Varnish-Restarted-5xx) {
# Jump to 'vcl_synth' with a 503 status code.
return (abandon);
}
}
} -start

client c1 -connect ${v_backend_sock} {
# 1: Ask for a content. This will hit the backend as the cache is empty.
txreq
rxresp
expect resp.status == 200
expect resp.http.X-Cache-Hits == 0

# Wait until the TTL is over.
delay 1.5

# 2: Further requests to the same content inside the limited grace period
# (set to 2 seconds) will be resolved by the cache. A bgfetch to the
# backend is silently made.
txreq
rxresp
expect resp.status == 200
expect resp.http.X-Cache-Hits == 1

# Wait until the new TTL and new limited grace period are over.
delay 5.0

# 3: Even if the content is in the cache (it's being stored for the full
# cache period: 24 hours), limited grace makes sure fresh content is
# recovered from the backend. A request to the content, therefore,
# produces a new hit in the backend.
txreq
rxresp
expect resp.status == 200
expect resp.http.X-Cache-Hits == 0

# Wait again until the new TTL and new limited grace period are over.
delay 3.5

# 4: A new request will try to get fresh content, but the backend now returns
# an error response, so Varnish restarts the request and serves stalled
# content. No background fetch is done as Varnish abort the attempt to
# disturb the failing backend.
txreq
rxresp
expect resp.status == 200
expect resp.http.X-Cache-Hits == 1
} -run

varnish v_backend -expect client_req == 4

Best,

--
Carlos Abalde