Mailing List Archive

#1799: Backend request coalescing fail
#1799: Backend request coalescing fail
--------------------+---------------------
Reporter: martin | Owner:
Type: defect | Status: new
Priority: normal | Milestone:
Component: build | Version: unknown
Severity: normal | Keywords:
--------------------+---------------------
See attached test case.

If we have a busy (flag OC_F_BUSY set) OC on the object head during
HSH_Lookup, and there is an expired candidate available, the request won't
be put on waitinglist but return with the expired OC. If this OC is
rejected by the VCL (return(miss) from vcl_hit), the request is turned
into a pass to the backend. This is logged SLT_VCL_Error: "vcl_hit{}
returns fetch without busy object. Doing pass").

If e.g. Varnish is configured to not use grace but with keep (for IMS
against the backend), this happens just after the TTL of the object has
expired. There is an expired object available, but not within grace, and
all requests except the one that inserted the OC_F_BUSY will become passes
to the backend until the OC_F_BUSY flag is removed. The attached test case
covers this.

Another scenario that will exhibit this is when you have a large grace
period, but differentiate on how big grace you accept in vcl_hit based on
the backend health status. If your object was not refreshed during the
healthy backend shorter grace window, you'll get one backend request for
each client request until the backend answers (slow backends suffer more).

The expected outcome in my opinion is that the requests should have been
placed on the waitinglist of the object head until the time the OC_F_BUSY
flag is gone.

I've been pondering how to fix this. But all attempts at reentering the
waitinglist on the OH after HSH_Lookup has completed seems to open race
conditions. The only sane way to fix this that I can see is to reintroduce
the req.grace attribute, and move the graceability of the expired object
back into the HSH_Lookup object head mutex critical region.

Martin

--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1799>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs
Re: #1799: Backend request coalescing fail [ In reply to ]
#1799: Backend request coalescing fail
----------------------+--------------------
Reporter: martin | Owner:
Type: defect | Status: new
Priority: normal | Milestone:
Component: varnishd | Version: trunk
Severity: normal | Resolution:
Keywords: |
----------------------+--------------------
Changes (by martin):

* version: unknown => trunk
* component: build => varnishd


Comment:

Note: Original test case by Dag Haavi Finstad

This has been the behaviour since Varnish 4.0

--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1799#comment:1>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs
Re: #1799: Backend request coalescing fail [ In reply to ]
#1799: Backend request coalescing fail
----------------------+--------------------
Reporter: martin | Owner:
Type: defect | Status: new
Priority: normal | Milestone:
Component: varnishd | Version: trunk
Severity: normal | Resolution:
Keywords: |
----------------------+--------------------

Comment (by martin):

This was discussed during bugwash today. The sketched solution looks like:

* A return(miss) from vcl_hit without a busy OC will give the expired OC
an OC_F_DONTUSE flag. The request thread runs STP_LOOKUP again (instead of
falling back to pass). This flag acts like a signal to subsequent lookups
that we didn't like this expired object, and subsequent attempts at using
it should be prevented.

* During HSH_Lookup, if the expired OC selected (the newest expired
option) has the OC_F_DONTUSE flag, we will continue as if we don't have an
expired option. If then busy_found is true (OC with OC_F_BUSY on OH),
we'll go to waitinglist as before.

* bgfetch threads will on failed fetches clear the OC_F_DONTUSE flag if
the expired OC it holds a reference too has this set (on successful
fetches the object is purged as normal). This will reenable the object for
use as an IMS candidate when fetching again, or as an expanded grace
candidate for requests after e.g. the backend is marked sick.

We will ponder the issue for a day to make sure the strategy is viable.

Martin

--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1799#comment:2>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs
Re: #1799: Backend request coalescing fail [ In reply to ]
#1799: Backend request coalescing fail
----------------------+--------------------
Reporter: martin | Owner:
Type: defect | Status: new
Priority: normal | Milestone:
Component: varnishd | Version: trunk
Severity: normal | Resolution:
Keywords: |
----------------------+--------------------

Comment (by martin):

A couple of VSL related things to consider:

* SLT_Hit probably should only after we are sure that we won't jump back
to STP_LOOKUP, so we avoid multiple confusing log lines.
* The waitinglist timestamp also needs to be dealt with so it's logged
only when we know we won't jump back to STP_LOOKUP.

Martin

--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1799#comment:3>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs
Re: #1799: Backend request coalescing fail [ In reply to ]
#1799: Backend request coalescing fail
----------------------+--------------------
Reporter: martin | Owner:
Type: defect | Status: new
Priority: normal | Milestone:
Component: varnishd | Version: trunk
Severity: normal | Resolution:
Keywords: |
----------------------+--------------------

Comment (by franveux):

Hi,

here is some VCL that converts back a pass to deliver if this happens. It
works for me in production, avoiding pass requests to backend.


{{{
// workarround of 1799, assuming max-age is everytime more that 5
seconds.
sub vcl_hit {
if (req.http.X-VCL-HITPASS1799) {
// don't pass but deliver instead
std.log("vcl_hit: FIX, pass convert to deliver");
return(deliver);
}
// we've got a hit
set req.http.X-VCL-HIT = true;
}
sub vcl_pass {
if (req.http.X-VCL-HIT && req.restarts < 2) {
// we've got a hit then a pass
set req.http.X-VCL-HITPASS1799 = true;
return(restart);
}
}
sub vcl_deliver {
if (req.http.X-VCL-HITPASS1799) {
// fake age header to avoid the client comes back too quickly.
set req.http.X-max-age = regsub(resp.http.cache-control,".*max-
age=([0-9]+).*","\1");
if (std.integer(req.http.X-max-age,5) <= std.integer(resp.http.age,0))
{
set resp.http.age = std.integer(req.http.X-max-age, 5) - 3;
}
}
}


}}}

François V.

--
Ticket URL: <https://www.varnish-cache.org/trac/ticket/1799#comment:4>
Varnish <https://varnish-cache.org/>
The Varnish HTTP Accelerator

_______________________________________________
varnish-bugs mailing list
varnish-bugs@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-bugs