Mailing List Archive

Hit ratio dropped significantly after recent upgrades
Hello,

I use Varnish on four load-balanced web servers running Apache with several virtual hosts providing MediaWiki wikis. Last week I upgraded the web servers to support upgrading MediaWiki from 1.26 to 1.27 due to PHP requirements. This meant upgrading (reinstalling) Ubuntu 12.04 to 16.04, meaning I went from PHP 5.3 to 7, Apache 2.4 + mod_php to Apache 2.4 + PHP-FPM, and Varnish 3 to 4.

For the upgrade from Varnish 3 to 4, I adapted my VCL code<https://gist.github.com/Calygos/105957a997ea3bde6b8257a1f34bbd20> accordingly, based on Mediawiki Varnish sample code<https://www.mediawiki.org/wiki/Manual:Varnish_caching>, but for reasons I don't understand, my Varnish hit ratio dropped from around 86% to ~20-37%. This picture<http://imgur.com/jX4iu6s> shows a graph of the last 10 days hit ratio calculations for the four web servers. The graph is calculated using metrics from the Collectd Varnish plugin as (conns - bereqs / conns), i.e.:

asPercent(
diffSeries(
linux.hostname.varnish-default-connections.connections-received,
linux.hostname.varnish-default-backend.http_requests-requests
),
linux.hostname.varnish-default-connections.connections-received
)

I've also tried the Varnish hit and miss metrics but those give similar results. So I'm not sure what changed to give such a poor hit ratio, despite otherwise good performance on large, busy wikis. It's possible it could be something in MediaWiki's cookie and/or session handling and I've been investigating that possibility as well, but I figured this would be a good place to ask, too.

Thanks,
Justin
Re: Hit ratio dropped significantly after recent upgrades [ In reply to ]
I think we're going to need something a little more specific to go on.
That is a mile of changes all at once.

Finding a single request that should be cached, but isn't and producing the
varnish log for that request will probably help illuminate what's going on.

On Wed, Dec 7, 2016 at 4:22 PM, Justin Lloyd <justinl@arena.net> wrote:

> Hello,
>
>
>
> I use Varnish on four load-balanced web servers running Apache with
> several virtual hosts providing MediaWiki wikis. Last week I upgraded the
> web servers to support upgrading MediaWiki from 1.26 to 1.27 due to PHP
> requirements. This meant upgrading (reinstalling) Ubuntu 12.04 to 16.04,
> meaning I went from PHP 5.3 to 7, Apache 2.4 + mod_php to Apache 2.4 +
> PHP-FPM, and Varnish 3 to 4.
>
>
>
> For the upgrade from Varnish 3 to 4, I adapted my VCL code
> <https://gist.github.com/Calygos/105957a997ea3bde6b8257a1f34bbd20>
> accordingly, based on Mediawiki Varnish sample code
> <https://www.mediawiki.org/wiki/Manual:Varnish_caching>, but for reasons
> I don’t understand, my Varnish hit ratio dropped from around 86% to
> ~20-37%. This picture <http://imgur.com/jX4iu6s> shows a graph of the
> last 10 days hit ratio calculations for the four web servers. The graph is
> calculated using metrics from the Collectd Varnish plugin as (conns -
> bereqs / conns), i.e.:
>
>
>
> asPercent(
>
> diffSeries(
>
> linux.*hostname*.varnish-default-connections.connections-received,
>
> linux.*hostname*.varnish-default-backend.http_requests-requests
>
> ),
>
> linux.*hostname*.varnish-default-connections.connections-received
>
> )
>
>
>
> I’ve also tried the Varnish hit and miss metrics but those give similar
> results. So I’m not sure what changed to give such a poor hit ratio,
> despite otherwise good performance on large, busy wikis. It’s possible it
> could be something in MediaWiki’s cookie and/or session handling and I’ve
> been investigating that possibility as well, but I figured this would be a
> good place to ask, too.
>
>
>
> Thanks,
>
> Justin
>
>
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
Re: Hit ratio dropped significantly after recent upgrades [ In reply to ]
> results. So I’m not sure what changed to give such a poor hit ratio, despite
> otherwise good performance on large, busy wikis. It’s possible it could be

Hello Justin,

I'm afraid too many things changed at once, with so many moving parts
it will be hard to find the cause. I suggest you start with transaction logs,
looking essentially at what the backend is responding and how your VCL
is handling it.

Dridi

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Hit ratio dropped significantly after recent upgrades [ In reply to ]
On Thu, Dec 8, 2016 at 2:35 PM, Jason Price <japrice@gmail.com> wrote:
> I think we're going to need something a little more specific to go on. That
> is a mile of changes all at once.

Yes: varnishlog, coffee, and a lot of patience.

> Finding a single request that should be cached, but isn't and producing the
> varnish log for that request will probably help illuminate what's going on.

There's currently no way to query the transaction log of a specific request:
https://github.com/varnishcache/varnish-cache/issues/2154

I'm just saying...

Dridi

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Hit ratio dropped significantly after recent upgrades [ In reply to ]
On Thu, Dec 8, 2016 at 3:48 PM, Dridi Boukelmoune <dridi@varni.sh> wrote:
>
> There's currently no way to query the transaction log of a specific
> request:
> https://github.com/varnishcache/varnish-cache/issues/2154


awk is love, awk is life:

varnishlog -d -v -g raw | awk '$1 == XXXXXX'

still better than nothing.

--
Guillaume Quintard
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
I have been doing a lot of digging with varnishtop and varnishlog, and our VCL really didn’t change from this upgrade except as needed to migrate from Varnish 3 to 4. As I mentioned, our web app is MediaWiki so we don't control its caching requirements and recommendations, so what I'm trying to understand is whether the drop in the hit rate is due to some change(s) in MediaWiki's cookie and/or cache handling (e.g. via Cache-Control and Set-Cookie headers) or if something in Varnish changed that affects how it determines things. For example, a while back I had been using the Varnish hit and miss metrics in Collectd to calculate the ratio but apparently how those values are calculated with respect to purges changed so the hit ratio dropped, causing me to change the ratio calculation to use incoming connections and backend requests instead.

That said, based on my varnishlog and varnishtop testing, I have a strong feeling that the biggest part of the problem is thumbnail images. If you look again at my VCL code (https://gist.github.com/Calygos/105957a997ea3bde6b8257a1f34bbd20), you can see I strip cookies from thumbnails so they should get cached, but I seem to get a lot more misses than hits when watching for thumbnail URL requests through varnishtop. I give 8 GB to Varnish and its process is typically only around 1 to 2 GB when previous it would be at 8 GB with frequent nukes and the occasional spike of expires that would temporarily eliminate nukes while memory filled up again. For what it's worth, I added the thumbnail stripping a couple of years ago due to a performance issue and it helped tremendously, so I don't know why it would become problematic with these latest upgrades.

Justin

-----Original Message-----
From: Dridi Boukelmoune [mailto:dridi@varni.sh]
Sent: Thursday, December 8, 2016 6:49 AM
To: Jason Price <japrice@gmail.com>
Cc: Justin Lloyd <justinl@arena.net>; varnish-misc@varnish-cache.org
Subject: Re: Hit ratio dropped significantly after recent upgrades

On Thu, Dec 8, 2016 at 2:35 PM, Jason Price <japrice@gmail.com> wrote:
> I think we're going to need something a little more specific to go on.
> That is a mile of changes all at once.

Yes: varnishlog, coffee, and a lot of patience.

> Finding a single request that should be cached, but isn't and
> producing the varnish log for that request will probably help illuminate what's going on.

There's currently no way to query the transaction log of a specific request:
https://github.com/varnishcache/varnish-cache/issues/2154

I'm just saying...

Dridi
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Hit ratio dropped significantly after recent upgrades [ In reply to ]
Hi

Is this Varnish 4.1 ?

We have an unsolved bug open describing something very similar,
https://github.com/varnishcache/varnish-cache/issues/1859

On Thu, Dec 8, 2016 at 9:26 PM, Justin Lloyd <justinl@arena.net> wrote:
> I have been doing a lot of digging with varnishtop and varnishlog, and our VCL really didn’t change from this upgrade except as needed to migrate from Varnish 3 to 4. As I mentioned, our web app is MediaWiki so we don't control its caching requirements and recommendations, so what I'm trying to understand is whether the drop in the hit rate is due to some change(s) in MediaWiki's cookie and/or cache handling (e.g. via Cache-Control and Set-Cookie headers) or if something in Varnish changed that affects how it determines things. For example, a while back I had been using the Varnish hit and miss metrics in Collectd to calculate the ratio but apparently how those values are calculated with respect to purges changed so the hit ratio dropped, causing me to change the ratio calculation to use incoming connections and backend requests instead.
>
> That said, based on my varnishlog and varnishtop testing, I have a strong feeling that the biggest part of the problem is thumbnail images. If you look again at my VCL code (https://gist.github.com/Calygos/105957a997ea3bde6b8257a1f34bbd20), you can see I strip cookies from thumbnails so they should get cached, but I seem to get a lot more misses than hits when watching for thumbnail URL requests through varnishtop. I give 8 GB to Varnish and its process is typically only around 1 to 2 GB when previous it would be at 8 GB with frequent nukes and the occasional spike of expires that would temporarily eliminate nukes while memory filled up again. For what it's worth, I added the thumbnail stripping a couple of years ago due to a performance issue and it helped tremendously, so I don't know why it would become problematic with these latest upgrades.
>
> Justin
>
> -----Original Message-----
> From: Dridi Boukelmoune [mailto:dridi@varni.sh]
> Sent: Thursday, December 8, 2016 6:49 AM
> To: Jason Price <japrice@gmail.com>
> Cc: Justin Lloyd <justinl@arena.net>; varnish-misc@varnish-cache.org
> Subject: Re: Hit ratio dropped significantly after recent upgrades
>
> On Thu, Dec 8, 2016 at 2:35 PM, Jason Price <japrice@gmail.com> wrote:
>> I think we're going to need something a little more specific to go on.
>> That is a mile of changes all at once.
>
> Yes: varnishlog, coffee, and a lot of patience.
>
>> Finding a single request that should be cached, but isn't and
>> producing the varnish log for that request will probably help illuminate what's going on.
>
> There's currently no way to query the transaction log of a specific request:
> https://github.com/varnishcache/varnish-cache/issues/2154
>
> I'm just saying...
>
> Dridi
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc



--
Dag Haavi Finstad
Software Developer | Varnish Software
Mobile: +47 476 64 134
We Make Websites Fly!

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
Hello! Yes, it is Varnish 4.1.1-1 from the Ubuntu 16.04 repo. I'll look at the issue you've linked and see if I can match it to our situation. Thanks!

Justin

-----Original Message-----
From: Dag Haavi Finstad [mailto:daghf@varnish-software.com]
Sent: Friday, December 9, 2016 4:47 AM
To: Justin Lloyd <justinl@arena.net>
Cc: Dridi Boukelmoune <dridi@varni.sh>; Jason Price <japrice@gmail.com>; varnish-misc@varnish-cache.org
Subject: Re: Hit ratio dropped significantly after recent upgrades

Hi

Is this Varnish 4.1 ?

We have an unsolved bug open describing something very similar,
https://github.com/varnishcache/varnish-cache/issues/1859

On Thu, Dec 8, 2016 at 9:26 PM, Justin Lloyd <justinl@arena.net> wrote:
> I have been doing a lot of digging with varnishtop and varnishlog, and our VCL really didn’t change from this upgrade except as needed to migrate from Varnish 3 to 4. As I mentioned, our web app is MediaWiki so we don't control its caching requirements and recommendations, so what I'm trying to understand is whether the drop in the hit rate is due to some change(s) in MediaWiki's cookie and/or cache handling (e.g. via Cache-Control and Set-Cookie headers) or if something in Varnish changed that affects how it determines things. For example, a while back I had been using the Varnish hit and miss metrics in Collectd to calculate the ratio but apparently how those values are calculated with respect to purges changed so the hit ratio dropped, causing me to change the ratio calculation to use incoming connections and backend requests instead.
>
> That said, based on my varnishlog and varnishtop testing, I have a strong feeling that the biggest part of the problem is thumbnail images. If you look again at my VCL code (https://gist.github.com/Calygos/105957a997ea3bde6b8257a1f34bbd20), you can see I strip cookies from thumbnails so they should get cached, but I seem to get a lot more misses than hits when watching for thumbnail URL requests through varnishtop. I give 8 GB to Varnish and its process is typically only around 1 to 2 GB when previous it would be at 8 GB with frequent nukes and the occasional spike of expires that would temporarily eliminate nukes while memory filled up again. For what it's worth, I added the thumbnail stripping a couple of years ago due to a performance issue and it helped tremendously, so I don't know why it would become problematic with these latest upgrades.
>
> Justin
>
> -----Original Message-----
> From: Dridi Boukelmoune [mailto:dridi@varni.sh]
> Sent: Thursday, December 8, 2016 6:49 AM
> To: Jason Price <japrice@gmail.com>
> Cc: Justin Lloyd <justinl@arena.net>; varnish-misc@varnish-cache.org
> Subject: Re: Hit ratio dropped significantly after recent upgrades
>
> On Thu, Dec 8, 2016 at 2:35 PM, Jason Price <japrice@gmail.com> wrote:
>> I think we're going to need something a little more specific to go on.
>> That is a mile of changes all at once.
>
> Yes: varnishlog, coffee, and a lot of patience.
>
>> Finding a single request that should be cached, but isn't and
>> producing the varnish log for that request will probably help illuminate what's going on.
>
> There's currently no way to query the transaction log of a specific request:
> https://github.com/varnishcache/varnish-cache/issues/2154
>
> I'm just saying...
>
> Dridi
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc



--
Dag Haavi Finstad
Software Developer | Varnish Software
Mobile: +47 476 64 134
We Make Websites Fly!
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
We may be seeing this issue though I can't confirm since I all of my servers are running 4.1.1 now, but I calculate my hit ratio based on the total number of Varnish connections and how many are sent to Apache, per server. Again, for reference from my first email, here's my Graphite dashboard function for generating my hit ratio graph:

asPercent(
diffSeries(
linux.hostname.varnish-default-connections.connections-received,
linux.hostname.varnish-default-backend.http_requests-requests
),
linux.hostname.varnish-default-connections.connections-received
)

I also look at varnishstat to confirm, here's about a 10 minute sample:

NAME CURRENT CHANGE AVERAGE AVG_10 AVG_100 AVG_1000
MAIN.client_req 7937622 0.00 47.00 5.27 83.94 60.92
MAIN.backend_req 6341302 51.91 37.00 38.97 53.59 43.80
MAIN.cache_hit 2267216 0.00 13.00 1.08 25.91 18.19
MAIN.cache_miss 4906402 0.00 29.00 1.50 43.93 37.26

So I don't think our problem is a cache_hit/miss value calculation issue since the number of client vs. backend requests is similar and very underperforming.

To reiterate on a point in another of my responses in this thread, I think it may be something about MediaWiki thumbnail images not being cached properly despite our current VCL in that regard not having changed from how it worked prior to the upgrade during which time we were seeing a very high (86%-ish) hit ratio from the same formula.

-----Original Message-----
From: varnish-misc-bounces+justinl=arena.net@varnish-cache.org [mailto:varnish-misc-bounces+justinl=arena.net@varnish-cache.org] On Behalf Of Justin Lloyd
Sent: Friday, December 9, 2016 5:44 AM
To: Dag Haavi Finstad <daghf@varnish-software.com>
Cc: varnish-misc@varnish-cache.org
Subject: RE: Hit ratio dropped significantly after recent upgrades

Hello! Yes, it is Varnish 4.1.1-1 from the Ubuntu 16.04 repo. I'll look at the issue you've linked and see if I can match it to our situation. Thanks!

Justin

-----Original Message-----
From: Dag Haavi Finstad [mailto:daghf@varnish-software.com]
Sent: Friday, December 9, 2016 4:47 AM
To: Justin Lloyd <justinl@arena.net>
Cc: Dridi Boukelmoune <dridi@varni.sh>; Jason Price <japrice@gmail.com>; varnish-misc@varnish-cache.org
Subject: Re: Hit ratio dropped significantly after recent upgrades

Hi

Is this Varnish 4.1 ?

We have an unsolved bug open describing something very similar,
https://github.com/varnishcache/varnish-cache/issues/1859

On Thu, Dec 8, 2016 at 9:26 PM, Justin Lloyd <justinl@arena.net> wrote:
> I have been doing a lot of digging with varnishtop and varnishlog, and our VCL really didn’t change from this upgrade except as needed to migrate from Varnish 3 to 4. As I mentioned, our web app is MediaWiki so we don't control its caching requirements and recommendations, so what I'm trying to understand is whether the drop in the hit rate is due to some change(s) in MediaWiki's cookie and/or cache handling (e.g. via Cache-Control and Set-Cookie headers) or if something in Varnish changed that affects how it determines things. For example, a while back I had been using the Varnish hit and miss metrics in Collectd to calculate the ratio but apparently how those values are calculated with respect to purges changed so the hit ratio dropped, causing me to change the ratio calculation to use incoming connections and backend requests instead.
>
> That said, based on my varnishlog and varnishtop testing, I have a strong feeling that the biggest part of the problem is thumbnail images. If you look again at my VCL code (https://gist.github.com/Calygos/105957a997ea3bde6b8257a1f34bbd20), you can see I strip cookies from thumbnails so they should get cached, but I seem to get a lot more misses than hits when watching for thumbnail URL requests through varnishtop. I give 8 GB to Varnish and its process is typically only around 1 to 2 GB when previous it would be at 8 GB with frequent nukes and the occasional spike of expires that would temporarily eliminate nukes while memory filled up again. For what it's worth, I added the thumbnail stripping a couple of years ago due to a performance issue and it helped tremendously, so I don't know why it would become problematic with these latest upgrades.
>
> Justin
>
> -----Original Message-----
> From: Dridi Boukelmoune [mailto:dridi@varni.sh]
> Sent: Thursday, December 8, 2016 6:49 AM
> To: Jason Price <japrice@gmail.com>
> Cc: Justin Lloyd <justinl@arena.net>; varnish-misc@varnish-cache.org
> Subject: Re: Hit ratio dropped significantly after recent upgrades
>
> On Thu, Dec 8, 2016 at 2:35 PM, Jason Price <japrice@gmail.com> wrote:
>> I think we're going to need something a little more specific to go on.
>> That is a mile of changes all at once.
>
> Yes: varnishlog, coffee, and a lot of patience.
>
>> Finding a single request that should be cached, but isn't and
>> producing the varnish log for that request will probably help illuminate what's going on.
>
> There's currently no way to query the transaction log of a specific request:
> https://github.com/varnishcache/varnish-cache/issues/2154
>
> I'm just saying...
>
> Dridi
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc



--
Dag Haavi Finstad
Software Developer | Varnish Software
Mobile: +47 476 64 134
We Make Websites Fly!
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Hit ratio dropped significantly after recent upgrades [ In reply to ]
> To reiterate on a point in another of my responses in this thread, I think it may be something about MediaWiki thumbnail images not being cached properly despite our current VCL in that regard not having changed from how it worked prior to the upgrade during which time we were seeing a very high (86%-ish) hit ratio from the same formula.

To reiterate on a point I made on a couple occasions, it's time to
give varnishlog a spin. Too much focus on VCL, and not enough on
what's happening.

Dridi

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
I really am looking at what's happening as well. I have been looking at both varnishlog and varnishtop and I see a lot of thumbnail image requests being sent to the backend when there is still plenty of room for them in the cache, so even though there are a lot of thumbnail images, I shouldn't see so many backend requests for them. As I previously mentioned, I give Varnish 8 GB and it used to stay full (based on RSS usage and looking at nukes vs. expires) but now it hovers around only about 2 GB used. A related statistics is that there used to be 600-700k objects in Varnish (based on our graphs of MAIN.n_object via Collectd's varnish-default-struct.objects-object metric) but now there are only roughly 40-70k objects in Varnish at any given time. So it's definitely caching a lot fewer things than it was before the upgrade, and most of the requested URLs for requests that have cookies are for a lot of images and thumbnails. Images shouldn't be cached due to size and overall volume but thumbnails should, which is why I strip cookies from the thumbnails. These varnishtop commands break out /images and /images/thumb client requests, showing IMHO too many regular images being cached and nowhere near enough thumbnails:

# varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/" and not ReqURL ~ "/images/thumb"'

349.47 VCL_call HASH
349.47 VCL_call RECV
349.47 VCL_call DELIVER
207.22 VCL_call HIT
116.40 VCL_call MISS
116.30 VCL_call PASS

# varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/thumb"'

1859.60 VCL_call HASH
1859.60 VCL_call RECV
1859.60 VCL_call DELIVER
1424.83 VCL_call MISS
422.84 VCL_call HIT
218.82 VCL_call PASS

I'm still poking around trying to correlate caching of other types of URLs based on whether or not the requests have cookies, if Cache-Control gets returned, etc. but I just wanted to reply with this info. I do appreciate the responses I'm getting! :)


-----Original Message-----
From: Dridi Boukelmoune [mailto:dridi@varni.sh]
Sent: Friday, December 9, 2016 10:11 AM
To: Justin Lloyd <justinl@arena.net>
Cc: Dag Haavi Finstad <daghf@varnish-software.com>; varnish-misc@varnish-cache.org
Subject: Re: Hit ratio dropped significantly after recent upgrades

> To reiterate on a point in another of my responses in this thread, I think it may be something about MediaWiki thumbnail images not being cached properly despite our current VCL in that regard not having changed from how it worked prior to the upgrade during which time we were seeing a very high (86%-ish) hit ratio from the same formula.

To reiterate on a point I made on a couple occasions, it's time to give varnishlog a spin. Too much focus on VCL, and not enough on what's happening.

Dridi
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
To follow up on my last email from Friday, at this point the problem boils down to one thing that I've not been able to determine: Why are far fewer things being cached now than before the upgrade?

1. Cookies don't seem to be the problem. Most appear to be Google Analytics (as opposed to session), which are being unset by vcl_recv.

2. varnishlog/varnishtop shows many thumbnail URLs being missed and virtually none are requested with a no-cache cache-control header. Is it possible to use these tools determine if they (or any URLs for that matter) are being cached following a miss-deliver sequence? There are about 1.5m thumbnail files totaling around 30 GB, which prior to the upgrades wasn't an issue, and I don't think it is now since there are only a few expires and purges per minute and no nukes at all. Varnish is only using about 2 GB out of the 8 GB allocated to it, where it used to use all 8 GB and have lots of nukes and far fewer expires, so it's not a memory constraint.

Could there be some other resource limitation I'm hitting without knowing it (nothing in any logs I've seen)? Everything else I could think of so far seems fine, e.g. open files, threads, tcp connections.


-----Original Message-----
From: varnish-misc-bounces+justinl=arena.net@varnish-cache.org [mailto:varnish-misc-bounces+justinl=arena.net@varnish-cache.org] On Behalf Of Justin Lloyd
Sent: Friday, December 9, 2016 11:19 AM
To: Dridi Boukelmoune <dridi@varni.sh>
Cc: varnish-misc@varnish-cache.org
Subject: RE: Hit ratio dropped significantly after recent upgrades

I really am looking at what's happening as well. I have been looking at both varnishlog and varnishtop and I see a lot of thumbnail image requests being sent to the backend when there is still plenty of room for them in the cache, so even though there are a lot of thumbnail images, I shouldn't see so many backend requests for them. As I previously mentioned, I give Varnish 8 GB and it used to stay full (based on RSS usage and looking at nukes vs. expires) but now it hovers around only about 2 GB used. A related statistics is that there used to be 600-700k objects in Varnish (based on our graphs of MAIN.n_object via Collectd's varnish-default-struct.objects-object metric) but now there are only roughly 40-70k objects in Varnish at any given time. So it's definitely caching a lot fewer things than it was before the upgrade, and most of the requested URLs for requests that have cookies are for a lot of images and thumbnails. Images shouldn't be cached due to size and overall volume but thumbnails should, which is why I strip cookies from the thumbnails. These varnishtop commands break out /images and /images/thumb client requests, showing IMHO too many regular images being cached and nowhere near enough thumbnails:

# varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/" and not ReqURL ~ "/images/thumb"'

349.47 VCL_call HASH
349.47 VCL_call RECV
349.47 VCL_call DELIVER
207.22 VCL_call HIT
116.40 VCL_call MISS
116.30 VCL_call PASS

# varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/thumb"'

1859.60 VCL_call HASH
1859.60 VCL_call RECV
1859.60 VCL_call DELIVER
1424.83 VCL_call MISS
422.84 VCL_call HIT
218.82 VCL_call PASS

I'm still poking around trying to correlate caching of other types of URLs based on whether or not the requests have cookies, if Cache-Control gets returned, etc. but I just wanted to reply with this info. I do appreciate the responses I'm getting! :)


-----Original Message-----
From: Dridi Boukelmoune [mailto:dridi@varni.sh]
Sent: Friday, December 9, 2016 10:11 AM
To: Justin Lloyd <justinl@arena.net>
Cc: Dag Haavi Finstad <daghf@varnish-software.com>; varnish-misc@varnish-cache.org
Subject: Re: Hit ratio dropped significantly after recent upgrades

> To reiterate on a point in another of my responses in this thread, I think it may be something about MediaWiki thumbnail images not being cached properly despite our current VCL in that regard not having changed from how it worked prior to the upgrade during which time we were seeing a very high (86%-ish) hit ratio from the same formula.

To reiterate on a point I made on a couple occasions, it's time to give varnishlog a spin. Too much focus on VCL, and not enough on what's happening.

Dridi
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Hit ratio dropped significantly after recent upgrades [ In reply to ]
> Could there be some other resource limitation I'm hitting without knowing it (nothing in any logs I've seen)? Everything else I could think of so far seems fine, e.g. open files, threads, tcp connections.

We can't tell if we can't have a look at a transaction's log so please
find one thumbnail's req+bereq log with varnishlog and post it on the
list. Check that it doesn't contain sensitive information and otherwise
sanitize it before posting.

Dridi

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Hit ratio dropped significantly after recent upgrades [ In reply to ]
Can you pastebin the req+bereq transactions in varnishlog, related to such
a miss?

--
Guillaume Quintard

On Tue, Dec 13, 2016 at 3:37 AM, Justin Lloyd <justinl@arena.net> wrote:

> To follow up on my last email from Friday, at this point the problem boils
> down to one thing that I've not been able to determine: Why are far fewer
> things being cached now than before the upgrade?
>
> 1. Cookies don't seem to be the problem. Most appear to be Google
> Analytics (as opposed to session), which are being unset by vcl_recv.
>
> 2. varnishlog/varnishtop shows many thumbnail URLs being missed and
> virtually none are requested with a no-cache cache-control header. Is it
> possible to use these tools determine if they (or any URLs for that matter)
> are being cached following a miss-deliver sequence? There are about 1.5m
> thumbnail files totaling around 30 GB, which prior to the upgrades wasn't
> an issue, and I don't think it is now since there are only a few expires
> and purges per minute and no nukes at all. Varnish is only using about 2 GB
> out of the 8 GB allocated to it, where it used to use all 8 GB and have
> lots of nukes and far fewer expires, so it's not a memory constraint.
>
> Could there be some other resource limitation I'm hitting without knowing
> it (nothing in any logs I've seen)? Everything else I could think of so far
> seems fine, e.g. open files, threads, tcp connections.
>
>
> -----Original Message-----
> From: varnish-misc-bounces+justinl=arena.net@varnish-cache.org [mailto:
> varnish-misc-bounces+justinl=arena.net@varnish-cache.org] On Behalf Of
> Justin Lloyd
> Sent: Friday, December 9, 2016 11:19 AM
> To: Dridi Boukelmoune <dridi@varni.sh>
> Cc: varnish-misc@varnish-cache.org
> Subject: RE: Hit ratio dropped significantly after recent upgrades
>
> I really am looking at what's happening as well. I have been looking at
> both varnishlog and varnishtop and I see a lot of thumbnail image requests
> being sent to the backend when there is still plenty of room for them in
> the cache, so even though there are a lot of thumbnail images, I shouldn't
> see so many backend requests for them. As I previously mentioned, I give
> Varnish 8 GB and it used to stay full (based on RSS usage and looking at
> nukes vs. expires) but now it hovers around only about 2 GB used. A related
> statistics is that there used to be 600-700k objects in Varnish (based on
> our graphs of MAIN.n_object via Collectd's varnish-default-struct.objects-object
> metric) but now there are only roughly 40-70k objects in Varnish at any
> given time. So it's definitely caching a lot fewer things than it was
> before the upgrade, and most of the requested URLs for requests that have
> cookies are for a lot of images and thumbnails. Images shouldn't be cached
> due to size and overall volume but thumbnails should, which is why I strip
> cookies from the thumbnails. These varnishtop commands break out /images
> and /images/thumb client requests, showing IMHO too many regular images
> being cached and nowhere near enough thumbnails:
>
> # varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/" and not ReqURL ~
> "/images/thumb"'
>
> 349.47 VCL_call HASH
> 349.47 VCL_call RECV
> 349.47 VCL_call DELIVER
> 207.22 VCL_call HIT
> 116.40 VCL_call MISS
> 116.30 VCL_call PASS
>
> # varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/thumb"'
>
> 1859.60 VCL_call HASH
> 1859.60 VCL_call RECV
> 1859.60 VCL_call DELIVER
> 1424.83 VCL_call MISS
> 422.84 VCL_call HIT
> 218.82 VCL_call PASS
>
> I'm still poking around trying to correlate caching of other types of URLs
> based on whether or not the requests have cookies, if Cache-Control gets
> returned, etc. but I just wanted to reply with this info. I do appreciate
> the responses I'm getting! :)
>
>
> -----Original Message-----
> From: Dridi Boukelmoune [mailto:dridi@varni.sh]
> Sent: Friday, December 9, 2016 10:11 AM
> To: Justin Lloyd <justinl@arena.net>
> Cc: Dag Haavi Finstad <daghf@varnish-software.com>;
> varnish-misc@varnish-cache.org
> Subject: Re: Hit ratio dropped significantly after recent upgrades
>
> > To reiterate on a point in another of my responses in this thread, I
> think it may be something about MediaWiki thumbnail images not being cached
> properly despite our current VCL in that regard not having changed from how
> it worked prior to the upgrade during which time we were seeing a very high
> (86%-ish) hit ratio from the same formula.
>
> To reiterate on a point I made on a couple occasions, it's time to give
> varnishlog a spin. Too much focus on VCL, and not enough on what's
> happening.
>
> Dridi
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
Here’s a typical varnishlog miss for a thumbnail image, appropriately sanitized. I can provide more if it helps

https://gist.github.com/Calygos/ca7906da005569046a7031d1fcaa6372


From: Guillaume Quintard [mailto:guillaume@varnish-software.com]
Sent: Tuesday, December 13, 2016 12:17 AM
To: Justin Lloyd <justinl@arena.net>
Cc: Dridi Boukelmoune <dridi@varni.sh>; varnish-misc@varnish-cache.org
Subject: Re: Hit ratio dropped significantly after recent upgrades

Can you pastebin the req+bereq transactions in varnishlog, related to such a miss?

--
Guillaume Quintard

On Tue, Dec 13, 2016 at 3:37 AM, Justin Lloyd <justinl@arena.net<mailto:justinl@arena.net>> wrote:
To follow up on my last email from Friday, at this point the problem boils down to one thing that I've not been able to determine: Why are far fewer things being cached now than before the upgrade?

1. Cookies don't seem to be the problem. Most appear to be Google Analytics (as opposed to session), which are being unset by vcl_recv.

2. varnishlog/varnishtop shows many thumbnail URLs being missed and virtually none are requested with a no-cache cache-control header. Is it possible to use these tools determine if they (or any URLs for that matter) are being cached following a miss-deliver sequence? There are about 1.5m thumbnail files totaling around 30 GB, which prior to the upgrades wasn't an issue, and I don't think it is now since there are only a few expires and purges per minute and no nukes at all. Varnish is only using about 2 GB out of the 8 GB allocated to it, where it used to use all 8 GB and have lots of nukes and far fewer expires, so it's not a memory constraint.

Could there be some other resource limitation I'm hitting without knowing it (nothing in any logs I've seen)? Everything else I could think of so far seems fine, e.g. open files, threads, tcp connections.


-----Original Message-----
From: varnish-misc-bounces+justinl=arena.net@varnish-cache.org<mailto:arena.net@varnish-cache.org> [mailto:varnish-misc-bounces+justinl<mailto:varnish-misc-bounces%2Bjustinl>=arena.net@varnish-cache.org<mailto:arena.net@varnish-cache.org>] On Behalf Of Justin Lloyd
Sent: Friday, December 9, 2016 11:19 AM
To: Dridi Boukelmoune <dridi@varni.sh<mailto:dridi@varni.sh>>
Cc: varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
Subject: RE: Hit ratio dropped significantly after recent upgrades

I really am looking at what's happening as well. I have been looking at both varnishlog and varnishtop and I see a lot of thumbnail image requests being sent to the backend when there is still plenty of room for them in the cache, so even though there are a lot of thumbnail images, I shouldn't see so many backend requests for them. As I previously mentioned, I give Varnish 8 GB and it used to stay full (based on RSS usage and looking at nukes vs. expires) but now it hovers around only about 2 GB used. A related statistics is that there used to be 600-700k objects in Varnish (based on our graphs of MAIN.n_object via Collectd's varnish-default-struct.objects-object metric) but now there are only roughly 40-70k objects in Varnish at any given time. So it's definitely caching a lot fewer things than it was before the upgrade, and most of the requested URLs for requests that have cookies are for a lot of images and thumbnails. Images shouldn't be cached due to size and overall volume but thumbnails should, which is why I strip cookies from the thumbnails. These varnishtop commands break out /images and /images/thumb client requests, showing IMHO too many regular images being cached and nowhere near enough thumbnails:

# varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/" and not ReqURL ~ "/images/thumb"'

349.47 VCL_call HASH
349.47 VCL_call RECV
349.47 VCL_call DELIVER
207.22 VCL_call HIT
116.40 VCL_call MISS
116.30 VCL_call PASS

# varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/thumb"'

1859.60 VCL_call HASH
1859.60 VCL_call RECV
1859.60 VCL_call DELIVER
1424.83 VCL_call MISS
422.84 VCL_call HIT
218.82 VCL_call PASS

I'm still poking around trying to correlate caching of other types of URLs based on whether or not the requests have cookies, if Cache-Control gets returned, etc. but I just wanted to reply with this info. I do appreciate the responses I'm getting! :)


-----Original Message-----
From: Dridi Boukelmoune [mailto:dridi@varni.sh<mailto:dridi@varni.sh>]
Sent: Friday, December 9, 2016 10:11 AM
To: Justin Lloyd <justinl@arena.net<mailto:justinl@arena.net>>
Cc: Dag Haavi Finstad <daghf@varnish-software.com<mailto:daghf@varnish-software.com>>; varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
Subject: Re: Hit ratio dropped significantly after recent upgrades

> To reiterate on a point in another of my responses in this thread, I think it may be something about MediaWiki thumbnail images not being cached properly despite our current VCL in that regard not having changed from how it worked prior to the upgrade during which time we were seeing a very high (86%-ish) hit ratio from the same formula.

To reiterate on a point I made on a couple occasions, it's time to give varnishlog a spin. Too much focus on VCL, and not enough on what's happening.

Dridi
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Hit ratio dropped significantly after recent upgrades [ In reply to ]
On Tue, Dec 13, 2016 at 3:23 PM, Justin Lloyd <justinl@arena.net> wrote:
> Here’s a typical varnishlog miss for a thumbnail image, appropriately
> sanitized. I can provide more if it helps
>
> https://gist.github.com/Calygos/ca7906da005569046a7031d1fcaa6372

After a quick look at the log, I was able to find which line [1] of
VCL is responsible for this behavior. Your backend doesn't say that
the resource is public, so you may have to configure mediawiki to mark
thumbnails as public. This is not something I can help with.

Cheers

[1] https://gist.github.com/Calygos/105957a997ea3bde6b8257a1f34bbd20#file-wiki-vcl-L144

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
Hmm, interesting. I've never made the connection to that line, which is part of the recommended Varnish configuration for MediaWiki[1], but it's not been a problem in the past as far as I know. I'll definitely investigate this more deeply. Thank you!

[1] https://www.mediawiki.org/wiki/Manual:Varnish_caching


-----Original Message-----
From: Dridi Boukelmoune [mailto:dridi@varni.sh]
Sent: Tuesday, December 13, 2016 6:40 AM
To: Justin Lloyd <justinl@arena.net>
Cc: Guillaume Quintard <guillaume@varnish-software.com>; varnish-misc@varnish-cache.org
Subject: Re: Hit ratio dropped significantly after recent upgrades

On Tue, Dec 13, 2016 at 3:23 PM, Justin Lloyd <justinl@arena.net> wrote:
> Here’s a typical varnishlog miss for a thumbnail image, appropriately
> sanitized. I can provide more if it helps
>
> https://gist.github.com/Calygos/ca7906da005569046a7031d1fcaa6372

After a quick look at the log, I was able to find which line [1] of VCL is responsible for this behavior. Your backend doesn't say that the resource is public, so you may have to configure mediawiki to mark thumbnails as public. This is not something I can help with.

Cheers

[1] https://gist.github.com/Calygos/105957a997ea3bde6b8257a1f34bbd20#file-wiki-vcl-L144
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
Actually, unless I'm missing something, that line would only matter for requests with Authorization headers, which none of our responses have since we don’t use http authentication.

-----Original Message-----
From: Justin Lloyd
Sent: Tuesday, December 13, 2016 6:47 AM
To: 'Dridi Boukelmoune' <dridi@varni.sh>
Cc: Guillaume Quintard <guillaume@varnish-software.com>; varnish-misc@varnish-cache.org
Subject: RE: Hit ratio dropped significantly after recent upgrades

Hmm, interesting. I've never made the connection to that line, which is part of the recommended Varnish configuration for MediaWiki[1], but it's not been a problem in the past as far as I know. I'll definitely investigate this more deeply. Thank you!

[1] https://www.mediawiki.org/wiki/Manual:Varnish_caching


-----Original Message-----
From: Dridi Boukelmoune [mailto:dridi@varni.sh]
Sent: Tuesday, December 13, 2016 6:40 AM
To: Justin Lloyd <justinl@arena.net>
Cc: Guillaume Quintard <guillaume@varnish-software.com>; varnish-misc@varnish-cache.org
Subject: Re: Hit ratio dropped significantly after recent upgrades

On Tue, Dec 13, 2016 at 3:23 PM, Justin Lloyd <justinl@arena.net> wrote:
> Here’s a typical varnishlog miss for a thumbnail image, appropriately
> sanitized. I can provide more if it helps
>
> https://gist.github.com/Calygos/ca7906da005569046a7031d1fcaa6372

After a quick look at the log, I was able to find which line [1] of VCL is responsible for this behavior. Your backend doesn't say that the resource is public, so you may have to configure mediawiki to mark thumbnails as public. This is not something I can help with.

Cheers

[1] https://gist.github.com/Calygos/105957a997ea3bde6b8257a1f34bbd20#file-wiki-vcl-L144
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Hit ratio dropped significantly after recent upgrades [ In reply to ]
On Tue, Dec 13, 2016 at 4:01 PM, Justin Lloyd <justinl@arena.net> wrote:
> Actually, unless I'm missing something, that line would only matter for requests with Authorization headers, which none of our responses have since we don’t use http authentication.

I still have trouble getting the hang of reading. And I had seen this
one but for some reason missed it...

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Hit ratio dropped significantly after recent upgrades [ In reply to ]
> I still have trouble getting the hang of reading. And I had seen this
> one but for some reason missed it...

Actually, this looks like the response is properly cached, otherwise
it'd be using the Transient storage.

Dridi

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
That's really the crux of my problem, as I tried to summarize in my last big email though it was longer than it probably should have been. Is it possible to determine from varnishlog whether/which requests are being missed and fetched but not cached? There's plenty of room in the cache, so I'm at a loss to determine why more stuff isn't being or staying cached long enough to result in a good hit rate like I saw prior to my recent upgrades.

I get the feeling that this will turn out to be either something very simple in the Varnish and/or MediaWiki configuration or else a bug somewhere, though I'd lean towards the former as being more likely.

-----Original Message-----
From: Dridi Boukelmoune [mailto:dridi@varni.sh]
Sent: Tuesday, December 13, 2016 7:13 AM
To: Justin Lloyd <justinl@arena.net>
Cc: Guillaume Quintard <guillaume@varnish-software.com>; varnish-misc@varnish-cache.org
Subject: Re: Hit ratio dropped significantly after recent upgrades

> I still have trouble getting the hang of reading. And I had seen this
> one but for some reason missed it...

Actually, this looks like the response is properly cached, otherwise it'd be using the Transient storage.

Dridi
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Hit ratio dropped significantly after recent upgrades [ In reply to ]
On Tue, Dec 13, 2016 at 4:20 PM, Justin Lloyd <justinl@arena.net> wrote:
> That's really the crux of my problem, as I tried to summarize in my last big email though it was longer than it probably should have been. Is it possible to determine from varnishlog whether/which requests are being missed and fetched but not cached?

There are more things you can see with varnishlog like hashes for
instance, but you could start by logging markers in your VCL to see
which branches you are taking.

Example:

if (beresp.ttl <= 0s) {
std.log("negative ttl");
set beresp.uncacheable = true;
return (deliver);
}

And do the same for all branches, and see which code is executed.

> I get the feeling that this will turn out to be either something very simple in the Varnish and/or MediaWiki configuration or else a bug somewhere, though I'd lean towards the former as being more likely.

If you can reproduce it in lab conditions where you are the only user,
look for a transaction containing:

- VCL_return hash
- VCL_call HASH
- VCL_return lookup
- VCL_call PASS

And then look for the previous transaction on the same resource.

Dridi

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
Yep! Getting tons of uncacheable responses, including lots of /images/thumb URLs, due to negative TTLs. Not sure yet what the best way to handle this is since I'm sure this is done for a reason, so I'll need to dig back in to the MW code and possibly hit up the devs on IRC again. Thanks! I'll let you know how this goes.

Justin

-----Original Message-----
From: Dridi Boukelmoune [mailto:dridi@varni.sh]
Sent: Tuesday, December 13, 2016 7:42 AM
To: Justin Lloyd <justinl@arena.net>
Cc: Guillaume Quintard <guillaume@varnish-software.com>; varnish-misc@varnish-cache.org
Subject: Re: Hit ratio dropped significantly after recent upgrades

On Tue, Dec 13, 2016 at 4:20 PM, Justin Lloyd <justinl@arena.net> wrote:
> That's really the crux of my problem, as I tried to summarize in my last big email though it was longer than it probably should have been. Is it possible to determine from varnishlog whether/which requests are being missed and fetched but not cached?

There are more things you can see with varnishlog like hashes for instance, but you could start by logging markers in your VCL to see which branches you are taking.

Example:

if (beresp.ttl <= 0s) {
std.log("negative ttl");
set beresp.uncacheable = true;
return (deliver);
}

And do the same for all branches, and see which code is executed.

> I get the feeling that this will turn out to be either something very simple in the Varnish and/or MediaWiki configuration or else a bug somewhere, though I'd lean towards the former as being more likely.

If you can reproduce it in lab conditions where you are the only user, look for a transaction containing:

- VCL_return hash
- VCL_call HASH
- VCL_return lookup
- VCL_call PASS

And then look for the previous transaction on the same resource.

Dridi
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
The log shows that the fetched object is introduced into the cache with
both TTL and grace time set to 120s each:

-- VCL_call BACKEND_RESPONSE
-- TTL VCL 120 120 0 1481637557
-- VCL_return deliver
-- Storage malloc s0

It would be interesting to see if a subsequent request to the same URL
within less than 4 minutes would yield another miss or not.

Regards,

Florian


Am 13. Dezember 2016 15:27:16 schrieb Justin Lloyd <justinl@arena.net>:

> Here’s a typical varnishlog miss for a thumbnail image, appropriately
> sanitized. I can provide more if it helps
>
> https://gist.github.com/Calygos/ca7906da005569046a7031d1fcaa6372
>
>
> From: Guillaume Quintard [mailto:guillaume@varnish-software.com]
> Sent: Tuesday, December 13, 2016 12:17 AM
> To: Justin Lloyd <justinl@arena.net>
> Cc: Dridi Boukelmoune <dridi@varni.sh>; varnish-misc@varnish-cache.org
> Subject: Re: Hit ratio dropped significantly after recent upgrades
>
> Can you pastebin the req+bereq transactions in varnishlog, related to such
> a miss?
>
> --
> Guillaume Quintard
>
> On Tue, Dec 13, 2016 at 3:37 AM, Justin Lloyd
> <justinl@arena.net<mailto:justinl@arena.net>> wrote:
> To follow up on my last email from Friday, at this point the problem boils
> down to one thing that I've not been able to determine: Why are far fewer
> things being cached now than before the upgrade?
>
> 1. Cookies don't seem to be the problem. Most appear to be Google Analytics
> (as opposed to session), which are being unset by vcl_recv.
>
> 2. varnishlog/varnishtop shows many thumbnail URLs being missed and
> virtually none are requested with a no-cache cache-control header. Is it
> possible to use these tools determine if they (or any URLs for that matter)
> are being cached following a miss-deliver sequence? There are about 1.5m
> thumbnail files totaling around 30 GB, which prior to the upgrades wasn't
> an issue, and I don't think it is now since there are only a few expires
> and purges per minute and no nukes at all. Varnish is only using about 2 GB
> out of the 8 GB allocated to it, where it used to use all 8 GB and have
> lots of nukes and far fewer expires, so it's not a memory constraint.
>
> Could there be some other resource limitation I'm hitting without knowing
> it (nothing in any logs I've seen)? Everything else I could think of so far
> seems fine, e.g. open files, threads, tcp connections.
>
>
> -----Original Message-----
> From:
> varnish-misc-bounces+justinl=arena.net@varnish-cache.org<mailto:arena.net@varnish-cache.org>
> [mailto:varnish-misc-bounces+justinl<mailto:varnish-misc-bounces%2Bjustinl>=arena.net@varnish-cache.org<mailto:arena.net@varnish-cache.org>]
> On Behalf Of Justin Lloyd
> Sent: Friday, December 9, 2016 11:19 AM
> To: Dridi Boukelmoune <dridi@varni.sh<mailto:dridi@varni.sh>>
> Cc: varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
> Subject: RE: Hit ratio dropped significantly after recent upgrades
>
> I really am looking at what's happening as well. I have been looking at
> both varnishlog and varnishtop and I see a lot of thumbnail image requests
> being sent to the backend when there is still plenty of room for them in
> the cache, so even though there are a lot of thumbnail images, I shouldn't
> see so many backend requests for them. As I previously mentioned, I give
> Varnish 8 GB and it used to stay full (based on RSS usage and looking at
> nukes vs. expires) but now it hovers around only about 2 GB used. A related
> statistics is that there used to be 600-700k objects in Varnish (based on
> our graphs of MAIN.n_object via Collectd's
> varnish-default-struct.objects-object metric) but now there are only
> roughly 40-70k objects in Varnish at any given time. So it's definitely
> caching a lot fewer things than it was before the upgrade, and most of the
> requested URLs for requests that have cookies are for a lot of images and
> thumbnails. Images shouldn't be cached due to size and overall volume but
> thumbnails should, which is why I strip cookies from the thumbnails. These
> varnishtop commands break out /images and /images/thumb client requests,
> showing IMHO too many regular images being cached and nowhere near enough
> thumbnails:
>
> # varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/" and not ReqURL ~
> "/images/thumb"'
>
> 349.47 VCL_call HASH
> 349.47 VCL_call RECV
> 349.47 VCL_call DELIVER
> 207.22 VCL_call HIT
> 116.40 VCL_call MISS
> 116.30 VCL_call PASS
>
> # varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/thumb"'
>
> 1859.60 VCL_call HASH
> 1859.60 VCL_call RECV
> 1859.60 VCL_call DELIVER
> 1424.83 VCL_call MISS
> 422.84 VCL_call HIT
> 218.82 VCL_call PASS
>
> I'm still poking around trying to correlate caching of other types of URLs
> based on whether or not the requests have cookies, if Cache-Control gets
> returned, etc. but I just wanted to reply with this info. I do appreciate
> the responses I'm getting! :)
>
>
> -----Original Message-----
> From: Dridi Boukelmoune [mailto:dridi@varni.sh<mailto:dridi@varni.sh>]
> Sent: Friday, December 9, 2016 10:11 AM
> To: Justin Lloyd <justinl@arena.net<mailto:justinl@arena.net>>
> Cc: Dag Haavi Finstad
> <daghf@varnish-software.com<mailto:daghf@varnish-software.com>>;
> varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
> Subject: Re: Hit ratio dropped significantly after recent upgrades
>
>> To reiterate on a point in another of my responses in this thread, I think
>> it may be something about MediaWiki thumbnail images not being cached
>> properly despite our current VCL in that regard not having changed from how
>> it worked prior to the upgrade during which time we were seeing a very high
>> (86%-ish) hit ratio from the same formula.
>
> To reiterate on a point I made on a couple occasions, it's time to give
> varnishlog a spin. Too much focus on VCL, and not enough on what's happening.
>
> Dridi
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
>
>
> ----------
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc



_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
Based on this conversation, I added a 1h TTL to thumbnail images in vcl_backend_response and that has gotten my hit ratio up to about 55-60% depending on how you calculate it (hit/miss values vs. frontend/backend connections), with up to about 72k objects in memory, up from about 60k max before, though before the upgrades it was more like 600-700k objects.

It's been an hour now and I'm seeing a spike in expired objects and a drop in the number of objects, so I'll probably increase the TTL until I find a sweet spot. I don't think there's any risk since thumbnails don't change often, so even a max of 48h may be reasonable. So I'll do more testing today and see how things go.

Thanks!


-----Original Message-----
From: Florian Tham [mailto:fgtham@gmail.com]
Sent: Tuesday, December 13, 2016 12:13 PM
To: Justin Lloyd <justinl@arena.net>
Cc: varnish-misc@varnish-cache.org
Subject: RE: Hit ratio dropped significantly after recent upgrades

The log shows that the fetched object is introduced into the cache with both TTL and grace time set to 120s each:

-- VCL_call BACKEND_RESPONSE
-- TTL VCL 120 120 0 1481637557
-- VCL_return deliver
-- Storage malloc s0

It would be interesting to see if a subsequent request to the same URL within less than 4 minutes would yield another miss or not.

Regards,

Florian


Am 13. Dezember 2016 15:27:16 schrieb Justin Lloyd <justinl@arena.net>:

> Here’s a typical varnishlog miss for a thumbnail image, appropriately
> sanitized. I can provide more if it helps
>
> https://gist.github.com/Calygos/ca7906da005569046a7031d1fcaa6372
>
>
> From: Guillaume Quintard [mailto:guillaume@varnish-software.com]
> Sent: Tuesday, December 13, 2016 12:17 AM
> To: Justin Lloyd <justinl@arena.net>
> Cc: Dridi Boukelmoune <dridi@varni.sh>; varnish-misc@varnish-cache.org
> Subject: Re: Hit ratio dropped significantly after recent upgrades
>
> Can you pastebin the req+bereq transactions in varnishlog, related to
> such a miss?
>
> --
> Guillaume Quintard
>
> On Tue, Dec 13, 2016 at 3:37 AM, Justin Lloyd
> <justinl@arena.net<mailto:justinl@arena.net>> wrote:
> To follow up on my last email from Friday, at this point the problem
> boils down to one thing that I've not been able to determine: Why are
> far fewer things being cached now than before the upgrade?
>
> 1. Cookies don't seem to be the problem. Most appear to be Google
> Analytics (as opposed to session), which are being unset by vcl_recv.
>
> 2. varnishlog/varnishtop shows many thumbnail URLs being missed and
> virtually none are requested with a no-cache cache-control header. Is
> it possible to use these tools determine if they (or any URLs for that
> matter) are being cached following a miss-deliver sequence? There are
> about 1.5m thumbnail files totaling around 30 GB, which prior to the
> upgrades wasn't an issue, and I don't think it is now since there are
> only a few expires and purges per minute and no nukes at all. Varnish
> is only using about 2 GB out of the 8 GB allocated to it, where it
> used to use all 8 GB and have lots of nukes and far fewer expires, so it's not a memory constraint.
>
> Could there be some other resource limitation I'm hitting without
> knowing it (nothing in any logs I've seen)? Everything else I could
> think of so far seems fine, e.g. open files, threads, tcp connections.
>
>
> -----Original Message-----
> From:
> varnish-misc-bounces+justinl=arena.net@varnish-cache.org<mailto:arena.
> varnish-misc-bounces+net@varnish-cache.org>
> [mailto:varnish-misc-bounces+justinl<mailto:varnish-misc-bounces%2Bjus
> tinl>=arena.net@varnish-cache.org<mailto:arena.net@varnish-cache.org>]
> On Behalf Of Justin Lloyd
> Sent: Friday, December 9, 2016 11:19 AM
> To: Dridi Boukelmoune <dridi@varni.sh<mailto:dridi@varni.sh>>
> Cc:
> varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
> Subject: RE: Hit ratio dropped significantly after recent upgrades
>
> I really am looking at what's happening as well. I have been looking
> at both varnishlog and varnishtop and I see a lot of thumbnail image
> requests being sent to the backend when there is still plenty of room
> for them in the cache, so even though there are a lot of thumbnail
> images, I shouldn't see so many backend requests for them. As I
> previously mentioned, I give Varnish 8 GB and it used to stay full
> (based on RSS usage and looking at nukes vs. expires) but now it
> hovers around only about 2 GB used. A related statistics is that there
> used to be 600-700k objects in Varnish (based on our graphs of
> MAIN.n_object via Collectd's varnish-default-struct.objects-object
> metric) but now there are only roughly 40-70k objects in Varnish at
> any given time. So it's definitely caching a lot fewer things than it
> was before the upgrade, and most of the requested URLs for requests
> that have cookies are for a lot of images and thumbnails. Images
> shouldn't be cached due to size and overall volume but thumbnails
> should, which is why I strip cookies from the thumbnails. These
> varnishtop commands break out /images and /images/thumb client
> requests, showing IMHO too many regular images being cached and
> nowhere near enough
> thumbnails:
>
> # varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/" and not ReqURL ~
> "/images/thumb"'
>
> 349.47 VCL_call HASH
> 349.47 VCL_call RECV
> 349.47 VCL_call DELIVER
> 207.22 VCL_call HIT
> 116.40 VCL_call MISS
> 116.30 VCL_call PASS
>
> # varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/thumb"'
>
> 1859.60 VCL_call HASH
> 1859.60 VCL_call RECV
> 1859.60 VCL_call DELIVER
> 1424.83 VCL_call MISS
> 422.84 VCL_call HIT
> 218.82 VCL_call PASS
>
> I'm still poking around trying to correlate caching of other types of
> URLs based on whether or not the requests have cookies, if
> Cache-Control gets returned, etc. but I just wanted to reply with this
> info. I do appreciate the responses I'm getting! :)
>
>
> -----Original Message-----
> From: Dridi Boukelmoune [mailto:dridi@varni.sh<mailto:dridi@varni.sh>]
> Sent: Friday, December 9, 2016 10:11 AM
> To: Justin Lloyd <justinl@arena.net<mailto:justinl@arena.net>>
> Cc: Dag Haavi Finstad
> <daghf@varnish-software.com<mailto:daghf@varnish-software.com>>;
> varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
> Subject: Re: Hit ratio dropped significantly after recent upgrades
>
>> To reiterate on a point in another of my responses in this thread, I
>> think it may be something about MediaWiki thumbnail images not being
>> cached properly despite our current VCL in that regard not having
>> changed from how it worked prior to the upgrade during which time we
>> were seeing a very high
>> (86%-ish) hit ratio from the same formula.
>
> To reiterate on a point I made on a couple occasions, it's time to
> give varnishlog a spin. Too much focus on VCL, and not enough on what's happening.
>
> Dridi
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
>
>
> ----------
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc

1 2  View All