Mailing List Archive

1 2  View All
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
So after increasing the TTL for thumbnail images to 4 hours, the hit ratio got to 65-70%, objects in memory got up to around 150k before tapering off (due to lots of expirations starting after 4 hours, as to be expected) and slowly dipping back down to around 100k before starting on the upswing again. I'm continuing to test increasing the TTL, setting it to 24h and we'll see if we have any problems reported as a result in case that's too long, but at any rate we definitely appear to have found the smoking gun and I know where to tinker to try to better optimize things.

I'm not sure why this changed with the upgrades, whether it was something in MediaWiki or Varnish, but at least I know where to spend cycles on optimizations.

Thank you all very much for the help!

Justin


-----Original Message-----
From: varnish-misc-bounces+justinl=arena.net@varnish-cache.org [mailto:varnish-misc-bounces+justinl=arena.net@varnish-cache.org] On Behalf Of Justin Lloyd
Sent: Tuesday, December 13, 2016 12:19 PM
To: Florian Tham <fgtham@gmail.com>
Cc: varnish-misc@varnish-cache.org
Subject: RE: Hit ratio dropped significantly after recent upgrades

Based on this conversation, I added a 1h TTL to thumbnail images in vcl_backend_response and that has gotten my hit ratio up to about 55-60% depending on how you calculate it (hit/miss values vs. frontend/backend connections), with up to about 72k objects in memory, up from about 60k max before, though before the upgrades it was more like 600-700k objects.

It's been an hour now and I'm seeing a spike in expired objects and a drop in the number of objects, so I'll probably increase the TTL until I find a sweet spot. I don't think there's any risk since thumbnails don't change often, so even a max of 48h may be reasonable. So I'll do more testing today and see how things go.

Thanks!


-----Original Message-----
From: Florian Tham [mailto:fgtham@gmail.com]
Sent: Tuesday, December 13, 2016 12:13 PM
To: Justin Lloyd <justinl@arena.net>
Cc: varnish-misc@varnish-cache.org
Subject: RE: Hit ratio dropped significantly after recent upgrades

The log shows that the fetched object is introduced into the cache with both TTL and grace time set to 120s each:

-- VCL_call BACKEND_RESPONSE
-- TTL VCL 120 120 0 1481637557
-- VCL_return deliver
-- Storage malloc s0

It would be interesting to see if a subsequent request to the same URL within less than 4 minutes would yield another miss or not.

Regards,

Florian


Am 13. Dezember 2016 15:27:16 schrieb Justin Lloyd <justinl@arena.net>:

> Here’s a typical varnishlog miss for a thumbnail image, appropriately
> sanitized. I can provide more if it helps
>
> https://gist.github.com/Calygos/ca7906da005569046a7031d1fcaa6372
>
>
> From: Guillaume Quintard [mailto:guillaume@varnish-software.com]
> Sent: Tuesday, December 13, 2016 12:17 AM
> To: Justin Lloyd <justinl@arena.net>
> Cc: Dridi Boukelmoune <dridi@varni.sh>; varnish-misc@varnish-cache.org
> Subject: Re: Hit ratio dropped significantly after recent upgrades
>
> Can you pastebin the req+bereq transactions in varnishlog, related to
> such a miss?
>
> --
> Guillaume Quintard
>
> On Tue, Dec 13, 2016 at 3:37 AM, Justin Lloyd
> <justinl@arena.net<mailto:justinl@arena.net>> wrote:
> To follow up on my last email from Friday, at this point the problem
> boils down to one thing that I've not been able to determine: Why are
> far fewer things being cached now than before the upgrade?
>
> 1. Cookies don't seem to be the problem. Most appear to be Google
> Analytics (as opposed to session), which are being unset by vcl_recv.
>
> 2. varnishlog/varnishtop shows many thumbnail URLs being missed and
> virtually none are requested with a no-cache cache-control header. Is
> it possible to use these tools determine if they (or any URLs for that
> matter) are being cached following a miss-deliver sequence? There are
> about 1.5m thumbnail files totaling around 30 GB, which prior to the
> upgrades wasn't an issue, and I don't think it is now since there are
> only a few expires and purges per minute and no nukes at all. Varnish
> is only using about 2 GB out of the 8 GB allocated to it, where it
> used to use all 8 GB and have lots of nukes and far fewer expires, so it's not a memory constraint.
>
> Could there be some other resource limitation I'm hitting without
> knowing it (nothing in any logs I've seen)? Everything else I could
> think of so far seems fine, e.g. open files, threads, tcp connections.
>
>
> -----Original Message-----
> From:
> varnish-misc-bounces+justinl=arena.net@varnish-cache.org<mailto:arena.
> varnish-misc-bounces+net@varnish-cache.org>
> [mailto:varnish-misc-bounces+justinl<mailto:varnish-misc-bounces%2Bjus
> tinl>=arena.net@varnish-cache.org<mailto:arena.net@varnish-cache.org>]
> On Behalf Of Justin Lloyd
> Sent: Friday, December 9, 2016 11:19 AM
> To: Dridi Boukelmoune <dridi@varni.sh<mailto:dridi@varni.sh>>
> Cc:
> varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
> Subject: RE: Hit ratio dropped significantly after recent upgrades
>
> I really am looking at what's happening as well. I have been looking
> at both varnishlog and varnishtop and I see a lot of thumbnail image
> requests being sent to the backend when there is still plenty of room
> for them in the cache, so even though there are a lot of thumbnail
> images, I shouldn't see so many backend requests for them. As I
> previously mentioned, I give Varnish 8 GB and it used to stay full
> (based on RSS usage and looking at nukes vs. expires) but now it
> hovers around only about 2 GB used. A related statistics is that there
> used to be 600-700k objects in Varnish (based on our graphs of
> MAIN.n_object via Collectd's varnish-default-struct.objects-object
> metric) but now there are only roughly 40-70k objects in Varnish at
> any given time. So it's definitely caching a lot fewer things than it
> was before the upgrade, and most of the requested URLs for requests
> that have cookies are for a lot of images and thumbnails. Images
> shouldn't be cached due to size and overall volume but thumbnails
> should, which is why I strip cookies from the thumbnails. These
> varnishtop commands break out /images and /images/thumb client
> requests, showing IMHO too many regular images being cached and
> nowhere near enough
> thumbnails:
>
> # varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/" and not ReqURL ~
> "/images/thumb"'
>
> 349.47 VCL_call HASH
> 349.47 VCL_call RECV
> 349.47 VCL_call DELIVER
> 207.22 VCL_call HIT
> 116.40 VCL_call MISS
> 116.30 VCL_call PASS
>
> # varnishtop -c -i VCL_call -q 'ReqURL ~ "/images/thumb"'
>
> 1859.60 VCL_call HASH
> 1859.60 VCL_call RECV
> 1859.60 VCL_call DELIVER
> 1424.83 VCL_call MISS
> 422.84 VCL_call HIT
> 218.82 VCL_call PASS
>
> I'm still poking around trying to correlate caching of other types of
> URLs based on whether or not the requests have cookies, if
> Cache-Control gets returned, etc. but I just wanted to reply with this
> info. I do appreciate the responses I'm getting! :)
>
>
> -----Original Message-----
> From: Dridi Boukelmoune [mailto:dridi@varni.sh<mailto:dridi@varni.sh>]
> Sent: Friday, December 9, 2016 10:11 AM
> To: Justin Lloyd <justinl@arena.net<mailto:justinl@arena.net>>
> Cc: Dag Haavi Finstad
> <daghf@varnish-software.com<mailto:daghf@varnish-software.com>>;
> varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
> Subject: Re: Hit ratio dropped significantly after recent upgrades
>
>> To reiterate on a point in another of my responses in this thread, I
>> think it may be something about MediaWiki thumbnail images not being
>> cached properly despite our current VCL in that regard not having
>> changed from how it worked prior to the upgrade during which time we
>> were seeing a very high
>> (86%-ish) hit ratio from the same formula.
>
> To reiterate on a point I made on a couple occasions, it's time to
> give varnishlog a spin. Too much focus on VCL, and not enough on what's happening.
>
> Dridi
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org<mailto:varnish-misc@varnish-cache.org>
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
>
>
> ----------
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc


_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: Hit ratio dropped significantly after recent upgrades [ In reply to ]
On Wed, Dec 14, 2016 at 2:58 PM, Justin Lloyd <justinl@arena.net> wrote:
> So after increasing the TTL for thumbnail images to 4 hours, the hit ratio got to 65-70%, objects in memory got up to around 150k before tapering off (due to lots of expirations starting after 4 hours, as to be expected) and slowly dipping back down to around 100k before starting on the upswing again. I'm continuing to test increasing the TTL, setting it to 24h and we'll see if we have any problems reported as a result in case that's too long, but at any rate we definitely appear to have found the smoking gun and I know where to tinker to try to better optimize things.
>
> I'm not sure why this changed with the upgrades, whether it was something in MediaWiki or Varnish, but at least I know where to spend cycles on optimizations.
>
> Thank you all very much for the help!

Hi,

Thanks for the feedback and glad to see things back to normal.
Consider sharing your findings with the MediaWiki folks as they will
likely know better how to deal with the thumbnails.

If images/thumbnails aren't changing often, you can safely increase
the TTL (ideally directly from MediaWiki) as long as you have an
invalidation strategy in place.

Cheers

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
I actually have the TTL set to 48h now and it seems to work well, but I'm not sure if there's anything specific I should do regarding invalidation. In general, wikis don't allow for deleting or even changing images; you just upload new ones and the wiki software keeps previous versions in the revision history. However, I am concerned about invalidating cached images when new versions are uploaded so that the previous version is purged from the cache, but I'm not sure about the most appropriate way to handle that from a Varnish or MediaWiki perspective. If you have any thoughts on that, I'd appreciate it.

-----Original Message-----
From: Dridi Boukelmoune [mailto:dridi@varni.sh]
Sent: Wednesday, December 14, 2016 7:20 AM
To: Justin Lloyd <justinl@arena.net>
Cc: Florian Tham <fgtham@gmail.com>; varnish-misc@varnish-cache.org
Subject: Re: Hit ratio dropped significantly after recent upgrades

On Wed, Dec 14, 2016 at 2:58 PM, Justin Lloyd <justinl@arena.net> wrote:
> So after increasing the TTL for thumbnail images to 4 hours, the hit ratio got to 65-70%, objects in memory got up to around 150k before tapering off (due to lots of expirations starting after 4 hours, as to be expected) and slowly dipping back down to around 100k before starting on the upswing again. I'm continuing to test increasing the TTL, setting it to 24h and we'll see if we have any problems reported as a result in case that's too long, but at any rate we definitely appear to have found the smoking gun and I know where to tinker to try to better optimize things.
>
> I'm not sure why this changed with the upgrades, whether it was something in MediaWiki or Varnish, but at least I know where to spend cycles on optimizations.
>
> Thank you all very much for the help!

Hi,

Thanks for the feedback and glad to see things back to normal.
Consider sharing your findings with the MediaWiki folks as they will likely know better how to deal with the thumbnails.

If images/thumbnails aren't changing often, you can safely increase the TTL (ideally directly from MediaWiki) as long as you have an invalidation strategy in place.

Cheers
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
RE: Hit ratio dropped significantly after recent upgrades [ In reply to ]
The best thing to do (imo) is to trigger the purges directly in media wiki.
I only did a quick search but they seem to already integrate easily with
Varnish, to send cache purge requests on updates/changes
https://m.mediawiki.org/wiki/Manual:Varnish_caching hope it helps!

On Dec 16, 2016 17:42, "Justin Lloyd" <justinl@arena.net> wrote:

> I actually have the TTL set to 48h now and it seems to work well, but I'm
> not sure if there's anything specific I should do regarding invalidation.
> In general, wikis don't allow for deleting or even changing images; you
> just upload new ones and the wiki software keeps previous versions in the
> revision history. However, I am concerned about invalidating cached images
> when new versions are uploaded so that the previous version is purged from
> the cache, but I'm not sure about the most appropriate way to handle that
> from a Varnish or MediaWiki perspective. If you have any thoughts on that,
> I'd appreciate it.
>
> -----Original Message-----
> From: Dridi Boukelmoune [mailto:dridi@varni.sh]
> Sent: Wednesday, December 14, 2016 7:20 AM
> To: Justin Lloyd <justinl@arena.net>
> Cc: Florian Tham <fgtham@gmail.com>; varnish-misc@varnish-cache.org
> Subject: Re: Hit ratio dropped significantly after recent upgrades
>
> On Wed, Dec 14, 2016 at 2:58 PM, Justin Lloyd <justinl@arena.net> wrote:
> > So after increasing the TTL for thumbnail images to 4 hours, the hit
> ratio got to 65-70%, objects in memory got up to around 150k before
> tapering off (due to lots of expirations starting after 4 hours, as to be
> expected) and slowly dipping back down to around 100k before starting on
> the upswing again. I'm continuing to test increasing the TTL, setting it to
> 24h and we'll see if we have any problems reported as a result in case
> that's too long, but at any rate we definitely appear to have found the
> smoking gun and I know where to tinker to try to better optimize things.
> >
> > I'm not sure why this changed with the upgrades, whether it was
> something in MediaWiki or Varnish, but at least I know where to spend
> cycles on optimizations.
> >
> > Thank you all very much for the help!
>
> Hi,
>
> Thanks for the feedback and glad to see things back to normal.
> Consider sharing your findings with the MediaWiki folks as they will
> likely know better how to deal with the thumbnails.
>
> If images/thumbnails aren't changing often, you can safely increase the
> TTL (ideally directly from MediaWiki) as long as you have an invalidation
> strategy in place.
>
> Cheers
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>

1 2  View All