Mailing List Archive

BAN - Memory consumption exploding under load
Hi there,

We're using Varnish 4.1.3 here and are running into an issue when using a
lot of BANs.

We make a heavy use of BAN-lurker friendly bans, and it seems that overtime
BANs accumulate.

They accumulate to the point where varnish suddenly starts using all the
memory available on the system and that only a reboot of the server fixes.
The CPU usage also seem to explode.

The accumulation of BANs seems to be happening when we have a sudden spike
in traffic triggering in turn a sudden spike in BAN requests.

Not sure how useful the following information is but:

I was able earlier to run `sudo varnishstat -1 | grep ban` on the server to
gather some information, but unfortunately lost all the exact numbers.

I do however recall seeing MAIN.bans, MAIN.ban_added and MAIN.ban_created
ever increasing and MAIN.ban_deleted staying to the same constant number.

MAIN.bans_persisted_bytes and MAIN.bans_persisted_fragmentation showed both
around 410MBytes.

We noticed a very similar issue on varnish 3.0.6 where sending too many BAN
requests to Varnish would make it crawl to a halt (without necessarily
eating all the memory and CPU though). We upgraded to Varnish 4.1.3 hoping
it would help, but seems like it didn't.

I also, thanks to varnishlog, realized that we were sending a lot of BAN
that were invalid regular expressions. I fixed the issue with the regular
expressions and varnishlog isn't showing errors anymore, but currently have
no idea if it had anything to do with my original issue.

Has anybody ever run into this issue? I found an old bug report reporting
the same kind of behavior: https://www.varnish-cache.org/trac/ticket/1701

Cheers,
-Hugues
Re: BAN - Memory consumption exploding under load [ In reply to ]
Hi,

we've added a bunch of ban and ban lurker improvements which did not get
backported to 4.1.3

Please upgrade to 5.0 or master.

Thanks, Nils

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: BAN - Memory consumption exploding under load [ In reply to ]
BTW, should you still see the issue with 5.0/master, getting hold of varnishstat
is absolutely vital for any further analysis. Also, getting regular outputs of
varnishadm ban.list would be really helpful (it changes over time).

Thanks, Nils

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: BAN - Memory consumption exploding under load [ In reply to ]
Hi Nils,

I'll upgrade to Varnish 5.0 and see if the problem persists.

Cheers,
-Hugues

On Thu, Oct 27, 2016 at 9:07 AM, Nils Goroll <slink@schokola.de> wrote:

> BTW, should you still see the issue with 5.0/master, getting hold of
> varnishstat
> is absolutely vital for any further analysis. Also, getting regular
> outputs of
> varnishadm ban.list would be really helpful (it changes over time).
>
> Thanks, Nils
>
Re: BAN - Memory consumption exploding under load [ In reply to ]
On Thu, Oct 27, 2016 at 6:02 PM, Nils Goroll <slink@schokola.de> wrote:
> Hi,
>
> we've added a bunch of ban and ban lurker improvements which did not get
> backported to 4.1.3
>
> Please upgrade to 5.0 or master.

Hi Nils,

I was thinking (against the too-many-knobs trend) that maybe we should
have some sort of ban_queue_limit parameter to avoid piling up too
many bans. It would also play nice with your std.ban() initiative
where hitting the queue limit could be one of the reasons returned by
the function when it fails.

The default could be 0 for unlimited, or a fairly large number.

Dridi

_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: BAN - Memory consumption exploding under load [ In reply to ]
Hi there,

Re-opening this thread from 2016!

A few days ago, I ran into an issue where varnish (varnish-6.0.0 revision
a068361dff0d25a0d85cf82a6e5fdaf315e06a7d) became completely unresponsive.

While investigating, I noticed a lot of errors in varnishlog `VSL: store
overflow`, but most importantly, I noticed our BAN list was starting to
accumulate with somewhere around 130K BANs reported by varnishstat.

The load on the website was pretty low. Restarting Varnish fixed the issue. It
is possible I had been running some script generating a lot of BANs, but it
still seems weird it would generate so many BANs that varnish would become
unresponsive.

After restarting Varnish, I watched `varnishstat -1 | grep bans` and could
see MAIN.bans going up and down, but staying fairly stable on average.

Today, I decided to check again and I noticed:

MAIN.bans 66863 . Count of bans
MAIN.bans_completed 66834 . Number of bans marked
'completed'
MAIN.bans_obj 66849 . Number of bans using obj.*
MAIN.bans_req 14 . Number of bans using req.*
MAIN.bans_added 170939 1.18 Bans added
MAIN.bans_deleted 104076 0.72 Bans deleted
MAIN.bans_tested 3767852 26.08 Bans tested against
objects (lookup)
MAIN.bans_obj_killed 62024 0.43 Objects killed by bans
(lookup)
MAIN.bans_lurker_tested 3344073359 23148.62 Bans tested against
objects (lurker)
MAIN.bans_tests_tested 94711644 655.62 Ban tests tested against
objects (lookup)
MAIN.bans_lurker_tests_tested 4573910595 31661.91 Ban tests tested
against objects (lurker)
MAIN.bans_lurker_obj_killed 16627 0.12 Objects killed by
bans (lurker)
MAIN.bans_lurker_obj_killed_cutoff 0 0.00 Objects killed
by bans for cutoff (lurker)
MAIN.bans_dups 113845 0.79 Bans
superseded by other bans
MAIN.bans_lurker_contention 3 0.00 Lurker gave
way for lookup
MAIN.bans_persisted_bytes 163879330 . Bytes used by
the persisted ban lists
MAIN.bans_persisted_fragmentation 162795194 . Extra bytes in
persisted ban lists due to fragmentation


1- MAIN.bans is almost at 70k and varnish has been running for 1 day 16
hours
2- MAIN.bans_req is > 0, but nowhere in my configuration do I ever submit
bans using req.*

So I have 3 questions:

- is 70k+ bans un-reasonable and should I find a way to reduce them?
- why am I seeing 14 MAIN.bans_req when my config doesn't ever emit such
bans?

Thanks,
-Hugues

On Thu, Nov 3, 2016 at 3:04 AM, Dridi Boukelmoune <dridi@varni.sh> wrote:

> On Thu, Oct 27, 2016 at 6:02 PM, Nils Goroll <slink@schokola.de> wrote:
> > Hi,
> >
> > we've added a bunch of ban and ban lurker improvements which did not get
> > backported to 4.1.3
> >
> > Please upgrade to 5.0 or master.
>
> Hi Nils,
>
> I was thinking (against the too-many-knobs trend) that maybe we should
> have some sort of ban_queue_limit parameter to avoid piling up too
> many bans. It would also play nice with your std.ban() initiative
> where hitting the queue limit could be one of the reasons returned by
> the function when it fails.
>
> The default could be 0 for unlimited, or a fairly large number.
>
> Dridi
>
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>
Re: BAN - Memory consumption exploding under load [ In reply to ]
In case someone runs into a similar issue regarding BANs accumulating, I
found an experimental `ban_cutoff` runtime parameter (see below) that
specifically exists for this issue.

A search for `ban_cutoff` in this mailing list archive also yields a good
thread about this (thread: "Varnish Lurker is getting slower / Ban lists
keeps increasing")

ban_cutoff
Value is: 0 [bans]
Default is: 0
Minimum is: 0

Expurge long tail content from the cache to keep the number of
bans below this value. 0 disables.

When this parameter is set to a non-zero value, the ban lurker
continues to work the ban list as usual top to bottom, but when
it reaches the ban_cutoff-th ban, it treats all objects as if
they matched a ban and expurges them from cache. As actively
used objects get tested against the ban list at request time
and thus are likely to be associated with bans near the top of
the ban list, with ban_cutoff, least recently accessed objects
(the "long tail") are removed.

This parameter is a safety net to avoid bad response times due
to bans being tested at lookup time. Setting a cutoff trades
response time for cache efficiency. The recommended value is
proportional to rate(bans_lurker_tests_tested) / n_objects
while the ban lurker is working, which is the number of bans
the system can sustain. The additional latency due to request
ban testing is in the order of ban_cutoff /
rate(bans_lurker_tests_tested). For example, for
rate(bans_lurker_tests_tested) = 2M/s and a tolerable latency
of 100ms, a good value for ban_cutoff may be 200K.

NB: We do not know yet if it is a good idea to change this
parameter, or if the default value is even sensible. Caution is
advised, and feedback is most welcome.


On Thu, May 31, 2018 at 3:49 PM Hugues Alary <hugues@betabrand.com> wrote:

> Hi there,
>
> Re-opening this thread from 2016!
>
> A few days ago, I ran into an issue where varnish (varnish-6.0.0 revision
> a068361dff0d25a0d85cf82a6e5fdaf315e06a7d) became completely unresponsive.
>
> While investigating, I noticed a lot of errors in varnishlog `VSL: store
> overflow`, but most importantly, I noticed our BAN list was starting to
> accumulate with somewhere around 130K BANs reported by varnishstat.
>
> The load on the website was pretty low. Restarting Varnish fixed the
> issue. It is possible I had been running some script generating a lot of
> BANs, but it still seems weird it would generate so many BANs that varnish
> would become unresponsive.
>
> After restarting Varnish, I watched `varnishstat -1 | grep bans` and could
> see MAIN.bans going up and down, but staying fairly stable on average.
>
> Today, I decided to check again and I noticed:
>
> MAIN.bans 66863 . Count of bans
> MAIN.bans_completed 66834 . Number of bans marked
> 'completed'
> MAIN.bans_obj 66849 . Number of bans using obj.*
> MAIN.bans_req 14 . Number of bans using req.*
> MAIN.bans_added 170939 1.18 Bans added
> MAIN.bans_deleted 104076 0.72 Bans deleted
> MAIN.bans_tested 3767852 26.08 Bans tested against
> objects (lookup)
> MAIN.bans_obj_killed 62024 0.43 Objects killed by bans
> (lookup)
> MAIN.bans_lurker_tested 3344073359 23148.62 Bans tested against
> objects (lurker)
> MAIN.bans_tests_tested 94711644 655.62 Ban tests tested against
> objects (lookup)
> MAIN.bans_lurker_tests_tested 4573910595 31661.91 Ban tests tested
> against objects (lurker)
> MAIN.bans_lurker_obj_killed 16627 0.12 Objects killed by
> bans (lurker)
> MAIN.bans_lurker_obj_killed_cutoff 0 0.00 Objects
> killed by bans for cutoff (lurker)
> MAIN.bans_dups 113845 0.79 Bans
> superseded by other bans
> MAIN.bans_lurker_contention 3 0.00 Lurker gave
> way for lookup
> MAIN.bans_persisted_bytes 163879330 . Bytes used by
> the persisted ban lists
> MAIN.bans_persisted_fragmentation 162795194 . Extra bytes
> in persisted ban lists due to fragmentation
>
>
> 1- MAIN.bans is almost at 70k and varnish has been running for 1 day 16
> hours
> 2- MAIN.bans_req is > 0, but nowhere in my configuration do I ever submit
> bans using req.*
>
> So I have 3 questions:
>
> - is 70k+ bans un-reasonable and should I find a way to reduce them?
> - why am I seeing 14 MAIN.bans_req when my config doesn't ever emit such
> bans?
>
> Thanks,
> -Hugues
>
> On Thu, Nov 3, 2016 at 3:04 AM, Dridi Boukelmoune <dridi@varni.sh> wrote:
>
>> On Thu, Oct 27, 2016 at 6:02 PM, Nils Goroll <slink@schokola.de> wrote:
>> > Hi,
>> >
>> > we've added a bunch of ban and ban lurker improvements which did not get
>> > backported to 4.1.3
>> >
>> > Please upgrade to 5.0 or master.
>>
>> Hi Nils,
>>
>> I was thinking (against the too-many-knobs trend) that maybe we should
>> have some sort of ban_queue_limit parameter to avoid piling up too
>> many bans. It would also play nice with your std.ban() initiative
>> where hitting the queue limit could be one of the reasons returned by
>> the function when it fails.
>>
>> The default could be 0 for unlimited, or a fairly large number.
>>
>> Dridi
>>
>> _______________________________________________
>> varnish-misc mailing list
>> varnish-misc@varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
>>
>
>