Mailing List Archive

strange temporary varnish outage
Hello,

we're using varnish v5 (debian stretch) for image caching; yesterday
there was a strange outage where i'm somehow unable to find the reason
as there are almost no log entries, besides one:

Feb 17 09:03:47 rowlf kernel: [1047133.190149] cgroup: fork rejected
by pids controller in /system.slice/varnish.service

But the problems started a couple of minutes before that, so this
message simply could be a result of previous problems. Some munin
graphs:

Backend traffic: strange spike in backend connection retry/success,
decrease in recycle/reuse:
https://abload.de/img/varnish_backend_traffqwj74.png

Expunge: a similar spike in "Number of expired objects"
https://abload.de/img/varnish_expunge-day5kk0l.png

Threads: threads went up at that time; was lower before (restart was
done on Feb 14th), and suddenly went up.
day: https://abload.de/img/varnish_threads-dayzoken.png
week: https://abload.de/img/varnish_threads-week7qjoo.png
Backend graph: https://abload.de/img/nginx_status-day54jkd.png

/etc/systemd/system/varnish.service : https://pastebin.com/aAhMHn4p
Here's the (shortened) vcl file: https://pastebin.com/nVu5vVaa

Anyone has an idea how to dig into this? Something horribly wrong in
the vcl file?


Thx,
Hubert
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: strange temporary varnish outage [ In reply to ]
Good morning,
i think we solved the problem: we ran into a systemd limit (4915 tasks):

https://github.com/varnishcache/varnish-cache/issues/2822
https://github.com/varnishcache/pkg-varnish-cache/blob/6c90eb775857573564dc1fe38424267143bb6b34/systemd/varnish.service#L19

It seems we hit that limit; i updated the (loooong outdated) v5 to v6
LTS and set TasksMax=infinity. systemctl status varnish.service now
shows: Tasks: 7136 - so, yeah, solved :-) Thx for reading ;-)

Hubert

Am Mo., 18. Feb. 2019 um 10:58 Uhr schrieb Hu Bert <revirii@googlemail.com>:
>
> Hello,
>
> we're using varnish v5 (debian stretch) for image caching; yesterday
> there was a strange outage where i'm somehow unable to find the reason
> as there are almost no log entries, besides one:
>
> Feb 17 09:03:47 rowlf kernel: [1047133.190149] cgroup: fork rejected
> by pids controller in /system.slice/varnish.service
>
> But the problems started a couple of minutes before that, so this
> message simply could be a result of previous problems. Some munin
> graphs:
>
> Backend traffic: strange spike in backend connection retry/success,
> decrease in recycle/reuse:
> https://abload.de/img/varnish_backend_traffqwj74.png
>
> Expunge: a similar spike in "Number of expired objects"
> https://abload.de/img/varnish_expunge-day5kk0l.png
>
> Threads: threads went up at that time; was lower before (restart was
> done on Feb 14th), and suddenly went up.
> day: https://abload.de/img/varnish_threads-dayzoken.png
> week: https://abload.de/img/varnish_threads-week7qjoo.png
> Backend graph: https://abload.de/img/nginx_status-day54jkd.png
>
> /etc/systemd/system/varnish.service : https://pastebin.com/aAhMHn4p
> Here's the (shortened) vcl file: https://pastebin.com/nVu5vVaa
>
> Anyone has an idea how to dig into this? Something horribly wrong in
> the vcl file?
>
>
> Thx,
> Hubert
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
Re: strange temporary varnish outage [ In reply to ]
On Tue, Feb 19, 2019 at 9:03 AM Hu Bert <revirii@googlemail.com> wrote:
>
> Good morning,
> i think we solved the problem: we ran into a systemd limit (4915 tasks):
>
> https://github.com/varnishcache/varnish-cache/issues/2822
> https://github.com/varnishcache/pkg-varnish-cache/blob/6c90eb775857573564dc1fe38424267143bb6b34/systemd/varnish.service#L19
>
> It seems we hit that limit; i updated the (loooong outdated) v5 to v6
> LTS and set TasksMax=infinity. systemctl status varnish.service now
> shows: Tasks: 7136 - so, yeah, solved :-) Thx for reading ;-)

Happy to see that moving to 6.0 solved the problem!

> Hubert
>
> Am Mo., 18. Feb. 2019 um 10:58 Uhr schrieb Hu Bert <revirii@googlemail.com>:
> >
> > Hello,
> >
> > we're using varnish v5 (debian stretch) for image caching; yesterday
> > there was a strange outage where i'm somehow unable to find the reason
> > as there are almost no log entries, besides one:
> >
> > Feb 17 09:03:47 rowlf kernel: [1047133.190149] cgroup: fork rejected
> > by pids controller in /system.slice/varnish.service
> >
> > But the problems started a couple of minutes before that, so this
> > message simply could be a result of previous problems. Some munin
> > graphs:
> >
> > Backend traffic: strange spike in backend connection retry/success,
> > decrease in recycle/reuse:
> > https://abload.de/img/varnish_backend_traffqwj74.png
> >
> > Expunge: a similar spike in "Number of expired objects"
> > https://abload.de/img/varnish_expunge-day5kk0l.png
> >
> > Threads: threads went up at that time; was lower before (restart was
> > done on Feb 14th), and suddenly went up.
> > day: https://abload.de/img/varnish_threads-dayzoken.png
> > week: https://abload.de/img/varnish_threads-week7qjoo.png
> > Backend graph: https://abload.de/img/nginx_status-day54jkd.png
> >
> > /etc/systemd/system/varnish.service : https://pastebin.com/aAhMHn4p
> > Here's the (shortened) vcl file: https://pastebin.com/nVu5vVaa
> >
> > Anyone has an idea how to dig into this? Something horribly wrong in
> > the vcl file?
> >
> >
> > Thx,
> > Hubert
> _______________________________________________
> varnish-misc mailing list
> varnish-misc@varnish-cache.org
> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
_______________________________________________
varnish-misc mailing list
varnish-misc@varnish-cache.org
https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc