Hi all,
I'm still investigating issues with one of our varnish instances. We
use varnish as a cache and loadbalancer behind nginx and in front of a
docker platform. We experienced an outage for about 20 minutes as
clients received 503 errors being produced by varnish while the docker
containers responded correct (according to the containers' logs).
Setup is:
[ nginx ==> varnish ] ==> [ docker swarm (4 hosts, lots of containers) ]
Sites are distinguished by the exposed ports of the respective swarm
services. Mapping site to service is done with a director containing
the 4 hosts and the respective service port as backends.
By comparing nginx logs with container logs we could confirm varnish
being the culprit. It seemed like the backend request succeeds, but
varnish returns a 503 error anyway.
To investigate further, I activated some logging, which revealed some
concerning information. Apparently varnish sometimes has problems with
the storage, as the "FetchError" says "Could not get storage".
```
* << BeReq >> 70780723
- Begin bereq 70780722 pass
[...]
- Storage malloc Transient
- Fetch_Body 2 chunked -
- FetchError Could not get storage
```
I have attached two complete log examples to this mail.
I did some extensive searching including the varnish book and stuff but
so far did not come up with an explanation. Can anyone help understand
why this happens and how to avoid it?
Here are some additional information about our varnish instance:
- Debian buster
- system: HP DL360p G8, 32G RAM, Intel Xeon E5-2630
- varnish 6.6.0-1~buster (using the varnish repos)
- varnish start options:
```
ExecStart=/usr/sbin/varnishd -a :6081 \
-T :6082 \
-f /etc/varnish/default.vcl \
-p ping_interval=6 -p cli_timeout=10 -p pipe_timeout=600 \
-p listen_depth=4096 -p thread_pool_min=200
-p thread_pool_max=500 -p workspace_client=128k
-p nuke_limit=1000 -S /etc/varnish/secret \
-s malloc,12G \
-s Transient=malloc,3500M
```
Thanks in advance!
--
Marco Dickert
I'm still investigating issues with one of our varnish instances. We
use varnish as a cache and loadbalancer behind nginx and in front of a
docker platform. We experienced an outage for about 20 minutes as
clients received 503 errors being produced by varnish while the docker
containers responded correct (according to the containers' logs).
Setup is:
[ nginx ==> varnish ] ==> [ docker swarm (4 hosts, lots of containers) ]
Sites are distinguished by the exposed ports of the respective swarm
services. Mapping site to service is done with a director containing
the 4 hosts and the respective service port as backends.
By comparing nginx logs with container logs we could confirm varnish
being the culprit. It seemed like the backend request succeeds, but
varnish returns a 503 error anyway.
To investigate further, I activated some logging, which revealed some
concerning information. Apparently varnish sometimes has problems with
the storage, as the "FetchError" says "Could not get storage".
```
* << BeReq >> 70780723
- Begin bereq 70780722 pass
[...]
- Storage malloc Transient
- Fetch_Body 2 chunked -
- FetchError Could not get storage
```
I have attached two complete log examples to this mail.
I did some extensive searching including the varnish book and stuff but
so far did not come up with an explanation. Can anyone help understand
why this happens and how to avoid it?
Here are some additional information about our varnish instance:
- Debian buster
- system: HP DL360p G8, 32G RAM, Intel Xeon E5-2630
- varnish 6.6.0-1~buster (using the varnish repos)
- varnish start options:
```
ExecStart=/usr/sbin/varnishd -a :6081 \
-T :6082 \
-f /etc/varnish/default.vcl \
-p ping_interval=6 -p cli_timeout=10 -p pipe_timeout=600 \
-p listen_depth=4096 -p thread_pool_min=200
-p thread_pool_max=500 -p workspace_client=128k
-p nuke_limit=1000 -S /etc/varnish/secret \
-s malloc,12G \
-s Transient=malloc,3500M
```
Thanks in advance!
--
Marco Dickert