Mailing List Archive

Varnish memory leak?
I'm running Varnish 1.1.1 on Centos 5 x86_64. varnishd has been running
for about 20 days, but now the machine is using about 100% of the CPU,
and most of it is in iowait because 100% of the swap is being used by
varnishd (this is happening on about 8 different cache servers).



Is this a known issue? Or am I doing something wrong (with the command
line arguments or whatnot)?



Here is the relevant information:



Command line arguments:

/usr/sbin/varnishd -a :80 -f /etc/varnish/photo.vcl -T 127.0.0.1:6082 -t
120 -w 10,1000,120 -s file,/c01/varnish/varnish_storage.bin ,40% -u
varnish -g varnish -P /var/run/varnish.pid



df of /c01:

/dev/sdb1 460G 102G 334G 24% /c01 (it's a raid 10 array
of 14 73gb disks)



uname -a:

Linux <hostname> 2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 EDT 2007
x86_64 x86_64 x86_64 GNU/Linux



Output of vanishstat -1:

client_conn 30637407 17.88 Client connections accepted

client_req 69880298 40.78 Client requests received

cache_hit 65771178 38.38 Cache hits

cache_hitpass 82 0.00 Cache hits for pass

cache_miss 4107980 2.40 Cache misses

backend_conn 4108677 2.40 Backend connections success

backend_fail 0 0.00 Backend connections failures

backend_reuse 3771264 2.20 Backend connections reuses

backend_recycle 3920436 2.29 Backend connections recycles

backend_unused 16 0.00 Backend connections unused

n_srcaddr 1471 . N struct srcaddr

n_srcaddr_act 99 . N active struct srcaddr

n_sess_mem 361 . N struct sess_mem

n_sess 129 . N struct sess

n_object 4171343 . N struct object

n_objecthead 4171347 . N struct objecthead

n_smf 4107787 . N struct smf

n_smf_frag 0 . N small free smf

n_smf_large 1 . N large free smf

n_vbe_conn 145 . N struct vbe_conn

n_wrk 56 . N worker threads

n_wrk_create 70814 0.04 N worker threads created

n_wrk_failed 0 0.00 N worker threads not created

n_wrk_max 0 0.00 N worker threads limited

n_wrk_queue 0 0.00 N queued work requests

n_wrk_overflow 70814 0.04 N overflowed work requests

n_wrk_drop 0 0.00 N dropped work requests

n_expired 195 . N expired objects

n_deathrow 0 . N objects on deathrow

losthdr 0 0.00 HTTP header overflows

n_objsendfile 0 0.00 Objects sent with sendfile

n_objwrite 69142017 40.35 Objects sent with write

s_sess 30637382 17.88 Total Sessions

s_req 69880318 40.78 Total Requests

s_pipe 41 0.00 Total pipe

s_pass 0 0.00 Total pass

s_fetch 4108636 2.40 Total fetch

s_hdrbytes 23065938774 13461.07 Total header bytes

s_bodybytes 1500010438157 875392.50 Total body bytes

sess_closed 2794850 1.63 Session Closed

sess_pipeline 85042 0.05 Session Pipeline

sess_readahead 0 0.00 Session Read Ahead

sess_herd 67587880 39.44 Session herd

shm_records 3193762280 1863.85 SHM records

shm_writes 401219061 234.15 SHM writes

shm_cont 97272 0.06 SHM MTX contention

sm_nreq 4109108 2.40 allocator requests

sm_nobj 4107786 . outstanding allocations

sm_balloc 51938721792 . bytes allocated

sm_bfree 57256402944 . bytes free

backend_req 4108636 2.40 Backend requests made



VCL Code:

backend default {

set backend.host = "127.0.0.1";

set backend.port = "81";

}



sub vcl_recv {

if (req.request == "GET" && req.http.cookie) {

lookup;

}

}



sub vcl_fetch {

if (obj.ttl < 3600s) {

set obj.ttl = 3600s;

}

}



sub vcl_miss {

if (req.url ~
"^/g/([a-f0-9])([a-f0-9])([a-f0-9]{30})\.([a-z0-9]{3,4})$") {

set bereq.url = regsub(req.url,
"^/g/([a-f0-9])([a-f0-9])([a-f0-9]{30})\.([a-z0-9]{3,4})$",
"/$1/$2/$1$2$3.$4");

fetch;

}



if (req.url ~ "^/g/([a-f0-9])([a-f0-9])([a-f0-9]{30})$") {

set bereq.url = regsub(req.url,
"^/g/([a-f0-9])([a-f0-9])([a-f0-9]{30})$", "/$1/$2/$1$2$3");

fetch;

}



if (req.url ~
"^/g/i/([a-f0-9])([a-f0-9])([a-f0-9]{30})\.([a-z0-9]{3,4})$") {

set bereq.url = regsub(req.url,
"^/g/i/([a-f0-9])([a-f0-9])([a-f0-9]{30})\.([a-z0-9]{3,4})$",
"/fullsize/$1/$2/$1$2$3.$4");

fetch;

}



if (req.url ~ "^/g/i/([a-f0-9])([a-f0-9])([a-f0-9]{30})$") {

set bereq.url = regsub(req.url,
"^/g/i/([a-f0-9])([a-f0-9])([a-f0-9]{30})$", "/fullsize/$1/$2/$1$2$3");

fetch;

}



if (req.url ~ "^/g/v/([a-f0-9])([a-f0-9])([a-f0-9]{30})\.jpg$")
{

set bereq.url = regsub(req.url,
"^/g/v/([a-f0-9])([a-f0-9])([a-f0-9]{30})\.jpg$",
"/video/$1/$2/$1$2$3/thumb_hi.jpg");

fetch;

}



if (req.url ~
"^/g/v/([a-f0-9])([a-f0-9])([a-f0-9]{30})_([0-9]{5})\.jpg$") {

set bereq.url = regsub(req.url,
"^/g/v/([a-f0-9])([a-f0-9])([a-f0-9]{30})_([0-9]{5})\.jpg$",
"/video/$1/$2/$1$2$3/frames/frame$4.jpg");

fetch;

}



set bereq.url = "/404.html";

fetch;

}



Thanks,

Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.linpro.no/pipermail/varnish-misc/attachments/20070918/c37949e6/attachment.htm
Varnish memory leak? [ In reply to ]
In message <0A3A6EA86530E64EA3BEA603EC4401CAF07B20 at dexbe014-8.exch014.msoutlookonline.net>, "Andrew Knapp" writes:

>I'm running Varnish 1.1.1 on Centos 5 x86_64. varnishd has been running
>for about 20 days, but now the machine is using about 100% of the CPU,
>and most of it is in iowait because 100% of the swap is being used by
>varnishd (this is happening on about 8 different cache servers).

As far as I can see, you have just cached more stuff than your RAM will
take and now the VM system moves stuff in and out of the paging area.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Varnish memory leak? [ In reply to ]
> -----Original Message-----
> From: phk at critter.freebsd.dk [mailto:phk at critter.freebsd.dk] On Behalf
> Of Poul-Henning Kamp
> Sent: Wednesday, September 19, 2007 11:31 PM
> To: Andrew Knapp
> Cc: varnish-misc at projects.linpro.no
> Subject: Re: Varnish memory leak?
>
> In message <0A3A6EA86530E64EA3BEA603EC4401CAF07B20 at dexbe014-
> 8.exch014.msoutlookonline.net>, "Andrew Knapp" writes:
>
> >I'm running Varnish 1.1.1 on Centos 5 x86_64. varnishd has been
> running
> >for about 20 days, but now the machine is using about 100% of the
CPU,
> >and most of it is in iowait because 100% of the swap is being used by
> >varnishd (this is happening on about 8 different cache servers).
>
> As far as I can see, you have just cached more stuff than your RAM
will
> take and now the VM system moves stuff in and out of the paging area.
>

I guess I don't understand this. These servers have 8GB of RAM in them,
so why doesn't it fill up then RAM and then move onto filling up the
disk storage, instead of swap space?

-Andy