Mailing List Archive

Varnish crash (SIGABRT) about every 10 mins
Our server will crash (SIGABRT, signal 6) every 10 mins (approximately).
We tried to use "-d" to see error log, but the server hangs and won't
listen port 80.

It's running FreeBSD 6.2-STABLE (amd64) with ports/www/varnisa.
(varnish-1.1.1_2)

dmesg and sysctl shows:

real memory = 5100273664 (4864 MB)
avail memory = 4126154752 (3935 MB)

vm.kmem_size_scale: 3
vm.kmem_size_max: 419430400
vm.kmem_size: 419430400
hw.physmem: 4286906368
hw.usermem: 3878264832
hw.realmem: 5100273664

But the real memory is 4096MB not 4864MB, it might be the problem, but
I cannot make sure about this.

And, we modified some variables in sysctl:

kern.ipc.maxsockets: 262144
kern.maxfiles: 1048576
kern.maxfilesperproc: 524288
kern.threads.max_groups_per_proc: 8192
kern.threads.max_threads_per_proc: 8192

The running command is:

sudo /usr/bin/env -i /usr/local/sbin/varnishd -a 60.199.247.x:80 -f /usr/local/etc/varnish/image.vcl -h classic,1048583 -P /var/run/varnishd.pid -s file,/home/service/varnish-cache.mmap,32G -T 127.0.0.1:11957 -t 7200 -w 4096

And image.vcl is:

#
# haproxy - default
backend default {
set backend.host = "127.0.0.1";
set backend.port = "8080";
}

# vip - quota
backend vip {
set backend.host = "10.1.2.3";
set backend.port = "80";
}

sub vcl_recv {
if (!req.http.Referer || req.http.Referer ~ "^https?://.*\.pixnet\.net(/|$)") {
lookup;
}

set req.backend = vip;
pipe;
}

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
On Wed, Nov 07, 2007 at 02:16:02PM +0800, Gea-Suan Lin wrote:
> The running command is:
>
> sudo /usr/bin/env -i /usr/local/sbin/varnishd -a 60.199.247.x:80 -f /usr/local/etc/varnish/image.vcl -h classic,1048583 -P /var/run/varnishd.pid -s file,/home/service/varnish-cache.mmap,32G -T 127.0.0.1:11957 -t 7200 -w 4096

btw, I've tried another command (just without -t 7200, to use default
value 120):

sudo /usr/bin/env -i /usr/local/sbin/varnishd -a 60.199.247.x:80 -f /usr/local/etc/varnish/image.vcl -h classic,1048583 -P /var/run/varnishd.pid -s file,/home/service/varnish-cache.mmap,32G -T 127.0.0.1:11957 -w 4096

And this one works fine. (but lower hitrate)

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
Gea-Suan Lin <gslin at gslin.org> writes:
> Our server will crash (SIGABRT, signal 6) every 10 mins (approximately).
> We tried to use "-d" to see error log, but the server hangs and won't
> listen port 80.

When running in debug mode, you need to type 'start' to start
accepting connections.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
Hello,

This is "-t 600" case:

# sudo /usr/bin/env -i /usr/local/sbin/varnishd -a [our_ip_address]:80 -f /usr/local/etc/varnish/image.vcl -h classic,1048583 -P /var/run/varnishd.pid -s file,/home/service/varnish-cache.mmap,32G -T 127.0.0.1:11957 -t 600 -w 4096 -d -d
file /home/service/varnish-cache.mmap size 34359738368 bytes (16777216 fs-blocks, 8388608 pages)
Classic hash: 1048583 buckets
Using old SHMFILE
rolling(1)...
rolling(2)...
start
start child pid 9802
200 0

Child said (2, 9802): <<Child starts
managed to mmap 34359738368 bytes of 34359738368
Ready
CLI ready
>>
Child said (2, 9802): <<socktest: linger=0 sndtimeo=1 rcvtimeo=1
>>
Child said (2, 9802): <<socktest: linger=0 sndtimeo=1 rcvtimeo=1
>>
Child said (2, 9802): <<Assert error in SES_Delete(), cache_session.c line 339:
Condition((sp->obj) == 0) not true.
errno = 0 (Unknown error: 0)
>>
Cache child died pid=9802 status=0x6
Clean child
Child cleaned
start child pid 10394
Child said (2, 10394): <<Child starts
managed to mmap 34359738368 bytes of 34359738368
Ready
CLI ready
>>
Child said (2, 10394): <<socktest: linger=0 sndtimeo=0 rcvtimeo=0
>>
Child said (2, 10394): <<Assert error in SES_Delete(), cache_session.c line 339:
Condition((sp->obj) == 0) not true.
errno = 0 (Unknown error: 0)
>>
Child not responding to ping
Cache child died pid=10394 status=0x6
Clean child
Child cleaned
start child pid 11575
Child said (2, 11575): <<Child starts
managed to mmap 34359738368 bytes of 34359738368
Ready
CLI ready
>>
Child said (2, 11575): <<socktest: linger=0 sndtimeo=0 rcvtimeo=0
>>
Child said (2, 11575): <<socktest: linger=0 sndtimeo=0 rcvtimeo=0
>>
Child not responding to ping
Child not responding to ping
Child not responding to ping
Cache child died pid=11575 status=0x9
Clean child
Child cleaned
start child pid 11576
Pushing vcls failed:
CLI communication error
Child said (1, 11576): <<Child starts
>>
unlink ./bin.ffNEQwzp
gslin at testphp [~] (18:54)

On Wed, Nov 07, 2007 at 11:52:32AM +0100, Dag-Erling Sm?grav wrote:
> Gea-Suan Lin <gslin at gslin.org> writes:
> > Our server will crash (SIGABRT, signal 6) every 10 mins (approximately).
> > We tried to use "-d" to see error log, but the server hangs and won't
> > listen port 80.
>
> When running in debug mode, you need to type 'start' to start
> accepting connections.
>
> DES
> --
> Dag-Erling Sm?grav
> Senior Software Developer
> Linpro AS - www.linpro.no

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
Gea-Suan Lin <gslin at gslin.org> writes:
> Child said (2, 9802): <<Assert error in SES_Delete(), cache_session.c line 339:
> Condition((sp->obj) == 0) not true.
> errno = 0 (Unknown error: 0)

See ticket #162, we're working on it.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
On Thu, Nov 08, 2007 at 12:19:59PM +0100, Dag-Erling Sm?grav wrote:
> Gea-Suan Lin <gslin at gslin.org> writes:
> > Child said (2, 9802): <<Assert error in SES_Delete(), cache_session.c line 339:
> > Condition((sp->obj) == 0) not true.
> > errno = 0 (Unknown error: 0)
>
> See ticket #162, we're working on it.

Okay, so I'll upgrade to trunk and try again.

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
Gea-Suan Lin <gslin at gslin.org> writes:
> On Thu, Nov 08, 2007 at 12:19:59PM +0100, Dag-Erling Sm?rgrav wrote:
> > Gea-Suan Lin <gslin at gslin.org> writes:
> > > Child said (2, 9802): <<Assert error in SES_Delete(), cache_session.c line 339:
> > > Condition((sp->obj) == 0) not true.
> > > errno = 0 (Unknown error: 0)
> > See ticket #162, we're working on it.
> Okay, so I'll upgrade to trunk and try again.

We haven't fixed it yet, so upgrading to trunk won't help.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
On Thu, Nov 08, 2007 at 01:57:36PM +0100, Dag-Erling Sm?grav wrote:
> Gea-Suan Lin <gslin at gslin.org> writes:
> > On Thu, Nov 08, 2007 at 12:19:59PM +0100, Dag-Erling Sm???rgrav wrote:
> > > Gea-Suan Lin <gslin at gslin.org> writes:
> > > > Child said (2, 9802): <<Assert error in SES_Delete(), cache_session.c line 339:
> > > > Condition((sp->obj) == 0) not true.
> > > > errno = 0 (Unknown error: 0)
> > > See ticket #162, we're working on it.
> > Okay, so I'll upgrade to trunk and try again.
>
> We haven't fixed it yet, so upgrading to trunk won't help.

Is that ok to running 32 varnishd and use HAProxy to redirect to them ?
We found they will run out of memory.

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
Gea-Suan Lin <gslin at gslin.org> writes:
> Is that ok to running 32 varnishd and use HAProxy to redirect to them ?

32 Varnish instances? Sounds like a lot. What kind of site is it?

> We found they will run out of memory.

Hmm, you must be running 1.1; try switching to branches/1.2. It
shouldn't run out of memory.

Here's a patch I think should help with the SES_Delete() issue, BTW.
I will commit it (or something similar) once I've confirmed that it
actually fixes the bug.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cache_acceptor_kqueue.diff
Url: http://projects.linpro.no/pipermail/varnish-misc/attachments/20071108/5c35e59c/attachment.txt
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
On Thu, Nov 08, 2007 at 05:39:45PM +0100, Dag-Erling Sm?grav wrote:
> Gea-Suan Lin <gslin at gslin.org> writes:
> > Is that ok to running 32 varnishd and use HAProxy to redirect to them ?
>
> 32 Varnish instances? Sounds like a lot. What kind of site is it?

We're trying to serve images since the backend's I/O is not fast enough,
so we need cache server to cache them.

The original idea is use Varnish with "-s file,filename,100GB" to cache
them, but varnish abort in SES_Delete. So the alternative is to run
lots of varnishd and use HAProxy to dispatch the request.

> > We found they will run out of memory.
>
> Hmm, you must be running 1.1; try switching to branches/1.2. It
> shouldn't run out of memory.
>
> Here's a patch I think should help with the SES_Delete() issue, BTW.
> I will commit it (or something similar) once I've confirmed that it
> actually fixes the bug.

Yes we run varnish-1.1 (install from ports), I'll try 1.2 with your
patch. Thank you in advance.

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
On Thu, Nov 08, 2007 at 05:39:45PM +0100, Dag-Erling Sm?grav wrote:
> Gea-Suan Lin <gslin at gslin.org> writes:
> > We found they will run out of memory.
>
> Hmm, you must be running 1.1; try switching to branches/1.2. It
> shouldn't run out of memory.
>
> Here's a patch I think should help with the SES_Delete() issue, BTW.
> I will commit it (or something similar) once I've confirmed that it
> actually fixes the bug.

I checkout the latest version in trunk (r2231), apply the patch, and
find no object being cached. (telnet to management port and use "stats"
to check)

Now I run it without the patch and the server just crash and no
information to debug:

Child not responding to ping
Child not responding to ping
Child not responding to ping
Cache child died pid=49696 status=0x9
Clean child
Child cleaned
start child pid 49747
Child said (2, 49747): <<Child starts
sizeof(struct ws) = 48
sizeof(struct http) = 584
sizeof(struct http_conn) = 48
sizeof(struct acct) = 64
sizeof(struct worker) = 1232
sizeof(struct workreq) = 24
sizeof(struct bereq) = 656
sizeof(struct storage) = 72
sizeof(struct object) = 824
sizeof(struct objhead) = 56
sizeof(struct sess) = 448
sizeof(struct vbe_conn) = 48
sizeof(struct backend) = 88
managed to mmap 34359738368 bytes of 34359738368
Ready
CLI ready
>>

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
Gea-Suan Lin <gslin at gslin.org> writes:
> I checkout the latest version in trunk (r2231), apply the patch, and
> find no object being cached. (telnet to management port and use "stats"
> to check)

I'm not psychic, you'll have to actually *show me* the stats.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
On Thu, Nov 08, 2007 at 08:40:23PM +0100, Dag-Erling Sm?grav wrote:
> Gea-Suan Lin <gslin at gslin.org> writes:
> > I checkout the latest version in trunk (r2231), apply the patch, and
> > find no object being cached. (telnet to management port and use "stats"
> > to check)
>
> I'm not psychic, you'll have to actually *show me* the stats.

When I want to try it again, I found it might be my fault :-)

http://varnish.projects.linpro.no/attachment/ticket/162/cache_acceptor_kqueue.diff

It should be patched to:

AZ(kevent(kq, ki, nki, NULL, 0, NULL));

But I patched to:

AZ(kevent(kq, ki, nki, NULL, 0, NULL) <= 0);

I'll try it again.

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
Gea-Suan Lin <gslin at gslin.org> writes:
> When I want to try it again, I found it might be my fault :-)
>
> http://varnish.projects.linpro.no/attachment/ticket/162/cache_acceptor_kqueue.diff
>
> It should be patched to:
>
> AZ(kevent(kq, ki, nki, NULL, 0, NULL));
>
> But I patched to:
>
> AZ(kevent(kq, ki, nki, NULL, 0, NULL) <= 0);
>
> I'll try it again.

The correct way to apply a patch is with /usr/bin/patch.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
On Fri, Nov 09, 2007 at 03:44:57AM +0800, Gea-Suan Lin wrote:
> On Thu, Nov 08, 2007 at 08:40:23PM +0100, Dag-Erling Sm?grav wrote:
> > Gea-Suan Lin <gslin at gslin.org> writes:
> > > I checkout the latest version in trunk (r2231), apply the patch, and
> > > find no object being cached. (telnet to management port and use "stats"
> > > to check)
> >
> > I'm not psychic, you'll have to actually *show me* the stats.
>
> When I want to try it again, I found it might be my fault :-)
>
> http://varnish.projects.linpro.no/attachment/ticket/162/cache_acceptor_kqueue.diff
>
> It should be patched to:
>
> AZ(kevent(kq, ki, nki, NULL, 0, NULL));
>
> But I patched to:
>
> AZ(kevent(kq, ki, nki, NULL, 0, NULL) <= 0);
>
> I'll try it again.

It crash again:

gslin at testphp [~] (3:45) sudo /usr/bin/env -i /home/service/varnish/sbin/varnishd -a [our ip address]:80 -f /usr/local/etc/varnish/image.vcl -h classic,1048583 -P /var/run/varnishd.pid -s file,/home/service/varnish-cache.mmap,32G -T 127.0.0.1:11957 -t 604800 -w 32,4096 -d -d
storage_file: filename: /home/service/varnish-cache.mmap size 32768 MegaBytes.
Classic hash: 1048583 buckets
Using old SHMFILE
rolling(1)...
rolling(2)...
start
start child pid 57305
200 0

Child said (2, 57305): <<Child starts
sizeof(struct ws) = 48
sizeof(struct http) = 584
sizeof(struct http_conn) = 48
sizeof(struct acct) = 64
sizeof(struct worker) = 1232
sizeof(struct workreq) = 24
sizeof(struct bereq) = 656
sizeof(struct storage) = 72
sizeof(struct object) = 824
sizeof(struct objhead) = 56
sizeof(struct sess) = 448
sizeof(struct vbe_conn) = 48
sizeof(struct backend) = 88
managed to mmap 34359738368 bytes of 34359738368
Ready
CLI ready
>>
Child not responding to ping
Child not responding to ping
Child not responding to ping
Cache child died pid=57305 status=0x9
Clean child
Child cleaned
start child pid 58543
Child said (2, 58543): <<Child starts
sizeof(struct ws) = 48
sizeof(struct http) = 584
sizeof(struct http_conn) = 48
sizeof(struct acct) = 64
sizeof(struct worker) = 1232
sizeof(struct workreq) = 24
sizeof(struct bereq) = 656
sizeof(struct storage) = 72
sizeof(struct object) = 824
sizeof(struct objhead) = 56
sizeof(struct sess) = 448
sizeof(struct vbe_conn) = 48
sizeof(struct backend) = 88
managed to mmap 34359738368 bytes of 34359738368
Ready
CLI ready
>>
Child not responding to ping
Child not responding to ping
Cache child died pid=58543 status=0x9
Clean child
Child cleaned
start child pid 58544
Child said (2, 58544): <<Child starts
sizeof(struct ws) = 48
sizeof(struct http) = 584
sizeof(struct http_conn) = 48
sizeof(struct acct) = 64
sizeof(struct worker) = 1232
sizeof(struct workreq) = 24
sizeof(struct bereq) = 656
sizeof(struct storage) = 72
sizeof(struct object) = 824
sizeof(struct objhead) = 56
sizeof(struct sess) = 448
sizeof(struct vbe_conn) = 48
sizeof(struct backend) = 88
managed to mmap 34359738368 bytes of 34359738368
Ready
CLI ready
>>

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
On Thu, Nov 08, 2007 at 09:16:22PM +0100, Dag-Erling Sm?grav wrote:
> The correct way to apply a patch is with /usr/bin/patch.

Yes I do patch using system's patch this time.

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
Gea-Suan Lin <gslin at gslin.org> writes:
> It crash again:

Actually, it doesn't crash; the child get stuck somewhere and the
parent kills it.

I'd like to see an 'svn diff' of your tree.

Also, if you change SIGKILL in bin/varnishd/mgt_child.c to either
SIGABRT or SIGQUIT, you should get a core dump of the child process
when the parent kills it. If you don't, 'ulimit -c unlimited' and try
again.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
On Thu, Nov 08, 2007 at 10:21:22PM +0100, Dag-Erling Sm?grav wrote:
> Gea-Suan Lin <gslin at gslin.org> writes:
> > It crash again:
>
> Actually, it doesn't crash; the child get stuck somewhere and the
> parent kills it.
>
> I'd like to see an 'svn diff' of your tree.
>
> Also, if you change SIGKILL in bin/varnishd/mgt_child.c to either
> SIGABRT or SIGQUIT, you should get a core dump of the child process
> when the parent kills it. If you don't, 'ulimit -c unlimited' and try
> again.
>
> DES

gslin at testphp [/tmp] (5:23) svn export ~/svn/varnish/varnish-cache varnish-cache.clean
gslin at testphp [/tmp] (5:23) diff -ruN varnish-cache.clean varnish-cache > varnish-cache.diff

This file is in http://mail.pixnet.tw/~gslin/varnish-cache.diff.gz

gslin at testphp [/tmp] (5:26) cd varnish-cache.clean/
gslin at testphp [/tmp/varnish-cache.clean] (5:26) ./autogen.sh
+ aclocal
+ libtoolize --copy --force
+ autoheader
+ automake --add-missing --copy --foreign
configure.ac: installing `./install-sh'
configure.ac: installing `./missing'
bin/varnishadm/Makefile.am: installing `./compile'
bin/varnishadm/Makefile.am: installing `./depcomp'
+ autoconf
gslin at testphp [/tmp/varnish-cache.clean] (5:26) cd ..
gslin at testphp [/tmp] (5:26) diff -ruN varnish-cache.clean varnish-cache > varnish-cache-2.diff

This file is in http://mail.pixnet.tw/~gslin/varnish-cache-2.diff.gz

gslin at testphp [/tmp/varnish-cache.clean] (5:29) ./configure --prefix=/home/service/varnish
(lots of msgs)
gslin at testphp [/tmp] (5:30) diff -ruN varnish-cache.clean varnish-cache > varnish-cache-3.diff

This file is in http://mail.pixnet.tw/~gslin/varnish-cache-3.diff.gz

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
Gea-Suan Lin <gslin at gslin.org> writes:
> gslin at testphp [/tmp] (5:23) svn export ~/svn/varnish/varnish-cache varnish-cache.clean
> gslin at testphp [/tmp] (5:23) diff -ruN varnish-cache.clean varnish-cache > varnish-cache.diff

This is *really* not useful. Please give me a simple 'svn diff', or
at least drop the -N.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
On Thu, Nov 08, 2007 at 10:37:58PM +0100, Dag-Erling Sm?grav wrote:
> Gea-Suan Lin <gslin at gslin.org> writes:
> > gslin at testphp [/tmp] (5:23) svn export ~/svn/varnish/varnish-cache varnish-cache.clean
> > gslin at testphp [/tmp] (5:23) diff -ruN varnish-cache.clean varnish-cache > varnish-cache.diff
>
> This is *really* not useful. Please give me a simple 'svn diff', or
> at least drop the -N.

I didn't run autogen.sh & configure & make in working repository, but
"svn export varnish-cache /tmp/varnish-cache".

This is diff of bin/varnishd/cache_acceptor_kqueue.c:

gslin at testphp [/tmp/varnish-cache/bin/varnishd] (5:41) diff -ruN ~/svn/varnish/varnish-cache/bin/varnishd/cache_acceptor_kqueue.c cache_acceptor_kqueue.c
--- /net/account/admin/gslin/svn/varnish/varnish-cache/bin/varnishd/cache_acceptor_kqueue.c Fri Nov 9 01:38:12 2007
+++ cache_acceptor_kqueue.c Fri Nov 9 03:43:26 2007
@@ -66,8 +66,8 @@
if (sp->fd < 0)
return;
EV_SET(&ki[nki], sp->fd, EVFILT_READ, arm, 0, 0, sp);
- if (++nki == NKEV) {
- assert(kevent(kq, ki, nki, NULL, 0, NULL) <= 0);
+ if (++nki == NKEV || arm == EV_DELETE) {
+ AZ(kevent(kq, ki, nki, NULL, 0, NULL));
nki = 0;
}
}

An update of patched varnishd: I use gdb to attach it and it looks
fine now, this is varnishstat screenshot, now it can store >5GB.
I'll try to run without gdb later.

0+00:27:23
Hitrate ratio: 10 100 1000
Hitrate avg: 0.0933 0.2071 0.2500

30831 13.42 18.77 Client connections accepted
33441 14.38 20.35 Client requests received
5005 0.96 3.05 Cache hits
0 0.00 0.00 Cache hits for pass
28405 14.38 17.29 Cache misses
28436 14.38 17.31 Backend connections success
0 0.00 0.00 Backend connections failures
0 0.00 0.00 Backend connections reuses
0 0.00 0.00 Backend connections recycles
6 -0.96 0.00 Backend connections unused
24 . . N struct srcaddr
4 . . N active struct srcaddr
23 . . N struct sess_mem
5 . . N struct sess
28415 . . N struct object
28415 . . N struct objecthead
56817 . . N struct smf
0 . . N small free smf
1 . . N large free smf
0 . . N struct vbe_conn
17 . . N worker threads
17 0.00 0.01 N worker threads created
0 0.00 0.00 N worker threads not created
0 0.00 0.00 N worker threads limited
0 0.00 0.00 N queued work requests
17 0.00 0.01 N overflowed work requests
0 0.00 0.00 N dropped work requests
0 . . N expired objects
0 . . N LRU nuked objects
0 . . N LRU saved objects
0 . . N objects on deathrow
0 0.00 0.00 HTTP header overflows
0 0.00 0.00 Objects sent with sendfile
31598 14.38 19.23 Objects sent with write
30829 13.42 18.76 Total Sessions
33440 14.38 20.35 Total Requests
31 0.00 0.02 Total pipe
0 0.00 0.00 Total pass
28404 13.42 17.29 Total fetch
7942018 3464.68 4833.85 Total header bytes
5805632005 1854165.04 3533555.69 Total body bytes
30345 13.42 18.47 Session Closed
0 0.00 0.00 Session Pipeline
0 0.00 0.00 Session Read Ahead
3109 0.96 1.89 Session herd
2058773 926.09 1253.06 SHM records
127012 56.56 77.30 SHM writes
13 0.00 0.01 SHM MTX contention
56816 26.84 34.58 allocator requests
56816 . . outstanding allocations
5530710016 . . bytes allocated
28829028352 . . bytes free
28404 14.38 17.29 Backend requests made

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
Okay it dies, last varnishstat screenshot:

0+00:36:40
Hitrate ratio: 10 100 1000
Hitrate avg: 0.7220 0.2649 0.2517

31612 0.00 14.37 Client connections accepted
34339 0.00 15.61 Client requests received
5134 0.00 2.33 Cache hits
0 0.00 0.00 Cache hits for pass
29165 0.00 13.26 Cache misses
29202 0.00 13.27 Backend connections success
0 0.00 0.00 Backend connections failures
0 0.00 0.00 Backend connections reuses
0 0.00 0.00 Backend connections recycles
6 0.00 0.00 Backend connections unused
25 . . N struct srcaddr
7 . . N active struct srcaddr
23 . . N struct sess_mem
10 . . N struct sess
29176 . . N struct object
29176 . . N struct objecthead
58338 . . N struct smf
0 . . N small free smf
1 . . N large free smf
0 . . N struct vbe_conn
17 . . N worker threads
17 0.00 0.01 N worker threads created
0 0.00 0.00 N worker threads not created
0 0.00 0.00 N worker threads limited
0 0.00 0.00 N queued work requests
17 0.00 0.01 N overflowed work requests
0 0.00 0.00 N dropped work requests
0 . . N expired objects
0 . . N LRU nuked objects
0 . . N LRU saved objects
0 . . N objects on deathrow
0 0.00 0.00 HTTP header overflows
0 0.00 0.00 Objects sent with sendfile
32398 0.00 14.73 Objects sent with write
31606 0.00 14.37 Total Sessions
34335 0.00 15.61 Total Requests
37 0.00 0.02 Total pipe
0 0.00 0.00 Total pass
29164 0.00 13.26 Total fetch
8150499 0.00 3704.77 Total header bytes
5990402809 0.00 2722910.37 Total body bytes
31100 0.00 14.14 Session Closed
0 0.00 0.00 Session Pipeline
0 0.00 0.00 Session Read Ahead
3249 0.00 1.48 Session herd
2113909 0.00 960.87 SHM records
130287 0.00 59.22 SHM writes
13 0.00 0.01 SHM MTX contention
58337 0.00 26.52 allocator requests
58337 . . outstanding allocations
5724512256 . . bytes allocated
28635226112 . . bytes free
29164 0.00 13.26 Backend requests made

and gdb console:

gslin at testphp [~] (5:15) sudo gdb /home/service/varnish/sbin/varnishd 58794
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Attaching to program: /home/service/varnish/sbin/varnishd, process 58794
Error while mapping shared library sections:
./bin.Z0UDwDnk: No such file or directory.
Reading symbols from /home/service/varnish/lib/libvarnish.so.0...done.
Loaded symbols for /home/service/varnish/lib/libvarnish.so.0
Reading symbols from /home/service/varnish/lib/libvarnishcompat.so.0...done.
Loaded symbols for /home/service/varnish/lib/libvarnishcompat.so.0
Reading symbols from /home/service/varnish/lib/libvcl.so.0...done.
Loaded symbols for /home/service/varnish/lib/libvcl.so.0
Reading symbols from /usr/lib/libthr.so.2...done.
[New Thread 0xa67d600 (LWP 100358)]
[New Thread 0xa67d400 (LWP 100334)]
[New Thread 0xa67d200 (LWP 100202)]
[New Thread 0xa67d000 (LWP 100186)]
[New Thread 0x53de00 (LWP 100184)]
[New Thread 0x53da00 (LWP 100092)]
[New Thread 0x53d800 (LWP 100067)]
[New Thread 0x53d600 (LWP 100056)]
[New Thread 0x53d000 (LWP 100235)]
Loaded symbols for /usr/lib/libthr.so.2
Reading symbols from /lib/libm.so.4...done.
Loaded symbols for /lib/libm.so.4
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Error while reading shared library symbols:
./bin.Z0UDwDnk: No such file or directory.
Reading symbols from /libexec/ld-elf.so.1...done.
Loaded symbols for /libexec/ld-elf.so.1
[Switching to Thread 0xa67d600 (LWP 100358)]
0x0000000800c7bf5a in read () from /lib/libc.so.6
(gdb) c
Continuing.
[New Thread 0xa67d800 (LWP 100407)]
[New Thread 0xa67da00 (LWP 100580)]
[New Thread 0xa67dc00 (LWP 100660)]
[New Thread 0xa67de00 (LWP 100706)]
[New Thread 0xad49000 (LWP 100770)]
[New Thread 0xad49200 (LWP 100817)]
[New Thread 0xad49400 (LWP 100879)]
[New Thread 0xad49600 (LWP 100891)]
[New Thread 0xad49800 (LWP 101241)]
[New Thread 0xad49a00 (LWP 101286)]
[New Thread 0xad49c00 (LWP 101292)]
[New Thread 0xad49e00 (LWP 101296)]
[New Thread 0xbb4e000 (LWP 101298)]
[New Thread 0xbb4e200 (LWP 101321)]

Program received signal SIGKILL, Killed.
[Switching to Thread 0xbb4e200 (LWP 101321)]
0x0000000800c1716f in brk () from /lib/libc.so.6
(gdb) bt
#0 0x0000000800c1716f in brk () from /lib/libc.so.6
#1 0x000000000c2f1000 in ?? ()
#2 0x0000000800c165ba in _UTF8_init () from /lib/libc.so.6
#3 0x0000000800c167e8 in _UTF8_init () from /lib/libc.so.6
#4 0x0000000800c170e6 in _UTF8_init () from /lib/libc.so.6
#5 0x0000000800c7966b in calloc () from /lib/libc.so.6
#6 0x0000000000411262 in HSH_Prealloc (sp=0xa6dd008) at cache_hash.c:80
#7 0x0000000000411875 in HSH_Lookup (sp=0xa6dd008) at cache_hash.c:185
#8 0x000000000040e890 in CNT_Session (sp=0xa6dd008) at cache_center.c:534
#9 0x0000000000416209 in wrk_thread (priv=0x53e5e0) at cache_pool.c:193
#10 0x000000080098729e in pthread_create () from /usr/lib/libthr.so.2
#11 0x0000000000000000 in ?? ()
Error accessing memory address 0x7ffffd1ea000: Bad address.
(gdb) up
#1 0x000000000c2f1000 in ?? ()
(gdb) up
#2 0x0000000800c165ba in _UTF8_init () from /lib/libc.so.6
(gdb) up
#3 0x0000000800c167e8 in _UTF8_init () from /lib/libc.so.6
(gdb) up
#4 0x0000000800c170e6 in _UTF8_init () from /lib/libc.so.6
(gdb) up
#5 0x0000000800c7966b in calloc () from /lib/libc.so.6
(gdb) up
#6 0x0000000000411262 in HSH_Prealloc (sp=0xa6dd008) at cache_hash.c:80
80 w->nobjhead = calloc(sizeof *w->nobjhead, 1);
(gdb) l
75 CHECK_OBJ_NOTNULL(sp, SESS_MAGIC);
76 CHECK_OBJ_NOTNULL(sp->wrk, WORKER_MAGIC);
77 w = sp->wrk;
78
79 if (w->nobjhead == NULL) {
80 w->nobjhead = calloc(sizeof *w->nobjhead, 1);
81 XXXAN(w->nobjhead);
82 w->nobjhead->magic = OBJHEAD_MAGIC;
83 VTAILQ_INIT(&w->nobjhead->objects);
84 MTX_INIT(&w->nobjhead->mtx);
(gdb) up
#7 0x0000000000411875 in HSH_Lookup (sp=0xa6dd008) at cache_hash.c:185
185 HSH_Prealloc(sp);
(gdb) l
180 CHECK_OBJ_NOTNULL(sp->http, HTTP_MAGIC);
181 AN(hash);
182 w = sp->wrk;
183 h = sp->http;
184
185 HSH_Prealloc(sp);
186 if (sp->obj != NULL) {
187 CHECK_OBJ_NOTNULL(sp->obj, OBJECT_MAGIC);
188 o = sp->obj;
189 oh = o->objhead;
(gdb) up
#8 0x000000000040e890 in CNT_Session (sp=0xa6dd008) at cache_center.c:534
534 o = HSH_Lookup(sp);
(gdb) l
529
530 VCL_hash_method(sp);
531 /* XXX check error */
532 }
533
534 o = HSH_Lookup(sp);
535
536 if (o == NULL) {
537 /*
538 * We hit a busy object, disembark worker thread and expect
(gdb) up
#9 0x0000000000416209 in wrk_thread (priv=0x53e5e0) at cache_pool.c:193
193 CNT_Session(wrq->sess);
(gdb) l
188 CHECK_OBJ_NOTNULL(wrq->sess, SESS_MAGIC);
189 wrq->sess->wrk = w;
190 CHECK_OBJ_ORNULL(w->nobj, OBJECT_MAGIC);
191 CHECK_OBJ_ORNULL(w->nobjhead, OBJHEAD_MAGIC);
192 w->used = NAN;
193 CNT_Session(wrq->sess);
194 assert(!isnan(w->used));
195 CHECK_OBJ_ORNULL(w->nobj, OBJECT_MAGIC);
196 CHECK_OBJ_ORNULL(w->nobjhead, OBJHEAD_MAGIC);
197 w->wrq = NULL;
(gdb) up
#10 0x000000080098729e in pthread_create () from /usr/lib/libthr.so.2
(gdb) up
#11 0x0000000000000000 in ?? ()
(gdb) up
Error accessing memory address 0x7ffffd1ea000: Bad address.
(gdb)

gslin at testphp [~] (5:14) sudo /usr/bin/env -i /home/service/varnish/sbin/varnishd -a 60.199.247.118:80 -f /usr/local/etc/varn
ish/image.vcl -h classic,1048583 -P /var/run/varnishd.pid -s file,/home/service/varnish-cache.mmap,32G -T 127.0.0.1:11957 -t
604800 -w 32,4096 -d -d
storage_file: filename: /home/service/varnish-cache.mmap size 32768 MegaBytes.
Classic hash: 1048583 buckets
Using old SHMFILE
rolling(1)...
rolling(2)...
start
start child pid 58794
200 0

Child said (2, 58794): <<Child starts
sizeof(struct ws) = 48
sizeof(struct http) = 584
sizeof(struct http_conn) = 48
sizeof(struct acct) = 64
sizeof(struct worker) = 1232
sizeof(struct workreq) = 24
sizeof(struct bereq) = 656
sizeof(struct storage) = 72
sizeof(struct object) = 824
sizeof(struct objhead) = 56
sizeof(struct sess) = 448
sizeof(struct vbe_conn) = 48
sizeof(struct backend) = 88
managed to mmap 34359738368 bytes of 34359738368
Ready
CLI ready
>>

200 0


200 0

Child not responding to ping
Child not responding to ping
Child not responding to ping
Child not responding to ping
Child not responding to ping
Child not responding to ping
Child not responding to ping
Child not responding to ping
Child not responding to ping
Child not responding to ping
Child not responding to ping
Child not responding to ping
(lots of "Child not ...")

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
Gea-Suan Lin <gslin at gslin.org> writes:
> Program received signal SIGKILL, Killed.

Killed by the parent again.

> [Switching to Thread 0xbb4e200 (LWP 101321)]
> 0x0000000800c1716f in brk () from /lib/libc.so.6
> (gdb) bt
> #0 0x0000000800c1716f in brk () from /lib/libc.so.6
> #1 0x000000000c2f1000 in ?? ()
> #2 0x0000000800c165ba in _UTF8_init () from /lib/libc.so.6
> #3 0x0000000800c167e8 in _UTF8_init () from /lib/libc.so.6
> #4 0x0000000800c170e6 in _UTF8_init () from /lib/libc.so.6
> #5 0x0000000800c7966b in calloc () from /lib/libc.so.6
> #6 0x0000000000411262 in HSH_Prealloc (sp=0xa6dd008) at cache_hash.c:80
> #7 0x0000000000411875 in HSH_Lookup (sp=0xa6dd008) at cache_hash.c:185
> #8 0x000000000040e890 in CNT_Session (sp=0xa6dd008) at cache_center.c:534
> #9 0x0000000000416209 in wrk_thread (priv=0x53e5e0) at cache_pool.c:193
> #10 0x000000080098729e in pthread_create () from /usr/lib/libthr.so.2
> #11 0x0000000000000000 in ?? ()
> Error accessing memory address 0x7ffffd1ea000: Bad address.

This stack frame is of no interest, it is just the thread that
happened to be running when the SIGKILL was delivered. What would be
helpful at this point is "i thr".

As previously mentioned, this will be a lot easier if you modify
mgt_child.c to send SIGQUIT instead of SIGKILL so you get a core dump.

BTW, I assume this is all on FreeBSD? Which version?

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
On Thu, Nov 08, 2007 at 10:57:04PM +0100, Dag-Erling Sm?grav wrote:
> This stack frame is of no interest, it is just the thread that
> happened to be running when the SIGKILL was delivered. What would be
> helpful at this point is "i thr".
>
> As previously mentioned, this will be a lot easier if you modify
> mgt_child.c to send SIGQUIT instead of SIGKILL so you get a core dump.

Okay. I'll run the following version:

gslin at testphp [~/svn/varnish] (6:02) svn up
At revision 2245.
gslin at testphp [~/svn/varnish] (6:02) svn diff
Index: varnish-cache/bin/varnishd/mgt_child.c
===================================================================
--- varnish-cache/bin/varnishd/mgt_child.c (revision 2245)
+++ varnish-cache/bin/varnishd/mgt_child.c (working copy)
@@ -114,7 +114,7 @@
return (1);
if (child_pid > 0 && mgt_cli_askchild(NULL, NULL, "ping\n")) {
fprintf(stderr, "Child not responding to ping\n");
- kill(child_pid, SIGKILL);
+ kill(child_pid, SIGQUIT);
}
return (0);
}
gslin at testphp [~/svn/varnish] (6:02) svn export varnish-cache /tmp/varnish-cache
Export complete.
gslin at testphp [/tmp/varnish-cache] (6:03) ./autogen.sh
+ aclocal
+ libtoolize --copy --force
+ autoheader
+ automake --add-missing --copy --foreign
configure.ac: installing `./install-sh'
configure.ac: installing `./missing'
bin/varnishadm/Makefile.am: installing `./compile'
bin/varnishadm/Makefile.am: installing `./depcomp'
+ autoconf
gslin at testphp [/tmp/varnish-cache] (6:03) ./configure --prefix=/home/service/varnish
(lots of msgs)
gslin at testphp [/tmp/varnish-cache] (6:04) make
(lots of msgs)
gslin at testphp [/tmp/varnish-cache] (6:05) sudo make install clean

And,

gslin at testphp [/home/service/varnish/sbin] (6:06) ldd varnishd
varnishd:
libvarnish.so.0 => /home/service/varnish/lib/libvarnish.so.0 (0x800664000)
libvarnishcompat.so.0 => /home/service/varnish/lib/libvarnishcompat.so.0 (0x80076b000)
libvcl.so.0 => /home/service/varnish/lib/libvcl.so.0 (0x80086c000)
libthr.so.2 => /usr/lib/libthr.so.2 (0x80097e000)
libm.so.4 => /lib/libm.so.4 (0x800a94000)
libc.so.6 => /lib/libc.so.6 (0x800bb0000)

> BTW, I assume this is all on FreeBSD? Which version?

gslin at testphp [~] (6:05) uname -a
FreeBSD testphp.pixnet.tw 6.2-STABLE FreeBSD 6.2-STABLE #2: Mon Mar 26 12:20:42 CST 2007 root at tmp1.pixnet.tw:/usr/obj/usr/src/sys/BLOG amd64
gslin at testphp [~] (6:05) cat /etc/libmap.conf
libc_r.so libthr.so
libc_r.so.6 libthr.so.2
libpthread.so libthr.so
libpthread.so.2 libthr.so.2

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
On Thu, Nov 08, 2007 at 10:21:22PM +0100, Dag-Erling Sm?grav wrote:
> Gea-Suan Lin <gslin at gslin.org> writes:
> > It crash again:
>
> Actually, it doesn't crash; the child get stuck somewhere and the
> parent kills it.
>
> I'd like to see an 'svn diff' of your tree.
>
> Also, if you change SIGKILL in bin/varnishd/mgt_child.c to either
> SIGABRT or SIGQUIT, you should get a core dump of the child process
> when the parent kills it. If you don't, 'ulimit -c unlimited' and try
> again.

I can see signal 3 (QUIT) from dmesg:

pid 76061 (varnishd), uid 65534: exited on signal 3
pid 76187 (varnishd), uid 65534: exited on signal 3

but I cannot find coredump. I run the following command in /tmp with
mode 1777:

sudo /usr/bin/env -i /home/service/varnish/sbin/varnishd -a 60.199.247.118:80 -f /usr/local/etc/varnish/image.vcl -h classic,1048583 -P /var/run/varnishd.pid -s file,/home/service/varnish-cache.mmap,32G -T 127.0.0.1:11957 -t 604800 -w 32,4096

And the default coredumpsize is unlimited now.

gslin at testphp [~] (6:29) limits
Resource limits (current):
cputime infinity secs
filesize infinity kB
datasize 33554432 kB
stacksize 524288 kB
coredumpsize infinity kB
memoryuse infinity kB
memorylocked infinity kB
maxprocesses 5547
openfiles 524288
sbsize infinity bytes
vmemoryuse infinity kB
gslin at testphp [~] (6:29) sudo env -i /usr/bin/limits
Resource limits (current):
cputime infinity secs
filesize infinity kB
datasize 33554432 kB
stacksize 524288 kB
coredumpsize infinity kB
memoryuse infinity kB
memorylocked infinity kB
maxprocesses 5547
openfiles 524288
sbsize infinity bytes
vmemoryuse infinity kB

Suggestion ?

--
* Gea-Suan Lin (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them. -- Harry S Truman
Varnish crash (SIGABRT) about every 10 mins [ In reply to ]
Gea-Suan Lin <gslin at gslin.org> writes:
> I can see signal 3 (QUIT) from dmesg:
>
> pid 76061 (varnishd), uid 65534: exited on signal 3
> pid 76187 (varnishd), uid 65534: exited on signal 3
>
> but I cannot find coredump. I run the following command in /tmp with
> mode 1777:

Read what I wrote earlier, you need to run 'ulimit -c unlimited'
before starting Varnish.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no

1 2  View All