Mailing List Archive

Another Varnish 1.1.2 crash, in cache_backend.c
Hi,

(gdb) bt
#0 0x28129f37 in thr_kill () from /lib/libc.so.6
#1 0x280cf1a5 in pthread_mutex_unlock () from /usr/lib/libthr.so.2
#2 0x280c72ae in raise () from /usr/lib/libthr.so.2
#3 0x281a4b78 in abort () from /lib/libc.so.6
#4 0x280aa0ed in lbv_assert (func=0x806ed70 "VBE_ClosedFd",
file=0x806ebfb "cache_backend.c", line=357,
cond=0x806edba "(close(vc->fd)) == 0", err=22) at assert.c:58
#5 0x0804fbdf in VBE_ClosedFd (w=0xb3b3ecf0, vc=0xb1ce040)
at cache_backend.c:357
#6 0x0805a713 in PipeSession (sp=0x9099004) at cache_pipe.c:137
#7 0x080511df in cnt_pipe (sp=0x9099004) at cache_center.c:668
#8 0x080517c8 in CNT_Session (sp=0x9099004) at steps.h:35
#9 0x080590e2 in wrk_do_one (w=0xb3b3ecf0) at cache_pool.c:194
#10 0x08059650 in wrk_thread (priv=0x807e1f0) at cache_pool.c:248
#11 0x280c95cf in pthread_create () from /usr/lib/libthr.so.2
#12 0xb8489fec in ?? ()

Got this one during the night.

Bye,

--
Anders.
Another Varnish 1.1.2 crash, in cache_backend.c [ In reply to ]
In message <20080103073513.GA99459 at fupp.net>, Anders Nordby writes:
>Hi,
>
>(gdb) bt
>#0 0x28129f37 in thr_kill () from /lib/libc.so.6
>#1 0x280cf1a5 in pthread_mutex_unlock () from /usr/lib/libthr.so.2
>#2 0x280c72ae in raise () from /usr/lib/libthr.so.2
>#3 0x281a4b78 in abort () from /lib/libc.so.6
>#4 0x280aa0ed in lbv_assert (func=0x806ed70 "VBE_ClosedFd",
> file=0x806ebfb "cache_backend.c", line=357,
> cond=0x806edba "(close(vc->fd)) == 0", err=22) at assert.c:58
>#5 0x0804fbdf in VBE_ClosedFd (w=0xb3b3ecf0, vc=0xb1ce040)
> at cache_backend.c:357
>#6 0x0805a713 in PipeSession (sp=0x9099004) at cache_pipe.c:137
>#7 0x080511df in cnt_pipe (sp=0x9099004) at cache_center.c:668
>#8 0x080517c8 in CNT_Session (sp=0x9099004) at steps.h:35
>#9 0x080590e2 in wrk_do_one (w=0xb3b3ecf0) at cache_pool.c:194
>#10 0x08059650 in wrk_thread (priv=0x807e1f0) at cache_pool.c:248
>#11 0x280c95cf in pthread_create () from /usr/lib/libthr.so.2
>#12 0xb8489fec in ?? ()
>
>Got this one during the night.

Was there any errno information available ? I belive the assert
message would have printed it ?

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Another Varnish 1.1.2 crash, in cache_backend.c [ In reply to ]
Hi,

On Tue, Jan 08, 2008 at 10:26:54AM +0000, Poul-Henning Kamp wrote:
> Was there any errno information available ? I belive the assert
> message would have printed it ?

Nope, no assert.

Got two more of them here:

Case 1:

(gdb) bt
#0 0x0000000000408a90 in vbe_sock_conn (ai=0x0) at cache_backend.c:162
#1 0x0000000000408b98 in vbe_conn_try (bp=0xaf2d00, pai=0x7ffffadd5838)
at cache_backend.c:190
#2 0x0000000000408d14 in vbe_connect (sp=0xb4d008, bp=0xaf2d00)
at cache_backend.c:228
#3 0x000000000040912e in vbe_nextfd (sp=0xb4d008) at
cache_backend.c:299
#4 0x0000000000409372 in VBE_GetFd (sp=0xb4d008) at cache_backend.c:336
#5 0x000000000040cafa in Fetch (sp=0xb4d008) at cache_fetch.c:278
#6 0x000000000040a124 in cnt_fetch (sp=0xb4d008) at cache_center.c:300
#7 0x000000000040b18d in CNT_Session (sp=0xb4d008) at steps.h:40
#8 0x0000000000412cf8 in wrk_do_one (w=0x7ffffadd5ad0) at
cache_pool.c:194
#9 0x00000000004131f3 in wrk_thread (priv=0x856300) at cache_pool.c:248
#10 0x000000080098755b in pthread_create () from /usr/lib/libthr.so.2
#11 0x00007ffffabd8000 in ?? ()
Cannot access memory at address 0x7ffffadd8000

Child said (2, 51256): <<Child starts
managed to mmap 161061273600 bytes of 161061273600
Ready
CLI ready
>>
Cache child died pid=51256 status=0x8b
Clean child
Child cleaned

Case 2:

(gdb) bt
#0 0x0000000000408a90 in vbe_sock_conn (ai=0x0) at cache_backend.c:162
#1 0x0000000000408b98 in vbe_conn_try (bp=0xaf2c80, pai=0x7ffffd1e7838)
at cache_backend.c:190
#2 0x0000000000408d14 in vbe_connect (sp=0xb85008, bp=0xaf2c80)
at cache_backend.c:228
#3 0x000000000040912e in vbe_nextfd (sp=0xb85008) at
cache_backend.c:299
#4 0x0000000000409372 in VBE_GetFd (sp=0xb85008) at cache_backend.c:336
#5 0x000000000040cafa in Fetch (sp=0xb85008) at cache_fetch.c:278
#6 0x000000000040a124 in cnt_fetch (sp=0xb85008) at cache_center.c:300
#7 0x000000000040b18d in CNT_Session (sp=0xb85008) at steps.h:40
#8 0x0000000000412cf8 in wrk_do_one (w=0x7ffffd1e7ad0) at
cache_pool.c:194
#9 0x00000000004131f3 in wrk_thread (priv=0x856300) at cache_pool.c:248
#10 0x000000080098755b in pthread_create () from /usr/lib/libthr.so.2
#11 0x00007ffffcfea000 in ?? ()
Cannot access memory at address 0x7ffffd1ea000

Child said (2, 52204): <<Child starts
managed to mmap 161061273600 bytes of 161061273600
Ready
CLI ready
>>
Cache child died pid=52204 status=0x8b
Clean child
Child cleaned
start child pid 53434

I'm running FreeBSD 6.3-RC2/amd64 where I get this.

Bye,

--
Anders.
Another Varnish 1.1.2 crash, in cache_backend.c [ In reply to ]
Anders Nordby <anders at fupp.net> writes:
> Poul-Henning Kamp <phk at phk.freebsd.dk> writes:
> > Was there any errno information available ? I belive the assert
> > message would have printed it ?
> Nope, no assert.

Yes there was, let me show the backtrace which phk missed and you
edited out of your reply:

#0 0x28129f37 in thr_kill () from /lib/libc.so.6
#1 0x280cf1a5 in pthread_mutex_unlock () from /usr/lib/libthr.so.2
#2 0x280c72ae in raise () from /usr/lib/libthr.so.2
#3 0x281a4b78 in abort () from /lib/libc.so.6
#4 0x280aa0ed in lbv_assert (func=0x806ed70 "VBE_ClosedFd",
file=0x806ebfb "cache_backend.c", line=357,
cond=0x806edba "(close(vc->fd)) == 0", err=22) at assert.c:58
#5 0x0804fbdf in VBE_ClosedFd (w=0xb3b3ecf0, vc=0xb1ce040)
at cache_backend.c:357

errno 22 is EINVAL.

Similar issues in trunk were addressed by a number of commits
(including r2264 and r2285) which unfortunately can't be merged back
to 1.1, because the code has been completely reorganized in trunk.
They need to be reimplemented for 1.1.

> Got two more of them here:

Actually, those are two almost identical instances of a completely
different bug, a segfault in vbe_sock_conn() due to vbe_conn_try()
passing a NULL pointer:

> (gdb) bt
> #0 0x0000000000408a90 in vbe_sock_conn (ai=0x0) at cache_backend.c:162
> #1 0x0000000000408b98 in vbe_conn_try (bp=0xaf2d00, pai=0x7ffffadd5838)
> at cache_backend.c:190
> #2 0x0000000000408d14 in vbe_connect (sp=0xb4d008, bp=0xaf2d00)
> at cache_backend.c:228

The fault lies in the following loop on bin/varnishd/cache_backend.c
lines 188-196:

/* Then try the list until the cached last good address */
for (ai = bp->addr; ai != bp->last_addr; ai = ai->ai_next) {
s = vbe_sock_conn(ai);
if (s >= 0) {
bp->last_addr = ai;
*pai = ai;
return (s);
}
}

The loop condition should check that ai != NULL.

There may or may not be a race condition at the bottom of this
(between vbe_conn_try() and something else modifying the backend
struct that bp points to)

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no