Mailing List Archive: error auto-restart problem..

error auto-restart problem..

xing at litespeedtech

Sep 16, 2006, 5:52 PM

Post #1 of 10 (1526 views)

child said (2, 26344): <<Child starts
managed to mmap 10737418240 bytes of 10737418240
Ready
>>
Child said (2, 26344): <<Missing errorhandling code in FetchHeaders(),
cache_fetch.c line 318:
Condition(i == 0) not true.
errno = 104 (Connection reset by peer)
>>
Cache child died pid=26344 status=0x6
Clean child
Child cleaned
start child pid 26358
Pushing vcls failed:
CLI communication error
unlink /tmp/vcl.XXqufVqH

Usually varnish automatically restarts the cache process when it
encounters the above message. However, it wasn't able to today due to
"CLI communication error". Running Trunk 995 and the error happened on
the second instance of varnishd. Likely result of me running 2 instances
on the same server.

Xing

error auto-restart problem.. [ In reply to ]

xing at litespeedtech

Sep 16, 2006, 5:52 PM

Post #2 of 10 (1520 views)

child said (2, 26344): <<Child starts
managed to mmap 10737418240 bytes of 10737418240
Ready
>>
Child said (2, 26344): <<Missing errorhandling code in FetchHeaders(),
cache_fetch.c line 318:
Condition(i == 0) not true.
errno = 104 (Connection reset by peer)
>>
Cache child died pid=26344 status=0x6
Clean child
Child cleaned
start child pid 26358
Pushing vcls failed:
CLI communication error
unlink /tmp/vcl.XXqufVqH

Usually varnish automatically restarts the cache process when it
encounters the above message. However, it wasn't able to today due to
"CLI communication error". Running Trunk 995 and the error happened on
the second instance of varnishd. Likely result of me running 2 instances
on the same server.

Xing

error auto-restart problem.. [ In reply to ]

Sep 17, 2006, 2:09 AM

Post #3 of 10 (1520 views)

In message <450C9C42.2050809 at litespeedtech.com>, "xing at litespeedtech.com" write
s:

>Pushing vcls failed:
>CLI communication error
>unlink /tmp/vcl.XXqufVqH

I have just committed a change to make sure we get the error messages
from the child in this case.

If you can reproduce it I'm very interested in what it said.

>[...] Likely result of me running 2 instances on the same server.

I'm not so sure.

But regarding the multiple-instance thing:

I think my inclination is that each instance of varnish gets a
user chosen "identifier". The only requirement I have is that
it can be part of a filename (ie: no '/' etc).

You will of course have to specify that identifier to all programs
that need access to the shared memory.

Is that workable ?

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

error auto-restart problem.. [ In reply to ]

Sep 17, 2006, 2:09 AM

Post #4 of 10 (1522 views)

In message <450C9C42.2050809 at litespeedtech.com>, "xing at litespeedtech.com" write
s:

>Pushing vcls failed:
>CLI communication error
>unlink /tmp/vcl.XXqufVqH

I have just committed a change to make sure we get the error messages
from the child in this case.

If you can reproduce it I'm very interested in what it said.

>[...] Likely result of me running 2 instances on the same server.

I'm not so sure.

But regarding the multiple-instance thing:

I think my inclination is that each instance of varnish gets a
user chosen "identifier". The only requirement I have is that
it can be part of a filename (ie: no '/' etc).

You will of course have to specify that identifier to all programs
that need access to the shared memory.

Is that workable ?

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

error auto-restart problem.. [ In reply to ]

xing at litespeedtech

Sep 17, 2006, 10:20 AM

Post #5 of 10 (1520 views)

>> Pushing vcls failed:
>> CLI communication error
>> unlink /tmp/vcl.XXqufVqH
>
> I have just committed a change to make sure we get the error messages
> from the child in this case.

It only happened once in the last 72+ hours. Hopefully it will resurface
soon.

> But regarding the multiple-instance thing:
>
> I think my inclination is that each instance of varnish gets a
> user chosen "identifier". The only requirement I have is that
> it can be part of a filename (ie: no '/' etc).

This sounds good. I really do not have any preference on how instances
should be tagged/identified as.

> You will of course have to specify that identifier to all programs
> that need access to the shared memory.

That's fine. I will likely use the domain name as identifier. The custom
identifier string setup is easier than using the backend ip:port and
much easier to type out on the terminal so I'm all for it.

Xing

error auto-restart problem.. [ In reply to ]

xing at litespeedtech

Sep 17, 2006, 10:20 AM

Post #6 of 10 (1522 views)

>> Pushing vcls failed:
>> CLI communication error
>> unlink /tmp/vcl.XXqufVqH
>
> I have just committed a change to make sure we get the error messages
> from the child in this case.

It only happened once in the last 72+ hours. Hopefully it will resurface
soon.

> But regarding the multiple-instance thing:
>
> I think my inclination is that each instance of varnish gets a
> user chosen "identifier". The only requirement I have is that
> it can be part of a filename (ie: no '/' etc).

This sounds good. I really do not have any preference on how instances
should be tagged/identified as.

> You will of course have to specify that identifier to all programs
> that need access to the shared memory.

That's fine. I will likely use the domain name as identifier. The custom
identifier string setup is easier than using the backend ip:port and
much easier to type out on the terminal so I'm all for it.

Xing

error auto-restart problem.. [ In reply to ]

xing at litespeedtech

Sep 18, 2006, 9:16 AM

Post #7 of 10 (1521 views)

The error happened again on instance 2 with the new build. The only
meaningful config diff between instance 1 and instance 2 is that 2 is
started with -s file,/raid0/varn.fp,10G

Fatal error at the end.

Cache child died pid=1688 status=0x6
Clean child
Child cleaned
start child pid 1699
Child said (2, 1699): <<Child starts
managed to mmap 10737418240 bytes of 10737418240
Ready
>>
Child said (2, 1699): <<Missing errorhandling code in FetchHeaders(),
cache_fetc
h.c line 318:
Condition(i == 0) not true.
errno = 104 (Connection reset by peer)
>>
Cache child died pid=1699 status=0x6
Clean child
Child cleaned
start child pid 1730
Child said (2, 1730): <<Child starts
managed to mmap 10737418240 bytes of 10737418240
Ready
>>
Child said (2, 1730): <<Missing errorhandling code in FetchHeaders(),
cache_fetc
h.c line 318:
Condition(i == 0) not true.
errno = 104 (Connection reset by peer)
>>
Cache child died pid=1730 status=0x6
Clean child
Child cleaned
start child pid 1740
Cache child died pid=1740 status=0x6
Clean child
Child cleaned
start child pid 1749
Child said (2, 1749): <<Child starts
managed to mmap 10737418240 bytes of 10737418240
Ready
Missing errorhandling code in FetchHeaders(), cache_fetch.c line 318:
Condition(i == 0) not true.
errno = 104 (Connection reset by peer)
>>
Cache child died pid=1749 status=0x6
Clean child
Child cleaned
start child pid 1760
Child said (2, 1760): <<Child starts
managed to mmap 10737418240 bytes of 10737418240
Ready
Missing errorhandling code in FetchHeaders(), cache_fetch.c line 318:
Condition(i == 0) not true.
errno = 104 (Connection reset by peer)
>>
Cache child died pid=1760 status=0x6
Clean child
Child cleaned
start child pid 1771
Pushing vcls failed:
CLI communication error
unlink /tmp/vcl.XXxgVLp9

Poul-Henning Kamp wrote:
> In message <450C9C42.2050809 at litespeedtech.com>, "xing at litespeedtech.com" write
> s:
>
>> Pushing vcls failed:
>> CLI communication error
>> unlink /tmp/vcl.XXqufVqH
>
> I have just committed a change to make sure we get the error messages
> from the child in this case.
>
> If you can reproduce it I'm very interested in what it said.
>
>> [...] Likely result of me running 2 instances on the same server.
>
> I'm not so sure.
>
> But regarding the multiple-instance thing:
>
> I think my inclination is that each instance of varnish gets a
> user chosen "identifier". The only requirement I have is that
> it can be part of a filename (ie: no '/' etc).
>
> You will of course have to specify that identifier to all programs
> that need access to the shared memory.
>
> Is that workable ?
>

error auto-restart problem.. [ In reply to ]

xing at litespeedtech

Sep 18, 2006, 9:16 AM

Post #8 of 10 (1523 views)

The error happened again on instance 2 with the new build. The only
meaningful config diff between instance 1 and instance 2 is that 2 is
started with -s file,/raid0/varn.fp,10G

Fatal error at the end.

Cache child died pid=1688 status=0x6
Clean child
Child cleaned
start child pid 1699
Child said (2, 1699): <<Child starts
managed to mmap 10737418240 bytes of 10737418240
Ready
>>
Child said (2, 1699): <<Missing errorhandling code in FetchHeaders(),
cache_fetc
h.c line 318:
Condition(i == 0) not true.
errno = 104 (Connection reset by peer)
>>
Cache child died pid=1699 status=0x6
Clean child
Child cleaned
start child pid 1730
Child said (2, 1730): <<Child starts
managed to mmap 10737418240 bytes of 10737418240
Ready
>>
Child said (2, 1730): <<Missing errorhandling code in FetchHeaders(),
cache_fetc
h.c line 318:
Condition(i == 0) not true.
errno = 104 (Connection reset by peer)
>>
Cache child died pid=1730 status=0x6
Clean child
Child cleaned
start child pid 1740
Cache child died pid=1740 status=0x6
Clean child
Child cleaned
start child pid 1749
Child said (2, 1749): <<Child starts
managed to mmap 10737418240 bytes of 10737418240
Ready
Missing errorhandling code in FetchHeaders(), cache_fetch.c line 318:
Condition(i == 0) not true.
errno = 104 (Connection reset by peer)
>>
Cache child died pid=1749 status=0x6
Clean child
Child cleaned
start child pid 1760
Child said (2, 1760): <<Child starts
managed to mmap 10737418240 bytes of 10737418240
Ready
Missing errorhandling code in FetchHeaders(), cache_fetch.c line 318:
Condition(i == 0) not true.
errno = 104 (Connection reset by peer)
>>
Cache child died pid=1760 status=0x6
Clean child
Child cleaned
start child pid 1771
Pushing vcls failed:
CLI communication error
unlink /tmp/vcl.XXxgVLp9

Poul-Henning Kamp wrote:
> In message <450C9C42.2050809 at litespeedtech.com>, "xing at litespeedtech.com" write
> s:
>
>> Pushing vcls failed:
>> CLI communication error
>> unlink /tmp/vcl.XXqufVqH
>
> I have just committed a change to make sure we get the error messages
> from the child in this case.
>
> If you can reproduce it I'm very interested in what it said.
>
>> [...] Likely result of me running 2 instances on the same server.
>
> I'm not so sure.
>
> But regarding the multiple-instance thing:
>
> I think my inclination is that each instance of varnish gets a
> user chosen "identifier". The only requirement I have is that
> it can be part of a filename (ie: no '/' etc).
>
> You will of course have to specify that identifier to all programs
> that need access to the shared memory.
>
> Is that workable ?
>

error auto-restart problem.. [ In reply to ]

Sep 18, 2006, 12:02 PM

Post #9 of 10 (1520 views)

In message <450EC670.5010605 at litespeedtech.com>, "xing at litespeedtech.com" write
s:
>The error happened again on instance 2 with the new build. The only
>meaningful config diff between instance 1 and instance 2 is that 2 is
>started with -s file,/raid0/varn.fp,10G

>Child said (2, 1699): <<Missing errorhandling code in FetchHeaders(),
>cache_fetc
> h.c line 318:
> Condition(i == 0) not true.
> errno = 104 (Connection reset by peer)

This was a TCP connection to the backend which got broken.

I need to handle that properly.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

error auto-restart problem.. [ In reply to ]

Sep 18, 2006, 12:02 PM

Post #10 of 10 (1521 views)

In message <450EC670.5010605 at litespeedtech.com>, "xing at litespeedtech.com" write
s:
>The error happened again on instance 2 with the new build. The only
>meaningful config diff between instance 1 and instance 2 is that 2 is
>started with -s file,/raid0/varn.fp,10G

>Child said (2, 1699): <<Missing errorhandling code in FetchHeaders(),
>cache_fetc
> h.c line 318:
> Condition(i == 0) not true.
> errno = 104 (Connection reset by peer)

This was a TCP connection to the backend which got broken.

I need to handle that properly.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.