Mailing List Archive

Instability when using management CLI
Hi!

We're currently testing Varnish in front of a new content generation
system. This system doesn't know exactly how long the pages it
generates will be valid, but it is important to purge them from the
cache once new data is available. In order to do this, we're connecting
to the CLI on the administration port, and issuing url.purge commands.

Once the purge commands started pouring in, we found some stability
issues; Within a few minutes, Varnish would segfault in strange places,
or in some cases just keep using 100% CPU without any interaction with
anyone. ElectricFence gave me a segfault on this line:

http://varnish.projects.linpro.no/browser/tags/varnish-1.0.3/bin/varnishd/mgt_cli.c#L117

(The file doesn't seem to have changed between 1.0.3 and trunk.)

I'm not sure I understand every aspect of this function, but I'm unable
to see where the space for that extra newline is allocated, and
re-compiling version 1.0.3 with the attached patch seems to have
stabilized it. It has now been processing some thousands of purge
requests for almost 24 hours under ElectricFence without incident.

I also tried running it through Valgrind. It complained about lost
memory in several places. Would these Valgrind logs be useful to
anyone, or should I just try patching Varnish myself instead? :)

Regards,
Kristoffer.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: varnish-malloc.patch
Type: text/x-patch
Size: 407 bytes
Desc: not available
Url : http://projects.linpro.no/pipermail/varnish-dev/attachments/20070327/5f459080/attachment.bin
Instability when using management CLI [ In reply to ]
Hi!

We're currently testing Varnish in front of a new content generation
system. This system doesn't know exactly how long the pages it
generates will be valid, but it is important to purge them from the
cache once new data is available. In order to do this, we're connecting
to the CLI on the administration port, and issuing url.purge commands.

Once the purge commands started pouring in, we found some stability
issues; Within a few minutes, Varnish would segfault in strange places,
or in some cases just keep using 100% CPU without any interaction with
anyone. ElectricFence gave me a segfault on this line:

http://varnish.projects.linpro.no/browser/tags/varnish-1.0.3/bin/varnishd/mgt_cli.c#L117

(The file doesn't seem to have changed between 1.0.3 and trunk.)

I'm not sure I understand every aspect of this function, but I'm unable
to see where the space for that extra newline is allocated, and
re-compiling version 1.0.3 with the attached patch seems to have
stabilized it. It has now been processing some thousands of purge
requests for almost 24 hours under ElectricFence without incident.

I also tried running it through Valgrind. It complained about lost
memory in several places. Would these Valgrind logs be useful to
anyone, or should I just try patching Varnish myself instead? :)

Regards,
Kristoffer.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: varnish-malloc.patch
Type: text/x-patch
Size: 407 bytes
Desc: not available
Url : http://projects.linpro.no/pipermail/varnish-dev/attachments/20070327/5f459080/attachment-0001.bin
Instability when using management CLI [ In reply to ]
Kristoffer Gleditsch <kristoffer.gleditsch at met.no> writes:
> I'm not sure I understand every aspect of this function, but I'm unable
> to see where the space for that extra newline is allocated, and
> re-compiling version 1.0.3 with the attached patch seems to have
> stabilized it.

Good catch, but there are a couple of snags.

First of all, I think the final newline is meant to replace the
trailing space, i.e. ["a" "b" "c"\n] rather than ["a" "b" "c" \n].

However, the for loop escapes double quotes, newlines and backslashes,
so the string may grow well beyond the strlen() + 3 set aside for each
argument. I'll send you an updated patch later today.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no
Instability when using management CLI [ In reply to ]
Kristoffer Gleditsch <kristoffer.gleditsch at met.no> writes:
> I'm not sure I understand every aspect of this function, but I'm unable
> to see where the space for that extra newline is allocated, and
> re-compiling version 1.0.3 with the attached patch seems to have
> stabilized it.

Good catch, but there are a couple of snags.

First of all, I think the final newline is meant to replace the
trailing space, i.e. ["a" "b" "c"\n] rather than ["a" "b" "c" \n].

However, the for loop escapes double quotes, newlines and backslashes,
so the string may grow well beyond the strlen() + 3 set aside for each
argument. I'll send you an updated patch later today.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no
Instability when using management CLI [ In reply to ]
In message <46090DA7.8070404 at met.no>, Kristoffer Gleditsch writes:

>I also tried running it through Valgrind. It complained about lost
>memory in several places. Would these Valgrind logs be useful to
>anyone, or should I just try patching Varnish myself instead? :)

I still havn't gotten Valgrind to work on FreeBSD, so logfiles
are most welcome.

I'm not sure your patch is enough, I'll look at it in a moment.

Poul-Henning

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Instability when using management CLI [ In reply to ]
In message <46090DA7.8070404 at met.no>, Kristoffer Gleditsch writes:

>I also tried running it through Valgrind. It complained about lost
>memory in several places. Would these Valgrind logs be useful to
>anyone, or should I just try patching Varnish myself instead? :)

I still havn't gotten Valgrind to work on FreeBSD, so logfiles
are most welcome.

I'm not sure your patch is enough, I'll look at it in a moment.

Poul-Henning

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Instability when using management CLI [ In reply to ]
Poul-Henning Kamp wrote:

> I still havn't gotten Valgrind to work on FreeBSD, so logfiles
> are most welcome.

I've put a tarball of Valgrind logfiles here:
http://users.linpro.no/toffer/valgrind-logs.tar.bz2

This is the output after starting valgrind/varnish like this:

valgrind -v --leak-check=full --trace-children=yes --num-callers=30
--time-stamp=yes --log-file=/tmp/valgrind-5.varnish
/usr/local/varnish/sbin/varnishd -a 0.0.0.0:80 -h classic -f
/etc/varnish/vcl.conf -T 0.0.0.0:81 -t 60 -w 1,1,60 -s
file,/var/lib/varnish/varnish_storage.bin,10240000

...and then killing the parent process after 20-ish hours. The Varnish
in question was version 1.0.3 from SVN with the patch I got from DES
applied.

It seems the follow-children followed the compilation of the VCL file as
well; I guess valgrind-5.varnish.27584 is the most interesting file.

I hope this is useful. I don't have a lot of experience with Valgrind,
so if there are any options I should (or shouldn't) have used, please
let me know.

Regards,
Kristoffer.
Instability when using management CLI [ In reply to ]
Poul-Henning Kamp wrote:

> I still havn't gotten Valgrind to work on FreeBSD, so logfiles
> are most welcome.

I've put a tarball of Valgrind logfiles here:
http://users.linpro.no/toffer/valgrind-logs.tar.bz2

This is the output after starting valgrind/varnish like this:

valgrind -v --leak-check=full --trace-children=yes --num-callers=30
--time-stamp=yes --log-file=/tmp/valgrind-5.varnish
/usr/local/varnish/sbin/varnishd -a 0.0.0.0:80 -h classic -f
/etc/varnish/vcl.conf -T 0.0.0.0:81 -t 60 -w 1,1,60 -s
file,/var/lib/varnish/varnish_storage.bin,10240000

...and then killing the parent process after 20-ish hours. The Varnish
in question was version 1.0.3 from SVN with the patch I got from DES
applied.

It seems the follow-children followed the compilation of the VCL file as
well; I guess valgrind-5.varnish.27584 is the most interesting file.

I hope this is useful. I don't have a lot of experience with Valgrind,
so if there are any options I should (or shouldn't) have used, please
let me know.

Regards,
Kristoffer.
Instability when using management CLI [ In reply to ]
In message <460BA994.1040407 at met.no>, Kristoffer Gleditsch writes:
>Poul-Henning Kamp wrote:

>I've put a tarball of Valgrind logfiles here:
>http://users.linpro.no/toffer/valgrind-logs.tar.bz2

>It seems the follow-children followed the compilation of the VCL file as
>well; I guess valgrind-5.varnish.27584 is the most interesting file.

Thanks a LOT!

Yes, the VCL-compiler has some minor leaks I belive there already
is a ticket about it. So far the valgrind hasn't found any I didn't
know about, but it has found two I forgot I knew about :-)

Poul-Henning

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Instability when using management CLI [ In reply to ]
In message <460BA994.1040407 at met.no>, Kristoffer Gleditsch writes:
>Poul-Henning Kamp wrote:

>I've put a tarball of Valgrind logfiles here:
>http://users.linpro.no/toffer/valgrind-logs.tar.bz2

>It seems the follow-children followed the compilation of the VCL file as
>well; I guess valgrind-5.varnish.27584 is the most interesting file.

Thanks a LOT!

Yes, the VCL-compiler has some minor leaks I belive there already
is a ticket about it. So far the valgrind hasn't found any I didn't
know about, but it has found two I forgot I knew about :-)

Poul-Henning

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.