Mailing List Archive: Best Practices - Suggestions Request

Best Practices - Suggestions Request

anup at iamcool

Jun 26, 2007, 12:40 AM

Post #1 of 14 (2888 views)

Hi All,

We have been running varnish for over a month on our production servers now.
We are using the default configuration with a small change, that is to
cache request with Cookies.

We have a number of web servers behind a load balancer, each web server
running varnish + apache.

I understand that running both services on the same server is probably a
bad idea, but due to time constraints i had no option but to go ahead
with the current setup.

The problem is, the varnish process dies every 7 - 10 days, and i have
to manually restart it.
One solution would be to write a script which monitors varnishd and
starts it if not running.

I, however, would like to understand "Why it dies"... I have been
monitoring the servers for long times at a stretch and it seems that a
flood of requests for big files (100+ MB) seems to get varnishd down. I
cannot be very sure about this though.

Is it really a bad idea to run varnishd on the same server with apache?

Another thing is, is it possible to configure varnish such that it adds
a custom header to the response (for example, the ip of the server which
processed the request or any custom value). This would greatly assist in
knowing which server served the request.

Is it possible to make varnish refuse connections from the clients it it
detects that the backend has stopped responding and start servicing
again the moment it detects that the backend is up?

Sorry, if i have asked too many questions at one go.

Any help/pointers would be greatly appreciated.

Regards
A.S.

Best Practices - Suggestions Request [ In reply to ]

Jun 26, 2007, 2:17 AM

Post #2 of 14 (2844 views)

In message <4680C2E3.2010108 at iamcool.net>, Anup Shukla writes:

>I understand that running both services on the same server is probably a
>bad idea, but due to time constraints i had no option but to go ahead
>with the current setup.

It is not an obviously bad idea, and in many cases it is likely to
work quite well.

If it works for you, be happy about the saved hardware :-)

>The problem is, the varnish process dies every 7 - 10 days, and i have
>to manually restart it.
>One solution would be to write a script which monitors varnishd and
>starts it if not running.
>
>I, however, would like to understand "Why it dies"... I have been
>monitoring the servers for long times at a stretch and it seems that a
>flood of requests for big files (100+ MB) seems to get varnishd down. I
>cannot be very sure about this though.

It's not much to go from, but here are some ideas:

1. Do you have enough storage space ? By default Varnish takes a
fixed fraction of the free space in /tmp, that may not be enough.
Use the "-s" argument to specify a different directory and/or how
much space you want to use.

>Another thing is, is it possible to configure varnish such that it adds
>a custom header to the response (for example, the ip of the server which
>processed the request or any custom value). This would greatly assist in
>knowing which server served the request.

I'm working on VCL support for such headermodification right now.

>Is it possible to make varnish refuse connections from the clients it it
>detects that the backend has stopped responding and start servicing
>again the moment it detects that the backend is up?

At some point we will be able to, but not right now.

>Sorry, if i have asked too many questions at one go.

You're welcome :-)

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

Best Practices - Suggestions Request [ In reply to ]

anup at iamcool

Jun 26, 2007, 3:23 AM

Post #3 of 14 (2842 views)

Poul-Henning Kamp wrote:
> In message <4680C2E3.2010108 at iamcool.net>, Anup Shukla writes:
>
>
>> I understand that running both services on the same server is probably a
>> bad idea, but due to time constraints i had no option but to go ahead
>> with the current setup.
>>
>
> It is not an obviously bad idea, and in many cases it is likely to
> work quite well.
>
> If it works for you, be happy about the saved hardware :-)
>
That makes me feel a lot better now. Thank you. :)
> It's not much to go from, but here are some ideas:
>
> 1. Do you have enough storage space ? By default Varnish takes a
> fixed fraction of the free space in /tmp, that may not be enough.
> Use the "-s" argument to specify a different directory and/or how
> much space you want to use.
>
I have put the cache file (or whatever its called as) under /cache/varnish
the parameters are .................... -s
file,/cache/varnish/varnish_storage.bin,2G

Disk space should not be a problem as there is plenty available.
Initially i had kept the size to 8G, but changed it 2G later.
I doubt if that has any affect apart from being able to store not more
than 2G of cached content.

The only reason i changed it to 2G was that, i noticed the size of
varnishd process kept on increasing.
Now, i am not a Linux expert so did not have the knowledge about the
whole idea of using disk space as RAM.
However a bit of Google search cleared up that my fears were baseless.
Or is it that there is a relation between the process size and the
on-disk cache file? ( I hope this is not out-of-place)

Did not change it back to 8G though, it works fine for me with the way
its configured. ;)
I will keep an eye on the process to see if i can find anything specific
to relate to the problem of "Varnish dying in 7-10 days".

Do i need to check for something specific?
>
>> Sorry, if i have asked too many questions at one go.
>>
>
> You're welcome :-)
>
>
Much appreciated. Thanks again. :)

Regards
A.S

Best Practices - Suggestions Request [ In reply to ]

Jun 26, 2007, 3:42 AM

Post #4 of 14 (2835 views)

In message <4680E932.5040906 at iamcool.net>, Anup Shukla writes:
>Poul-Henning Kamp wrote:

>I have put the cache file (or whatever its called as) under /cache/varnish
>the parameters are .................... -s
>file,/cache/varnish/varnish_storage.bin,2G
>
>Disk space should not be a problem as there is plenty available.
>Initially i had kept the size to 8G, but changed it 2G later.
>I doubt if that has any affect apart from being able to store not more
>than 2G of cached content.

Well, that could be your problem, right now Varnish doesn't deal
well with running out of storage, (DES is working on that right now)

>The only reason i changed it to 2G was that, i noticed the size of
>varnishd process kept on increasing.

That's normal, expected and shouldn't worry you unless Varnish
forceses more important processes into the defensive.

Varnish is designed around virtual memory and very few people have
learned yet, that allocating a lot of virtual memory is not by
definition a bad thing, it all depends how you use it.

I would try to give it a bit more space if the crashes persist.

--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.

Best Practices - Suggestions Request [ In reply to ]

anup at iamcool

Jun 26, 2007, 4:09 AM

Post #5 of 14 (2835 views)

Poul-Henning Kamp wrote:
> In message <4680E932.5040906 at iamcool.net>, Anup Shukla writes:
>
>> Poul-Henning Kamp wrote:
>>
>
>
>> I have put the cache file (or whatever its called as) under /cache/varnish
>> the parameters are .................... -s
>> file,/cache/varnish/varnish_storage.bin,2G
>>
>> Disk space should not be a problem as there is plenty available.
>> Initially i had kept the size to 8G, but changed it 2G later.
>> I doubt if that has any affect apart from being able to store not more
>> than 2G of cached content.
>>
>
> Well, that could be your problem, right now Varnish doesn't deal
> well with running out of storage, (DES is working on that right now)
>
>

Okay.
Then i just increase the storage size to a large but acceptable number.
Thank you for the help.

Will surely update you if i still have the crashes going on. Thank you
again.

Regards
A.S

Best Practices - Suggestions Request [ In reply to ]

Jun 27, 2007, 1:05 PM

Post #6 of 14 (2838 views)

"Poul-Henning Kamp" <phk at phk.freebsd.dk> writes:
> Anup Shukla <anup at iamcool.net> writes:
> > Disk space should not be a problem as there is plenty available.
> > Initially i had kept the size to 8G, but changed it 2G later. I
> > doubt if that has any affect apart from being able to store not more
> > than 2G of cached content.
> Well, that could be your problem, right now Varnish doesn't deal
> well with running out of storage, (DES is working on that right now)

Still, only the child should die, the parent should automatically
restart it. He says he has to restart it manually, which worries me...

Anup, what version are you running?

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no

Best Practices - Suggestions Request [ In reply to ]

anup at iamcool

Jun 27, 2007, 8:14 PM

Post #7 of 14 (2845 views)

Dag-Erling Sm?rgrav wrote:
> Still, only the child should die, the parent should automatically
> restart it. He says he has to restart it manually, which worries me...
>
> Anup, what version are you running?
>
> DES
>

I am running version 1.0.4
Is that too old to use?
If so, i will update it immediately.

Thank you

Regards,
A.S

Best Practices - Suggestions Request [ In reply to ]

anup at iamcool

Jun 27, 2007, 8:23 PM

Post #8 of 14 (2842 views)

Anup Shukla wrote:
> Dag-Erling Sm?rgrav wrote:
>
>> Still, only the child should die, the parent should automatically
>> restart it. He says he has to restart it manually, which worries me...
>>
>> Anup, what version are you running?
>>
>> DES
>>
>>
>
>
> I am running version 1.0.4
> Is that too old to use?
> If so, i will update it immediately.
>

Sorry for missing out on the release date. It is version 1.0.4 released
on May 20, 2007.

Best Practices - Suggestions Request [ In reply to ]

admin at adofms

Jun 27, 2007, 9:37 PM

Post #9 of 14 (2843 views)

>> I am running version 1.0.4
>> Is that too old to use?
>> If so, i will update it immediately.
>>
>>
>
> Sorry for missing out on the release date. It is version 1.0.4 released
> on May 20, 2007.
>
>
hmm ...

I am running 1.0.4 as well, but I am building mine from the SVN
repository rather than from the 1.0.4 tarball that was available at the
time through the gentoo ebuilds.

I originally just did an emerge to get varnish running, but found some
issues with VCL, and it was suggested that I get the SVN release and try
some VCL extensions.

My SVN update was from the 29th May 2007, and there was sufficient
functionality difference between the SVN source compared to the
origiinal 1.0.4 release. Not a whole lot of code, but it added a few VCL
commands that got the job done for me, which were not working in the
original 20-May 1.0.4 release.

In my case, the following VCL code requires the SVN 1.0.4 release :

sub vcl_hash {
set req.hash += req.url;
set req.hash += req.http.host;
set req.hash += req.http.cookie;
hash;
}

Not suggesting that the SVN code would cure your crashes - but it
definitely gives you a better VCL environment than the stock release.

I havnt seen ANY stability or performance, or memory leak issues with
Varnish. My system gets hit hard enough, but in my case, varnish is only
serving smallish chunks, and has no constraints on disk space or RAM. My
main machine runs apache + php + mysql + varnish + a number of cronjobs
written in C, and even during very busy periods, it never goes to swap.
It has 'only' 4GB of physical RAM, and varnish is compiled in 64bit
mode. Not a problem on my setup, but every case is different.

When it comes to your situation with varnish dying - that really sounds
like a resource problem from the outside here. Hard to tell without
having your system right in front of me and watching what it is doing.
ie - I would be very keen to run one of your varnish instances under gdb
and try to simulate the point where it is dying, and see if you can find
a pattern there ??. Or at least run it from a console under gdb and let
it go for several days and hope that it crashes whilst in the debugger.

Whilst commercial reality dictates that we need to find workarounds to
make things work from one day to the next (and that applies to
EVERYTHING) - Id encourage you to keep looking and find out exactly why
the process is stopping. Even if a new release of code fixes the
problem, its good for you to know why the old setup didnt work, and the
new one does. Keep looking :)

Sounds like Dag's latest code (which drops items from cache on a LRU
scheme as memory fills up) is more likely to solve your problems longer
term. I assume that comes out first in SVN, so thats another good reason
to try the SVN release.

Best Practices - Suggestions Request [ In reply to ]

Jun 27, 2007, 11:54 PM

Post #10 of 14 (2836 views)

Anup Shukla <anup at iamcool.net> writes:
> Dag-Erling Sm?rgrav <des at linpro.no> writes:
> > Anup, what version are you running?
> I am running version 1.0.4 Is that too old to use? If so, i will
> update it immediately.

No, 1.0.4 should be fine, but I'm not aware of any bugs in it that might
cause the behaviour you're seeing. What hardware and OS are you running
it on?

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no

Best Practices - Suggestions Request [ In reply to ]

Jun 28, 2007, 12:01 AM

Post #11 of 14 (2841 views)

"ADOFMS Admin, SteveOC" <admin at adofms.com.au> writes:
> Sounds like Dag's latest code (which drops items from cache on a LRU
> scheme as memory fills up) is more likely to solve your problems longer
> term. I assume that comes out first in SVN, so thats another good reason
> to try the SVN release.

I would like to stress once more that it is not always a good idea to
track trunk, especially now that we are in an active development phase.
For instance, we now know that the workspace management code that was
committed on June 4th had a small but significant (performance- and
correctness-affecting) bug that was fixed only yesterday, and the
instance-naming code we committed last week had serious issues that were
fixed on Tuesday.

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no

Best Practices - Suggestions Request [ In reply to ]

trondmm-varnish at crusaders

Jun 28, 2007, 5:28 AM

Post #12 of 14 (2844 views)

On Thu, Jun 28, 2007 at 08:54:24AM +0200, Dag-Erling Sm?rgrav wrote:
> Anup Shukla <anup at iamcool.net> writes:
>> Dag-Erling Sm??rgrav <des at linpro.no> writes:
>>> Anup, what version are you running?
>> I am running version 1.0.4 Is that too old to use? If so, i will
>> update it immediately.
> No, 1.0.4 should be fine, but I'm not aware of any bugs in it that might
> cause the behaviour you're seeing. What hardware and OS are you running
> it on?

We're running Varnish 1.0.3, so this might not be related, but we've
also run into a bug that causes varnish to die.

In our setup, varnish is prone to crash if /tmp is cleaned up (we had
a process that removed all files that hadn't been accessed in 7 days
from /tmp). What happened in our case was that something (we couldn't
figure out what) would make varnish reload the vcl-files. I'm not sure
why it would read these files from /tmp, but I'm guessing this is
where varnish places the compiled versions. Anyway - if these files
were missing, all varnish processes would simply die.

--
Trond Michelsen

Best Practices - Suggestions Request [ In reply to ]

Jun 28, 2007, 5:51 AM

Post #13 of 14 (2843 views)

Trond Michelsen <mike at crusaders.no> writes:
> We're running Varnish 1.0.3, so this might not be related, but we've
> also run into a bug that causes varnish to die.

1.0.3 has numerous known bugs...

> In our setup, varnish is prone to crash if /tmp is cleaned up (we had
> a process that removed all files that hadn't been accessed in 7 days
> from /tmp). What happened in our case was that something (we couldn't
> figure out what) would make varnish reload the vcl-files. I'm not sure
> why it would read these files from /tmp, but I'm guessing this is
> where varnish places the compiled versions. Anyway - if these files
> were missing, all varnish processes would simply die.

If the child process dies (due to a bug or running out of space), it
will not be able to restart without the compiled VCL file, and the
parent will probably bail out...

DES
--
Dag-Erling Sm?rgrav
Senior Software Developer
Linpro AS - www.linpro.no

Best Practices - Suggestions Request [ In reply to ]

anup at iamcool

Jun 29, 2007, 8:50 PM

Post #14 of 14 (2839 views)

Hi All.

Apologies for not being able to respond to my own problems ;)

I would personally like to give the latest SVN version a try.

I had encountered a situation where the /tmp was mounted noexec and
varnish failed to load.
This made me aware that Varnish uses /tmp to keep the compiled VCL.
I have removed the noexec option and its fine now.. would love to have a
configuration option to set the directory though :) .

As for the crashes, its still a mystery.
The system does not swap and everything seems to be quite fine.

Must be something else which is affecting Varnish.

There is one possibility though.
There is a lot of old unorganized php code on the servers that i have
never had time to inspect.
Its working so i did not need to tinker with it.
I assume that some one has used the /tmp to cache query responses and
might be cleaning up the entire /tmp to flush the caches.
Sounds a bit stupid, but a very possible scenario in my case.

Will need some time to find if that is really the case.

Thanks to DES for pointing this /tmp scenario (in his reply to Tron
Michelson)

Regards
A.S