Mailing List Archive

Apache Not Restarting
I have been having some problems restarting apache on servers that are
using LVS-NAT and was hoping someone had some insight or a workaround.

Basically, when I make a configuration change to my webservers and I try
to restart them (either with a complete shutdown or even just a graceful
restart), Apache tries to close all the current connections and re-bind
to the port. The problem is that invariably it takes several minutes for
all the current connections to clear even if I kill apache, and the
server won't start as long as any socket is open on port 80, even if it
is in a 'CLOSING' state.

I'm guessing that my problem is that I am using LVS persistent
connections, and combined with apache's lingering close this makes it
difficult for apache to know the difference between a slow connection
and a dead connection when it tries to close down, so the time it takes
to clear some of the sockets approaches my LVS persistence time.

I haven't tried turning off persistence, and I haven't tried
re-compiling apache without lingering-close. This is a production
cluster with rather heavy traffic and I don't have a test cluster to
play with. In the end rebooting the machine has been faster than waiting
for the ports to clear so I can restart apache, but this seems really
dumb, and doesn't work well because then my cluster machines have
different configuration states.

Is there any way anyone knows of to kill the sockets on the webserver
other than simply wait for them to clear out or rebooting the machine?
(I tried also taking the interface down and bringing it up again ...
that didn't work either.)

Is there any way to 'reset' the MASQ table on the LVS machine to force a
reset?

thanks in advance?

thornton
Re: Apache Not Restarting [ In reply to ]
Catch-22. I think the proper way to do something like this is to take the
affected server out of the LVS table _before_ making any configuration
changes to the machine. Wait until all connections are closed, then make
your change and restart apache. You should run into less problems this
way. After the server has restarted, then add it back into the pool.
--
Michael Brown

On Fri, 5 Jan 2001, Thornton Prime wrote:

>
> I have been having some problems restarting apache on servers that are
> using LVS-NAT and was hoping someone had some insight or a workaround.
>
> Basically, when I make a configuration change to my webservers and I try
> to restart them (either with a complete shutdown or even just a graceful
> restart), Apache tries to close all the current connections and re-bind
> to the port. The problem is that invariably it takes several minutes for
> all the current connections to clear even if I kill apache, and the
> server won't start as long as any socket is open on port 80, even if it
> is in a 'CLOSING' state.
>
> I'm guessing that my problem is that I am using LVS persistent
> connections, and combined with apache's lingering close this makes it
> difficult for apache to know the difference between a slow connection
> and a dead connection when it tries to close down, so the time it takes
> to clear some of the sockets approaches my LVS persistence time.
>
> I haven't tried turning off persistence, and I haven't tried
> re-compiling apache without lingering-close. This is a production
> cluster with rather heavy traffic and I don't have a test cluster to
> play with. In the end rebooting the machine has been faster than waiting
> for the ports to clear so I can restart apache, but this seems really
> dumb, and doesn't work well because then my cluster machines have
> different configuration states.
>
> Is there any way anyone knows of to kill the sockets on the webserver
> other than simply wait for them to clear out or rebooting the machine?
> (I tried also taking the interface down and bringing it up again ...
> that didn't work either.)
>
> Is there any way to 'reset' the MASQ table on the LVS machine to force a
> reset?
>
> thanks in advance?
>
> thornton
>
> _______________________________________________
> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> Send requests to lvs-users-request@LinuxVirtualServer.org
> or go to http://www.in-addr.de/mailman/listinfo/lvs-users
>
Re: Apache Not Restarting [ In reply to ]
Michael E Brown wrote:
>
> Catch-22. I think the proper way to do something like this is to take the
> affected server out of the LVS table _before_ making any configuration
> changes to the machine. Wait until all connections are closed, then make
> your change and restart apache. You should run into less problems this
> way. After the server has restarted, then add it back into the pool.

I thought of that, but unfortunately I need to make sure that the
servers in the cluster remain in a near identical state, so the
reconfiguration time should be minimal.

So far rebooting the machine has been the fastest way to do this, it
just seems silly.

thornton
RE: Apache Not Restarting [ In reply to ]
Newbie here. If the real servers are identical, can't you use rsync via cron
(I rsync 3 real servers at 1 hour times, you could do it more often) to keep
the html code the same? Seems to work fine on my 4 box cluster.

Dave

-----Original Message-----
From: Thornton Prime [mailto:thornton@jalan.com]
Sent: Friday, January 05, 2001 11:47 AM
To: Michael E Brown
Cc: lvs-users@LinuxVirtualServer.org
Subject: Re: Apache Not Restarting


Michael E Brown wrote:
>
> Catch-22. I think the proper way to do something like this is to take the
> affected server out of the LVS table _before_ making any configuration
> changes to the machine. Wait until all connections are closed, then make
> your change and restart apache. You should run into less problems this
> way. After the server has restarted, then add it back into the pool.

I thought of that, but unfortunately I need to make sure that the
servers in the cluster remain in a near identical state, so the
reconfiguration time should be minimal.

So far rebooting the machine has been the fastest way to do this, it
just seems silly.

thornton

_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://www.in-addr.de/mailman/listinfo/lvs-users
Re: Apache Not Restarting [ In reply to ]
Hello,

On Fri, 5 Jan 2001, Thornton Prime wrote:

>
> I have been having some problems restarting apache on servers that are
> using LVS-NAT and was hoping someone had some insight or a workaround.
>
> Basically, when I make a configuration change to my webservers and I try
> to restart them (either with a complete shutdown or even just a graceful
> restart), Apache tries to close all the current connections and re-bind
> to the port. The problem is that invariably it takes several minutes for
> all the current connections to clear even if I kill apache, and the
> server won't start as long as any socket is open on port 80, even if it
> is in a 'CLOSING' state.

Hm, I don't have such problems with Apache. I use the default
configuration-time settings, may be with higher process limit only.
Are you sure you use the latest 2.2 kernels in the real servers?

>
> I'm guessing that my problem is that I am using LVS persistent
> connections, and combined with apache's lingering close this makes it
> difficult for apache to know the difference between a slow connection
> and a dead connection when it tries to close down, so the time it takes
> to clear some of the sockets approaches my LVS persistence time.
>
> I haven't tried turning off persistence, and I haven't tried
> re-compiling apache without lingering-close. This is a production
> cluster with rather heavy traffic and I don't have a test cluster to
> play with. In the end rebooting the machine has been faster than waiting
> for the ports to clear so I can restart apache, but this seems really
> dumb, and doesn't work well because then my cluster machines have
> different configuration states.

One reason your servers to block can be a very low value for
the client number. You can build apache in this way:

CFLAGS=-DHARD_SERVER_LIMIT=2048 ./configure ...

and then to increase MaxClients (up to the above limit). Try with
different values. And don't play too much with the MinSpareServers and
MaxSpareServers. Values near the default are preferred. Is your kernel
compiled with higher value for the number of processes:

/usr/src/linux/include/linux/tasks.h

>
> Is there any way anyone knows of to kill the sockets on the webserver
> other than simply wait for them to clear out or rebooting the machine?
> (I tried also taking the interface down and bringing it up again ...
> that didn't work either.)
>
> Is there any way to 'reset' the MASQ table on the LVS machine to force a
> reset?

No way! The masq follows the TCP protocol and it is transparent
to the both ends. The expiration timeouts in the LVS/MASQ box are high
enough to allow the connection termination to complete. Do you remove
the real servers from the LVS configuration before stopping the apaches?
This can block the traffic and can delay the shutdown. It seems the
fastest way to restart the apache is apachectl graceful, of course,
if you don't change anything in apachectl (in the httpd args).

> thanks in advance?
>
> thornton


Regards

--
Julian Anastasov <ja@ssi.bg>
Re: Apache Not Restarting [ In reply to ]
> David Lambe wrote:
>
> Newbie here. If the real servers are identical, can't you use rsync
> via cron (I rsync 3 real servers at 1 hour times, you could do it more
> often) to keep the html code the same? Seems to work fine on my 4 box
> cluster.

Its not the html that I'm concerned about, but the state of the
webserver configuration. If I add a new module or change a rewrite rule
I want it to appear on all the machines simultaneously, or as nearly as
possible.

That requires a apache restart, but that requires re-binding to the
port. That is where I get stuck. Even a graceful restart needs to clear
all the connections out before it can re-bind, and with lingering close
+ LVS persistence it appears that it takes a long time sometimes.

thornton