Mailing List Archive

[mod_backhand-users] One goes, they all go.
My front-end machine has this:

<Files ~ "\.(cgi|pl)">
Backhand byAge
Backhand byHostname (back1|back2|back3|back4|back5|back6|back7)
Backhand byRandom
Backhand byLogWindow
Backhand byLoad
</Files>

Everything works just fine until one of the "back" machines bails or locks
up. Once a machine crashes and is unavailable (i.e. can't even ping the
IP/hostname because it's dead at the network level), the front end machine
just doesn't respond anymore.

I just verified that this is true by taking down eth0 on "back4" -- and
the "Age" column on ALL of the servers in the backhand-status table
started growing higher and higher until finally, the page wouldn't load up
anymore. When I re-initialized eth0 on "back4", the entire cluster came
back up. Just FYI, it locks up the front end so bad that even normal web
requests for static "non-backhand" content doesn't even come up.

What I'd really like is that if a server becomes unavailable, it simply
falls off the available list and everything continues as normal. When I
kill the Apache process on any of the backend boxes, it behaves as it
should -- the Age value increases on the single server, and eventually,
the status color turns red on the backhand-status table.

Thanks for any tips.

--Neil
[mod_backhand-users] One goes, they all go. [ In reply to ]
Okay. This is a bug in mod_backhand. I haven't got around to fixing it
because never seem to run into that problem in real life. There is a
definite problem in back_util.c where the connect() is called in the
moderator. It call connect() on a blocking TCP socket. So, if the
connect() hangs, the whole server will hang waiting on the moderator.
If a back end machine crashes, TCP connect's from front end machines
will hand for quite some time.

Obviously this is undesirable. I have yet to fix it as it requires a
pretty hefty change in the event model of the moderator.

If you download the latest CVS version of mod_backhand and use the:
BackhandConnectionPools off
configuration directive, you should not see this problem. Can you try
this until a correct fix is applied to the moderator?


On Tuesday, November 6, 2001, at 04:38 PM, Neil Mansilla wrote:

> My front-end machine has this:
>
> <Files ~ "\.(cgi|pl)">
> Backhand byAge
> Backhand byHostname (back1|back2|back3|back4|back5|back6|back7)
> Backhand byRandom
> Backhand byLogWindow
> Backhand byLoad
> </Files>
>
> Everything works just fine until one of the "back" machines bails or
> locks
> up. Once a machine crashes and is unavailable (i.e. can't even ping the
> IP/hostname because it's dead at the network level), the front end
> machine
> just doesn't respond anymore.
>
> I just verified that this is true by taking down eth0 on "back4" -- and
> the "Age" column on ALL of the servers in the backhand-status table
> started growing higher and higher until finally, the page wouldn't load
> up
> anymore. When I re-initialized eth0 on "back4", the entire cluster came
> back up. Just FYI, it locks up the front end so bad that even normal
> web
> requests for static "non-backhand" content doesn't even come up.
>
> What I'd really like is that if a server becomes unavailable, it simply
> falls off the available list and everything continues as normal. When I
> kill the Apache process on any of the backend boxes, it behaves as it
> should -- the Age value increases on the single server, and eventually,
> the status color turns red on the backhand-status table.
>
> Thanks for any tips.
>
> --Neil
>
>
> _______________________________________________
> backhand-users mailing list
> backhand-users@lists.backhand.org
> http://lists.backhand.org/mailman/listinfo/backhand-users
>
--
Theo Schlossnagle
1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
[mod_backhand-users] One goes, they all go. [ In reply to ]
I will gladly try the CVS version. Quick question -- do I have to apply
it to all of the backend boxes, or can compiling it for the front-end
server be enough, while still running 1.2.0 on the backend servers? Also,
is anyone else here running the bleeding edge version in a production
environment? Any issues?

On Tue, 6 Nov 2001, Theo Schlossnagle wrote:

> Okay. This is a bug in mod_backhand. I haven't got around to fixing it
> because never seem to run into that problem in real life. There is a
> definite problem in back_util.c where the connect() is called in the
> moderator. It call connect() on a blocking TCP socket. So, if the
> connect() hangs, the whole server will hang waiting on the moderator.
> If a back end machine crashes, TCP connect's from front end machines
> will hand for quite some time.
>
> Obviously this is undesirable. I have yet to fix it as it requires a
> pretty hefty change in the event model of the moderator.
>
> If you download the latest CVS version of mod_backhand and use the:
> BackhandConnectionPools off
> configuration directive, you should not see this problem. Can you try
> this until a correct fix is applied to the moderator?
>
>
> On Tuesday, November 6, 2001, at 04:38 PM, Neil Mansilla wrote:
>
> > My front-end machine has this:
> >
> > <Files ~ "\.(cgi|pl)">
> > Backhand byAge
> > Backhand byHostname (back1|back2|back3|back4|back5|back6|back7)
> > Backhand byRandom
> > Backhand byLogWindow
> > Backhand byLoad
> > </Files>
> >
> > Everything works just fine until one of the "back" machines bails or
> > locks
> > up. Once a machine crashes and is unavailable (i.e. can't even ping the
> > IP/hostname because it's dead at the network level), the front end
> > machine
> > just doesn't respond anymore.
> >
> > I just verified that this is true by taking down eth0 on "back4" -- and
> > the "Age" column on ALL of the servers in the backhand-status table
> > started growing higher and higher until finally, the page wouldn't load
> > up
> > anymore. When I re-initialized eth0 on "back4", the entire cluster came
> > back up. Just FYI, it locks up the front end so bad that even normal
> > web
> > requests for static "non-backhand" content doesn't even come up.
> >
> > What I'd really like is that if a server becomes unavailable, it simply
> > falls off the available list and everything continues as normal. When I
> > kill the Apache process on any of the backend boxes, it behaves as it
> > should -- the Age value increases on the single server, and eventually,
> > the status color turns red on the backhand-status table.
> >
> > Thanks for any tips.
> >
> > --Neil
> >
> >
> > _______________________________________________
> > backhand-users mailing list
> > backhand-users@lists.backhand.org
> > http://lists.backhand.org/mailman/listinfo/backhand-users
> >
> --
> Theo Schlossnagle
> 1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
> 2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
>
>
> _______________________________________________
> backhand-users mailing list
> backhand-users@lists.backhand.org
> http://lists.backhand.org/mailman/listinfo/backhand-users
>
[mod_backhand-users] One goes, they all go. [ In reply to ]
On Wednesday, November 7, 2001, at 12:16 AM, Neil Mansilla wrote:
> I will gladly try the CVS version. Quick question -- do I have to apply
> it to all of the backend boxes, or can compiling it for the front-end
> server be enough, while still running 1.2.0 on the backend servers?
> Also,
> is anyone else here running the bleeding edge version in a production
> environment? Any issues?

You only need to run it on the front end machines.

I run the bleeding edge stuff on all my installations. Serving anywhere
between 1 hit per day! and 4 million hits/day. So, under casually heavy
load. Works like a charm.

The disabling of connection pooling was written for a similar problem.

I have a set up that runs Tomcat on the backend servers. Sometime Java
shows its true self and accepts() connection but NEVER answers questions
or disconnects. Very very bad.

--
Theo Schlossnagle
Principal Consultant
OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
Phone: +1 301 776 6376 Fax: +1 410 880 4879
1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
[mod_backhand-users] One goes, they all go. [ In reply to ]
There seems to be an odd behavior with the CVS snapshot version of
mod_backhand I installed the other day. Server side includes that are
CGIs are being backhanded to the backend servers; however, it seems that
the output is very goofy. It is re-sending data from the original calling
document.

For example: http://www.foo.com/test.shtml

The source looks like this:

<HTML>
This is a test. Random number = <!--#exec cgi="/random.pl" -->
</HTML>

Normal output (because it seems that mod_backhand 1.2.0 does not backhand
the SSI exec cgi request) looks like this:

<HTML>
This is a test. Random number = 86751930
</HTML>

When it backhands the page on the CVS snapshot, it looks like this:

<HTML>
This is a test. Random number =
1000
<HTML>
This is a test. Random number =

Has anyone else noticed any odd behavior with the SSI and the latest
snapshot? Is there a directive available to command mod_backhand not to
load balance server side include requests?

Thanks,
Neil


On Wed, 7 Nov 2001, Theo Schlossnagle wrote:

>
> On Wednesday, November 7, 2001, at 12:16 AM, Neil Mansilla wrote:
> > I will gladly try the CVS version. Quick question -- do I have to apply
> > it to all of the backend boxes, or can compiling it for the front-end
> > server be enough, while still running 1.2.0 on the backend servers?
> > Also,
> > is anyone else here running the bleeding edge version in a production
> > environment? Any issues?
>
> You only need to run it on the front end machines.
>
> I run the bleeding edge stuff on all my installations. Serving anywhere
> between 1 hit per day! and 4 million hits/day. So, under casually heavy
> load. Works like a charm.
>
> The disabling of connection pooling was written for a similar problem.
>
> I have a set up that runs Tomcat on the backend servers. Sometime Java
> shows its true self and accepts() connection but NEVER answers questions
> or disconnects. Very very bad.
>
> --
> Theo Schlossnagle
> Principal Consultant
> OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
> Phone: +1 301 776 6376 Fax: +1 410 880 4879
> 1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
> 2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
>
>
> _______________________________________________
> backhand-users mailing list
> backhand-users@lists.backhand.org
> http://lists.backhand.org/mailman/listinfo/backhand-users
>
[mod_backhand-users] One goes, they all go. [ In reply to ]
On Friday, November 9, 2001, at 03:44 PM, Neil Mansilla wrote:
> There seems to be an odd behavior with the CVS snapshot version of
> mod_backhand I installed the other day. Server side includes that are
> CGIs are being backhanded to the backend servers; however, it seems that
> the output is very goofy. It is re-sending data from the original
> calling
> document.
>
> Has anyone else noticed any odd behavior with the SSI and the latest
> snapshot? Is there a directive available to command mod_backhand not to
> load balance server side include requests?

I thought there was reason I did that trickery in post_read_request.
There was a change between mod_backhand.c,1.41 and mod_backhand.c,1.42
to support mod_gzip on the front end machine. I think this _may_ have
induced this problem? Can you back out the change manually and
recompile? Normally this is a pain in the ass with CVS, but since it
only adding back in 3 lines of code, it should be really easy.

The changes are depicted visually here:
http://commedia.cnds.jhu.edu/cgi-
bin/viewcvs/jesus/mod_backhand/mod_backhand.c.diff?r1=1.41&r2=1.42

Your CVS copy should look like the one on the right and you need to add
back in the three lines so that it looks like the one on the left. I
hope the link comes out right [my mail client doesn't always agree with
me].

If you can verify that this fixes your problem, I will start thinking
about how I can make for includes and gzip works automagically...

--
Theo Schlossnagle
1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
[mod_backhand-users] One goes, they all go. [ In reply to ]
On Fri, 9 Nov 2001, Theo Schlossnagle wrote:

> On Friday, November 9, 2001, at 03:44 PM, Neil Mansilla wrote:
> > There seems to be an odd behavior with the CVS snapshot version of
> > mod_backhand I installed the other day. Server side includes that are
> > CGIs are being backhanded to the backend servers; however, it seems that
> > the output is very goofy. It is re-sending data from the original
> > calling
> > document.
> >
> > Has anyone else noticed any odd behavior with the SSI and the latest
> > snapshot? Is there a directive available to command mod_backhand not to
> > load balance server side include requests?
>
> I thought there was reason I did that trickery in post_read_request.
> There was a change between mod_backhand.c,1.41 and mod_backhand.c,1.42
> to support mod_gzip on the front end machine. I think this _may_ have
> induced this problem? Can you back out the change manually and
> recompile? Normally this is a pain in the ass with CVS, but since it
> only adding back in 3 lines of code, it should be really easy.
>
> The changes are depicted visually here:
> http://commedia.cnds.jhu.edu/cgi-
> bin/viewcvs/jesus/mod_backhand/mod_backhand.c.diff?r1=1.41&r2=1.42
>
> Your CVS copy should look like the one on the right and you need to add
> back in the three lines so that it looks like the one on the left. I
> hope the link comes out right [my mail client doesn't always agree with
> me].
>
> If you can verify that this fixes your problem, I will start thinking
> about how I can make for includes and gzip works automagically...

Still behaves the same way, even after I added the previous diff lines.
What's weird is the output. I've simplied the CGI so tha the output is
static.

http://foo.com/include.cgi output is simply: 12345

The HTML file is:

This is a test: <!--#exec cgi="/include.cgi" -->

With mod_backhand 1.2.0 it comes out right, just like this:

This is a test: 12345

With the CVS snapshot (post patch from your last post, Theo), it looks
like this:

This is a test: 16
This is a test: 12345

0


-[eof]-

I'll gladly try any other code mods you suggest. I did check out the
server IP address in my include.cgi script, and placed it in the output,
and indeed, the include.cgi was still being backhanded to the backend
servers.

--Neil
[mod_backhand-users] One goes, they all go. [ In reply to ]
On Friday, November 9, 2001, at 05:14 PM, Neil Mansilla wrote:
> On Fri, 9 Nov 2001, Theo Schlossnagle wrote:
>
> Still behaves the same way, even after I added the previous diff lines.
> What's weird is the output. I've simplied the CGI so tha the output is
> static.

Around like 1309 in your mod_backhand.c, there should be a line like:

r->filename = ap_pstrdup(r->pool, r->uri);

Try changing it to:

r->filename = r->uri;

> I'll gladly try any other code mods you suggest. I did check out the
> server IP address in my include.cgi script, and placed it in the output,
> and indeed, the include.cgi was still being backhanded to the backend
> servers.

So when you have it spit out the IP, do you get an IP from the back end
and from the front end?

--
Theo Schlossnagle
1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
[mod_backhand-users] One goes, they all go. [ In reply to ]
On Fri, 9 Nov 2001, Theo Schlossnagle wrote:

> On Friday, November 9, 2001, at 05:14 PM, Neil Mansilla wrote:
> > On Fri, 9 Nov 2001, Theo Schlossnagle wrote:
> >
> > Still behaves the same way, even after I added the previous diff lines.
> > What's weird is the output. I've simplied the CGI so tha the output is
> > static.
>
> Around like 1309 in your mod_backhand.c, there should be a line like:
>
> r->filename = ap_pstrdup(r->pool, r->uri);
>
> Try changing it to:
>
> r->filename = r->uri;
>
> > I'll gladly try any other code mods you suggest. I did check out the
> > server IP address in my include.cgi script, and placed it in the output,
> > and indeed, the include.cgi was still being backhanded to the backend
> > servers.
>
> So when you have it spit out the IP, do you get an IP from the back end
> and from the front end?

IP from the back end, never from the front end. With the mod above, it
behaved the same way -- still sent requests to the front end and had
double responses (this time, instead of a random number, it's spitting out
server IP address):

This is a test: 1a
This is a test: 10.8.1.10

0


[-eof-]

This is a test: 1b
This is a test: 10.8.1.10

0

[-eof-]

And once again, 1.2.0 only pulls SSI CGIs from the frontend.

Thanks,
--Neil
[mod_backhand-users] One goes, they all go. [ In reply to ]
Hi Theo - the problem has not yet been resolved. I did replace
mod_backhand CVS version only on the front-end machine. I did compile
mod_backhand *into* Apache (not loading it as an external module), and
the only module I seem to be loading externally is mod_perl
with LoadModule perl_module libexec/libperl.so

Now that I have an isolated test-bed of servers back here @ the office, I
can get really freaky with the configs/code to see what I can whirl up.
I'll pull the latest from the CVS and see if I can find out any more
useful information to share.

Thanks,
Neil

On Fri, 16 Nov 2001, Theo Schlossnagle wrote:

> Neil,
>
> Did you ever get this resolved?
>
> Did you replace mod_bakchand with CVS only on the front end machines?
>
> Have you trying messing around with the AddModule ordering in the
> httpd.conf file?
>
> On Saturday, November 10, 2001, at 01:28 AM, Neil Mansilla wrote:
> > IP from the back end, never from the front end. With the mod above, it
> > behaved the same way -- still sent requests to the front end and had
> > double responses (this time, instead of a random number, it's spitting
> > out
> > server IP address):
> >
> > This is a test: 1a
> > This is a test: 10.8.1.10
> >
> > And once again, 1.2.0 only pulls SSI CGIs from the frontend.
>
> I think that the ordering of AddModule lines might effect this.... maybe.
>
> --
> Theo Schlossnagle
> 1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
> 2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
>
>