Mailing List Archive

Making backhand do tricks.
Hey,

We have been using mod_backhand for load balancing for a little while, and
have been experiencing some weirdness (which someone may or may not have
an answer for), and hit something where we need to modify backhand to
achieve something.

The bug:
Apache processes lock up in 'W' mode for forever. We're running 1.3.19
(used to be 1.3.17, upgrade didn't help), CVS mod_backhand, and
mod_fastcgi for cgi requests. They seem to be essentially random, but
always in the range of being backhanded, I believe. I see this problem
creeping up every so often on the main apache mailing lists, but I have
not seen a good answer yet, so if anyone knows, I'd be grateful... We're
getting pretty sick of that bug.

The feature:

We need to be able to configure mod_backhand to _never_ handle requests
locally if told to. We're trying to get it to do this via
Backhand removeSelf
Backhand byAge 10
Backhand byRandom
etc, but sometimes it handles the request itself just for fun. We have
four dedicated webservers in the loop, but it doesn't always backhand
right.

Upon looking through the source code, I see that it defaults to locally
handling the request if the decision process returns an empty list, or
if there are connection issues (correct me if I'm wrong.) I would like to
give it another option (essentially to dump an error, or default redirect
it to a machine, even if it's a hard definition in the config file, which
would be preferred.) I see where I can do this in the code, but it would
be painful to edit, and would require editing more than a little bit of
code. I'm also not terribly familiar with apache module code yet.
I will help out, and I will write patches. I know the code pretty well at
this point.

Here is part of the configuration, in case you need this to answer
anything:
<IfModule mod_backhand.c>
UnixSocketDir /home/lj/var/backhand
MulticastStats 10.0.0.1 10.0.0.255:4445
AcceptStats 10.0.0.0/8
</IfModule>

Thanks,
-Alan
Making backhand do tricks. [ In reply to ]
> We have been using mod_backhand for load balancing for a little while,
> and
> have been experiencing some weirdness (which someone may or may not have
> an answer for), and hit something where we need to modify backhand to
> achieve something.
>
> The bug:
> Apache processes lock up in 'W' mode for forever. We're running 1.3.19
> (used to be 1.3.17, upgrade didn't help), CVS mod_backhand, and
> mod_fastcgi for cgi requests. They seem to be essentially random, but
> always in the range of being backhanded, I believe. I see this problem
> creeping up every so often on the main apache mailing lists, but I have
> not seen a good answer yet, so if anyone knows, I'd be grateful... We're
> getting pretty sick of that bug.

I have seen this bug before, but not on any of my production machines.
Can trace a process in the W state? Figure out which process ID it is
and use strace/ktrace/truss to figure out what it is stuck on.

If you are running on Solaris, there is known IPC bug that will screw
with things badly. There is a note about it in the NOTES file in the
distribution.

> The feature:
>
> We need to be able to configure mod_backhand to _never_ handle requests
> locally if told to. We're trying to get it to do this via
> Backhand removeSelf
> Backhand byAge 10
> Backhand byRandom
> etc, but sometimes it handles the request itself just for fun. We have
> four dedicated webservers in the loop, but it doesn't always backhand
> right.
>
> Upon looking through the source code, I see that it defaults to locally
> handling the request if the decision process returns an empty list, or
> if there are connection issues (correct me if I'm wrong.) I would like
> to
> give it another option (essentially to dump an error, or default
> redirect
> it to a machine, even if it's a hard definition in the config file,
> which
> would be preferred.) I see where I can do this in the code, but it would
> be painful to edit, and would require editing more than a little bit of
> code. I'm also not terribly familiar with apache module code yet.
> I will help out, and I will write patches. I know the code pretty well
> at
> this point.

Actually, I think that you can jury rig the setup to never direct to
itself. The question is what machine should it send it to if none of
the server behind it are available? If you want to choose one of the
them arbitrarily, then choose it to be w.x.y.z.

Then change your MulticastStats so that the first parameter is w.x.y.z
instead of 10.0.0.1. Also, use "BackhandSelfRedirect On" and set your
rules up like this:

Backhand removeSelf
Backhand byAge 10
Backhand byRandom
Backhand addSelf

This will randomize the list of hosts and the _append_ the local
machine's IP. But, the local machine's IP has been "spoofed" as the
w.x.y.z by the MulticastStats line and in the event that all "other"
hosts are not available, it will proxy to "itself" -- w.x.y.z.

Let me know if this works the way you want it.

> Here is part of the configuration, in case you need this to answer
> anything:
> <IfModule mod_backhand.c>
> UnixSocketDir /home/lj/var/backhand
> MulticastStats 10.0.0.1 10.0.0.255:4445
> AcceptStats 10.0.0.0/8
> </IfModule>

--
Theo Schlossnagle
1024D/A8EBCF8F/13BD 8C08 6BE2 629A 527E 2DC2 72C2 AD05 A8EB CF8F
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
Making backhand do tricks. [ In reply to ]
> I have seen this bug before, but not on any of my production machines.
> Can trace a process in the W state? Figure out which process ID it is
> and use strace/ktrace/truss to figure out what it is stuck on.
>
> If you are running on Solaris, there is known IPC bug that will screw
> with things badly. There is a note about it in the NOTES file in the
> distribution.

We're using FreeBSD 4.3-RC... Gathering trace/log data has been extremely
difficult, I'm going to get it soon though. The problem is that the bug
only happens on one server (which happens to be the frontend server), and
that server gets an enormous amount of hits, so dumping logs and trace
data would fill the harddrive in a short period of time, and we need a 45
minute window to get accurate reports on stuck processes... I'll post
interesting information here if it's getting stuck within backhand.

> Actually, I think that you can jury rig the setup to never direct to
> itself. The question is what machine should it send it to if none of
> the server behind it are available? If you want to choose one of the
> them arbitrarily, then choose it to be w.x.y.z.
>
> Then change your MulticastStats so that the first parameter is w.x.y.z
> instead of 10.0.0.1. Also, use "BackhandSelfRedirect On" and set your
> rules up like this:
>
> Backhand removeSelf
> Backhand byAge 10
> Backhand byRandom
> Backhand addSelf
>
> This will randomize the list of hosts and the _append_ the local
> machine's IP. But, the local machine's IP has been "spoofed" as the
> w.x.y.z by the MulticastStats line and in the event that all "other"
> hosts are not available, it will proxy to "itself" -- w.x.y.z.
>
> Let me know if this works the way you want it.

D'oh. I had done all of that, except for the Backhand addSelf at the
bottom. Adding that one line made the spoofing config work as I wanted. We
can supply true errors if it still handles things... Thanks a bunch!

Now, can you get mod_backhand to dump a pidfile of the moderator? :)

Thanks,
-Alan