Mailing List Archive

[mod_backhand-users] Odd/even IP-adresses
Hi,

I currently try to to get mod_backhand working on a cluster
of about 10 servers, but I experience a very strange behaviour:

I'm on a class-C network, every Apache on an even
IP-address (x.x.16.34, x.x.16.36, x.x.16.38) can see all
apaches running on even IP-addresses, too. All apaches
running on an odd IP-address (35, 37, ...) are just
seeing those running on odd IP-addresses, too.

With 'see' I mean they are shown in the backhand-handler
statuspage, my backhand related setup is just the
following on each server, /var/backhand is set
to 0777 to ensure that this isn't the problem:

------------ httpd.conf -----------------
UnixSocketDir /var/backhand
MultiCastStats x.x.16.255:4445
AcceptStats x.x.16.0/24

<Location "/intern/status/backhand">
SetHandler backhand-handler
</Location>
-----------------------------------------


Is this a known problem (I don't think so)? What could
I try to solve this? All machines have just one ethernet
interface and are configured absolutly the same way.

If more input is needed, cry loud ;-)


Beside that, I think mod_backhand will be a great solution
for my cluster, just not totally sure due the lack of
testing...


Thanks in advance for any replies
Matt
[mod_backhand-users] Odd/even IP-adresses [ In reply to ]
"Matt D. Herold" wrote:
> I'm on a class-C network, every Apache on an even
> IP-address (x.x.16.34, x.x.16.36, x.x.16.38) can see all
> apaches running on even IP-addresses, too. All apaches
> running on an odd IP-address (35, 37, ...) are just
> seeing those running on odd IP-addresses, too.
>
> With 'see' I mean they are shown in the backhand-handler
> statuspage, my backhand related setup is just the
> following on each server, /var/backhand is set
> to 0777 to ensure that this isn't the problem:
>

This is strange indeed. Perhaps the strangest thing I have seen yet
with mod_backhand.

> ------------ httpd.conf -----------------
> UnixSocketDir /var/backhand
> MultiCastStats x.x.16.255:4445
> AcceptStats x.x.16.0/24
>
> <Location "/intern/status/backhand">
> SetHandler backhand-handler
> </Location>
> -----------------------------------------

First I would like to make absolutely sure that your even and odd
addresses are in the same subnet, you imply it, but never explicitly
state it. I will assume that you machines are:
192.168.16.36, 192.168.16.37, 192.168.16.38, 192.168.16.39

Obvious, if the even and odd IPs have varying first two octests ( e.g.
10.0.16.even and 10.1.16.odd ) and your netmask was a /24
(255.255.255.0) then they could talk to each other. [It] Sounds like
that isnt the case though.

> Is this a known problem (I don't think so)? What could
> I try to solve this? All machines have just one ethernet
> interface and are configured absolutly the same way.

This is not known problem. I have never seen this problem in the many
installations I have done. My first guess is that you could have the
even machines plugged into one switch (or blade) and the odds plugged
into aother switch (or blade) and you have directed broadcasts turned
off.

Try it will IP multicast. Use a Multicast address of
225.10.16.38:4445,2 (and make sure that multicast packet forwarding is
enabled on your switching hardware).

Another first test would be to verify that you can ping an even machine
from an odd machine.. If you can't do this, there are networking
hardware (switching, arp, etc.) issues that need to be resolved.

> If more input is needed, cry loud ;-)

A network topology diagram would help. If even machines are behind one
load balancer (LVS, BIG/ip, Alteon, Arrowpoint) and the off machines are
behind another, that would lead down a different troubleshooting path.

Good luck! I am very interested in the diagnosis of the problem. If
(when) you figure out what is wrong, please post your findings.

Try using tcpdump or friend and listen on port 4445. See if you see
only packets from even machine on even machines and likelise for odd
machines. This would imply that mod_backhand isn't working becuase your
networking isn't working right. If you see the packet from odd machines
coming in on an even machine, that sounds like a problem with
mod_backhand. I doubt it is a problem with mod_backhand because that
part of the code is VERY straight forward and no tricks are played.


--
Theo Schlossnagle
1024D/A8EBCF8F/13BD 8C08 6BE2 629A 527E 2DC2 72C2 AD05 A8EB CF8F
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
[mod_backhand-users] Odd/even IP-adresses [ In reply to ]
On Fri, Jun 16, 2000 at 10:30:06AM -0400, Theo E. Schlossnagle wrote:

Theo,

thanks for your fast and detailed reply, but my problem is
getting more strange...

> > I'm on a class-C network, every Apache on an even
> > IP-address (x.x.16.34, x.x.16.36, x.x.16.38) can see all
> > apaches running on even IP-addresses, too. All apaches
> > running on an odd IP-address (35, 37, ...) are just
> > seeing those running on odd IP-addresses, too.
>
> First I would like to make absolutely sure that your even and odd
> addresses are in the same subnet, you imply it, but never explicitly
> state it. I will assume that you machines are:
> 192.168.16.36, 192.168.16.37, 192.168.16.38, 192.168.16.39

OK.

> > Is this a known problem (I don't think so)? What could
> > I try to solve this? All machines have just one ethernet
> > interface and are configured absolutly the same way.
>

> Try it will IP multicast. Use a Multicast address of
> 225.10.16.38:4445,2 (and make sure that multicast packet forwarding is
> enabled on your switching hardware).

Mulicast doesn't work (each host doesn't see itself and
nobody sees any other), but I don't know where to start
in this case because I have very little knowledge about
MultiCast in general.

> Another first test would be to verify that you can ping an even machine
> from an odd machine.. If you can't do this, there are networking
> hardware (switching, arp, etc.) issues that need to be resolved.

This works. All these machines are running since two years without
any networking problems, there aren't any (IP-)hops between them.

> Try using tcpdump or friend and listen on port 4445. See if you see

Here is the output of tcpdump, running on machine 192.168.16.34

$ tcpdump -vvv -x -n -c 20 port 4445
05:31:51.610000 192.168.16.37.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 60058)
4500 0084 ea9a 0000 4011 bce5 d8e2 1025
d8e2 10ff 115d 115d 0070 3016 6576 656e
6265 7474 6572 372e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:51.630000 192.168.16.38.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 56741)
4500 0084 dda5 0000 4011 c9d9 d8e2 1026
d8e2 10ff 115d 115d 0070 635d 6576 656e
6265 7474 6572 382e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:51.630000 192.168.16.35.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 41528)
4500 0084 a238 0000 4011 054a d8e2 1023
d8e2 10ff 115d 115d 0070 fad8 6576 656e
6265 7474 6572 352e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:51.630000 192.168.16.34.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 40045)
4500 0084 9c6d 0000 4011 0b16 d8e2 1022
d8e2 10ff 115d 115d 0070 0b61 6576 656e
6265 7474 6572 342e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:51.630000 192.168.16.36.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 2999)
4500 0084 0bb7 0000 4011 9bca d8e2 1024
d8e2 10ff 115d 115d 0070 2f1d 6576 656e
6265 7474 6572 362e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:52.640000 192.168.16.37.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 60062)
4500 0084 ea9e 0000 4011 bce1 d8e2 1025
d8e2 10ff 115d 115d 0070 9042 6576 656e
6265 7474 6572 372e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:52.660000 192.168.16.38.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 56744)
4500 0084 dda8 0000 4011 c9d6 d8e2 1026
d8e2 10ff 115d 115d 0070 d1ce 6576 656e
6265 7474 6572 382e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:52.660000 192.168.16.34.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 40172)
4500 0084 9cec 0000 4011 0a97 d8e2 1022
d8e2 10ff 115d 115d 0070 8bf0 6576 656e
6265 7474 6572 342e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:52.660000 192.168.16.36.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 3049)
4500 0084 0be9 0000 4011 9b98 d8e2 1024
d8e2 10ff 115d 115d 0070 6f4d 6576 656e
6265 7474 6572 362e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:52.660000 192.168.16.35.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 41532)
4500 0084 a23c 0000 4011 0546 d8e2 1023
d8e2 10ff 115d 115d 0070 080e 6576 656e
6265 7474 6572 352e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:53.660000 192.168.16.37.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 60096)
4500 0084 eac0 0000 4011 bcbf d8e2 1025
d8e2 10ff 115d 115d 0070 8066 6576 656e
6265 7474 6572 372e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:53.680000 192.168.16.38.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 56771)
4500 0084 ddc3 0000 4011 c9bb d8e2 1026
d8e2 10ff 115d 115d 0070 43a1 6576 656e
6265 7474 6572 382e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:53.680000 192.168.16.35.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 41559)
4500 0084 a257 0000 4011 052b d8e2 1023
d8e2 10ff 115d 115d 0070 ea28 6576 656e
6265 7474 6572 352e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:53.680000 192.168.16.34.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 40240)
4500 0084 9d30 0000 4011 0a53 d8e2 1022
d8e2 10ff 115d 115d 0070 eba1 6576 656e
6265 7474 6572 342e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:53.680000 192.168.16.36.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 3075)
4500 0084 0c03 0000 4011 9b7e d8e2 1024
d8e2 10ff 115d 115d 0070 7f12 6576 656e
6265 7474 6572 362e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:54.690000 192.168.16.37.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 60260)
4500 0084 eb64 0000 4011 bc1b d8e2 1025
d8e2 10ff 115d 115d 0070 20bd 6576 656e
6265 7474 6572 372e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:54.710000 192.168.16.38.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 56825)
4500 0084 ddf9 0000 4011 c985 d8e2 1026
d8e2 10ff 115d 115d 0070 82f6 6576 656e
6265 7474 6572 382e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:54.710000 192.168.16.36.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 3102)
4500 0084 0c1e 0000 4011 9b63 d8e2 1024
d8e2 10ff 115d 115d 0070 2f3a 6576 656e
6265 7474 6572 362e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:54.710000 192.168.16.35.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 41564)
4500 0084 a25c 0000 4011 0526 d8e2 1023
d8e2 10ff 115d 115d 0070 9b11 6576 656e
6265 7474 6572 352e 6576 656e 6265 7474
6572 2e63 6f6d
05:31:54.710000 192.168.16.34.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 40301)
4500 0084 9d6d 0000 4011 0a16 d8e2 1022
d8e2 10ff 115d 115d 0070 fb58 6576 656e
6265 7474 6572 342e 6576 656e 6265 7474
6572 2e63 6f6d

$ /sbin/ifconfig
lo Link encap:Local Loopback
inet addr:127.0.0.1 Bcast:127.255.255.255 Mask:255.0.0.0
UP BROADCAST LOOPBACK RUNNING MTU:3584 Metric:1
RX packets:234334 errors:0 dropped:0 overruns:0 frame:0
TX packets:234334 errors:0 dropped:0 overruns:0 carrier:0
collisions:0

eth0 Link encap:Ethernet HWaddr 00:C0:F0:17:0E:43
inet addr:192.168.16.34 Bcast:192.168.16.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8483379 errors:0 dropped:0 overruns:0 frame:0
TX packets:6720924 errors:0 dropped:0 overruns:0 carrier:0
collisions:0
Interrupt:12 Base address:0x6c00

$ /sbin/route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.16.0 * 255.255.255.0 U 0 0 268 eth0
127.0.0.0 * 255.0.0.0 U 0 0 4 lo
default * 0.0.0.0 U 0 0 16387 eth0


On the status pages, the host is just seeing .34, .36 and .38. It would
be great if this problem could be solved, but I really don't know
how. Although I'm not very familar with UDP (my well knowledge is just
TCP and all above that), I think tcpdump shows that the packages
are transmitted.

All the servers are running Linux RedHat 5.2, Kernel 2.0.36,
glibc-2.0.7. mod_backhand was compiled & installed with apxs
in an apache_1.3.12, other modules are

$ httpd -l
Compiled-in modules:
http_core.c
mod_env.c
mod_log_config.c
mod_mime.c
mod_status.c
mod_cgi.c
mod_actions.c
mod_alias.c
mod_access.c
mod_auth.c
mod_so.c

additionally, mod_perl is loaded as a shared object, mod_backhand
version is 1.0.9. Today I'll also try to use mod_backhand on some
newer machines running on the same net, perhaps it's just a bug
in a linux-library (also I don't think so)...

Thanks for replies
Matt
[mod_backhand-users] Odd/even IP-adresses [ In reply to ]
"Matt D. Herold" wrote:
> thanks for your fast and detailed reply, but my problem is
> getting more strange...
> > Try it will IP multicast. Use a Multicast address of
> > 225.10.16.38:4445,2 (and make sure that multicast packet forwarding is
> > enabled on your switching hardware).
>
> Mulicast doesn't work (each host doesn't see itself and
> nobody sees any other), but I don't know where to start
> in this case because I have very little knowledge about
> MultiCast in general.

A valid multicast address for you would be 225.1.1.1:4445,1

Unless you have an external interface you clipped out of the netstat -rn
(to protect the innocent). Linux's multicasting is a little screwy when
you have two interfaces.

Your tcpdumps look right, though.

The next debugging step is to strace the backhand process to see if it
sees these packets coming in. If you do a top, there should be a httpd
process with considerable processor time. (also, if you look in your
error logs, you should see a line like:
[notice] backhand_init(12368) spawning stats things (PID 12380)

This would mean the process you are looking for is 12380 (the last one).

Now, strace the process -- I will assume you are running bash:
strace -p 12380 2>&1 | grep recvfrom

This will show you what the backhand process is receiving. If it is
receiving the packets from you odd AND even machines, then backhand must
be throwing hem away (erroneously). If it is not receiving them, then I
would wager it is a problem on the OS side of things.

> Here is the output of tcpdump, running on machine 192.168.16.34
>
> $ tcpdump -vvv -x -n -c 20 port 4445
> 05:31:51.610000 192.168.16.37.4445 > 192.168.16.255.4445: udp 104 (ttl 64, id 60058)
> 4500 0084 ea9a 0000 4011 bce5 d8e2 1025
> d8e2 10ff 115d 115d 0070 3016 6576 656e
> 6265 7474 6572 372e 6576 656e 6265 7474
> 6572 2e63 6f6d
> [ ... snip ... ]

I am now very interested in your problem... This is very strange
behaviour indeed.

--
Theo Schlossnagle
1024D/A8EBCF8F/13BD 8C08 6BE2 629A 527E 2DC2 72C2 AD05 A8EB CF8F
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
[mod_backhand-users] Odd/even IP-adresses [ In reply to ]
On Sun, Jun 18, 2000 at 12:55:27PM -0400, Theo E. Schlossnagle wrote:
> "Matt D. Herold" wrote:
> > thanks for your fast and detailed reply, but my problem is
> > getting more strange...

> I am now very interested in your problem... This is very strange
> behaviour indeed.

Theo,

first let me thank for your extended replies and good information
where to start to get things working.

I think I've solved the problem now: I compiled mod_backhand statically
(but this isn't the point, I think) and I ensured that no logrotate
script does a 'killall -HUP httpd'. So, servers don't get restarted
and the server numbers are '0/ 0'. In this state, all works very well.
If I try to restart a server, the stats show '7/ 7' and the server
behaves very strange.

Theo: Although I'm not having this problem any longer, I am willing
to test or give more debug info if you would like to investigate
this further.


Just another small problem: I have a web application that tooks
very long to load (calculates information in realtime while
customer waits to get the response). I want to output the
header of the page immediately to give the user some response.
I figured out that mod_backhand doesn't support this directly,
right? There must be buffering of some kind in it. I'm not
very familar with C (just a Perl/Java-App-Coder ;-), so is there
an easy solution for this?


Regards,
Matt
[mod_backhand-users] Odd/even IP-adresses [ In reply to ]
"Matt D. Herold" wrote:
>
> On Sun, Jun 18, 2000 at 12:55:27PM -0400, Theo E. Schlossnagle wrote:
> > "Matt D. Herold" wrote:
> > > thanks for your fast and detailed reply, but my problem is
> > > getting more strange...
>
> > I am now very interested in your problem... This is very strange
> > behaviour indeed.
>
> Theo,
>
> first let me thank for your extended replies and good information
> where to start to get things working.
>
> I think I've solved the problem now: I compiled mod_backhand statically
> (but this isn't the point, I think) and I ensured that no logrotate
> script does a 'killall -HUP httpd'. So, servers don't get restarted
> and the server numbers are '0/ 0'. In this state, all works very well.
> If I try to restart a server, the stats show '7/ 7' and the server
> behaves very strange.
>
> Theo: Although I'm not having this problem any longer, I am willing
> to test or give more debug info if you would like to investigate
> this further.

Thanks.. I think the problem is with the -HUP signal... I haven't had a
chance to try it here yet, but I would wager that it is Apache calling a
hook in mod_backhand that I didn't prepare for. I would suggest sending
apache a USR1 signal instead of HUP. It is much more graceful :) [That
is was ./apachectl graceful does] This will fix the 0/0 servers problem
and it should rotate your logs fine.

If you want a more flexible way to log in apache (much more on the fly
and cleaner), look into mod_log_spread... It takes some time to set up,
but the monitoring that cam be done with it and its flexibility are
worth the investment of time.
http://www.lethargy.org/~george/mod_log_spread/

This will allow you to rotate logs without ever touching apache. (Except
for a very minor one time configuration change)

> Just another small problem: I have a web application that tooks
> very long to load (calculates information in realtime while
> customer waits to get the response). I want to output the
> header of the page immediately to give the user some response.
> I figured out that mod_backhand doesn't support this directly,
> right? There must be buffering of some kind in it. I'm not
> very familar with C (just a Perl/Java-App-Coder ;-), so is there
> an easy solution for this?

Hmm... It might do that now? Do you see it behaving differently? I
don't do any buffer in mod_backhand. I let Apache take of that when it
sets the send and receive buffers on the TCP sockets. But, in the
mod_backhand code, I basically perform a:
while(read some amount) { write some amount }
I guess if your header is small, it will be blocked waiting for more. I
intended on write a non blocking I/O loop to more aggressively push I/O,
but this is first time I have heard it needed. I will try to get that
into the next versions of mod_backhand. It is hard to make that
portable code.

--
Theo Schlossnagle
1024D/A8EBCF8F/13BD 8C08 6BE2 629A 527E 2DC2 72C2 AD05 A8EB CF8F
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7