Mailing List Archive

HA failover testing
I am hoping someone out there may be able to help me out. I have set up a
5 node Piranha cluster (exactly like the one issued in the HOWTO). At
this point I would like to test the failover capacity and see how well it
really works. Does anyone know of any tools out there that will do this
(or maybe have scripts of their own that they have written or
suggestions)? Once I have taken down the eth0 interface on the primary
router what is the best way to test and make sure that the node has failed
over?

Thanks,
Ruel
HA failover testing [ In reply to ]
"michael.r.loehr.1" wrote:
>
> I am hoping someone out there may be able to help me out. I have set up a
> 5 node Piranha cluster (exactly like the one issued in the HOWTO). At
> this point I would like to test the failover capacity and see how well it
> really works. Does anyone know of any tools out there that will do this
> (or maybe have scripts of their own that they have written or
> suggestions)?

What kind of tools? You mean to measure performance or trigger failovers?

In an LVS setup, there are 2 places where a failure produces a result:
either the active router can fail and cause a failover, or a real server
can fail and be taken out of the pool.


REAL SERVERS

There are several ways I can thin of to play with the real servers.
The most obvious is to unplug the network cable, or stop (or kill)
and restart a monitored service. You should be able to view the
results in one of 2 ways.

Regular "ipvsadm -l" command will lewt you see the entries change.

Or you can do the following cool trick if you want to watch the
load balancing in action...

1. Start a web service on each real server and create a
special default web page on each. The web page should just
identify the server itself, and contain 1k-4k of non-displayed
content (such as a comment).

2. Type the folloing command on your client PC (running linux):

> while true; do lynx -dump nn.nn.nn.nn ;done

(where nn.nn.nn.nn is the Virtual address of your web service
being handled by the real servers).

3. You should see the web page constantly reload, and (if
load balancing is on) that page should rotate between the
servers based on whatever rule you chose.

4. Disable one of the real servers. Lynx should stop including
that web page (Note: You may have to restart lynx or wait for
it to timeout).



ACTIVE ROUTER

The easiest way to simulate a failure on the active router is to send
a "kill -s SIGTERM" to one of the nanny processes or lvs.



NOTE: Make sure you are using the latest piranha RPMs. They can be
found at http://people.redhat.com/kbarrett/


--

Keith Barrett
Red Hat Inc. HA Team
kbarrett@redhat.com


Keith Barrett
Red Hat Inc. HA Team
kbarrett@redhat.com