Mailing List Archive

reproducable crashes of a node
I already mailed the situation, but there was no reaction.

I just tested it a second time, the secondary crashes reproducable, when it should become
primary and there is a write process to the primary.

In detail:
1) primary up
2) secondary up
3) SyncAll is done
4) win-client uses a share via smbd
5) On the client, I copy a large file from local cd to the share; so it's written first to the primary hd
6) In this moment, during writing, I switch of the primary to simulate a crash.
7) secondary should become primary and take over the service, but it crashes completely, only
a reset is possible.
8) The logfiles are showing nothing, but on the screen was:

Code: 39 4a 08 76 f4 89 d0 39 48 04 77 e3 85 c0 74 03 89 43 08 5b
kernel BUG at timer.c:306!
invalid argument: 0000

... a lot more text ...

Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing

Does anybody knows what this is? Is there a kernel patch available?

In this situation, the cluster is nearly unusable, because the primary MUST NOT crash during a
write process from a client.

After this, I rebooted the former primary, said "yes" to become primary, but it waited until the
former secondary was rebooted too. Why? The primary should imo come up as stand alone too
to provide the service. Can anybody explain this behavior?


mfg ar

--
mailto:andreas@example.com
http://www.rittershofer.de
PGP-Public-Key http://www.rittershofer.de/ari.htm