Mailing List Archive: 6.1pre7 still unstable

As a consequence of my unsuccessful tries with the 2.2.12 and the 2.2.19
kernel I have switched to the Suse 7.3 distribution which uses the 2.4.10
kernel.

I am using (as always) the following test scenario:
- a 400MB drbd device
- the primary is connected and synchronized
- doing disk IO on the mounted drbd device (cp /usr/bin/perl; sync)
- then reset the secondary; let it reboot; reconnect; quicksync;

Result:
- 2 times out of 10 testruns the primary dies (kernel oops or complete
system hang)

I have seen this behaviour with a bunch of kernels (2.2.12, 2.2.19, 2.4.10)
and drbd revisions (5.8.1, 6.1pre3, 6.1pre4, 6.1pre6, 6.1pre7) and protocols
(B and C). As other people from the mailing list are giving similar error
descriptions in similar test scenarios I don't think that this is any
configuration or integration issue.

Any ideas how to get rid of this?

/Wolfram

In order to get support for the Microsoft Cluster Server a test suite must
be passed successfully. One part of this test creates a scenario where 8
client PCs are doing heavy load over the network on every cluster disk. This
includes ftp, http, smb, writing large files, writing many files, which
means there is a good application mix. Then at random points in time one
node is crashed and has to rejoin. The test has passed if the system still
offers high available services after a 48 hours testrun with more than 100
node crashes. Would be nice to have such a test for Linux, wouldn't it.

=======================================================================
Wolfram Weyer FORCE COMPUTERS GmbH
Staff Engineer - Systems Engineering A Solectron Subsidiary

phone: +49 89 60814-523 Street: Prof.-Messerschmitt-Str. 1
fax: +49 89 60814-112 City: D-85579 Neubiberg/Muenchen
mailto:Wolfram.Weyer@example.com <mailto:Wolfram.Weyer@force.de>
http://www.forcecomputers.com <http://www.forcecomputers.com/>
=======================================================================