Mailing List Archive

crashme crashes U60(2x300) almost as quickly as it does (2x450) (fwd)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Recalling IRC discussions, I guess this is of general interest.
If you haven't seen it before, read the second note first. :)

Truth in publishing ethics compels me to note that I have made a
correction to the U2 failure report.

If you have no idea what this is about, or if you have seen it many times
already, just ignore it.

Regards,
Ferris

- --
Ferris McCormick (P44646, MI) <fmccor@gentoo.org>
Developer, Gentoo Linux (sparc, devrel)

Date: Sat, 22 Oct 2005 08:53:55 +0000 (UTC)
From: Ferris McCormick <fmccor@gentoo.org>
To: squash@gentoo.org, weeve@gentoo.org
Cc: sparc@gentoo.org
Subject: crashme crashes U60(2x300) almost as quickly as it does (2x450) (fwd)

- --[PinePGP]--------------------------------------------------[begin]--
So, to finish the story duplicated below:
1. Disk involved (/dev/sda) in this test is a standard SUN-branded
18GB disk, Vendor: SEAGATE Model: ST318203LSUN18G Rev: 034A;
second disk on the system is the same.
2. To summarize my crashme results with this kernel:
a. U60(2x300), U60(2x450) --- pretty much the same, as described
in the original note, duplicated below.
b. U2(2x400) --- much worse. This system could not make it through
the first untar in pass 1.
3. Problem is scsi disk I/O. I suspect increased CPU utilization might
make it less likely to show up, because if the CPUs are busy doing
other things, they can't hit the disk as hard (observation from
emerge --sync) --- this is speculation.
4. For the record, U2(2x400), U60(2x450) are both completely stable
under kernel 2.4.31-sparc-r2; actually, U2 perhaps moreso.

This raises a question: Jason stated that a SUNESP patch made his U2
do much better. Is this patch in kernel 2.6.14-rc3-gb4d1b825? If
not, I would like to apply it and retest U2(2x400) on Monday. Clearly,
it would simplify the situation if case 2(b) -- the U2 failure -- could be
eliminated. A sample size of 1 is not all that useful, but if I recall
correctly (and I might be rewriting history based on current status), for
me the problem on a running system first came to light on that U2; it
seems to me, at least, that the U2 is more prone to failure.

So, if there is a U2-specific patch which is not in the kernel, that would
be significant. We might be looking at 2 scsi-related problems which
result in the same symptom. Answering that seems to me to be important.

Sorry (not very, really) to include another copy of my first note.

Thoughts, comments, suggestions, etc. to list please, not to me
personally.

Regards,

- --
Ferris McCormick (P44646, MI) <fmccor@gentoo.org>
Developer, Gentoo Linux (sparc, devrel)

- ---------- Forwarded message ----------
Date: Sat, 22 Oct 2005 01:38:38 +0000 (UTC)
From: Ferris McCormick <fmccor@gentoo.org>
To: squash@gentoo.org, weeve@gentoo.org
Cc: sparc@gentoo.org
Subject: crashme crashes U60(2x300) almost as quickly as it does (2x450)

I ran crashme on this system (as identified by 'uname -a') Friday evening:

Linux fer-de-lance 2.6.14-rc3-git-gb4d1b825 #1 SMP Fri Oct 21 23:20:37 UTC
2005 sparc64 sun4u TI UltraSparc II (BlackBird) GNU/Linux

gb4d1b825 is davem's current git.

This is a U60(2x300), /proc/cpuinfo thus:
==============================
fmccor@fer-de-lance ~ $ cat /proc/cpuinfo
cpu : TI UltraSparc II (BlackBird)
fpu : UltraSparc II integrated FPU
promlib : Version 3 Revision 31
prom : 3.31.0
type : sun4u
ncpus probed : 2
ncpus active : 2
D$ parity tl1 : 0
I$ parity tl1 : 0
Cpu0Bogo : 589.82
Cpu0ClkTck : 0000000011a53054
Cpu2Bogo : 589.82
Cpu2ClkTck : 0000000011a53054
MMU Type : Spitfire
State:
CPU0: online
CPU2: online
=================================
On this system, crashme died beginning pass 4 (as opposed to pass 3 on
2x450). I modified crashme.sh to keep a log file; here it is.

=================================
Fri Oct 21 23:41:32 UTC 2005
2.6.14-rc3-git-gb4d1b825
Copying /usr/portage to /CRASH/crash.
Create tarfile
Removing portage
Untar
Removing portage
Run 1 completed
Copying /usr/portage to /CRASH/crash.
Create tarfile
Removing portage
Untar
Removing portage
Run 2 completed
Copying /usr/portage to /CRASH/crash.
Create tarfile
Removing portage
Untar
Removing portage
Run 3 completed
Copying /usr/portage to /CRASH/crash.
====================================

The log does not show it, but /usr/portage and /CRASH/crash are on
the same partition (/dev/sda4).

So, crashme will kill (some) (2x300) systems if they are sensitive to the
problem. However, fer-de-lance (2x300) is much more robust running an
'emerge --sync' than antaresia (2x450) is. That might be because the CPUs
are slower, and so can't drive the disks as hard.

Hope this is useful,
Regards,
Ferris

- --
Ferris McCormick (P44646, MI) <fmccor@gentoo.org>
Developer, Gentoo Linux (sparc, devrel)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDWkg8Qa6M3+I///cRAkODAKCIVOZdWsa0rLFh+P13uy6j3VO5NQCbBs3t
NO5RIaCds27WpDuxpFhyUh4=
=qOKp
-----END PGP SIGNATURE-----
--
gentoo-sparc@gentoo.org mailing list
Re: crashme crashes U60(2x300) almost as quickly as it does (2x450) (fwd) [ In reply to ]
In current 2.6 kernels, the sysrq combination is tricky. This is how
you trigger it.

Hold down CTRL. Hold down ALT. Hold down SysRQ. Hold down SHIFT.
Usage message should display. Let go of SHIFT. Hit your command key.

Josh