Mailing List Archive

U1/U2 failures with kernel 2.6.<anything> --- maybe a clue?
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

OK, I've been thinking about this, and here is what we have.
(1) Some U1/U2 systems do very well on these kernels;
(2) Some are unusable: I have one which on 2.6.xx, has mean time between
(very hard lock) failure of about a day, on kernel-2.4.32, it's never
(literally).
(3) Weeve and (I believe) squash are as in point 2.

Now, I am not imagining things: a system which responds to nothing at all
is hard to make up.

Further, my unusable-with-2.6 system is 2x400; stable ones are I think a
bit slower.

Here's the clue: I tried the 2x400 system with a cdrecord, (which works
perfectly on 2.4.xx) with 2.6.15-rc4. It wrote the disk. Then it tried
to fixate it.
That killed it within about 1 second. I *think* fixating is one long
system call (I haven't read cdrecord yet), and scsi disk activity I know
is the general killer. So maybe looking at cdrecord's fixating system
activity can tell where the problem is. (I do know cdrecord on this
system with 2.6.xx has a 100% failure rate, based on several attempts.)

Thoughts, Comments?
(By the way, I regret my rash remarks from earlier.)
Regards,
Ferris
- --
Ferris McCormick (P44646, MI) <fmccor@gentoo.org>
Developer, Gentoo Linux (Devrel, Sparc)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (GNU/Linux)

iD8DBQFEAR06Qa6M3+I///cRAmq7AJ9SCiBS/sXieWdWF/Xu6nBMxIplngCdF2/3
Ku3jz0TMhUjvbnbT1md+Y/0=
=tMFQ
-----END PGP SIGNATURE-----
--
gentoo-sparc@gentoo.org mailing list
Re: U1/U2 failures with kernel 2.6.<anything> --- maybe a clue? [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ferris McCormick wrote:

> OK, I've been thinking about this, and here is what we have.
> (1) Some U1/U2 systems do very well on these kernels;
> (2) Some are unusable: I have one which on 2.6.xx, has mean time
> between (very hard lock) failure of about a day, on kernel-2.4.32,
> it's never (literally).
> (3) Weeve and (I believe) squash are as in point 2.
>
> Now, I am not imagining things: a system which responds to nothing
> at all is hard to make up.
>
> Further, my unusable-with-2.6 system is 2x400; stable ones are I
> think a bit slower.


This could be. I currently have 3 U1's running 2.6.15-r5 perfectly
fine. I'm still sort of in the process of building them out (just
adding packaged I need and such), so they have been compiling (with
distcc) fine for about 3 days straight. Two are 200Mhz Ultrasparc I's
and one is a 166Mhz U1.

They each only have one processor, but SMP is enabled.

>
> Here's the clue: I tried the 2x400 system with a cdrecord, (which
> works perfectly on 2.4.xx) with 2.6.15-rc4. It wrote the disk.
> Then it tried to fixate it. That killed it within about 1 second. I
> *think* fixating is one long system call (I haven't read cdrecord
> yet), and scsi disk activity I know is the general killer. So maybe
> looking at cdrecord's fixating system activity can tell where the
> problem is. (I do know cdrecord on this
> system with 2.6.xx has a 100% failure rate, based on several attempts.)
>
> Thoughts, Comments?
> (By the way, I regret my rash remarks from earlier.)
> Regards,
> Ferris
> --
> Ferris McCormick (P44646, MI) <fmccor@gentoo.org>
> Developer, Gentoo Linux (Devrel, Sparc)


- --
gentux
echo "hfouvyAdpy/ofu" | perl -pe 's/(.)/chr(ord($1)-1)/ge'

gentux's gpg fingerprint ==> 34CE 2E97 40C7 EF6E EC40 9795 2D81 924A
6996 0993
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (GNU/Linux)

iD8DBQFEAS04LYGSSmmWCZMRAmqtAKC1RoYtSKkJywwjC4j49pqQmwgr9wCeLE7A
rX6mbX/BI8o7YTdfsV5Nvtk=
=j9Yv
-----END PGP SIGNATURE-----

--
gentoo-sparc@gentoo.org mailing list
Re: U1/U2 failures with kernel 2.6.<anything> --- maybe a clue? [ In reply to ]
On Sun, 26 Feb 2006 03:14:58 +0000 (UTC)
Ferris McCormick <fmccor@gentoo.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> OK, I've been thinking about this, and here is what we have.
> (1) Some U1/U2 systems do very well on these kernels;
> (2) Some are unusable: I have one which on 2.6.xx, has mean time
> between (very hard lock) failure of about a day, on kernel-2.4.32,
> it's never (literally).
> (3) Weeve and (I believe) squash are as in point 2.

Yes, I have an Ultra 2 (2x300, 2GB RAM) and an Ultra 1 (143MHz, 448MB
RAM) that can readily be locked up with what appears to be the I/O
issue. In both cases, neither running the systems with a serial
console or graphical console reveals anything when they lock up
(even with the syslog daemon turned off).

For both systems, I've tried kernels built with gcc-3.4.5 and gcc-3.3.6
and the only possible difference is that it *seems* to take longer to
crash on the gcc-3.4.5 built kernels (but let me generate some data to
back that up).

Cheers,
--
Jason Wever
Gentoo/Sparc Team Co-Lead