Mailing List Archive

Sun Gem (RIO GEM r01) errors...
I recently rebuilt a SunBlade 2000 system that was
running Solaris 8 to Gentoo 2006.0. The system sports
a Sun RIO GEM NIC, and worked quite well for the first
few days, however, we didn't hit it hard during that
time period either. The systems primary task is to be
our source repository, and so needs to be network
enabled.

The system was initially setup on 3/9/2006, and ran
fine until 3/15/2006 when we started getting the below
error messages:

Mar 15 15:39:25 tsdfft1 NETDEV WATCHDOG: eth0:
transmit timed out
Mar 15 15:39:25 tsdfft1 eth0: transmit timed out,
resetting
Mar 15 15:39:25 tsdfft1 eth0:
TX_STATE[003ffc05:00000001:00000019]
Mar 15 15:39:25 tsdfft1 eth0:
RX_STATE[0100c805:00000001:00000021]
Mar 15 15:39:25 tsdfft1 eth0: Link is up at 100 Mbps,
half-duplex.
Mar 15 15:39:25 tsdfft1 eth0: Pause is disabled

And:

Mar 15 16:11:58 tsdfft1 eth0: TX MAC xmit underrun.

We're presently using the 2.6.16 kernel (vanilla) with
sungem driver version 0.98. We have also seen this
issue with the 2.6.15.6 kernel (vanilla) and the
2.4.32_r2 kernel (provided by Gentoo 2006.0).

The first one is spuratic, but happens from time to
time. (Same error message everytime, save date &
time.) The second one is the most reproducible as all
I have to do is try to pull down source from the
repository (hosted on Apache2 via WebDAV), and after
about 6 MiB of data transfer, the link will die until
an ifconfig down/up is done, when it will go for a
while longer and then require a system reboot.

In researching the issue, I discovered that there is
one of several issues at play - the card is going bad,
or there is a driver problem. I found a link to an
xmit underrun issue for Solaris, but was unable to
access it due to it being locked under
sunsolve.sun.com. So I have no guarantee that going
back to Solaris will solve the issue either.

I have had a hard time finding an xmit underrun issue
under Linux, most searches result in references to
where the message is generated from and not from users
trying to find solutions to the problem.

I did, however, notice that there was a similar
problem with overflows on the RX portion of the chip,
which was solved through resetting the chip's RX unit
via gem_rxmac_reset().

My first attempt at a fix was to modify the driver at
the point of issue to schedule a reset, based on code
elsewhere in the driver. (See sungem-fix1.patch.txt)

At first this patch did not seem to work, however, I
have been running the kernel with it for about a week
now, and at least SSH and Apache seem to keep running.
So I do think it at least helped to improve the
situation, but it does not solve the problem on the
Subversion side (Apache/WebDAV) which still dies after
issues (just tested to make sure).

I then tried building a solution based on the
gem_rxmac_reset() and the various init functions, and
produced gem_txmac_reset(). However, my first use
locked up the kernel. It might be just that I tried to
gain a lock when I shouldn't have (I did try to get
the lock and tx_lock for the driver). However, I am
not sure that I did it correctly.

I would very much appreciate it if someone who is more
familiar with the sungem driver would look at the
patches and verify that (a) it is the correct thing to
do, and (b) I did it correctly.

I am aware that the network the system is running on
is suppose to be full duplex, 100 Mbps. However, I
have noticed that the card/driver seems to think it is
half-duplex. Could this simply be a duplexing issue? I
have no control of the switch it is plugged into (so
far as settings go), but have not been able to find a
way to get ifconfig to force it to full-duplex. (We've
typically built the driver into the kernel.)

If there is any information that I missed which would
be helpful, please let me know and I will be glad to
pass on what I can.

Patches and additional error log information on eth0
are available at the following URL:
http://tinyurl.com/hxfbp

Summary of system information:
System: Sun Microsystem's SunBlade 2000
Purchased: roughly 11/03.
Processor: UltraSparcIII+/cheetah+/sparc64
NIC: Sun RIO GEM 10/100, built-in on SunBlade 2000
Linux Distro: Gentoo 2006.0
Kernel Versions: 2.6.16, 2.6.15.6, Gentoo's 2.4.32_r2

Specific error:

NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out,resetting
eth0: TX_STATE[003ffc05:00000001:00000019]
eth0: RX_STATE[0100c805:00000001:00000021]
eth0: Link is up at 100 Mbps,half-duplex.
eth0: Pause is disabled
...
eth0: TX MAC xmit underrun.

Any advice, help, etc. would be greatly appreciated.

TIA,

Benjamen R. Meyer

P.S. I also posted to the netdev list at
vger.kernel.org, but I have not heard anything.
--
gentoo-sparc@gentoo.org mailing list
Re: Sun Gem (RIO GEM r01) errors... [ In reply to ]
On Mon, 3 Apr 2006, BRM wrote:

> I am aware that the network the system is running on is suppose to
> be full duplex, 100 Mbps. However, I have noticed that the
> card/driver seems to think it is half-duplex. Could this simply be a
> duplexing issue? I have no control of the switch it is plugged into
> (so far as settings go), but have not been able to find a way to get
> ifconfig to force it to full-duplex. (We've typically built the
> driver into the kernel.)

ethtool is the utility you want to setting speed/duplex, e.g.,

ethtool -s eth0 speed 100 duplex full autoneg off

The sys-apps/ethtool ebuild will install it, if it's not already on
your system. See the ethtool(8) man page for all the details.

--
Paul Heinlein <> heinlein@madboa.com <> www.madboa.com
--
gentoo-sparc@gentoo.org mailing list
Re: Sun Gem (RIO GEM r01) errors... [ In reply to ]
--- Paul Heinlein <heinlein@madboa.com> wrote:
> On Mon, 3 Apr 2006, BRM wrote:
> > I am aware that the network the system is running
> on is suppose to
> > be full duplex, 100 Mbps. However, I have noticed
> that the
> > card/driver seems to think it is half-duplex.
> Could this simply be a
> > duplexing issue? I have no control of the switch
> it is plugged into
> > (so far as settings go), but have not been able to
> find a way to get
> > ifconfig to force it to full-duplex. (We've
> typically built the
> > driver into the kernel.)
> ethtool is the utility you want to setting
> speed/duplex, e.g.,
> ethtool -s eth0 speed 100 duplex full autoneg off

Thanks. That did the trick. Unfortunately, I am not
able to pull from portage due to an authenticated
firewall being in the way. (We tried the instructions
for the proxy, but it didn't work.) Any how, for
anyone interested, you can also get the software from
the following link:
http://sourceforge.net/projects/gkernel/

Thanks,

Ben
--
gentoo-sparc@gentoo.org mailing list