Mailing List Archive

Testing traffic forwarding issues on FESX448-PREM
I'm working on testing several FESX448-PREM switches.

One of the switches in my test group is known to be bad. It was previously
installed as a top of cabinet switch, 42 servers were connected to it, all
port lights came on, full duplex, no errors, low CPU, etc but ports 13-24
would not forward traffic. As I understand it, this is due to a bad ASIC
that covers port region 13-24.

However, I now have this bad switch at my work bench and I cannot replicate
the same forwarding issue with port region 13-24.

At my work bench, I have port 1 as the uplink and I've been connecting my
laptop to ports 2-48 sequentially using a CAT6 cable while running a
continuous ping to a public IP. Interestingly, all the ports now work fine
-- port region 13-24 no longer has forwarding issues.

Does anyone know how this is possible? If there's a bad ASIC for port
region 13-24, then I'd expect this problem to occur 100% of the time.

I tried a couple other things afterwards. I had the theory that I needed
more ports active at once in order to trigger the forwarding issue. So I
first took a layer 2 switch and connected it on a bunch of ports with the
FESX448. CPU usage immediately went to 100% on the FESX448. I figured
something recursive routing was happening with the layer 2 switch. Next, I
put the layer 2 switch into boot monitor mode so it wouldn't do any
routing. That resolved the 100% CPU issue, but again I'm still unable to
replicate the traffic forwarding issues with ports 13-24.

Any suggestions on how I can replicate the forwarding issue and effectively
test the remaining switches would be much appreciated!
Re: Testing traffic forwarding issues on FESX448-PREM [ In reply to ]
We have had issues with one or two older FESX switches where they failed
intermittently (ie. it would work after a cold boot for a random period of
time, anywhere from days to months, then stop passing traffic on a port
region). One was on a fesx 624hf+2xg that had well outlasted its useful
lifetime. It was a good excuse to upgrade ;). I seem to recall seeing
this behavior once before then, but I could be imagining things.

On Dec 8, 2016 5:08 PM, "removed" <removed> wrote:

> I'm working on testing several FESX448-PREM switches.
>
> One of the switches in my test group is known to be bad. It was previously
> installed as a top of cabinet switch, 42 servers were connected to it, all
> port lights came on, full duplex, no errors, low CPU, etc but ports 13-24
> would not forward traffic. As I understand it, this is due to a bad ASIC
> that covers port region 13-24.
>
> However, I now have this bad switch at my work bench and I cannot
> replicate the same forwarding issue with port region 13-24.
>
> At my work bench, I have port 1 as the uplink and I've been connecting my
> laptop to ports 2-48 sequentially using a CAT6 cable while running a
> continuous ping to a public IP. Interestingly, all the ports now work fine
> -- port region 13-24 no longer has forwarding issues.
>
> Does anyone know how this is possible? If there's a bad ASIC for port
> region 13-24, then I'd expect this problem to occur 100% of the time.
>
> I tried a couple other things afterwards. I had the theory that I needed
> more ports active at once in order to trigger the forwarding issue. So I
> first took a layer 2 switch and connected it on a bunch of ports with the
> FESX448. CPU usage immediately went to 100% on the FESX448. I figured
> something recursive routing was happening with the layer 2 switch. Next, I
> put the layer 2 switch into boot monitor mode so it wouldn't do any
> routing. That resolved the 100% CPU issue, but again I'm still unable to
> replicate the traffic forwarding issues with ports 13-24.
>
> Any suggestions on how I can replicate the forwarding issue and
> effectively test the remaining switches would be much appreciated!
>
>
>
> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/foundry-nsp
>
Re: Testing traffic forwarding issues on FESX448-PREM [ In reply to ]
removed wrote:
> Does anyone know how this is possible? If there's a bad ASIC for port
> region 13-24, then I'd expect this problem to occur 100% of the time.

naah, could be a weak solder joint or any one of a variety of problems
that could strike under arbitrary circumstances (heat, humidity, etc).
That said, once a port region has started giving trouble, it will
eventually die and you are best off throwing the box away.

Nick
_______________________________________________
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp
Re: Testing traffic forwarding issues on FESX448-PREM [ In reply to ]
I absolutely agree. It could be something as weird as a difference in
thermal conditions causing the problem to happen in the rack but not on the
bench.



On Fri, Dec 9, 2016 at 5:23 AM, Nick Hilliard <nick@foobar.org> wrote:

> removed wrote:
> > Does anyone know how this is possible? If there's a bad ASIC for port
> > region 13-24, then I'd expect this problem to occur 100% of the time.
>
> naah, could be a weak solder joint or any one of a variety of problems
> that could strike under arbitrary circumstances (heat, humidity, etc).
> That said, once a port region has started giving trouble, it will
> eventually die and you are best off throwing the box away.
>
> Nick
> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/foundry-nsp
>