Mailing List Archive

Trying to diagnose a possibly failing FESX648-PREM
Hi all,

I believe I have a failing switch on my hands and I'm wondering if you
might be able to provide an assessment based on the symptoms I've seeing.

I'm currently running a Foundry FESX648-PREM with the following version
info:

SSH@FESX648 Router>show version
SW: Version 07.4.00eT3e3 Copyright (c) 1996-2012 Brocade Communications
Systems, Inc. All rights reserved.
Compiled on Dec 11 2013 at 19:00:43 labeled as SXR07400e
(4593059 bytes) Primary sxr07400e.bin
BootROM: Version 07.4.01T3e5 (FEv2)
HW: Stackable FESX648-PREM6 (PROM-TYPE FESX648-L3U-IPV6)
==========================================================================
Serial #: FL18090011
License: SX_V6_HW_ROUTER_IPv6_SOFT_PACKAGE (LID: XXXXXXXXXXX)
P-ASIC 0: type 0111, rev 00 subrev 01
P-ASIC 1: type 0111, rev 00 subrev 01
P-ASIC 2: type 0111, rev 00 subrev 01
P-ASIC 3: type 0111, rev 00 subrev 01
==========================================================================
300 MHz Power PC processor 8245 (version 0081/1014) 66 MHz bus
512 KB boot flash memory
8192 KB code flash memory
256 MB DRAM
The system uptime is 26 minutes 49 seconds
The system : started=warm start reloaded=by "reload"


Quick summary of the symptoms:

1. These problems started only after ~15 servers were connected to the
switch. Although many servers were connected, utilization remains low, only
~40Mbit on a 1Gbit uplink.

2. I just rebooted my switch 20 minutes ago, but I'm already seeing a ton
of FCS errors across many ports: http://pbrd.co/SABLtk

3. Inexplicably high and erratic ping times (80ms, instead of the usual
20ms over the same route and variation of +- 20ms on every ping). Ping
times were low and stable before many servers were connected.

4. High packet loss. Before a lot of servers were connected, there was no
packet loss. Yesterday, the packet loss was hovering around 10%. It seems
to be worsening now. Today the average packet loss is 20%.

Screen capture: http://pbrd.co/SADKO7 <http://pbrd.co/SABZ3D>

5. Yesterday I was also able to temporarily eliminate packet loss and the
high ping times by disabling specific ports. Today, disabling ports 7 and
11 has no effect.

6. The cross-connect cables were suspect, but all cables have since been
tested with a MicroTest PentaScanner and all passed. We even replaced the
CAT5 cross-connect with a machined and molded CAT6 cable -- the same packet
loss and erratic ping times persisted.

7. Other strange things have happened. Yesterday I attempted to connect up
two new servers to the switch on port 37 and 38. Ports 5-48 belong to the
same default VLAN. The servers could connect to the switch, and ping the
gateway IP, but they could not ping to the outside world. I then moved the
CAT5 cables to ports 22 and 23 -- same VLAN -- and everything worked
perfectly.

Does this seem like a failing switch? Are there any further diagnostic
tests I could run to verify this?

Thanks,
Elliot
Re: Trying to diagnose a possibly failing FESX648-PREM [ In reply to ]
On 07/05/2014 19:46, redacted@gmail.com wrote:
> Does this seem like a failing switch? Are there any further diagnostic
> tests I could run to verify this?

It could be a failing switch. You could try some boot-time diagnostics to
see what the story is.

The fes-x6xx boxes are powered by port asics which control port regions.
On a 648, the regions are: 1-12, 13-24, 25-36 and 37-48. One of the
failure modes on this box is that a port asic can die and cause forwarding
problems. If this happens, the problems will be restricted to a specific
port group.

Nick

_______________________________________________
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp
Re: Trying to diagnose a possibly failing FESX648-PREM [ In reply to ]
This is a stand-alone switch in a cabinet so no L2 loop there. Pretty
simple setup -- single BGP session with an upstream provider with the
default route pointing right to them. CPU utilization currently sitting at
1%.

Initially when I noticed the packet loss I thought I was getting DoS
attacked, but I have sFlow monitoring activated on all ports and don't see
anything out of the ordinary.

I'll check the boot time diagnostics soon -- thanks for your input.

- Elliot


On Wed, May 7, 2014 at 4:28 PM, Jeroen Wunnink | Hibernia Networks <
jeroen.wunnink@atrato.com> wrote:

> Could be a L2 loop or ddos against the mgmt IP. is the CPU load also
> high?
>
>
> On 07/05/14 20:46, redacted@gmail.com wrote:
>
> Hi all,
>
> I believe I have a failing switch on my hands and I'm wondering if you
> might be able to provide an assessment based on the symptoms I've seeing.
>
> I'm currently running a Foundry FESX648-PREM with the following version
> info:
>
> SSH@FESX648 Router>show version
> SW: Version 07.4.00eT3e3 Copyright (c) 1996-2012 Brocade Communications
> Systems, Inc. All rights reserved.
> Compiled on Dec 11 2013 at 19:00:43 labeled as SXR07400e
> (4593059 bytes) Primary sxr07400e.bin
> BootROM: Version 07.4.01T3e5 (FEv2)
> HW: Stackable FESX648-PREM6 (PROM-TYPE FESX648-L3U-IPV6)
> ==========================================================================
> Serial #: FL18090011
> License: SX_V6_HW_ROUTER_IPv6_SOFT_PACKAGE (LID: XXXXXXXXXXX)
> P-ASIC 0: type 0111, rev 00 subrev 01
> P-ASIC 1: type 0111, rev 00 subrev 01
> P-ASIC 2: type 0111, rev 00 subrev 01
> P-ASIC 3: type 0111, rev 00 subrev 01
> ==========================================================================
> 300 MHz Power PC processor 8245 (version 0081/1014) 66 MHz bus
> 512 KB boot flash memory
> 8192 KB code flash memory
> 256 MB DRAM
> The system uptime is 26 minutes 49 seconds
> The system : started=warm start reloaded=by "reload"
>
>
> Quick summary of the symptoms:
>
> 1. These problems started only after ~15 servers were connected to the
> switch. Although many servers were connected, utilization remains low, only
> ~40Mbit on a 1Gbit uplink.
>
> 2. I just rebooted my switch 20 minutes ago, but I'm already seeing a
> ton of FCS errors across many ports: http://pbrd.co/SABLtk
>
> 3. Inexplicably high and erratic ping times (80ms, instead of the usual
> 20ms over the same route and variation of +- 20ms on every ping). Ping
> times were low and stable before many servers were connected.
>
> 4. High packet loss. Before a lot of servers were connected, there was
> no packet loss. Yesterday, the packet loss was hovering around 10%. It
> seems to be worsening now. Today the average packet loss is 20%.
>
> Screen capture: http://pbrd.co/SADKO7 <http://pbrd.co/SABZ3D>
>
> 5. Yesterday I was also able to temporarily eliminate packet loss and
> the high ping times by disabling specific ports. Today, disabling ports 7
> and 11 has no effect.
>
> 6. The cross-connect cables were suspect, but all cables have since been
> tested with a MicroTest PentaScanner and all passed. We even replaced the
> CAT5 cross-connect with a machined and molded CAT6 cable -- the same packet
> loss and erratic ping times persisted.
>
> 7. Other strange things have happened. Yesterday I attempted to connect
> up two new servers to the switch on port 37 and 38. Ports 5-48 belong to
> the same default VLAN. The servers could connect to the switch, and ping
> the gateway IP, but they could not ping to the outside world. I then moved
> the CAT5 cables to ports 22 and 23 -- same VLAN -- and everything worked
> perfectly.
>
> Does this seem like a failing switch? Are there any further diagnostic
> tests I could run to verify this?
>
> Thanks,
> Elliot
>
>
>
> _______________________________________________
> foundry-nsp mailing listfoundry-nsp@puck.nether.nethttp://puck.nether.net/mailman/listinfo/foundry-nsp
>
>
>
> --
>
> Jeroen Wunnink
> IP NOC Manager - Hibernia Networksjeroen.wunnink@hibernianetworks.com
> Phone: +1 908 516 4200 (Ext: 1011)
> 24/7 NOC Phone: +31 20 82 00 623
>
>
Re: Trying to diagnose a possibly failing FESX648-PREM [ In reply to ]
I just had a replacement FESX648-PREM delivered overnight, hooked it up and
initially all looked good. However, when I imported my config and moved
over all of the CAT5e cables, the packet loss and erratic pings resumed.

Assuming that there was some firmware issue at play, I started removing
different parts of my config while running a continuous ping test in the
background. The moment I removed all rate-limiting from the device, packet
loss halted and ping times stabilized. However, I continue to have problems
downloading files at full speed -- speed test files will do these 'stop and
start' pauses. Ultimately I can only average 6MB/s where I'd
normally expect to pull down at least 200MB/s.

My original switch was running sxr07400e.bin and the replacement is running
sxr07400d.bin

All my other switches are FESX448-PREMs, so unfortunately I don't have an
existing example config to model after.

Anyone recommend a boot ROM and firmware version that works well with a
FESX648-PREM?




On Wed, May 7, 2014 at 4:36 PM, redacted@gmail.com <redacted@gmail.com>wrote:

> This is a stand-alone switch in a cabinet so no L2 loop there. Pretty
> simple setup -- single BGP session with an upstream provider with the
> default route pointing right to them. CPU utilization currently sitting at
> 1%.
>
> Initially when I noticed the packet loss I thought I was getting DoS
> attacked, but I have sFlow monitoring activated on all ports and don't see
> anything out of the ordinary.
>
> I'll check the boot time diagnostics soon -- thanks for your input.
>
> - Elliot
>
>
> On Wed, May 7, 2014 at 4:28 PM, Jeroen Wunnink | Hibernia Networks <
> jeroen.wunnink@atrato.com> wrote:
>
>> Could be a L2 loop or ddos against the mgmt IP. is the CPU load also
>> high?
>>
>>
>> On 07/05/14 20:46, redacted@gmail.com wrote:
>>
>> Hi all,
>>
>> I believe I have a failing switch on my hands and I'm wondering if you
>> might be able to provide an assessment based on the symptoms I've seeing.
>>
>> I'm currently running a Foundry FESX648-PREM with the following version
>> info:
>>
>> SSH@FESX648 Router>show version
>> SW: Version 07.4.00eT3e3 Copyright (c) 1996-2012 Brocade Communications
>> Systems, Inc. All rights reserved.
>> Compiled on Dec 11 2013 at 19:00:43 labeled as SXR07400e
>> (4593059 bytes) Primary sxr07400e.bin
>> BootROM: Version 07.4.01T3e5 (FEv2)
>> HW: Stackable FESX648-PREM6 (PROM-TYPE FESX648-L3U-IPV6)
>> ==========================================================================
>> Serial #: FL18090011
>> License: SX_V6_HW_ROUTER_IPv6_SOFT_PACKAGE (LID: XXXXXXXXXXX)
>> P-ASIC 0: type 0111, rev 00 subrev 01
>> P-ASIC 1: type 0111, rev 00 subrev 01
>> P-ASIC 2: type 0111, rev 00 subrev 01
>> P-ASIC 3: type 0111, rev 00 subrev 01
>> ==========================================================================
>> 300 MHz Power PC processor 8245 (version 0081/1014) 66 MHz bus
>> 512 KB boot flash memory
>> 8192 KB code flash memory
>> 256 MB DRAM
>> The system uptime is 26 minutes 49 seconds
>> The system : started=warm start reloaded=by "reload"
>>
>>
>> Quick summary of the symptoms:
>>
>> 1. These problems started only after ~15 servers were connected to the
>> switch. Although many servers were connected, utilization remains low, only
>> ~40Mbit on a 1Gbit uplink.
>>
>> 2. I just rebooted my switch 20 minutes ago, but I'm already seeing a
>> ton of FCS errors across many ports: http://pbrd.co/SABLtk
>>
>> 3. Inexplicably high and erratic ping times (80ms, instead of the usual
>> 20ms over the same route and variation of +- 20ms on every ping). Ping
>> times were low and stable before many servers were connected.
>>
>> 4. High packet loss. Before a lot of servers were connected, there was
>> no packet loss. Yesterday, the packet loss was hovering around 10%. It
>> seems to be worsening now. Today the average packet loss is 20%.
>>
>> Screen capture: http://pbrd.co/SADKO7 <http://pbrd.co/SABZ3D>
>>
>> 5. Yesterday I was also able to temporarily eliminate packet loss and
>> the high ping times by disabling specific ports. Today, disabling ports 7
>> and 11 has no effect.
>>
>> 6. The cross-connect cables were suspect, but all cables have since
>> been tested with a MicroTest PentaScanner and all passed. We even replaced
>> the CAT5 cross-connect with a machined and molded CAT6 cable -- the same
>> packet loss and erratic ping times persisted.
>>
>> 7. Other strange things have happened. Yesterday I attempted to connect
>> up two new servers to the switch on port 37 and 38. Ports 5-48 belong to
>> the same default VLAN. The servers could connect to the switch, and ping
>> the gateway IP, but they could not ping to the outside world. I then moved
>> the CAT5 cables to ports 22 and 23 -- same VLAN -- and everything worked
>> perfectly.
>>
>> Does this seem like a failing switch? Are there any further diagnostic
>> tests I could run to verify this?
>>
>> Thanks,
>> Elliot
>>
>>
>>
>> _______________________________________________
>> foundry-nsp mailing listfoundry-nsp@puck.nether.nethttp://puck.nether.net/mailman/listinfo/foundry-nsp
>>
>>
>>
>> --
>>
>> Jeroen Wunnink
>> IP NOC Manager - Hibernia Networksjeroen.wunnink@hibernianetworks.com
>> Phone: +1 908 516 4200 (Ext: 1011)
>> 24/7 NOC Phone: +31 20 82 00 623
>>
>>
>
Re: Trying to diagnose a possibly failing FESX648-PREM [ In reply to ]
Just spoke with a sysadmin working out of a different datacenter. They have
FESX648-PREMs deployed and they're running sxr07400e.bin firmware as well.
Completely stumped at this point :-/


On Thu, May 8, 2014 at 1:38 PM, redacted@gmail.com <redacted@gmail.com>wrote:

> I just had a replacement FESX648-PREM delivered overnight, hooked it up
> and initially all looked good. However, when I imported my config and moved
> over all of the CAT5e cables, the packet loss and erratic pings resumed.
>
> Assuming that there was some firmware issue at play, I started removing
> different parts of my config while running a continuous ping test in the
> background. The moment I removed all rate-limiting from the device, packet
> loss halted and ping times stabilized. However, I continue to have problems
> downloading files at full speed -- speed test files will do these 'stop and
> start' pauses. Ultimately I can only average 6MB/s where I'd
> normally expect to pull down at least 200MB/s.
>
> My original switch was running sxr07400e.bin and the replacement is
> running sxr07400d.bin
>
> All my other switches are FESX448-PREMs, so unfortunately I don't have an
> existing example config to model after.
>
> Anyone recommend a boot ROM and firmware version that works well with a
> FESX648-PREM?
>
>
>
>
> On Wed, May 7, 2014 at 4:36 PM, redacted@gmail.com <redacted@gmail.com>wrote:
>
>> This is a stand-alone switch in a cabinet so no L2 loop there. Pretty
>> simple setup -- single BGP session with an upstream provider with the
>> default route pointing right to them. CPU utilization currently sitting at
>> 1%.
>>
>> Initially when I noticed the packet loss I thought I was getting DoS
>> attacked, but I have sFlow monitoring activated on all ports and don't see
>> anything out of the ordinary.
>>
>> I'll check the boot time diagnostics soon -- thanks for your input.
>>
>> - Elliot
>>
>>
>> On Wed, May 7, 2014 at 4:28 PM, Jeroen Wunnink | Hibernia Networks <
>> jeroen.wunnink@atrato.com> wrote:
>>
>>> Could be a L2 loop or ddos against the mgmt IP. is the CPU load also
>>> high?
>>>
>>>
>>> On 07/05/14 20:46, redacted@gmail.com wrote:
>>>
>>> Hi all,
>>>
>>> I believe I have a failing switch on my hands and I'm wondering if you
>>> might be able to provide an assessment based on the symptoms I've seeing.
>>>
>>> I'm currently running a Foundry FESX648-PREM with the following
>>> version info:
>>>
>>> SSH@FESX648 Router>show version
>>> SW: Version 07.4.00eT3e3 Copyright (c) 1996-2012 Brocade
>>> Communications Systems, Inc. All rights reserved.
>>> Compiled on Dec 11 2013 at 19:00:43 labeled as SXR07400e
>>> (4593059 bytes) Primary sxr07400e.bin
>>> BootROM: Version 07.4.01T3e5 (FEv2)
>>> HW: Stackable FESX648-PREM6 (PROM-TYPE FESX648-L3U-IPV6)
>>>
>>> ==========================================================================
>>> Serial #: FL18090011
>>> License: SX_V6_HW_ROUTER_IPv6_SOFT_PACKAGE (LID: XXXXXXXXXXX)
>>> P-ASIC 0: type 0111, rev 00 subrev 01
>>> P-ASIC 1: type 0111, rev 00 subrev 01
>>> P-ASIC 2: type 0111, rev 00 subrev 01
>>> P-ASIC 3: type 0111, rev 00 subrev 01
>>>
>>> ==========================================================================
>>> 300 MHz Power PC processor 8245 (version 0081/1014) 66 MHz bus
>>> 512 KB boot flash memory
>>> 8192 KB code flash memory
>>> 256 MB DRAM
>>> The system uptime is 26 minutes 49 seconds
>>> The system : started=warm start reloaded=by "reload"
>>>
>>>
>>> Quick summary of the symptoms:
>>>
>>> 1. These problems started only after ~15 servers were connected to the
>>> switch. Although many servers were connected, utilization remains low, only
>>> ~40Mbit on a 1Gbit uplink.
>>>
>>> 2. I just rebooted my switch 20 minutes ago, but I'm already seeing a
>>> ton of FCS errors across many ports: http://pbrd.co/SABLtk
>>>
>>> 3. Inexplicably high and erratic ping times (80ms, instead of the
>>> usual 20ms over the same route and variation of +- 20ms on every ping).
>>> Ping times were low and stable before many servers were connected.
>>>
>>> 4. High packet loss. Before a lot of servers were connected, there was
>>> no packet loss. Yesterday, the packet loss was hovering around 10%. It
>>> seems to be worsening now. Today the average packet loss is 20%.
>>>
>>> Screen capture: http://pbrd.co/SADKO7 <http://pbrd.co/SABZ3D>
>>>
>>> 5. Yesterday I was also able to temporarily eliminate packet loss and
>>> the high ping times by disabling specific ports. Today, disabling ports 7
>>> and 11 has no effect.
>>>
>>> 6. The cross-connect cables were suspect, but all cables have since
>>> been tested with a MicroTest PentaScanner and all passed. We even replaced
>>> the CAT5 cross-connect with a machined and molded CAT6 cable -- the same
>>> packet loss and erratic ping times persisted.
>>>
>>> 7. Other strange things have happened. Yesterday I attempted to
>>> connect up two new servers to the switch on port 37 and 38. Ports 5-48
>>> belong to the same default VLAN. The servers could connect to the switch,
>>> and ping the gateway IP, but they could not ping to the outside world. I
>>> then moved the CAT5 cables to ports 22 and 23 -- same VLAN -- and
>>> everything worked perfectly.
>>>
>>> Does this seem like a failing switch? Are there any further diagnostic
>>> tests I could run to verify this?
>>>
>>> Thanks,
>>> Elliot
>>>
>>>
>>>
>>> _______________________________________________
>>> foundry-nsp mailing listfoundry-nsp@puck.nether.nethttp://puck.nether.net/mailman/listinfo/foundry-nsp
>>>
>>>
>>>
>>> --
>>>
>>> Jeroen Wunnink
>>> IP NOC Manager - Hibernia Networksjeroen.wunnink@hibernianetworks.com
>>> Phone: +1 908 516 4200 (Ext: 1011)
>>> 24/7 NOC Phone: +31 20 82 00 623
>>>
>>>
>>
>
Re: Trying to diagnose a possibly failing FESX648-PREM [ In reply to ]
Could it be a cabling issue? Are there any errors?

Is flow control enabled?

--
Eldon Koyle

On May 08 14:13-0700, redacted@gmail.com wrote:
> Just spoke with a sysadmin working out of a different datacenter. They have
> FESX648-PREMs deployed and they're running sxr07400e.bin firmware as well.
> Completely stumped at this point :-/
>
>
> On Thu, May 8, 2014 at 1:38 PM, redacted@gmail.com <redacted@gmail.com>wrote:
>
> > I just had a replacement FESX648-PREM delivered overnight, hooked it up
> > and initially all looked good. However, when I imported my config and moved
> > over all of the CAT5e cables, the packet loss and erratic pings resumed.
> >
> > Assuming that there was some firmware issue at play, I started removing
> > different parts of my config while running a continuous ping test in the
> > background. The moment I removed all rate-limiting from the device, packet
> > loss halted and ping times stabilized. However, I continue to have problems
> > downloading files at full speed -- speed test files will do these 'stop and
> > start' pauses. Ultimately I can only average 6MB/s where I'd
> > normally expect to pull down at least 200MB/s.
> >
> > My original switch was running sxr07400e.bin and the replacement is
> > running sxr07400d.bin
> >
> > All my other switches are FESX448-PREMs, so unfortunately I don't have an
> > existing example config to model after.
> >
> > Anyone recommend a boot ROM and firmware version that works well with a
> > FESX648-PREM?
> >
> >
> >
> >
> > On Wed, May 7, 2014 at 4:36 PM, redacted@gmail.com <redacted@gmail.com>wrote:
> >
> >> This is a stand-alone switch in a cabinet so no L2 loop there. Pretty
> >> simple setup -- single BGP session with an upstream provider with the
> >> default route pointing right to them. CPU utilization currently sitting at
> >> 1%.
> >>
> >> Initially when I noticed the packet loss I thought I was getting DoS
> >> attacked, but I have sFlow monitoring activated on all ports and don't see
> >> anything out of the ordinary.
> >>
> >> I'll check the boot time diagnostics soon -- thanks for your input.
> >>
> >> - Elliot
> >>
> >>
> >> On Wed, May 7, 2014 at 4:28 PM, Jeroen Wunnink | Hibernia Networks <
> >> jeroen.wunnink@atrato.com> wrote:
> >>
> >>> Could be a L2 loop or ddos against the mgmt IP. is the CPU load also
> >>> high?
> >>>
> >>>
> >>> On 07/05/14 20:46, redacted@gmail.com wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I believe I have a failing switch on my hands and I'm wondering if you
> >>> might be able to provide an assessment based on the symptoms I've seeing.
> >>>
> >>> I'm currently running a Foundry FESX648-PREM with the following
> >>> version info:
> >>>
> >>> SSH@FESX648 Router>show version
> >>> SW: Version 07.4.00eT3e3 Copyright (c) 1996-2012 Brocade
> >>> Communications Systems, Inc. All rights reserved.
> >>> Compiled on Dec 11 2013 at 19:00:43 labeled as SXR07400e
> >>> (4593059 bytes) Primary sxr07400e.bin
> >>> BootROM: Version 07.4.01T3e5 (FEv2)
> >>> HW: Stackable FESX648-PREM6 (PROM-TYPE FESX648-L3U-IPV6)
> >>>
> >>> ==========================================================================
> >>> Serial #: FL18090011
> >>> License: SX_V6_HW_ROUTER_IPv6_SOFT_PACKAGE (LID: XXXXXXXXXXX)
> >>> P-ASIC 0: type 0111, rev 00 subrev 01
> >>> P-ASIC 1: type 0111, rev 00 subrev 01
> >>> P-ASIC 2: type 0111, rev 00 subrev 01
> >>> P-ASIC 3: type 0111, rev 00 subrev 01
> >>>
> >>> ==========================================================================
> >>> 300 MHz Power PC processor 8245 (version 0081/1014) 66 MHz bus
> >>> 512 KB boot flash memory
> >>> 8192 KB code flash memory
> >>> 256 MB DRAM
> >>> The system uptime is 26 minutes 49 seconds
> >>> The system : started=warm start reloaded=by "reload"
> >>>
> >>>
> >>> Quick summary of the symptoms:
> >>>
> >>> 1. These problems started only after ~15 servers were connected to the
> >>> switch. Although many servers were connected, utilization remains low, only
> >>> ~40Mbit on a 1Gbit uplink.
> >>>
> >>> 2. I just rebooted my switch 20 minutes ago, but I'm already seeing a
> >>> ton of FCS errors across many ports: http://pbrd.co/SABLtk
> >>>
> >>> 3. Inexplicably high and erratic ping times (80ms, instead of the
> >>> usual 20ms over the same route and variation of +- 20ms on every ping).
> >>> Ping times were low and stable before many servers were connected.
> >>>
> >>> 4. High packet loss. Before a lot of servers were connected, there was
> >>> no packet loss. Yesterday, the packet loss was hovering around 10%. It
> >>> seems to be worsening now. Today the average packet loss is 20%.
> >>>
> >>> Screen capture: http://pbrd.co/SADKO7 <http://pbrd.co/SABZ3D>
> >>>
> >>> 5. Yesterday I was also able to temporarily eliminate packet loss and
> >>> the high ping times by disabling specific ports. Today, disabling ports 7
> >>> and 11 has no effect.
> >>>
> >>> 6. The cross-connect cables were suspect, but all cables have since
> >>> been tested with a MicroTest PentaScanner and all passed. We even replaced
> >>> the CAT5 cross-connect with a machined and molded CAT6 cable -- the same
> >>> packet loss and erratic ping times persisted.
> >>>
> >>> 7. Other strange things have happened. Yesterday I attempted to
> >>> connect up two new servers to the switch on port 37 and 38. Ports 5-48
> >>> belong to the same default VLAN. The servers could connect to the switch,
> >>> and ping the gateway IP, but they could not ping to the outside world. I
> >>> then moved the CAT5 cables to ports 22 and 23 -- same VLAN -- and
> >>> everything worked perfectly.
> >>>
> >>> Does this seem like a failing switch? Are there any further diagnostic
> >>> tests I could run to verify this?
> >>>
> >>> Thanks,
> >>> Elliot
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> foundry-nsp mailing listfoundry-nsp@puck.nether.nethttp://puck.nether.net/mailman/listinfo/foundry-nsp
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Jeroen Wunnink
> >>> IP NOC Manager - Hibernia Networksjeroen.wunnink@hibernianetworks.com
> >>> Phone: +1 908 516 4200 (Ext: 1011)
> >>> 24/7 NOC Phone: +31 20 82 00 623
> >>>
> >>>
> >>
> >

> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/foundry-nsp

_______________________________________________
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp
Re: Trying to diagnose a possibly failing FESX648-PREM [ In reply to ]
Plenty of FCS errors and they're incrementing on the new switch as well.
Flow control is enabled on all ports. Here's my 'show statistics' output:

SSH@FESX648 Router(config)#show statistics
Port In Packets Out Packets In Errors Out
Errors
1 180855 0
0 0
2 0 0
0 0
3 123136488 70341679
0 0
4 0 0
0 0
5 5315137 6604598
648949 0
6 342105 1549867
535454 0
7 9669516 16503017
3137016 0
8 14399232 29683571
1 0
9 9974691 18817287
3853703 0
10 4152353 4000770
0 0
11 13630527 25175503
5483288 0
12 71369 149477
1642 0
13 6881418 1668386
158036 0
14 939892 3171692
261376 0
15 11008907 20921720
4404347 0
16 77529 222362
24009 0
17 433 87820
0 0
18 82308 1759389
759693 0
19 0 0
0 0
20 27175 109184
1567 0
21 0 0
0 0
22 0 0
0 0
23 0 0
0 0
24 0 0
0 0
25 0 391
0 0
26 410 0
0 0
27 0 0
0 0
28 0 0
0 0
29 0 0
0 0

Almost every port that is active has FCS errors.

I've had such an bizarre combination of symptoms (15% packet loss
and erratic pings that was resolved by removing rate-limiting), that I
initially I discounted the possibility that my cables were bad. However, I
did self-terminate all of them (I've terminated thousands of cables) and I
was using a new bag of RJ45 plugs that I haven't used elsewhere.

The datacenter technician who tested my uplink cross-connect cable also
tested one of my self-terminated cables. Both cables passed the test, but
maybe the rest of my self-terminated cables are bad...


On Thu, May 8, 2014 at 2:22 PM, Eldon Koyle <
esk-puck.nether.net@esk.cs.usu.edu> wrote:

> Could it be a cabling issue? Are there any errors?
>
> Is flow control enabled?
>
> --
> Eldon Koyle
>
> On May 08 14:13-0700, redacted@gmail.com wrote:
> > Just spoke with a sysadmin working out of a different datacenter. They
> have
> > FESX648-PREMs deployed and they're running sxr07400e.bin firmware as
> well.
> > Completely stumped at this point :-/
> >
> >
> > On Thu, May 8, 2014 at 1:38 PM, redacted@gmail.com <redacted@gmail.com
> >wrote:
> >
> > > I just had a replacement FESX648-PREM delivered overnight, hooked it up
> > > and initially all looked good. However, when I imported my config and
> moved
> > > over all of the CAT5e cables, the packet loss and erratic pings
> resumed.
> > >
> > > Assuming that there was some firmware issue at play, I started removing
> > > different parts of my config while running a continuous ping test in
> the
> > > background. The moment I removed all rate-limiting from the device,
> packet
> > > loss halted and ping times stabilized. However, I continue to have
> problems
> > > downloading files at full speed -- speed test files will do these
> 'stop and
> > > start' pauses. Ultimately I can only average 6MB/s where I'd
> > > normally expect to pull down at least 200MB/s.
> > >
> > > My original switch was running sxr07400e.bin and the replacement is
> > > running sxr07400d.bin
> > >
> > > All my other switches are FESX448-PREMs, so unfortunately I don't have
> an
> > > existing example config to model after.
> > >
> > > Anyone recommend a boot ROM and firmware version that works well with a
> > > FESX648-PREM?
> > >
> > >
> > >
> > >
> > > On Wed, May 7, 2014 at 4:36 PM, redacted@gmail.com <redacted@gmail.com
> >wrote:
> > >
> > >> This is a stand-alone switch in a cabinet so no L2 loop there. Pretty
> > >> simple setup -- single BGP session with an upstream provider with the
> > >> default route pointing right to them. CPU utilization currently
> sitting at
> > >> 1%.
> > >>
> > >> Initially when I noticed the packet loss I thought I was getting DoS
> > >> attacked, but I have sFlow monitoring activated on all ports and
> don't see
> > >> anything out of the ordinary.
> > >>
> > >> I'll check the boot time diagnostics soon -- thanks for your input.
> > >>
> > >> - Elliot
> > >>
> > >>
> > >> On Wed, May 7, 2014 at 4:28 PM, Jeroen Wunnink | Hibernia Networks <
> > >> jeroen.wunnink@atrato.com> wrote:
> > >>
> > >>> Could be a L2 loop or ddos against the mgmt IP. is the CPU load also
> > >>> high?
> > >>>
> > >>>
> > >>> On 07/05/14 20:46, redacted@gmail.com wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> I believe I have a failing switch on my hands and I'm wondering if
> you
> > >>> might be able to provide an assessment based on the symptoms I've
> seeing.
> > >>>
> > >>> I'm currently running a Foundry FESX648-PREM with the following
> > >>> version info:
> > >>>
> > >>> SSH@FESX648 Router>show version
> > >>> SW: Version 07.4.00eT3e3 Copyright (c) 1996-2012 Brocade
> > >>> Communications Systems, Inc. All rights reserved.
> > >>> Compiled on Dec 11 2013 at 19:00:43 labeled as SXR07400e
> > >>> (4593059 bytes) Primary sxr07400e.bin
> > >>> BootROM: Version 07.4.01T3e5 (FEv2)
> > >>> HW: Stackable FESX648-PREM6 (PROM-TYPE FESX648-L3U-IPV6)
> > >>>
> > >>>
> ==========================================================================
> > >>> Serial #: FL18090011
> > >>> License: SX_V6_HW_ROUTER_IPv6_SOFT_PACKAGE (LID:
> XXXXXXXXXXX)
> > >>> P-ASIC 0: type 0111, rev 00 subrev 01
> > >>> P-ASIC 1: type 0111, rev 00 subrev 01
> > >>> P-ASIC 2: type 0111, rev 00 subrev 01
> > >>> P-ASIC 3: type 0111, rev 00 subrev 01
> > >>>
> > >>>
> ==========================================================================
> > >>> 300 MHz Power PC processor 8245 (version 0081/1014) 66 MHz bus
> > >>> 512 KB boot flash memory
> > >>> 8192 KB code flash memory
> > >>> 256 MB DRAM
> > >>> The system uptime is 26 minutes 49 seconds
> > >>> The system : started=warm start reloaded=by "reload"
> > >>>
> > >>>
> > >>> Quick summary of the symptoms:
> > >>>
> > >>> 1. These problems started only after ~15 servers were connected to
> the
> > >>> switch. Although many servers were connected, utilization remains
> low, only
> > >>> ~40Mbit on a 1Gbit uplink.
> > >>>
> > >>> 2. I just rebooted my switch 20 minutes ago, but I'm already seeing
> a
> > >>> ton of FCS errors across many ports: http://pbrd.co/SABLtk
> > >>>
> > >>> 3. Inexplicably high and erratic ping times (80ms, instead of the
> > >>> usual 20ms over the same route and variation of +- 20ms on every
> ping).
> > >>> Ping times were low and stable before many servers were connected.
> > >>>
> > >>> 4. High packet loss. Before a lot of servers were connected, there
> was
> > >>> no packet loss. Yesterday, the packet loss was hovering around 10%.
> It
> > >>> seems to be worsening now. Today the average packet loss is 20%.
> > >>>
> > >>> Screen capture: http://pbrd.co/SADKO7 <http://pbrd.co/SABZ3D>
> > >>>
> > >>> 5. Yesterday I was also able to temporarily eliminate packet loss
> and
> > >>> the high ping times by disabling specific ports. Today, disabling
> ports 7
> > >>> and 11 has no effect.
> > >>>
> > >>> 6. The cross-connect cables were suspect, but all cables have since
> > >>> been tested with a MicroTest PentaScanner and all passed. We even
> replaced
> > >>> the CAT5 cross-connect with a machined and molded CAT6 cable -- the
> same
> > >>> packet loss and erratic ping times persisted.
> > >>>
> > >>> 7. Other strange things have happened. Yesterday I attempted to
> > >>> connect up two new servers to the switch on port 37 and 38. Ports
> 5-48
> > >>> belong to the same default VLAN. The servers could connect to the
> switch,
> > >>> and ping the gateway IP, but they could not ping to the outside
> world. I
> > >>> then moved the CAT5 cables to ports 22 and 23 -- same VLAN -- and
> > >>> everything worked perfectly.
> > >>>
> > >>> Does this seem like a failing switch? Are there any further
> diagnostic
> > >>> tests I could run to verify this?
> > >>>
> > >>> Thanks,
> > >>> Elliot
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> foundry-nsp mailing listfoundry-nsp@puck.nether.nethttp://
> puck.nether.net/mailman/listinfo/foundry-nsp
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Jeroen Wunnink
> > >>> IP NOC Manager - Hibernia
> Networksjeroen.wunnink@hibernianetworks.com
> > >>> Phone: +1 908 516 4200 (Ext: 1011)
> > >>> 24/7 NOC Phone: +31 20 82 00 623
> > >>>
> > >>>
> > >>
> > >
>
> > _______________________________________________
> > foundry-nsp mailing list
> > foundry-nsp@puck.nether.net
> > http://puck.nether.net/mailman/listinfo/foundry-nsp
>
>
Re: Trying to diagnose a possibly failing FESX648-PREM [ In reply to ]
Hi all,

After spamming this mailing list so heavily, I might as well spam it once
more with the resolution for my perplexing problem.

Turns out the problem was more straightforward that I anticipated. As I
hooked more servers up to the switch, I set many switch ports manually to
100-full -- as a crude means of rate-limiting. It turns out many of those
servers dropped down to 100-half on their end -- with the *notable
exception* of the server I was using for testing.

I suppose the packet loss, erratic ping times and degraded transfer speeds
(even on my correctly negotiated test server) were all just a result of the
switch becoming overwhelmed with duplex mismatch errors.

Thanks for all the tips -- one of you on here prompted me to check all the
connected servers for duplex mismatches and that was the prompting I needed.


On Thu, May 8, 2014 at 2:22 PM, Eldon Koyle <
esk-puck.nether.net@esk.cs.usu.edu> wrote:

> Could it be a cabling issue? Are there any errors?
>
> Is flow control enabled?
>
> --
> Eldon Koyle
>
> On May 08 14:13-0700, redacted@gmail.com wrote:
> > Just spoke with a sysadmin working out of a different datacenter. They
> have
> > FESX648-PREMs deployed and they're running sxr07400e.bin firmware as
> well.
> > Completely stumped at this point :-/
> >
> >
> > On Thu, May 8, 2014 at 1:38 PM, redacted@gmail.com <redacted@gmail.com
> >wrote:
> >
> > > I just had a replacement FESX648-PREM delivered overnight, hooked it up
> > > and initially all looked good. However, when I imported my config and
> moved
> > > over all of the CAT5e cables, the packet loss and erratic pings
> resumed.
> > >
> > > Assuming that there was some firmware issue at play, I started removing
> > > different parts of my config while running a continuous ping test in
> the
> > > background. The moment I removed all rate-limiting from the device,
> packet
> > > loss halted and ping times stabilized. However, I continue to have
> problems
> > > downloading files at full speed -- speed test files will do these
> 'stop and
> > > start' pauses. Ultimately I can only average 6MB/s where I'd
> > > normally expect to pull down at least 200MB/s.
> > >
> > > My original switch was running sxr07400e.bin and the replacement is
> > > running sxr07400d.bin
> > >
> > > All my other switches are FESX448-PREMs, so unfortunately I don't have
> an
> > > existing example config to model after.
> > >
> > > Anyone recommend a boot ROM and firmware version that works well with a
> > > FESX648-PREM?
> > >
> > >
> > >
> > >
> > > On Wed, May 7, 2014 at 4:36 PM, redacted@gmail.com <redacted@gmail.com
> >wrote:
> > >
> > >> This is a stand-alone switch in a cabinet so no L2 loop there. Pretty
> > >> simple setup -- single BGP session with an upstream provider with the
> > >> default route pointing right to them. CPU utilization currently
> sitting at
> > >> 1%.
> > >>
> > >> Initially when I noticed the packet loss I thought I was getting DoS
> > >> attacked, but I have sFlow monitoring activated on all ports and
> don't see
> > >> anything out of the ordinary.
> > >>
> > >> I'll check the boot time diagnostics soon -- thanks for your input.
> > >>
> > >> - Elliot
> > >>
> > >>
> > >> On Wed, May 7, 2014 at 4:28 PM, Jeroen Wunnink | Hibernia Networks <
> > >> jeroen.wunnink@atrato.com> wrote:
> > >>
> > >>> Could be a L2 loop or ddos against the mgmt IP. is the CPU load also
> > >>> high?
> > >>>
> > >>>
> > >>> On 07/05/14 20:46, redacted@gmail.com wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> I believe I have a failing switch on my hands and I'm wondering if
> you
> > >>> might be able to provide an assessment based on the symptoms I've
> seeing.
> > >>>
> > >>> I'm currently running a Foundry FESX648-PREM with the following
> > >>> version info:
> > >>>
> > >>> SSH@FESX648 Router>show version
> > >>> SW: Version 07.4.00eT3e3 Copyright (c) 1996-2012 Brocade
> > >>> Communications Systems, Inc. All rights reserved.
> > >>> Compiled on Dec 11 2013 at 19:00:43 labeled as SXR07400e
> > >>> (4593059 bytes) Primary sxr07400e.bin
> > >>> BootROM: Version 07.4.01T3e5 (FEv2)
> > >>> HW: Stackable FESX648-PREM6 (PROM-TYPE FESX648-L3U-IPV6)
> > >>>
> > >>>
> ==========================================================================
> > >>> Serial #: FL18090011
> > >>> License: SX_V6_HW_ROUTER_IPv6_SOFT_PACKAGE (LID:
> XXXXXXXXXXX)
> > >>> P-ASIC 0: type 0111, rev 00 subrev 01
> > >>> P-ASIC 1: type 0111, rev 00 subrev 01
> > >>> P-ASIC 2: type 0111, rev 00 subrev 01
> > >>> P-ASIC 3: type 0111, rev 00 subrev 01
> > >>>
> > >>>
> ==========================================================================
> > >>> 300 MHz Power PC processor 8245 (version 0081/1014) 66 MHz bus
> > >>> 512 KB boot flash memory
> > >>> 8192 KB code flash memory
> > >>> 256 MB DRAM
> > >>> The system uptime is 26 minutes 49 seconds
> > >>> The system : started=warm start reloaded=by "reload"
> > >>>
> > >>>
> > >>> Quick summary of the symptoms:
> > >>>
> > >>> 1. These problems started only after ~15 servers were connected to
> the
> > >>> switch. Although many servers were connected, utilization remains
> low, only
> > >>> ~40Mbit on a 1Gbit uplink.
> > >>>
> > >>> 2. I just rebooted my switch 20 minutes ago, but I'm already seeing
> a
> > >>> ton of FCS errors across many ports: http://pbrd.co/SABLtk
> > >>>
> > >>> 3. Inexplicably high and erratic ping times (80ms, instead of the
> > >>> usual 20ms over the same route and variation of +- 20ms on every
> ping).
> > >>> Ping times were low and stable before many servers were connected.
> > >>>
> > >>> 4. High packet loss. Before a lot of servers were connected, there
> was
> > >>> no packet loss. Yesterday, the packet loss was hovering around 10%.
> It
> > >>> seems to be worsening now. Today the average packet loss is 20%.
> > >>>
> > >>> Screen capture: http://pbrd.co/SADKO7 <http://pbrd.co/SABZ3D>
> > >>>
> > >>> 5. Yesterday I was also able to temporarily eliminate packet loss
> and
> > >>> the high ping times by disabling specific ports. Today, disabling
> ports 7
> > >>> and 11 has no effect.
> > >>>
> > >>> 6. The cross-connect cables were suspect, but all cables have since
> > >>> been tested with a MicroTest PentaScanner and all passed. We even
> replaced
> > >>> the CAT5 cross-connect with a machined and molded CAT6 cable -- the
> same
> > >>> packet loss and erratic ping times persisted.
> > >>>
> > >>> 7. Other strange things have happened. Yesterday I attempted to
> > >>> connect up two new servers to the switch on port 37 and 38. Ports
> 5-48
> > >>> belong to the same default VLAN. The servers could connect to the
> switch,
> > >>> and ping the gateway IP, but they could not ping to the outside
> world. I
> > >>> then moved the CAT5 cables to ports 22 and 23 -- same VLAN -- and
> > >>> everything worked perfectly.
> > >>>
> > >>> Does this seem like a failing switch? Are there any further
> diagnostic
> > >>> tests I could run to verify this?
> > >>>
> > >>> Thanks,
> > >>> Elliot
> > >>>
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> foundry-nsp mailing listfoundry-nsp@puck.nether.nethttp://
> puck.nether.net/mailman/listinfo/foundry-nsp
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>>
> > >>> Jeroen Wunnink
> > >>> IP NOC Manager - Hibernia
> Networksjeroen.wunnink@hibernianetworks.com
> > >>> Phone: +1 908 516 4200 (Ext: 1011)
> > >>> 24/7 NOC Phone: +31 20 82 00 623
> > >>>
> > >>>
> > >>
> > >
>
> > _______________________________________________
> > foundry-nsp mailing list
> > foundry-nsp@puck.nether.net
> > http://puck.nether.net/mailman/listinfo/foundry-nsp
>
>
Re: Trying to diagnose a possibly failing FESX648-PREM [ In reply to ]
Hi

As a general advice I would sort it out layer by layer with a checklist.

Check grounding
Check cable type and Shield
Check cable (FastIron# phy cable-diag tdr <n>) or with a diag tool
Check speed/duplex
Check the numbers of input/output on the other site.
Packetsize

Next safe a show tech- befor any config changes and use a diff or version tool.


On the device check the hw limits (tcam) and the impact of changing system-values (max vlan) cause the ipv6 models differs by allocating tcam. Do you enable ipv6 ?

Regards Erich






> Am 09.05.2014 um 00:07 schrieb "redacted@gmail.com" <redacted@gmail.com>:
>
> Plenty of FCS errors and they're incrementing on the new switch as well. Flow control is enabled on all ports. Here's my 'show statistics' output:
>
> SSH@FESX648 Router(config)#show statistics
> Port In Packets Out Packets In Errors Out Errors
> 1 180855 0 0 0
> 2 0 0 0 0
> 3 123136488 70341679 0 0
> 4 0 0 0 0
> 5 5315137 6604598 648949 0
> 6 342105 1549867 535454 0
> 7 9669516 16503017 3137016 0
> 8 14399232 29683571 1 0
> 9 9974691 18817287 3853703 0
> 10 4152353 4000770 0 0
> 11 13630527 25175503 5483288 0
> 12 71369 149477 1642 0
> 13 6881418 1668386 158036 0
> 14 939892 3171692 261376 0
> 15 11008907 20921720 4404347 0
> 16 77529 222362 24009 0
> 17 433 87820 0 0
> 18 82308 1759389 759693 0
> 19 0 0 0 0
> 20 27175 109184 1567 0
> 21 0 0 0 0
> 22 0 0 0 0
> 23 0 0 0 0
> 24 0 0 0 0
> 25 0 391 0 0
> 26 410 0 0 0
> 27 0 0 0 0
> 28 0 0 0 0
> 29 0 0 0 0
>
> Almost every port that is active has FCS errors.
>
> I've had such an bizarre combination of symptoms (15% packet loss and erratic pings that was resolved by removing rate-limiting), that I initially I discounted the possibility that my cables were bad. However, I did self-terminate all of them (I've terminated thousands of cables) and I was using a new bag of RJ45 plugs that I haven't used elsewhere.
>
> The datacenter technician who tested my uplink cross-connect cable also tested one of my self-terminated cables. Both cables passed the test, but maybe the rest of my self-terminated cables are bad...
>
>
>> On Thu, May 8, 2014 at 2:22 PM, Eldon Koyle <esk-puck.nether.net@esk.cs.usu.edu> wrote:
>> Could it be a cabling issue? Are there any errors?
>>
>> Is flow control enabled?
>>
>> --
>> Eldon Koyle
>>
>> On May 08 14:13-0700, redacted@gmail.com wrote:
>> > Just spoke with a sysadmin working out of a different datacenter. They have
>> > FESX648-PREMs deployed and they're running sxr07400e.bin firmware as well.
>> > Completely stumped at this point :-/
>> >
>> >
>> > On Thu, May 8, 2014 at 1:38 PM, redacted@gmail.com <redacted@gmail.com>wrote:
>> >
>> > > I just had a replacement FESX648-PREM delivered overnight, hooked it up
>> > > and initially all looked good. However, when I imported my config and moved
>> > > over all of the CAT5e cables, the packet loss and erratic pings resumed.
>> > >
>> > > Assuming that there was some firmware issue at play, I started removing
>> > > different parts of my config while running a continuous ping test in the
>> > > background. The moment I removed all rate-limiting from the device, packet
>> > > loss halted and ping times stabilized. However, I continue to have problems
>> > > downloading files at full speed -- speed test files will do these 'stop and
>> > > start' pauses. Ultimately I can only average 6MB/s where I'd
>> > > normally expect to pull down at least 200MB/s.
>> > >
>> > > My original switch was running sxr07400e.bin and the replacement is
>> > > running sxr07400d.bin
>> > >
>> > > All my other switches are FESX448-PREMs, so unfortunately I don't have an
>> > > existing example config to model after.
>> > >
>> > > Anyone recommend a boot ROM and firmware version that works well with a
>> > > FESX648-PREM?
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, May 7, 2014 at 4:36 PM, redacted@gmail.com <redacted@gmail.com>wrote:
>> > >
>> > >> This is a stand-alone switch in a cabinet so no L2 loop there. Pretty
>> > >> simple setup -- single BGP session with an upstream provider with the
>> > >> default route pointing right to them. CPU utilization currently sitting at
>> > >> 1%.
>> > >>
>> > >> Initially when I noticed the packet loss I thought I was getting DoS
>> > >> attacked, but I have sFlow monitoring activated on all ports and don't see
>> > >> anything out of the ordinary.
>> > >>
>> > >> I'll check the boot time diagnostics soon -- thanks for your input.
>> > >>
>> > >> - Elliot
>> > >>
>> > >>
>> > >> On Wed, May 7, 2014 at 4:28 PM, Jeroen Wunnink | Hibernia Networks <
>> > >> jeroen.wunnink@atrato.com> wrote:
>> > >>
>> > >>> Could be a L2 loop or ddos against the mgmt IP. is the CPU load also
>> > >>> high?
>> > >>>
>> > >>>
>> > >>> On 07/05/14 20:46, redacted@gmail.com wrote:
>> > >>>
>> > >>> Hi all,
>> > >>>
>> > >>> I believe I have a failing switch on my hands and I'm wondering if you
>> > >>> might be able to provide an assessment based on the symptoms I've seeing.
>> > >>>
>> > >>> I'm currently running a Foundry FESX648-PREM with the following
>> > >>> version info:
>> > >>>
>> > >>> SSH@FESX648 Router>show version
>> > >>> SW: Version 07.4.00eT3e3 Copyright (c) 1996-2012 Brocade
>> > >>> Communications Systems, Inc. All rights reserved.
>> > >>> Compiled on Dec 11 2013 at 19:00:43 labeled as SXR07400e
>> > >>> (4593059 bytes) Primary sxr07400e.bin
>> > >>> BootROM: Version 07.4.01T3e5 (FEv2)
>> > >>> HW: Stackable FESX648-PREM6 (PROM-TYPE FESX648-L3U-IPV6)
>> > >>>
>> > >>> ==========================================================================
>> > >>> Serial #: FL18090011
>> > >>> License: SX_V6_HW_ROUTER_IPv6_SOFT_PACKAGE (LID: XXXXXXXXXXX)
>> > >>> P-ASIC 0: type 0111, rev 00 subrev 01
>> > >>> P-ASIC 1: type 0111, rev 00 subrev 01
>> > >>> P-ASIC 2: type 0111, rev 00 subrev 01
>> > >>> P-ASIC 3: type 0111, rev 00 subrev 01
>> > >>>
>> > >>> ==========================================================================
>> > >>> 300 MHz Power PC processor 8245 (version 0081/1014) 66 MHz bus
>> > >>> 512 KB boot flash memory
>> > >>> 8192 KB code flash memory
>> > >>> 256 MB DRAM
>> > >>> The system uptime is 26 minutes 49 seconds
>> > >>> The system : started=warm start reloaded=by "reload"
>> > >>>
>> > >>>
>> > >>> Quick summary of the symptoms:
>> > >>>
>> > >>> 1. These problems started only after ~15 servers were connected to the
>> > >>> switch. Although many servers were connected, utilization remains low, only
>> > >>> ~40Mbit on a 1Gbit uplink.
>> > >>>
>> > >>> 2. I just rebooted my switch 20 minutes ago, but I'm already seeing a
>> > >>> ton of FCS errors across many ports: http://pbrd.co/SABLtk
>> > >>>
>> > >>> 3. Inexplicably high and erratic ping times (80ms, instead of the
>> > >>> usual 20ms over the same route and variation of +- 20ms on every ping).
>> > >>> Ping times were low and stable before many servers were connected.
>> > >>>
>> > >>> 4. High packet loss. Before a lot of servers were connected, there was
>> > >>> no packet loss. Yesterday, the packet loss was hovering around 10%. It
>> > >>> seems to be worsening now. Today the average packet loss is 20%.
>> > >>>
>> > >>> Screen capture: http://pbrd.co/SADKO7 <http://pbrd.co/SABZ3D>
>> > >>>
>> > >>> 5. Yesterday I was also able to temporarily eliminate packet loss and
>> > >>> the high ping times by disabling specific ports. Today, disabling ports 7
>> > >>> and 11 has no effect.
>> > >>>
>> > >>> 6. The cross-connect cables were suspect, but all cables have since
>> > >>> been tested with a MicroTest PentaScanner and all passed. We even replaced
>> > >>> the CAT5 cross-connect with a machined and molded CAT6 cable -- the same
>> > >>> packet loss and erratic ping times persisted.
>> > >>>
>> > >>> 7. Other strange things have happened. Yesterday I attempted to
>> > >>> connect up two new servers to the switch on port 37 and 38. Ports 5-48
>> > >>> belong to the same default VLAN. The servers could connect to the switch,
>> > >>> and ping the gateway IP, but they could not ping to the outside world. I
>> > >>> then moved the CAT5 cables to ports 22 and 23 -- same VLAN -- and
>> > >>> everything worked perfectly.
>> > >>>
>> > >>> Does this seem like a failing switch? Are there any further diagnostic
>> > >>> tests I could run to verify this?
>> > >>>
>> > >>> Thanks,
>> > >>> Elliot
>> > >>>
>> > >>>
>> > >>>
>> > >>> _______________________________________________
>> > >>> foundry-nsp mailing listfoundry-nsp@puck.nether.nethttp://puck.nether.net/mailman/listinfo/foundry-nsp
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>>
>> > >>> Jeroen Wunnink
>> > >>> IP NOC Manager - Hibernia Networksjeroen.wunnink@hibernianetworks.com
>> > >>> Phone: +1 908 516 4200 (Ext: 1011)
>> > >>> 24/7 NOC Phone: +31 20 82 00 623
>> > >>>
>> > >>>
>> > >>
>> > >
>>
>> > _______________________________________________
>> > foundry-nsp mailing list
>> > foundry-nsp@puck.nether.net
>> > http://puck.nether.net/mailman/listinfo/foundry-nsp
>
> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/foundry-nsp