Mailing List Archive

FWSX eating arp packets
Hi all, sorry for the long winded post but this has been eating away at
me. Feel free to reply on or off list.

Everything was fine for almost 2 years then out of the blue, a near
complete black hole occurs with traffic between two FWSX switches. In
case you aren't aware, FWSX are just regular FESX switches neutered so
they can't be upgraded with a PREM layer3 license. Here's a diagram:

-----------xc1-----------[FWSX 1]--[server1]
| |
[upstream switch] xc3
| |
-----------xc2-----------[FWSX 2]--[server2]

Both FWSX's are pure layer2 and form a 802.1w loop with xc2 the blocking
link. No frills, bells or whistles.

After many hours of tcpdumping on servers connected to a pair of FWSX
(basic layer2) switches, it turns out ARP unicast packets are being
dropped by the x-connect between two switches but only in one direction.
Below, you'll see the unicast reply to the initial broadcast, but
subsequent unicast pings are dropped (thus only a single reply using
arping).

Traffic between two servers - SERVER 1 (switch1) to SERVER 2 (switch2):
[root@cl-ash-s1 ~]# arping -I eth1.2 10.11.13.11
ARPING 10.11.13.11 from 10.11.13.5 eth1.2
Unicast reply from 10.11.13.11 [A0:36:9F:0E:13:B2] 2.453ms
Sent 11 probes (1 broadcast(s))
Received 1 response(s)

And on the other server - SERVER 2 (switch2) to SERVER 1 (switch1):
[root@localhost ~]# arping 10.11.13.5 -I xenbr2
ARPING 10.11.13.5 from 10.11.13.11 xenbr2
Sent 11 probes (11 broadcast(s))
Received 0 response(s)


In a nutshell -- Unicast ARP from server1 to server2 is completely
dropped. Broadcast works in both directions, and unicast works only
from server2 to server1.

MAC tables on the FWSX's are sane. Every server is shown where it
should be.

Can reproduce this with any device or operating system. It's
definitely NOT a problem with the host configuration(s).

Now the kicker - if I remove the x-connect between the switches (and let
spanning tree re-converge through the upstream switch both are connected
to), things work normally. Tried swapping xc3 to different ports, no
change. So as long as I boomerang inter-switch traffic through the
upstream switch, we're good. Which is quite a bit, actually --
including SAN traffic -- I need to avoid this. Reboot both switches.
No change. Software is latest for the platform (05.1.00eT1e0) so I
can't try upgrading.

So my simple question is -- has anyone ever seen brocade switches (in
pure L2 duties) just straight up eat arp packets? And not only that --
but JUST unicast arp and in only one direction?


--
Randy McAnally
_______________________________________________
foundry-nsp mailing list
foundry-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/foundry-nsp
Re: FWSX eating arp packets [ In reply to ]
I've seen ASIC failures in a lot of my old FESX's recently. Each failure
has had it's own unique set of features.

I'd try moving the cross connect to a different tower (port group, ASIC) on
both switches to rule out ASIC failure.

On Thu, May 28, 2015 at 5:52 PM, Randy McAnally <rsm@fast-serv.com> wrote:

> Hi all, sorry for the long winded post but this has been eating away at
> me. Feel free to reply on or off list.
>
> Everything was fine for almost 2 years then out of the blue, a near
> complete black hole occurs with traffic between two FWSX switches. In case
> you aren't aware, FWSX are just regular FESX switches neutered so they
> can't be upgraded with a PREM layer3 license. Here's a diagram:
>
> -----------xc1-----------[FWSX 1]--[server1]
> | |
> [upstream switch] xc3
> | |
> -----------xc2-----------[FWSX 2]--[server2]
>
> Both FWSX's are pure layer2 and form a 802.1w loop with xc2 the blocking
> link. No frills, bells or whistles.
>
> After many hours of tcpdumping on servers connected to a pair of FWSX
> (basic layer2) switches, it turns out ARP unicast packets are being dropped
> by the x-connect between two switches but only in one direction. Below,
> you'll see the unicast reply to the initial broadcast, but subsequent
> unicast pings are dropped (thus only a single reply using arping).
>
> Traffic between two servers - SERVER 1 (switch1) to SERVER 2 (switch2):
> [root@cl-ash-s1 ~]# arping -I eth1.2 10.11.13.11
> ARPING 10.11.13.11 from 10.11.13.5 eth1.2
> Unicast reply from 10.11.13.11 [A0:36:9F:0E:13:B2] 2.453ms
> Sent 11 probes (1 broadcast(s))
> Received 1 response(s)
>
> And on the other server - SERVER 2 (switch2) to SERVER 1 (switch1):
> [root@localhost ~]# arping 10.11.13.5 -I xenbr2
> ARPING 10.11.13.5 from 10.11.13.11 xenbr2
> Sent 11 probes (11 broadcast(s))
> Received 0 response(s)
>
>
> In a nutshell -- Unicast ARP from server1 to server2 is completely
> dropped. Broadcast works in both directions, and unicast works only from
> server2 to server1.
>
> MAC tables on the FWSX's are sane. Every server is shown where it should
> be.
>
> Can reproduce this with any device or operating system. It's definitely
> NOT a problem with the host configuration(s).
>
> Now the kicker - if I remove the x-connect between the switches (and let
> spanning tree re-converge through the upstream switch both are connected
> to), things work normally. Tried swapping xc3 to different ports, no
> change. So as long as I boomerang inter-switch traffic through the
> upstream switch, we're good. Which is quite a bit, actually -- including
> SAN traffic -- I need to avoid this. Reboot both switches. No change.
> Software is latest for the platform (05.1.00eT1e0) so I can't try
> upgrading.
>
> So my simple question is -- has anyone ever seen brocade switches (in pure
> L2 duties) just straight up eat arp packets? And not only that -- but
> JUST unicast arp and in only one direction?
>
>
> --
> Randy McAnally
> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/foundry-nsp
>
Re: FWSX eating arp packets [ In reply to ]
And, 5.1.00F is available as of 2012, 5.1.00E is no longer the latest ;-)

On Thu, May 28, 2015 at 5:52 PM, Randy McAnally <rsm@fast-serv.com> wrote:

> Hi all, sorry for the long winded post but this has been eating away at
> me. Feel free to reply on or off list.
>
> Everything was fine for almost 2 years then out of the blue, a near
> complete black hole occurs with traffic between two FWSX switches. In case
> you aren't aware, FWSX are just regular FESX switches neutered so they
> can't be upgraded with a PREM layer3 license. Here's a diagram:
>
> -----------xc1-----------[FWSX 1]--[server1]
> | |
> [upstream switch] xc3
> | |
> -----------xc2-----------[FWSX 2]--[server2]
>
> Both FWSX's are pure layer2 and form a 802.1w loop with xc2 the blocking
> link. No frills, bells or whistles.
>
> After many hours of tcpdumping on servers connected to a pair of FWSX
> (basic layer2) switches, it turns out ARP unicast packets are being dropped
> by the x-connect between two switches but only in one direction. Below,
> you'll see the unicast reply to the initial broadcast, but subsequent
> unicast pings are dropped (thus only a single reply using arping).
>
> Traffic between two servers - SERVER 1 (switch1) to SERVER 2 (switch2):
> [root@cl-ash-s1 ~]# arping -I eth1.2 10.11.13.11
> ARPING 10.11.13.11 from 10.11.13.5 eth1.2
> Unicast reply from 10.11.13.11 [A0:36:9F:0E:13:B2] 2.453ms
> Sent 11 probes (1 broadcast(s))
> Received 1 response(s)
>
> And on the other server - SERVER 2 (switch2) to SERVER 1 (switch1):
> [root@localhost ~]# arping 10.11.13.5 -I xenbr2
> ARPING 10.11.13.5 from 10.11.13.11 xenbr2
> Sent 11 probes (11 broadcast(s))
> Received 0 response(s)
>
>
> In a nutshell -- Unicast ARP from server1 to server2 is completely
> dropped. Broadcast works in both directions, and unicast works only from
> server2 to server1.
>
> MAC tables on the FWSX's are sane. Every server is shown where it should
> be.
>
> Can reproduce this with any device or operating system. It's definitely
> NOT a problem with the host configuration(s).
>
> Now the kicker - if I remove the x-connect between the switches (and let
> spanning tree re-converge through the upstream switch both are connected
> to), things work normally. Tried swapping xc3 to different ports, no
> change. So as long as I boomerang inter-switch traffic through the
> upstream switch, we're good. Which is quite a bit, actually -- including
> SAN traffic -- I need to avoid this. Reboot both switches. No change.
> Software is latest for the platform (05.1.00eT1e0) so I can't try
> upgrading.
>
> So my simple question is -- has anyone ever seen brocade switches (in pure
> L2 duties) just straight up eat arp packets? And not only that -- but
> JUST unicast arp and in only one direction?
>
>
> --
> Randy McAnally
> _______________________________________________
> foundry-nsp mailing list
> foundry-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/foundry-nsp
>