Mailing List Archive

Cluster stall issue going to switched cluster...
Hi guys

I just had a bad experience today when trying to convert my switchless AFF8080 to a switched cluster using a pair of CN1610 switches.
The two switches are about 100M from the AFF8080 so the round trip is of cause increased from 0,5M to 200M.
But I do not think this is the problem here? We of cause use MultiMode fibers between the two DCs, and the fibers checks out and didn’t seem to have any errors.

I followed this guide to convert two ports at a time:
https://library.netapp.com/ecm/ecm_download_file/ECMP1140535

And it seemed to work just great on the console ????
I must admit that I didn’t down the CN1610 ports as described because you also down the ports on the NetApp side, so it seems pointless to do it on both sides…
Other than that I followed the guide.

As soon as I completed the migration, the users started to complain about very slow access on both CIFS, NFS and iSCSI.

We looked at the port stats of the link but there were no errors… and the system seemed to serve data, just very much slower the normal… so we had to bite the bullet and switch back again, and as soon as we were back to the normal back2back cables, the system was normal again…

Going through the logs I found this error on one of the four links…

11/9/2020 13:40:34 STOR01-01 ERROR netif.sfpNotSupported: The SFP+ or QSFP+ module (FINISAR CORP. FTLF8528P3BCV-QL) installed in e0a is not supported with this network interface.

And I must admit that a non-netapp SFP+ module had found its way to my stash ????. The other SFP+ modules were the same PN but just with a “NA1” at the end…. I’m not sure if this could cause this issue?
I have seen poor performance when using a non-supported NIC in a NetApp, but non-supported SFP+ module?
Has anyone had issues like this?

We are on ONTAP 9.7P7 and the CN1610 are running a quote old FW, but I have used them for other projects with no issues (i.e. multi node NetApp clusters)…

/Heino
Re: Cluster stall issue going to switched cluster... [ In reply to ]
The CN1610s should be running EFOS 1.3.0.2 or 1.3.0.3 with ONTAP 9.7 And
the Reference Configuration File (RCF) should be current (1.2).
The Supported Optics include:
332-00363, AFBR-703SMZ-NA3, AFBR-703SMZ-NA4, AFBR-709SMZ-NA3,
AFBR-709SMZ-NA4, PLRXPL-SC-S43NA1 and PLRXPL-SC-S43NA2

Anything is possible with unsupported Optics.

Another test you could do....use the optics and try the point-to-point
connections with Optics and fibers first.
Replace both e0a ports, then e0c, then e0b and finally e0d.
If the issue returns, then the blame could certainly be the optics.

--tmac

*Tim McCarthy, **Principal Consultant*

*Proud Member of the #NetAppATeam <https://twitter.com/NetAppATeam>*

*I Blog at TMACsRack <https://tmacsrack.wordpress.com/>*



On Mon, Nov 9, 2020 at 11:37 AM Heino Walther <hw@beardmann.dk> wrote:

> Hi guys
>
>
>
> I just had a bad experience today when trying to convert my switchless
> AFF8080 to a switched cluster using a pair of CN1610 switches.
>
> The two switches are about 100M from the AFF8080 so the round trip is of
> cause increased from 0,5M to 200M.
>
> But I do not think this is the problem here? We of cause use MultiMode
> fibers between the two DCs, and the fibers checks out and didn’t seem to
> have any errors.
>
>
>
> I followed this guide to convert two ports at a time:
>
> https://library.netapp.com/ecm/ecm_download_file/ECMP1140535
>
>
>
> And it seemed to work just great on the console ????
>
> I must admit that I didn’t down the CN1610 ports as described because you
> also down the ports on the NetApp side, so it seems pointless to do it on
> both sides…
>
> Other than that I followed the guide.
>
>
>
> As soon as I completed the migration, the users started to complain about
> very slow access on both CIFS, NFS and iSCSI.
>
>
>
> We looked at the port stats of the link but there were no errors… and the
> system seemed to serve data, just very much slower the normal… so we had
> to bite the bullet and switch back again, and as soon as we were back to
> the normal back2back cables, the system was normal again…
>
>
>
> Going through the logs I found this error on one of the four links…
>
>
>
> 11/9/2020 13:40:34 STOR01-01 ERROR netif.sfpNotSupported: The
> SFP+ or QSFP+ module (FINISAR CORP. FTLF8528P3BCV-QL) installed in e0a
> is not supported with this network interface.
>
>
>
> And I must admit that a non-netapp SFP+ module had found its way to my
> stash ????. The other SFP+ modules were the same PN but just with a “NA1”
> at the end…. I’m not sure if this could cause this issue?
>
> I have seen poor performance when using a non-supported NIC in a NetApp,
> but non-supported SFP+ module?
>
> Has anyone had issues like this?
>
>
>
> We are on ONTAP 9.7P7 and the CN1610 are running a quote old FW, but I
> have used them for other projects with no issues (i.e. multi node NetApp
> clusters)…
>
>
>
> /Heino
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> https://www.teaparty.net/mailman/listinfo/toasters
SV: Cluster stall issue going to switched cluster... [ In reply to ]
The other three SFP+ modules were FTLF8528P3BCV-N1 and the controller didn’t complain about them.
The links also went up correctly as expected, and after migrating all ports, switching the network option to switchless-cluster true, the “cluster show” and “health” commands all showed that everything was OK…
It clearly was not ????
So we had to turn back fast, and didn’t have any time to investigate further.
But I think just to be sure I will use some AFBR* SFP+ modules(Avargo branded I think), just to be 100% sure everything is supported.

/Heino


Fra: tmac <tmacmd@gmail.com>
Dato: tirsdag, 10. november 2020 kl. 00.22
Til: Heino Walther <hw@beardmann.dk>
Cc: toasters@teaparty.net <toasters@teaparty.net>
Emne: Re: Cluster stall issue going to switched cluster...
The CN1610s should be running EFOS 1.3.0.2 or 1.3.0.3 with ONTAP 9.7 And the Reference Configuration File (RCF) should be current (1.2).
The Supported Optics include:
332-00363, AFBR-703SMZ-NA3, AFBR-703SMZ-NA4, AFBR-709SMZ-NA3, AFBR-709SMZ-NA4, PLRXPL-SC-S43NA1 and PLRXPL-SC-S43NA2

Anything is possible with unsupported Optics.

Another test you could do....use the optics and try the point-to-point connections with Optics and fibers first.
Replace both e0a ports, then e0c, then e0b and finally e0d.
If the issue returns, then the blame could certainly be the optics.

--tmac

Tim McCarthy, Principal Consultant

Proud Member of the #NetAppATeam<https://twitter.com/NetAppATeam>

I Blog at TMACsRack<https://tmacsrack.wordpress.com/>
RE: Cluster stall issue going to switched cluster... [ In reply to ]
Heino,

When you looked at the port stats for errors, did you look on both the switch and in ONTAP? I would be curious to see the stats from the cluster ports in ONTAP.

Is there any chance the multimode fiber is OM1 or OM2? You would need at least OM3 for 10Gb/s at 100m.

Thank you,
Tim

From: Toasters <toasters-bounces@teaparty.net> On Behalf Of Heino Walther
Sent: Monday, November 09, 2020 3:36 PM
To: tmac <tmacmd@gmail.com>
Cc: toasters@teaparty.net
Subject: SV: Cluster stall issue going to switched cluster...

The other three SFP+ modules were FTLF8528P3BCV-N1 and the controller didn’t complain about them.
The links also went up correctly as expected, and after migrating all ports, switching the network option to switchless-cluster true, the “cluster show” and “health” commands all showed that everything was OK…
It clearly was not ????
So we had to turn back fast, and didn’t have any time to investigate further.
But I think just to be sure I will use some AFBR* SFP+ modules(Avargo branded I think), just to be 100% sure everything is supported.

/Heino


Fra: tmac <tmacmd@gmail.com<mailto:tmacmd@gmail.com>>
Dato: tirsdag, 10. november 2020 kl. 00.22
Til: Heino Walther <hw@beardmann.dk<mailto:hw@beardmann.dk>>
Cc: toasters@teaparty.net<mailto:toasters@teaparty.net> <toasters@teaparty.net<mailto:toasters@teaparty.net>>
Emne: Re: Cluster stall issue going to switched cluster...
The CN1610s should be running EFOS 1.3.0.2 or 1.3.0.3 with ONTAP 9.7 And the Reference Configuration File (RCF) should be current (1.2).
The Supported Optics include:
332-00363, AFBR-703SMZ-NA3, AFBR-703SMZ-NA4, AFBR-709SMZ-NA3, AFBR-709SMZ-NA4, PLRXPL-SC-S43NA1 and PLRXPL-SC-S43NA2

Anything is possible with unsupported Optics.

Another test you could do....use the optics and try the point-to-point connections with Optics and fibers first.
Replace both e0a ports, then e0c, then e0b and finally e0d.
If the issue returns, then the blame could certainly be the optics.

--tmac

Tim McCarthy, Principal Consultant

Proud Member of the #NetAppATeam<https://twitter.com/NetAppATeam>

I Blog at TMACsRack<https://tmacsrack.wordpress.com/>