Mailing List Archive

Upgrade thoughts: 8.3 -> 9.1 -> 9.3
Guys,

We're getting ready to ugprade our 4 node 8060 cluster from 8.3.2P9 to
9.1P19 and then onto 9.3P17, all this weekend. My only real concern
is the upgrade from 9.1 to 9.3, which lists a major warning for bug
1250500:

Expired truststore security certificates causing upgrade and
new installation failures.

Unfortunately, I can't run an upgrade advisor report for my cluster
going from 9.1P19 to 9.3P17, because I'm not yet running 9.1, and it
can take upto a week for the autosupport data to get pushed to Upgrade
Advisor. Sigh...

Has anyone run into this issue when doing the 9.1 -> 9.3 upgrade with
the expired certificate? Otherwise, it all looks good, my cluster
switches are supported at their current version, etc.

John
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: Upgrade thoughts: 8.3 -> 9.1 -> 9.3 [ In reply to ]
Is 9.4 supported on the 8060? I think later versions of 9.4 avoid that cert expiration issue.

Kevin

----- Original Message -----
From: "John Stoffel" <john@stoffel.org>
To: toasters@teaparty.net
Sent: Tuesday, March 3, 2020 4:35:13 PM
Subject: Upgrade thoughts: 8.3 -> 9.1 -> 9.3


Guys,

We're getting ready to ugprade our 4 node 8060 cluster from 8.3.2P9 to
9.1P19 and then onto 9.3P17, all this weekend. My only real concern
is the upgrade from 9.1 to 9.3, which lists a major warning for bug
1250500:

Expired truststore security certificates causing upgrade and
new installation failures.

Unfortunately, I can't run an upgrade advisor report for my cluster
going from 9.1P19 to 9.3P17, because I'm not yet running 9.1, and it
can take upto a week for the autosupport data to get pushed to Upgrade
Advisor. Sigh...

Has anyone run into this issue when doing the 9.1 -> 9.3 upgrade with
the expired certificate? Otherwise, it all looks good, my cluster
switches are supported at their current version, etc.

John
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: Upgrade thoughts: 8.3 -> 9.1 -> 9.3 [ In reply to ]
John,
Did you have a look at fastpath?
https://whyistheinternetbroken.wordpress.com/2018/02/16/ipfastpath-ontap92/
It seems every time we put in a case for upgrades to 9.3 netapp support
tries to make sure we looked into this! So it must've bitten a lot of folks.

We did run into a pretty big bug on the upgrade from 9.1 to 9.3P15 -- we
have a case/core in now. I've seen nfs stop serving from a node in at least
3 clusters roughly a 2-5 hours after the upgrade. We fix it by indicating
the unresponsive node and either powering it down, or via NMI/SP. It will
not respond to normal takeover commands. Preliminary core analysis (no full
core analysis yet) points at at least 1 bug fixed in 9.3P17.

https://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=1236722


Typically when we roll updates it can take months given the number of nodes
and clusters. So we stick with whatever P patch we rolled on the first set
of nodes, and then by the end of upgrades 1-3 P patches are released.

With this experience, always use the latest P patch possible on the
intermediary update especially if you are going to take a bit to roll it
through your entire deployment. I also recommend taking a look at going to
9.5, it sounds nuts, but we've had better stability with this release. We
move to this release because of a specific feature that was needed
(CIFS/SMB enhancements, and flexcache/flexgroups).

Regards,
Douglas


On Tue, Mar 3, 2020 at 4:41 PM John Stoffel <john@stoffel.org> wrote:

>
> Guys,
>
> We're getting ready to ugprade our 4 node 8060 cluster from 8.3.2P9 to
> 9.1P19 and then onto 9.3P17, all this weekend. My only real concern
> is the upgrade from 9.1 to 9.3, which lists a major warning for bug
> 1250500:
>
> Expired truststore security certificates causing upgrade and
> new installation failures.
>
> Unfortunately, I can't run an upgrade advisor report for my cluster
> going from 9.1P19 to 9.3P17, because I'm not yet running 9.1, and it
> can take upto a week for the autosupport data to get pushed to Upgrade
> Advisor. Sigh...
>
> Has anyone run into this issue when doing the 9.1 -> 9.3 upgrade with
> the expired certificate? Otherwise, it all looks good, my cluster
> switches are supported at their current version, etc.
>
> John
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
Re: Upgrade thoughts: 8.3 -> 9.1 -> 9.3 [ In reply to ]
John,

You will be fine in performing that upgrade from 9.1P19 to 9.3P17
(I’d recommend going to the latest 9.3P18 release). Regarding the
bug around the security certificates, you need to be at 9.1P14 or higher
prior to performing the upgrade to 9.3.

There are some other items that I would recommend that you check in your
environment. I believe I saw in another email reply the note about
Fastpath. Definitely do your homework on that one.

Also, and this is very important, make sure that your SP firmware is up to
date for the releases that you are on/going to. I ran into this issue 3
times last week during an upgrade where the SP firmware wasn’t up to date
and the controllers, when rebooting during the ONTAP upgrade, halted and
returned to the loader prompt with the error "This platform is not
supported in this release”.

I was able to resolve it via the SP command by performing a “dirty
shutdown” of that node and powering it back up and then performing a SP
reboot. There is a NetApp KB article, 1009154, that is related (talks about
a different platform) but the fix resolves this.

Also, I *highly* recommend updating all disk/shelf firmwares and
qualification files ahead of the game (at least 24 hours).

Good luck and HTH

Regards,
André M. Clark

On March 3, 2020 at 16:41:53, Stoffel John (john@stoffel.org) wrote:


Guys,

We're getting ready to ugprade our 4 node 8060 cluster from 8.3.2P9 to
9.1P19 and then onto 9.3P17, all this weekend. My only real concern
is the upgrade from 9.1 to 9.3, which lists a major warning for bug
1250500:

Expired truststore security certificates causing upgrade and
new installation failures.

Unfortunately, I can't run an upgrade advisor report for my cluster
going from 9.1P19 to 9.3P17, because I'm not yet running 9.1, and it
can take upto a week for the autosupport data to get pushed to Upgrade
Advisor. Sigh...

Has anyone run into this issue when doing the 9.1 -> 9.3 upgrade with
the expired certificate? Otherwise, it all looks good, my cluster
switches are supported at their current version, etc.

John
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: Upgrade thoughts: 8.3 -> 9.1 -> 9.3 [ In reply to ]
Kevin> Is 9.4 supported on the 8060? I think later versions of 9.4
Kevin> avoid that cert expiration issue.

It looks like OnTap 9.7 is still supported on the FAS8060s, the
Hardware Universe Tool seems to show that it's still supported all the
way upto 9.7, which I doubt we'll ever get to before we retire this
cluster and move to something newer.


Kevin> ----- Original Message -----
Kevin> From: "John Stoffel" <john@stoffel.org>
Kevin> To: toasters@teaparty.net
Kevin> Sent: Tuesday, March 3, 2020 4:35:13 PM
Kevin> Subject: Upgrade thoughts: 8.3 -> 9.1 -> 9.3


Kevin> Guys,

Kevin> We're getting ready to ugprade our 4 node 8060 cluster from 8.3.2P9 to
Kevin> 9.1P19 and then onto 9.3P17, all this weekend. My only real concern
Kevin> is the upgrade from 9.1 to 9.3, which lists a major warning for bug
Kevin> 1250500:

Kevin> Expired truststore security certificates causing upgrade and
Kevin> new installation failures.

Kevin> Unfortunately, I can't run an upgrade advisor report for my cluster
Kevin> going from 9.1P19 to 9.3P17, because I'm not yet running 9.1, and it
Kevin> can take upto a week for the autosupport data to get pushed to Upgrade
Kevin> Advisor. Sigh...

Kevin> Has anyone run into this issue when doing the 9.1 -> 9.3 upgrade with
Kevin> the expired certificate? Otherwise, it all looks good, my cluster
Kevin> switches are supported at their current version, etc.

Kevin> John
Kevin> _______________________________________________
Kevin> Toasters mailing list
Kevin> Toasters@teaparty.net
Kevin> http://www.teaparty.net/mailman/listinfo/toasters
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: Upgrade thoughts: 8.3 -> 9.1 -> 9.3 [ In reply to ]
Douglas> Did you have a look at fastpath?

I did, and I think I'm all ok because most of my SVMs have just one
network associated with them, and I've also moved all my management
off to a completely seperate subnet.

Our cluster is stupid simple, just 4 x 10gb LACP trunks from each node
(4 nodes total) with a number of VLANs running over those trunks. the
SVMs are reasonably designed, though we did make some mistakes many
years ago when we first set things up that I would do differently
now.

Douglas> https://whyistheinternetbroken.wordpress.com/2018/02/16/ipfastpath-ontap92/
Douglas> It seems every time we put in a case for upgrades to 9.3
Douglas> netapp support tries to make sure we looked into this! So it
Douglas> must've bitten a lot of folks.

I think so. I hope we're all set.

Douglas> We did run into a pretty big bug on the upgrade from 9.1 to
Douglas> 9.3P15 -- we have a case/core in now. I've seen nfs stop
Douglas> serving from a node in at least 3 clusters roughly a 2-5
Douglas> hours after the upgrade. We fix it by indicating the
Douglas> unresponsive node and either powering it down, or via NMI/SP.
Douglas> It will not respond to normal takeover commands. Preliminary
Douglas> core analysis (no full core analysis yet) points at at least
Douglas> 1 bug fixed in 9.3P17.

Yikes! That is a big bug to have to deal with.


Douglas> https://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=1236722

Douglas> Typically when we roll updates it can take months given the
Douglas> number of nodes and clusters. So we stick with whatever P
Douglas> patch we rolled on the first set of nodes, and then by the
Douglas> end of upgrades 1-3 P patches are released.

You have a much bigger environment than we have! I think next time we
might do smaller pairs of nodes, so we can just VMotion VMs back and
forth and hopefully have enough space to be a bit more proactive on
upgrades without disrupting everything with a full shutdown.

Douglas> With this experience, always use the latest P patch possible
Douglas> on the intermediary update especially if you are going to
Douglas> take a bit to roll it through your entire deployment. I also
Douglas> recommend taking a look at going to 9.5, it sounds nuts, but
Douglas> we've had better stability with this release. We move to this
Douglas> release because of a specific feature that was needed
Douglas> (CIFS/SMB enhancements, and flexcache/flexgroups).

I had thought of going that far up, but just getting the downtime for
the two jumps I need to do has been hard enough. But we're learning
our lesson and trying to do upgrades more frequently. We only have
this one cluster though and it runs everything.



Douglas> On Tue, Mar 3, 2020 at 4:41 PM John Stoffel <john@stoffel.org> wrote:

Douglas> Guys,

Douglas> We're getting ready to ugprade our 4 node 8060 cluster from 8.3.2P9 to
Douglas> 9.1P19 and then onto 9.3P17, all this weekend.? My only real concern
Douglas> is the upgrade from 9.1 to 9.3, which lists a major warning for bug
Douglas> 1250500:

Douglas> ? Expired truststore security certificates causing upgrade and
Douglas> ? new installation failures.

Douglas> Unfortunately, I can't run an upgrade advisor report for my cluster
Douglas> going from 9.1P19 to 9.3P17, because I'm not yet running 9.1, and it
Douglas> can take upto a week for the autosupport data to get pushed to Upgrade
Douglas> Advisor.? Sigh...

Douglas> Has anyone run into this issue when doing the 9.1 -> 9.3 upgrade with
Douglas> the expired certificate?? Otherwise, it all looks good, my cluster
Douglas> switches are supported at their current version, etc.

Douglas> John
Douglas> _______________________________________________
Douglas> Toasters mailing list
Douglas> Toasters@teaparty.net
Douglas> http://www.teaparty.net/mailman/listinfo/toasters


_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: Upgrade thoughts: 8.3 -> 9.1 -> 9.3 [ In reply to ]
André> You will be fine in performing that upgrade from 9.1P19 to
André> 9.3P17 (I’d recommend going to the latest 9.3P18
André> release). Regarding the bug around the security certificates,
André> you need to be at 9.1P14 or higher prior to performing the
André> upgrade to 9.3.

Thats good then, since we're going to 9.1P19.

André> There are some other items that I would recommend that you
André> check in your environment. I believe I saw in another email
André> reply the note about Fastpath. Definitely do your homework on
André> that one.

I think we're all set there.

André> Also, and this is very important, make sure that your SP
André> firmware is up to date for the releases that you are on/going
André> to. I ran into this issue 3 times last week during an upgrade
André> where the SP firmware wasn’t up to date and the controllers,
André> when rebooting during the ONTAP upgrade, halted and returned to
André> the loader prompt with the error "This platform is not
André> supported in this release”.

This is a good thing to know, mine are all at 3.1.2 version. I'll see
if there's a newer version and plan on upgrading them all ahead of
time if I can. Thought I might not, since 3.1.2 is the latest version
supported with OnTap 8.3P2, so the 9.1P19 will give me upto SP version
3.9 to install.

André> I was able to resolve it via the SP command by performing
André> a “dirty shutdown” of that node and powering it back up and
André> then performing a SP reboot. There is a NetApp KB article,
André> 1009154, that is related (talks about a different platform) but
André> the fix resolves this.

Thanks for this info, I'll certainly look into this ASAP.

André> Also, I highly recommend updating all disk/shelf firmwares and
André> qualification files ahead of the game (at least 24 hours).

Great idea, I can start this tonight I think.

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: Upgrade thoughts: 8.3 -> 9.1 -> 9.3 [ In reply to ]
Well, just to let you all know that the upgrade went great, and I even
did the CN1610 cluster switch upgrade as well, even though the version
I was on was still supported with 9.3P17.

The only real gotcha that hit me was having both regular and e0M ports
in the same broadcast domain, which since the node_management ports
were on e0M made it a *pain* in the ass to get them moved out.

I basically had to do:

- create new node_mgmt interface on another VLAN port.
- delete the one I really wnated,
- remove the e0M port from the broadcast-domain
- create a new new failover group and add e0M to it
- recreate a new node management lif using e0M again
- delete other temp lif.

Across four nodes, this sucked, especially to figure out. Then I
found the "broadcast-domain split " command, which might have made
this all oh so much easier.

Anyway, the actual upgrade went great, no outage for the VMs still on
the ESX cluster, etc. Very very nice. Next time I ask them if I can
just do this during the day instead. LOL!

John

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: Upgrade thoughts: 8.3 -> 9.1 -> 9.3 [ In reply to ]
Oh No!

You should have asked!

There is a broadcast-domain merge and a broadcast-domain Split for just
such these occasions!

--tmac

*Tim McCarthy, **Principal Consultant*

*Proud Member of the #NetAppATeam <https://twitter.com/NetAppATeam>*

*I Blog at TMACsRack <https://tmacsrack.wordpress.com/>*




On Sun, Mar 8, 2020 at 8:37 PM John Stoffel <john@stoffel.org> wrote:

>
> Well, just to let you all know that the upgrade went great, and I even
> did the CN1610 cluster switch upgrade as well, even though the version
> I was on was still supported with 9.3P17.
>
> The only real gotcha that hit me was having both regular and e0M ports
> in the same broadcast domain, which since the node_management ports
> were on e0M made it a *pain* in the ass to get them moved out.
>
> I basically had to do:
>
> - create new node_mgmt interface on another VLAN port.
> - delete the one I really wnated,
> - remove the e0M port from the broadcast-domain
> - create a new new failover group and add e0M to it
> - recreate a new node management lif using e0M again
> - delete other temp lif.
>
> Across four nodes, this sucked, especially to figure out. Then I
> found the "broadcast-domain split " command, which might have made
> this all oh so much easier.
>
> Anyway, the actual upgrade went great, no outage for the VMs still on
> the ESX cluster, etc. Very very nice. Next time I ask them if I can
> just do this during the day instead. LOL!
>
> John
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
Re: Upgrade thoughts: 8.3 -> 9.1 -> 9.3 [ In reply to ]
I did the same thing a few ago and then gave myself the forehead slap after
I found out.
The Split/Merge is so quick and easy.

I know what a PITA it was to do what you did!

--tmac

*Tim McCarthy, **Principal Consultant*

*Proud Member of the #NetAppATeam <https://twitter.com/NetAppATeam>*

*I Blog at TMACsRack <https://tmacsrack.wordpress.com/>*



On Sun, Mar 8, 2020 at 8:44 PM tmac <tmacmd@gmail.com> wrote:

> Oh No!
>
> You should have asked!
>
> There is a broadcast-domain merge and a broadcast-domain Split for just
> such these occasions!
>
> --tmac
>
> *Tim McCarthy, **Principal Consultant*
>
> *Proud Member of the #NetAppATeam <https://twitter.com/NetAppATeam>*
>
> *I Blog at TMACsRack <https://tmacsrack.wordpress.com/>*
>
>
>
>
> On Sun, Mar 8, 2020 at 8:37 PM John Stoffel <john@stoffel.org> wrote:
>
>>
>> Well, just to let you all know that the upgrade went great, and I even
>> did the CN1610 cluster switch upgrade as well, even though the version
>> I was on was still supported with 9.3P17.
>>
>> The only real gotcha that hit me was having both regular and e0M ports
>> in the same broadcast domain, which since the node_management ports
>> were on e0M made it a *pain* in the ass to get them moved out.
>>
>> I basically had to do:
>>
>> - create new node_mgmt interface on another VLAN port.
>> - delete the one I really wnated,
>> - remove the e0M port from the broadcast-domain
>> - create a new new failover group and add e0M to it
>> - recreate a new node management lif using e0M again
>> - delete other temp lif.
>>
>> Across four nodes, this sucked, especially to figure out. Then I
>> found the "broadcast-domain split " command, which might have made
>> this all oh so much easier.
>>
>> Anyway, the actual upgrade went great, no outage for the VMs still on
>> the ESX cluster, etc. Very very nice. Next time I ask them if I can
>> just do this during the day instead. LOL!
>>
>> John
>>
>> _______________________________________________
>> Toasters mailing list
>> Toasters@teaparty.net
>> http://www.teaparty.net/mailman/listinfo/toasters
>>
>
Re: Upgrade thoughts: 8.3 -> 9.1 -> 9.3 [ In reply to ]
tmac> I did the same thing a few ago and then gave myself the forehead
tmac> slap after I found out. The Split/Merge is so quick and easy.

tmac> I know what a PITA it was to do what you did!

It totally was a pain, and I think I've still got some issues, but
doing the work by hand was a good learning experience. But still
painful. Heh.
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters