Mailing List Archive

RAID Reconstruction after ONTAP upgrade?
Hi,

has anyone ever seen a RAID reconstruct happening immediately after an OnTap upgrade?
I just upgraded one of my older filers to 8.3.1P2 and it is now running a reconstruction on one of its aggregates for (at least to me) no obvious reason.

During the boot up of this controller after the upgrade, I saw the following message on the console which did not show up on the second controller:

Creating trace file /etc/log/rastrace/RAID_0_20170402_17:18:00:095890.dmp

No disks show as broken, or in maintenance mode, or anything like that - so any hints would be welcome.

Here's the output of `aggr status -r` on this controller:

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0b.22.23 0b 22 23 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
parity 0b.22.4 0b 22 4 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.5 2a 21 5 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.5 0b 22 5 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.6 2a 21 6 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.6 0b 22 6 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.7 2a 21 7 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.7 0b 22 7 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (reconstruction 3% completed)
data 2a.21.8 2a 21 8 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.8 0b 22 8 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.9 2a 21 9 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.9 0b 22 9 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.10 2a 21 10 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.10 0b 22 10 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.11 2a 21 11 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.11 0b 22 11 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.12 2a 21 12 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.12 0b 22 12 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.13 2a 21 13 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

Thanks,

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>
Web: http://www.anexia-it.com<http://www.anexia-it.com/>

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstra?e 140, 9020 Klagenfurt
Gesch?ftsf?hrer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601
RE: RAID Reconstruction after ONTAP upgrade? [ In reply to ]
Hi Alexander,

We have seen this on 7-Mode following a cf takeover & giveback. (FAS3250)

Same output as you and no disks failed before or after this procedure.

Kind Regards,
Chris.

From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Alexander Griesser
Sent: 02 April 2017 19:08
To: toasters@teaparty.net
Subject: RAID Reconstruction after ONTAP upgrade?

Hi,

has anyone ever seen a RAID reconstruct happening immediately after an OnTap upgrade?
I just upgraded one of my older filers to 8.3.1P2 and it is now running a reconstruction on one of its aggregates for (at least to me) no obvious reason.

During the boot up of this controller after the upgrade, I saw the following message on the console which did not show up on the second controller:

Creating trace file /etc/log/rastrace/RAID_0_20170402_17:18:00:095890.dmp

No disks show as broken, or in maintenance mode, or anything like that - so any hints would be welcome.

Here's the output of `aggr status -r` on this controller:

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0b.22.23 0b 22 23 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
parity 0b.22.4 0b 22 4 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.5 2a 21 5 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.5 0b 22 5 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.6 2a 21 6 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.6 0b 22 6 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.7 2a 21 7 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.7 0b 22 7 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (reconstruction 3% completed)
data 2a.21.8 2a 21 8 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.8 0b 22 8 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.9 2a 21 9 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.9 0b 22 9 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.10 2a 21 10 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.10 0b 22 10 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.11 2a 21 11 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.11 0b 22 11 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.12 2a 21 12 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.12 0b 22 12 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.13 2a 21 13 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

Thanks,

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>
Web: http://www.anexia-it.com<http://www.anexia-it.com/>

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstra?e 140, 9020 Klagenfurt
Gesch?ftsf?hrer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601
AW: RAID Reconstruction after ONTAP upgrade? [ In reply to ]
Did you ever find out the reason for this?

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>
Web: http://www.anexia-it.com<http://www.anexia-it.com/>

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstra?e 140, 9020 Klagenfurt
Gesch?ftsf?hrer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601

Von: Chris Hague [mailto:Chris_Hague@ajg.com]
Gesendet: Montag, 3. April 2017 11:56
An: Alexander Griesser <AGriesser@anexia-it.com>; toasters@teaparty.net
Betreff: RE: RAID Reconstruction after ONTAP upgrade?

Hi Alexander,

We have seen this on 7-Mode following a cf takeover & giveback. (FAS3250)

Same output as you and no disks failed before or after this procedure.

Kind Regards,
Chris.

From: toasters-bounces@teaparty.net<mailto:toasters-bounces@teaparty.net> [mailto:toasters-bounces@teaparty.net] On Behalf Of Alexander Griesser
Sent: 02 April 2017 19:08
To: toasters@teaparty.net<mailto:toasters@teaparty.net>
Subject: RAID Reconstruction after ONTAP upgrade?

Hi,

has anyone ever seen a RAID reconstruct happening immediately after an OnTap upgrade?
I just upgraded one of my older filers to 8.3.1P2 and it is now running a reconstruction on one of its aggregates for (at least to me) no obvious reason.

During the boot up of this controller after the upgrade, I saw the following message on the console which did not show up on the second controller:

Creating trace file /etc/log/rastrace/RAID_0_20170402_17:18:00:095890.dmp

No disks show as broken, or in maintenance mode, or anything like that - so any hints would be welcome.

Here's the output of `aggr status -r` on this controller:

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0b.22.23 0b 22 23 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
parity 0b.22.4 0b 22 4 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.5 2a 21 5 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.5 0b 22 5 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.6 2a 21 6 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.6 0b 22 6 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.7 2a 21 7 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.7 0b 22 7 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (reconstruction 3% completed)
data 2a.21.8 2a 21 8 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.8 0b 22 8 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.9 2a 21 9 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.9 0b 22 9 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.10 2a 21 10 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.10 0b 22 10 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.11 2a 21 11 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.11 0b 22 11 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.12 2a 21 12 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.12 0b 22 12 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.13 2a 21 13 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

Thanks,

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>
Web: http://www.anexia-it.com<http://www.anexia-it.com/>

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstra?e 140, 9020 Klagenfurt
Gesch?ftsf?hrer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601
RE: RAID Reconstruction after ONTAP upgrade? [ In reply to ]
NetApp said upgrade the Disk FW and allow an aggr scrub to complete (when we looked at the aggr scrub status -v, some of the scrubs hadn't completed for over a year, others had never completed!)

As an aside, we have another issue with the same system - disk reservation error & missing raid group child objects which are also preventing a graceful takeover. This is a rare bug which requires an ontap upgrade, but as we cannot gracefully takeover we are awaiting an outage window to perform a DU.

Kind Regards,
Chris.

From: Alexander Griesser [mailto:AGriesser@anexia-it.com]
Sent: 03 April 2017 10:57
To: Chris Hague; toasters@teaparty.net
Subject: AW: RAID Reconstruction after ONTAP upgrade?

Did you ever find out the reason for this?

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>
Web: http://www.anexia-it.com<http://www.anexia-it.com/>

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstra?e 140, 9020 Klagenfurt
Gesch?ftsf?hrer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601

Von: Chris Hague [mailto:Chris_Hague@ajg.com]
Gesendet: Montag, 3. April 2017 11:56
An: Alexander Griesser <AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>>; toasters@teaparty.net<mailto:toasters@teaparty.net>
Betreff: RE: RAID Reconstruction after ONTAP upgrade?

Hi Alexander,

We have seen this on 7-Mode following a cf takeover & giveback. (FAS3250)

Same output as you and no disks failed before or after this procedure.

Kind Regards,
Chris.

From: toasters-bounces@teaparty.net<mailto:toasters-bounces@teaparty.net> [mailto:toasters-bounces@teaparty.net] On Behalf Of Alexander Griesser
Sent: 02 April 2017 19:08
To: toasters@teaparty.net<mailto:toasters@teaparty.net>
Subject: RAID Reconstruction after ONTAP upgrade?

Hi,

has anyone ever seen a RAID reconstruct happening immediately after an OnTap upgrade?
I just upgraded one of my older filers to 8.3.1P2 and it is now running a reconstruction on one of its aggregates for (at least to me) no obvious reason.

During the boot up of this controller after the upgrade, I saw the following message on the console which did not show up on the second controller:

Creating trace file /etc/log/rastrace/RAID_0_20170402_17:18:00:095890.dmp

No disks show as broken, or in maintenance mode, or anything like that - so any hints would be welcome.

Here's the output of `aggr status -r` on this controller:

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0b.22.23 0b 22 23 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
parity 0b.22.4 0b 22 4 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.5 2a 21 5 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.5 0b 22 5 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.6 2a 21 6 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.6 0b 22 6 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.7 2a 21 7 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.7 0b 22 7 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (reconstruction 3% completed)
data 2a.21.8 2a 21 8 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.8 0b 22 8 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.9 2a 21 9 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.9 0b 22 9 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.10 2a 21 10 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.10 0b 22 10 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.11 2a 21 11 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.11 0b 22 11 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.12 2a 21 12 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.12 0b 22 12 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.13 2a 21 13 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

Thanks,

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>
Web: http://www.anexia-it.com<http://www.anexia-it.com/>

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstra?e 140, 9020 Klagenfurt
Gesch?ftsf?hrer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601
AW: RAID Reconstruction after ONTAP upgrade? [ In reply to ]
I was running shelf, ACP and disk firmware upgrades prior to the upgrade and I also installed the new disk qualification package as recommended.
Here's the output of my scrub status for the affected aggregate:

::> aggr scrub -action status -aggregate blabla_data

Raid Group:/blabla_data/plex0/rg0, Is Suspended:false, Last Scrub:Sun Apr 2 04:24:20 2017

Raid Group:/blabla_data/plex0/rg1, Is Suspended:true, Last Scrub:Sun Mar 19 06:24:32 2017
, Percentage Completed:38%
Raid Group:/blabla_data/plex0/rg2, Is Suspended:true, Percentage Completed:40%

So, I guess I'll leave that running and for some time now before I try another takeover.

How did you check for the disk reservation and missing raid group child object errors? Does `cf status` on this system tell you that a takeover is not possible due to this issue or does it only tell you when you try to run a takeover?

Best,

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>
Web: http://www.anexia-it.com<http://www.anexia-it.com/>

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstra?e 140, 9020 Klagenfurt
Gesch?ftsf?hrer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601

Von: Chris Hague [mailto:Chris_Hague@ajg.com]
Gesendet: Montag, 3. April 2017 12:02
An: Alexander Griesser <AGriesser@anexia-it.com>; toasters@teaparty.net
Betreff: RE: RAID Reconstruction after ONTAP upgrade?

NetApp said upgrade the Disk FW and allow an aggr scrub to complete (when we looked at the aggr scrub status -v, some of the scrubs hadn't completed for over a year, others had never completed!)

As an aside, we have another issue with the same system - disk reservation error & missing raid group child objects which are also preventing a graceful takeover. This is a rare bug which requires an ontap upgrade, but as we cannot gracefully takeover we are awaiting an outage window to perform a DU.

Kind Regards,
Chris.

From: Alexander Griesser [mailto:AGriesser@anexia-it.com]
Sent: 03 April 2017 10:57
To: Chris Hague; toasters@teaparty.net<mailto:toasters@teaparty.net>
Subject: AW: RAID Reconstruction after ONTAP upgrade?

Did you ever find out the reason for this?

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>
Web: http://www.anexia-it.com<http://www.anexia-it.com/>

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstra?e 140, 9020 Klagenfurt
Gesch?ftsf?hrer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601

Von: Chris Hague [mailto:Chris_Hague@ajg.com]
Gesendet: Montag, 3. April 2017 11:56
An: Alexander Griesser <AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>>; toasters@teaparty.net<mailto:toasters@teaparty.net>
Betreff: RE: RAID Reconstruction after ONTAP upgrade?

Hi Alexander,

We have seen this on 7-Mode following a cf takeover & giveback. (FAS3250)

Same output as you and no disks failed before or after this procedure.

Kind Regards,
Chris.

From: toasters-bounces@teaparty.net<mailto:toasters-bounces@teaparty.net> [mailto:toasters-bounces@teaparty.net] On Behalf Of Alexander Griesser
Sent: 02 April 2017 19:08
To: toasters@teaparty.net<mailto:toasters@teaparty.net>
Subject: RAID Reconstruction after ONTAP upgrade?

Hi,

has anyone ever seen a RAID reconstruct happening immediately after an OnTap upgrade?
I just upgraded one of my older filers to 8.3.1P2 and it is now running a reconstruction on one of its aggregates for (at least to me) no obvious reason.

During the boot up of this controller after the upgrade, I saw the following message on the console which did not show up on the second controller:

Creating trace file /etc/log/rastrace/RAID_0_20170402_17:18:00:095890.dmp

No disks show as broken, or in maintenance mode, or anything like that - so any hints would be welcome.

Here's the output of `aggr status -r` on this controller:

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0b.22.23 0b 22 23 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
parity 0b.22.4 0b 22 4 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.5 2a 21 5 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.5 0b 22 5 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.6 2a 21 6 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.6 0b 22 6 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.7 2a 21 7 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.7 0b 22 7 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (reconstruction 3% completed)
data 2a.21.8 2a 21 8 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.8 0b 22 8 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.9 2a 21 9 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.9 0b 22 9 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.10 2a 21 10 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.10 0b 22 10 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.11 2a 21 11 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.11 0b 22 11 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.12 2a 21 12 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.12 0b 22 12 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.13 2a 21 13 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

Thanks,

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>
Web: http://www.anexia-it.com<http://www.anexia-it.com/>

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstra?e 140, 9020 Klagenfurt
Gesch?ftsf?hrer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601
RE: RAID Reconstruction after ONTAP upgrade? [ In reply to ]
Hi Alexander,

No, unfortunately cf status does not show this issue.

It only appears when the takeover is attempted (we have tried this 3 times now with the same results)
Errors;
00000016.0000e424 01ca8287 Sat Mar 18 2017 08:45:50 +00:00 [disk.reserveFailed:error] Disk reservation failed on 3a.25.1 CDB 0x5f:0001 - SCSI:illegal request (5 55 4)
00000016.0000e425 01ca8287 Sat Mar 18 2017 08:45:50 +00:00 [disk.reserveFailed:error] Disk reservation failed on 3a.25.5 CDB 0x5f:0001 - SCSI:illegal request (5 55 4)
00000016.0000e426 01ca8287 Sat Mar 18 2017 08:45:50 +00:00 [disk.reserveFailed:error] Disk reservation failed on 3a.25.3 CDB 0x5f:0001 - SCSI:illegal request (5 55 4)
00000016.0000e427 01ca8287 Sat Mar 18 2017 08:45:50 +00:00 [disk.reserveFailed:error] Disk reservation failed on 3a.25.2 CDB 0x5f:0001 - SCSI:illegal request (5 55 4)
Sat Mar 18 08:46:00 GMT [KEN:raid.assim.rg.missingChild:error]: Aggregate partner:aggr0, rgobj_verify: RAID object 0 has only 7 valid children, expected 17.
Sat Mar 18 08:46:00 GMT [KEN:raid.assim.rg.missingChild:error]: Aggregate partner:aggr0, rgobj_verify: RAID object 2 has only 5 valid children, expected 17.
Sat Mar 18 08:46:00 GMT [KEN:raid.assim.rg.missingChild:error]: Aggregate partner:aggr0, rgobj_verify: RAID object 1 has only 12 valid children, expected 17.
Sat Mar 18 08:46:00 GMT [KEN:raid.assim.rg.missingChild:error]: Aggregate partner:aggr0, rgobj_verify: RAID object 3 has only 8 valid children, expected 17.
Sat Mar 18 08:46:15 GMT [BARBIE:ha.takeoverImpNotDef:warning]: Takeover of the partner node is impossible due to reason waiting for partner to recover.

Aggr scrub by default is only configured to run for 10 hours every Sunday @ 1am.
We changed this to run continuously in order to complete the scrubs. (once the reconstruction had completed and the disk FW had been upgraded)

Kind Regards,
Chris.

From: Alexander Griesser [mailto:AGriesser@anexia-it.com]
Sent: 03 April 2017 11:08
To: Chris Hague; toasters@teaparty.net
Subject: AW: RAID Reconstruction after ONTAP upgrade?

I was running shelf, ACP and disk firmware upgrades prior to the upgrade and I also installed the new disk qualification package as recommended.
Here's the output of my scrub status for the affected aggregate:

::> aggr scrub -action status -aggregate blabla_data

Raid Group:/blabla_data/plex0/rg0, Is Suspended:false, Last Scrub:Sun Apr 2 04:24:20 2017

Raid Group:/blabla_data/plex0/rg1, Is Suspended:true, Last Scrub:Sun Mar 19 06:24:32 2017
, Percentage Completed:38%
Raid Group:/blabla_data/plex0/rg2, Is Suspended:true, Percentage Completed:40%

So, I guess I'll leave that running and for some time now before I try another takeover.

How did you check for the disk reservation and missing raid group child object errors? Does `cf status` on this system tell you that a takeover is not possible due to this issue or does it only tell you when you try to run a takeover?

Best,

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>
Web: http://www.anexia-it.com<http://www.anexia-it.com/>

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstra?e 140, 9020 Klagenfurt
Gesch?ftsf?hrer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601

Von: Chris Hague [mailto:Chris_Hague@ajg.com]
Gesendet: Montag, 3. April 2017 12:02
An: Alexander Griesser <AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>>; toasters@teaparty.net<mailto:toasters@teaparty.net>
Betreff: RE: RAID Reconstruction after ONTAP upgrade?

NetApp said upgrade the Disk FW and allow an aggr scrub to complete (when we looked at the aggr scrub status -v, some of the scrubs hadn't completed for over a year, others had never completed!)

As an aside, we have another issue with the same system - disk reservation error & missing raid group child objects which are also preventing a graceful takeover. This is a rare bug which requires an ontap upgrade, but as we cannot gracefully takeover we are awaiting an outage window to perform a DU.

Kind Regards,
Chris.

From: Alexander Griesser [mailto:AGriesser@anexia-it.com]
Sent: 03 April 2017 10:57
To: Chris Hague; toasters@teaparty.net<mailto:toasters@teaparty.net>
Subject: AW: RAID Reconstruction after ONTAP upgrade?

Did you ever find out the reason for this?

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>
Web: http://www.anexia-it.com<http://www.anexia-it.com/>

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstra?e 140, 9020 Klagenfurt
Gesch?ftsf?hrer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601

Von: Chris Hague [mailto:Chris_Hague@ajg.com]
Gesendet: Montag, 3. April 2017 11:56
An: Alexander Griesser <AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>>; toasters@teaparty.net<mailto:toasters@teaparty.net>
Betreff: RE: RAID Reconstruction after ONTAP upgrade?

Hi Alexander,

We have seen this on 7-Mode following a cf takeover & giveback. (FAS3250)

Same output as you and no disks failed before or after this procedure.

Kind Regards,
Chris.

From: toasters-bounces@teaparty.net<mailto:toasters-bounces@teaparty.net> [mailto:toasters-bounces@teaparty.net] On Behalf Of Alexander Griesser
Sent: 02 April 2017 19:08
To: toasters@teaparty.net<mailto:toasters@teaparty.net>
Subject: RAID Reconstruction after ONTAP upgrade?

Hi,

has anyone ever seen a RAID reconstruct happening immediately after an OnTap upgrade?
I just upgraded one of my older filers to 8.3.1P2 and it is now running a reconstruction on one of its aggregates for (at least to me) no obvious reason.

During the boot up of this controller after the upgrade, I saw the following message on the console which did not show up on the second controller:

Creating trace file /etc/log/rastrace/RAID_0_20170402_17:18:00:095890.dmp

No disks show as broken, or in maintenance mode, or anything like that - so any hints would be welcome.

Here's the output of `aggr status -r` on this controller:

RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0b.22.23 0b 22 23 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
parity 0b.22.4 0b 22 4 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.5 2a 21 5 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.5 0b 22 5 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.6 2a 21 6 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.6 0b 22 6 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.7 2a 21 7 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.7 0b 22 7 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816 (reconstruction 3% completed)
data 2a.21.8 2a 21 8 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.8 0b 22 8 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.9 2a 21 9 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.9 0b 22 9 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.10 2a 21 10 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.10 0b 22 10 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.11 2a 21 11 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.11 0b 22 11 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.12 2a 21 12 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 0b.22.12 0b 22 12 SA:A 0 BSAS 7200 1695466/3472315904 1695759/3472914816
data 2a.21.13 2a 21 13 SA:B 0 BSAS 7200 1695466/3472315904 1695759/3472914816

Thanks,

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>
Web: http://www.anexia-it.com<http://www.anexia-it.com/>

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstra?e 140, 9020 Klagenfurt
Gesch?ftsf?hrer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601