Mailing List Archive: X235_SCHT5073F10 Medium Errors

X235_SCHT5073F10 Medium Errors

Aug 15, 2002, 8:13 AM

Post #1 of 5 (2228 views)

Has anyone been noticing an unusual occurrence of medium errors (i.e
unrecovered read errors) for the 72 GB FC-AL disks (X235_SCHT5073F10)? I
have several recently purchased F840 filers, running ONTAP 6.2R2, with
these disks in DS14 disk shelves. The disks all have the latest firmware
revision (NA04). The DS14 shelf firmware is up to date as well (Rev. 11).

I've been getting about 1 medium error every week (a total of 9 so far),
during the disk scrub process. The errors are on different disks, in
different filers. The only common factor, so far, is the drive type. I'm
finding this unusual since I've run F740s for years with medium errors
being VERY infrequent (i.e. one). Is this possibly related to the drive
size? The F740s had the Seagate (ST118202FC and ST318203FC) 18 GB drives.

I have pursued this with Netapp Technical Support. However, since the
errors are still considered infrequent, the only suggestion I have received
so far is to replace the disks that have shown errors, which might be
happening shortly. I'm just wondering if anyone else is seeing the same
thing.

Re: X235_SCHT5073F10 Medium Errors [ In reply to ]

paul.bell at rbccm

Aug 15, 2002, 8:46 AM

Post #2 of 5 (2186 views)

Permalink

i also have seen too many occurrences of these errors. if i see the same disk two - three weeks in a row i replace the disk. overkill? perhaps, but why take the risk.

-paul

Paul Caron wrote:

> ****************
> The following message was sent to your former e-mail address paul.bell@us.rbcds.com
> , which will soon be discontinued.
>
> A notice was automatically generated and delivered on your behalf to the sender notifying them of the change, but you may also wish to contact the sender to ensure they update their records.
>
> If the incoming message is from a source whose distribution list you no longer wish to be on, please unsubscribe from the list to prevent unnecessary mail traffic.
>
> Thank you for your cooperation.
> RBC Capital Markets Postmaster.
> ****************
>
> Has anyone been noticing an unusual occurrence of medium errors (i.e
> unrecovered read errors) for the 72 GB FC-AL disks (X235_SCHT5073F10)? I
> have several recently purchased F840 filers, running ONTAP 6.2R2, with
> these disks in DS14 disk shelves. The disks all have the latest firmware
> revision (NA04). The DS14 shelf firmware is up to date as well (Rev. 11).
>
> I've been getting about 1 medium error every week (a total of 9 so far),
> during the disk scrub process. The errors are on different disks, in
> different filers. The only common factor, so far, is the drive type. I'm
> finding this unusual since I've run F740s for years with medium errors
> being VERY infrequent (i.e. one). Is this possibly related to the drive
> size? The F740s had the Seagate (ST118202FC and ST318203FC) 18 GB drives.
>
> I have pursued this with Netapp Technical Support. However, since the
> errors are still considered infrequent, the only suggestion I have received
> so far is to replace the disks that have shown errors, which might be
> happening shortly. I'm just wondering if anyone else is seeing the same
> thing.

--
"There is magic in the web" - Shakespeare (Othello, Act 3, Scene 4)

Re: X235_SCHT5073F10 Medium Errors [ In reply to ]

caron at sig

Aug 15, 2002, 9:22 AM

Post #3 of 5 (2185 views)

Permalink

Don't you worry about having another medium error on a different disk in
the same volume (i.e. your parity disk) during the disk reconstruction? I
believe that would count as a multiple disk failure since the disk that was
failed for replacement would no longer be available for parity
reconstruction. Dunno exactly how ONTAP would handle this since I've never
experienced a medium error during reconstruction....yet.

>Date: Thu, 15 Aug 2002 11:46:49 -0400
>From: "Paul J. Bell" <paul.bell@rbccm.com>
>X-Accept-Language: en
>MIME-Version: 1.0
>To: Paul Caron <caron@sig.com>
>CC: toasters@mathworks.com
>Subject: Re: X235_SCHT5073F10 Medium Errors
>Content-Transfer-Encoding: 7bit
>X-OriginalArrivalTime: 15 Aug 2002 15:46:54.0986 (UTC)
FILETIME=[FB1EDEA0:01C24472]
>
>i also have seen too many occurrences of these errors. if i see the same
disk two - three weeks in a row i replace the disk. overkill? perhaps, but
why take the risk.
>
> -paul
>

Re: X235_SCHT5073F10 Medium Errors [ In reply to ]

caron at sig

Aug 15, 2002, 10:06 AM

Post #4 of 5 (2174 views)

Permalink

I received an update from Network Appliance Services Engineering on this
issue. It contains two links I found helpful. Anissa gave permission for
me to post this to toasters.

Also, I should mention that the "unrecovered read error" message I've been
getting is a bit misleading. I failed to mention that these were always
followed by the "Scrub rewriting bad data block". Since the data was
succesfully reconstructed, there was no data loss. In that sense, the
errors were actually recoverable.

------------- Begin Forwarded Message -------------

From: "Mohler, Anissa" <Anissa.Mohler@netapp.com>
To: "'caron@sig.com'" <caron@sig.com>
Cc: "Mohler, Anissa" <Anissa.Mohler@netapp.com>
Subject: Medium Errors
Date: Thu, 15 Aug 2002 09:38:30 -0700
MIME-Version: 1.0

Hi Paul,

The information given to you was in error. Medium errors at the rate you
describe (e.g. 1 per week) is perfectly OK... there is a 'bug' you can look
up on the NOW site that describe this in a bit more detail:

bug 68517
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=68517

We're also working on publishing more information about understanding and
handling medium errors to the now site.

Our apologies for the confusion.

Data ONTAP will warn you if the medium errors are an issue... the function
called Storage Health Monitor monitors the error frequency and will
recommend failing the drive if needed. If you want to read up on SHM check
out the following:
http://now.netapp.com/NOW/knowledge/docs/ontap/rel621/html/sag/appendi4.htm
#1154205

Hope this helps :)

Anissa Mohler
Network Appliance Services Engineering

-------------------------------------------

--- Anissa Mohler - 408.822.6404 - NetApp Services Engineering - RAS ---

------------- End Forwarded Message -------------

Re: X235_SCHT5073F10 Medium Errors [ In reply to ]

paul.bell at rbccm

Aug 15, 2002, 10:30 AM

Post #5 of 5 (2180 views)

Permalink

i used to think about that until it happened. now i worry.

ontap handles it by halting.

i rebooted and reconstruction continued; it halted again, etc.

to escape i set "bypass_media_errors" to on and held my breath until the
reconstruction was completed.

moist palms were the order of the day.

a wafl_check after completion ran clean so perhaps the errors were bogus.

a good case for minimal size raid groups. (i try the hold the 72GB disks to
not more than 10 disks/raid group,
so it's do as i say not as i do)

-paul

Paul Caron wrote:

> Don't you worry about having another medium error on a different disk in
> the same volume (i.e. your parity disk) during the disk reconstruction? I
> believe that would count as a multiple disk failure since the disk that was
> failed for replacement would no longer be available for parity
> reconstruction. Dunno exactly how ONTAP would handle this since I've never
> experienced a medium error during reconstruction....yet.
>
> >Date: Thu, 15 Aug 2002 11:46:49 -0400
> >From: "Paul J. Bell" <paul.bell@rbccm.com>
> >X-Accept-Language: en
> >MIME-Version: 1.0
> >To: Paul Caron <caron@sig.com>
> >CC: toasters@mathworks.com
> >Subject: Re: X235_SCHT5073F10 Medium Errors
> >Content-Transfer-Encoding: 7bit
> >X-OriginalArrivalTime: 15 Aug 2002 15:46:54.0986 (UTC)
> FILETIME=[FB1EDEA0:01C24472]
> >
> >i also have seen too many occurrences of these errors. if i see the same
> disk two - three weeks in a row i replace the disk. overkill? perhaps, but
> why take the risk.
> >
> > -paul
> >

--
"There is magic in the web" - Shakespeare (Othello, Act 3, Scene 4)