Mailing List Archive

RE: # of volumes
I generally liked to keep one spare per 14 drives. Since spares are global and can be used by any volume having a "x spares per volume" didn't make sense to me, so I figured out how many drives I was comfortable with having one spare for, and arbitrarily decided on 14.

Mad Dog
RE: # of volumes [ In reply to ]
Just to chime in -

We run with 2 spares per filer. That is for all our filers, from F840's/12 shelves/36gb drives, F760's/6 shelves/36 gb drives and F760/8 shelves/18gb drives. Since spares are per filer, not volume, the main thing we wanted to avoid was volume failure, which would occur if 2 drives in RAID group failed at the same time. That has never happened, although we almost hit that after moving a filer and 2 disks had a hard time spinning up. We also have 2 because the filers are remote from our offices, so the "insurance" or piece-of-mind are worth it to us.

--sam

-----Original Message-----
From: Tom Yergeau [mailto:yergeau@yergeau.net]
Sent: Wednesday, August 07, 2002 1:20 PM
To: toasters@mathworks.com
Subject: RE: # of volumes


I generally liked to keep one spare per 14 drives. Since spares are global and can be used by any volume having a "x spares per volume" didn't make sense to me, so I figured out how many drives I was comfortable with having one spare for, and arbitrarily decided on 14.

Mad Dog
Re: # of volumes [ In reply to ]
MessageSpares not only cross shelves they even cross FCAL adapters if you have shelves on different FCAL HBAs.

A single volume would have to lose two disks simultaneously or two disks after the first rebult in order to lose data with only one spare.

If it lost one disk, then shortly thereafter another disk, then having a second spare would not help as the first one would still be rebuilding and you'd lose the volume.

If you lose one disk, rebuild, and lose another disk, you're still intact but in degraded mode and vulnerable. Having a second spare would be helpful. But the odds of losing two disks, the second one after the first one rebuilt but before NetApp got you a replacement 4 hrs or the next day, seems low.

If you have two spares per filer like Sam mentioned, you're covered anyway since hot spares are global to the entire filer.

Gosh this sounds convoluted! :-)

MD


----- Original Message -----
From: Steve Evans
To: Tom Yergeau
Sent: Wednesday, August 07, 2002 4:35 PM
Subject: RE: # of volumes


Well if you have a 14 disk shelf, and two volumes, then each volume could have a disk fail on top of one more that the spare would cover. But yeah, if your doing a 1:13 ratio then it doesn't matter anyways because I don't believe spares can cross shelfs.



Steve Evans
Computing Services
(619) 594-0653
RE: # of volumes [ In reply to ]
I should mention one other "annoying" behavior - when a disk drops from
the filer, is not marked as "failed" and no autosupport is issued. So,
you can have a missing disk that the filer will spare out, using
whatever spare(s) you have provided. However, unless you monitor for
this condition by comparing disks total counts to what should be there,
its possible to have a failure, use your spare, not replace the spare
and get a second failure. This condition does exist under 6.1R1 - it
may be fixed in later ONTap revs. In this scenario, having 2 or more
spares is very good indeed.

-----Original Message-----
From: Tom Yergeau [mailto:yergeau@yergeau.net]
Sent: Wednesday, August 07, 2002 2:26 PM
To: Steve Evans; toasters@mathworks.com
Subject: Re: # of volumes


Spares not only cross shelves they even cross FCAL adapters if
you have shelves on different FCAL HBAs.

A single volume would have to lose two disks simultaneously or
two disks after the first rebult in order to lose data with only one
spare.

If it lost one disk, then shortly thereafter another disk, then
having a second spare would not help as the first one would still be
rebuilding and you'd lose the volume.

If you lose one disk, rebuild, and lose another disk, you're
still intact but in degraded mode and vulnerable. Having a second spare
would be helpful. But the odds of losing two disks, the second one
after the first one rebuilt but before NetApp got you a replacement 4
hrs or the next day, seems low.

If you have two spares per filer like Sam mentioned, you're
covered anyway since hot spares are global to the entire filer.

Gosh this sounds convoluted! :-)

MD


----- Original Message -----

From: Steve Evans <mailto:sevans@foundation.sdsu.edu>
To: Tom Yergeau <mailto:yergeau@yergeau.net>
Sent: Wednesday, August 07, 2002 4:35 PM
Subject: RE: # of volumes

Well if you have a 14 disk shelf, and two volumes, then
each volume could have a disk fail on top of one more that the spare
would cover. But yeah, if your doing a 1:13 ratio then it doesn't
matter anyways because I don't believe spares can cross shelfs.



Steve Evans
Computing Services
(619) 594-0653
Re: # of volumes [ In reply to ]
On Wed, Aug 07, 2002 at 05:26:11PM -0400, Tom Yergeau wrote:
> If you lose one disk, rebuild, and lose another disk, you're still intact but
> in degraded mode and vulnerable. Having a second spare would be helpful. But
> the odds of losing two disks, the second one after the first one rebuilt but
> before NetApp got you a replacement 4 hrs or the next day, seems low.

If you have multiple filers, another option is to have one hot spare per
filer, plus one or two cold spares sitting in a box, so you can plug
a new disk in right away. Of course, this only works if there is
someone on site to plug it in. For a lights out operation, I would
probably feel more comfortable with two hot spares.

Also, if you have multiple disks sizes in a filer, you will want one hot
spare for each disk size.

--
Deron Johnson
djohnson@amgen.com
Re: # of volumes [ In reply to ]
Spares can fill in for drives smaller than the spare, so technically as long as
your spare is the same size as your largest drive, you're okay. So if you have
18, 36 and 72 GB drives and a 72 GB spare, and you lose an 18 GB drive, then the
72 GB spare will take over for it.

On the downside, that disk is now a permanent member of the RAID set and you've
lost all that extra space. The only way I know of to reclaim it would be to
replace the failed 18 GB drive, then do a disk fail on the 72 GB drive, let it
rebuild onto the 18 GB drive, then disk erase the 72 and put it back in as a
spare. Big PITA.

Generally, I prefer to keep a spare for each disk size. Life is easier that
way.


On Thu, 8 Aug 2002, Deron Johnson wrote:

>
> Also, if you have multiple disks sizes in a filer, you will want one hot
> spare for each disk size.