Mailing List Archive: Re: Question about flash pool maximum SSD size and local tiering

Re: Question about flash pool maximum SSD size and local tiering

Oct 16, 2023, 6:35 AM

Post #1 of 3 (263 views)

Hi Florian,

On 16.10.2023 15:01, Fenn, Michael wrote:
> FlashPool SSDs (unlike FlashCache) are attached to aggregates as normal RAID groups, so you can use as many 3.8 TB drives in RAID-DP as you like to hit the maximum FlashPool capacity.

Consider using RAID4 for the SSD RaidGroups... More Cache, and for the
Write-cached blocks, not really less safety. (SSDs are more reliable,
blocks are usually written fairly soon to HDD anyway)

Regarding sizes: any size that's supported for the HW should be fine
(3.8/15.3TB). Within the Cache-RG, you should use the same size disks,
however...

>
> Note that FlashPool and FabricPool use the same underlying tiering metadata structures, so you can only have one or the other enabled on any given aggregate.
>
> Thanks,
> Michael
>
> ?On 10/16/23, 8:03 AM, "Toasters on behalf of Florian Schmid via Toasters" <toasters-bounces@teaparty.net <mailto:toasters-bounces@teaparty.net> on behalf of toasters@teaparty.net <mailto:toasters@teaparty.net>> wrote:
>
>
> Hi Alexander,
>
> this is a very good tip! Tank you very much.
> I will have a look on this.
>
> Best regards,
> Florian
>
> ----- Ursprüngliche Mail -----
> Von: "Alexander Griesser" <AGriesser@anexia.com>
> An: "Florian Schmid" <fschmid@ubimet.com>, "toasters" <toasters@teaparty.net>
> Gesendet: Montag, 16. Oktober 2023 12:07:53
> Betreff: AW: Question about flash pool maximum SSD size and local tiering
>
> Hi Florian,
>
> I cannot answer the question with the SSD sizes, I'm not sure if this is really a hard requirement or if the slices just may not be bigger than 3.8TB (in that case, you could probably manually partition the SSDs), maybe someone else has more insights into this.
>
> As for your second question: You can spin up OnTap's integrated S3 server on your old boxes and use them as fabric pool targets:
> https://www.netapp.com/media/17219-tr4814.pdf
>
> Best,
>
> Alexander Griesser
> Head of Systems Operations
>
> ANEXIA Internetdienstleistungs GmbH
>
> E-Mail: AGriesser@anexia.com
> Web: https://www.anexia.com
>
> Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt
> Geschäftsführer: Alexander Windbichler
> Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601
>
> -----Ursprüngliche Nachricht-----
> Von: Florian Schmid <fschmid@ubimet.com>
> Gesendet: Montag, 16. Oktober 2023 11:53
> An: toasters@teaparty.net
> Betreff: Question about flash pool maximum SSD size and local tiering
>
> Hi,
>
> I have checked NetApp HWU for a FAS 8300 and system cache limits.
>
> Ok, so far, max flash-pool is 72 TB, which is a way more than I want to use, but I haven't seen usable SSDs greater than 3.8 TB.
>
> Is that really true, that I can't use a 7.6 TB or 15.3 TB SSD for flash-pool?
>
> It would be nice, if someone has here a deeper understanding than I have about this and can give me here some clarifications.
>
> May I ask a second question?
> Is flash-pool still the way to go for speeding up NL-SAS aggregates?
> I had a look on fabric-pool tiering, but it seems like, that this only works to S3 storage, which we don't have.
> We have plenty of NL-SAS storage and also of SSDs and it would be great to have a tiering between them or at least use them for caching.
>
> Best regards,
> Florian
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> https://www.teaparty.net/mailman/listinfo/toasters
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
https://www.teaparty.net/mailman/listinfo/toasters

Re: Question about flash pool maximum SSD size and local tiering [ In reply to ]

michael.bergman at norsborg

Oct 18, 2023, 10:03 AM

Post #2 of 3 (261 views)

Permalink

Ok so this is a minimal deployment: just one (1) HA-pair FAS8300.
This "archive" storage, is it for pure compliance reasons? (You mention
writing it out to tape even...)

> I thought for speeding the NL-SAS aggregates a little bit up, we also
> use some SSDs as flash-pool, like we have it now on our old dev NetApp
> cluster.

Sure, it is advisable definitely to have Flash as a chache in such a system.
But you won't need much at all that's my tip. Do the simulations with AWA
over at least 4 weeks (like Johan Gisl?n wrote) and see for yourself.

Now, if you know your application and use cases well, you will know if the
data written to all these NL-SAS drives will be read a lot occasionally and
if it will be random R (cache will help, and be necessary) or seq R. If the
latter, the SSDs won't really do much for you; the data will be sucked in
from spinning in that scenario and 7.2K rpm drives are VERY VERY slow... you
risk spending $$$ on SSD for almost no benefit if your scenario is like that.

Again: if you know your workload, and it is indeed very light -- it's
"archive" type data in the true sense and it pretty much just sits there on
NL-SAS once it's been written there once, you could just as well skip
FlashPool and use an adequate amount of FlashCache. It won't cache W of
course, but for archive type use cases it's unlikely to matter.

While it is possible to tune FlashPool a bit, there are quite a few
parameters you can change, but it's hard to make a difference IRL (I tried
it once with our heavy NFS aggregated workload and then just gave it up, not
worth the effort). If you know you have a large portion of random
overwrites in your workload (>> 10%) then FlashPool will "win" over
FlashCache. For READ they're the same pretty much and I cannot believe you'd
ever notice any difference.

This is too little info for me to understand:

> The other SSDs will be used for backup and DR purposes.

Do you perhaps mean that the 100% SSD Aggrs you plan to put in this FAS300
node pair are for DR purposes? DR of what? You perhaps plan to sync-mirror
data from your AFF based production cluster to this FAS8300? That's fine if
the workload is small enough for the FAS8300 to handle it in your DR
situation, but if I were you I would think long and hard about how to
recover from such a potential state where [part of] your production workload
goes to the FAS8300's SSD Aggr... I.e. how do you get back to your normal
production state once this has happened? If you cannot do that in any way
that makes sense, the cure might be worse than the decease to to speak.

I would also think through very thoroughly what your definition of
"disaster" is (in your specific situation) and which ones exactly this DR
you're referring to will protect from. It's always a complex optimisation
problem.

> We think about moving some data to the 8300 cluster in the long term,

So this data would be "low pressure" production data on your AFF cluster
now, I take it. It's not very intense but still not "archive" type data. So
putting such data on very slow spinning, is often dangerous in that it risks
getting performance issues. And the FlashPool might not help as much as you
would wish, even if you have lots of it.
This is the kind of scenario which will inevitably give you headaches in the
long run, moving data back and forth between different clusters isn't even
non-disruptive. How can you be sure that data you've moved to this slow
FAS8300, doesn't "pick up speed" again later and the application/data owners
start to complain? How can you know that you have adequate space at that
point in time to migrate it back to your AFF based production cluster?
If you know this, then no prob!

The very good thing about AFF (Cx00 and Ax00) is that you don't have to
care. You can throw anything and everything at it and all the workloads will
just be absorbed w.r.t. the back end -- it's a gift from Flash Land. The
limiter will be the CPU utilisation in the node itself. For this type of
scenario I strongly recommend you leverage FabricPool (you need an S3 back
end). The AFF Ax00 or Cx00 will have all Storage Efficiency running all the
time and this will be preserved when sent out into S3 Buckets. You can't
run full Storage Efficiency Chain on your FAS8300 with slow NL-SAS and
FlashPool. (It's supported AFAIK, but it will inevitably bite you.)

> I haven't looked deeper into it [FabricPool], as we are not using S3 at
> all in the moment. This was some years ago.
> As we always had only one all-flash cluster, I haven't thought about it.

Well, if you happen to have NetApp gear (older FAS) incl lots of NL-SAS
shelves, then definitely you should start running FabricPool on this one AFF
based production cluster you have. You still have to have some sort of
backup (SnapMirror/-Vault) just as you have now (I assume).
If you have lots of NL-SAS shelves already, but lack controllers, you can
buy some for a small sum of money.
FP will automagically move all the "cold" WAFL blocks out to S3 based
storage and ONTAP S3 is *fast*. No problem there ever, the (only) challenge
for you is to make sure the network connection between the two clusters is
rock solid.

> Should fabricpool not also work on a 2-node cluster?
> So instead of using some SSDs for flashpool, we could create an aggregate
> on SSD and one on NL-SAS and use the NL-SAS one for S3 storage and then
> forlocal fabricpool?

Yes, this way of doing things (FabricPool internally inside the same
cluster) should work. Not sure if you can do it within the same *node*
though, it may be that you have to have the S3 Bucket on a different node
than the S3 client (= the FabricPool back end).

Please anyone correct me if I'm wrong around this.

I agree that if this type of FP setup you describe is supported with a
2-node FAS8300, it's not a bad idea at all.

/M

-------- Original Message --------
Subject: Re: Question about flash pool maximum SSD size and local tiering
Date: Tue, 17 Oct 2023 14:12:28 +0000 (UTC)
From: Florian Schmid <fschmid@ubimet.com>
To: Michael Bergman <michael.bergman@norsborg.net>
CC: Toasters <toasters@teaparty.net>

Hi Michael,

wow, thank you very much for your time writing this very detailed explanation!

It will be one 2-node 8300 cluster, switchless.
The cluster will be mainly used for long time archive storage until it is
going to tape or for tape restores.
For this, we want to take a huge amount of NL-SAS drives.

I thought for speeding the NL-SAS aggregates a little bit up, we also use
some SSDs as flash-pool, like we have it now on our old dev NetApp cluster.

The other SSDs will be used for backup and DR purposes.
We have a full production all-flash cluster for our normal workloads.

We think about moving some data to the 8300 cluster in the long term,
because not all volumes we have now on SSD must be on flash and might
consume there too much "expensive" space.

I will also have a deeper look on fabricpool. I had a look already on it in
the past, but as I read S3 storage, I haven't looked deeper into it, as we
are not using S3 at all in the moment. This was some years ago.
As we always had only one all-flash cluster, I haven't thought about it.

Should fabricpool not also work on a 2-node cluster?
So instead of using some SSDs for flashpool, we could create an aggregate on
SSD and one on NL-SAS and use the NL-SAS one for S3 storage and then for
local fabricpool?

Best regards,
Florian
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
https://www.teaparty.net/mailman/listinfo/toasters

Re: Question about flash pool maximum SSD size and local tiering [ In reply to ]

michael.bergman at norsborg

Oct 18, 2023, 10:58 AM

Post #3 of 3 (260 views)

Permalink

On 2023-10-18 19:28, Jeffrey Steiner wrote:
> FlashPool was almost miraculous in its day, and it's still important.
> I've seen a bit of a resurgence for FlashPool in the past year for
> similar reasons to what you seem to have. We see these massive archival
> systems, and I'm strongly recommending generous FlashPool so whatever
> random IO might happen will be captured by the SSD layer.

For random R IOPS, a big FlashCache will do the job just as well. Arguably
better even, depending on the workload (the two are caching at different
levels, FlashCache is a victim cache underneath the WAFL buffer cache
whereas FlashPool is very different from that).

Sure you can have bigger FlashPool than -Cache, but in reality it will not
matter that's my experience of having lots of unstructured file data
"archive type" in very large NL-SAS Aggrs with FlashPool for a large no of
years.

If you have a lot of random overwrite in your workload (>> 10%) then
FlashPool will win, when there's 7.2k rpm drives in the back end.

YMMV but when testing things with AWA and measuring in real production, it's
not easy to get a large aggregated file storage workload coming into a 7.2k
rpm based ONTAP Aggr to perform well most of the time even if you have a big
FlashPool. No matter how you tune it (I did try), it tends to not be used
(filled up) as much as you'd expect and more IOPS go to spinning more often
than you'd like.
(E.g. our 8 TB FlashPool per Aggr was never ever filled to more than 20-30%
so there was a waste there; stranded capacity)

It's different of course if you have a well defined application and its
behaviour is known. Ideally the working set size and its temporal locality
needs to be such that it "suits" how FlashPool works to leverage a large
FlashPool size. How to match this is beyond me to be honest, very few NetApp
customers would be even close to knowing any of these things about their
workloads.

All this said: it's MUCH MUCH better to have a "too large" FlashPool/-Cache
than nothing on a 7.2K rpm based Aggr!

The difficulty is to not overspend on SSD's in this scenario, because
NetApp's price model makes SSD shelves very very expensive.

N.B. I'm not experienced at all with workloads coming from databases.
For that stuff you'd all be wise to listen to Jeff ;-)

I agree that looking at Cx00 is a good idea here in this use case and then
leverage FabricPool in a smart way.
I also concur here:

"...but there are also some huge capacity projects where
that [C-series, large QLC Flash] doesn't quite make financial sense."

Depending on your definition of huge, but let's say PiB scale. Today and for
the foreseeable future (5 y) there's no way Flash will be able to compete
with large spinning 7.2K rpm NL-SAS in terms of $/(TiB*month).
Perhaps not even in the next 10 y.

And the cheapest for true archiving use cases is still tape. To this day. I
don't expect this to change soon either.

/M

Jeffrey Steiner wrote:
> I spent years building database setups, and if I could get 5% of the total
> dataset size in the form of FlashPool SSD, then virtually all the IOPS would
> hit that SSD layer. There was often barely any difference between all-flash
> and Flashpool configurations. There would still be a lot of IO hitting the
> spinning drives, but it was the sequential IO, which honestly doesn't
> benefit much from SSD anyway.
>
> That approach mostly went out the window because all-flash got affordable.
> Even if you didn't technically need all-flash at the moment, it was cheap
> enough and futureproof. A second reason is the size of spinning drives. We
> used to regularly sell systems with eight SSDs and 500 spinning drives.
> There was a decent amount of spinning disk IOPS to go around. These days,
> you're often buying dramatically fewer spinning drives, which means it's
> easier to push them to their collective IOPS limits. FlashPool can be a nice
> cushion against IO surges.
>
> I'd also recommend taking a look at C-Series. The whole point of C-Series
> is all-flash capacity projects. It's the natural upgrade path for hybrid SSD
> systems. I don't know what price looks like. Some customers are definitely
> swapping hybrid for C-Series, but there are also some huge capacity projects
> where that doesn't quite make financial sense.
>
> Someday, though. Someday there will be no hybrid spinning disk systems and
> it will all be on these capacity-based all-flash systems, but there is still
> a role for FlashPool at present.

--
Sr Human ;-) Alt: r.m.bergman@gmail-DEL_THIS-.com
--
"Qui vicit non est victor nisi victus fatetur." - Ennius
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
https://www.teaparty.net/mailman/listinfo/toasters