Mailing List Archive

RE: NVRAM weirdness (UNCLASSIFIED)
Classification: UNCLASSIFIED
Caveats: NONE

If "NVRAM has no place in the IO operation of a filer", why doesn't every NetApp storage system have the same amount of NVRAM/NVMEM?
--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler
Sent: Thursday, April 11, 2013 2:10 PM
To: Scott Eno
Cc: Toasters@teaparty.net
Subject: Re: NVRAM weirdness

For the OP and others..NVRAM has no place in the IO operation of a filer.

It holds copies of data that are in main RAM, just in -case- there is a problem with site power/etc.


There has been a continued misunderstanding of how Netapp NVRAM works over time..when in fact, it does no work at all.

:)


In sysstat..a view of "disks" being 100% busy is more a view of "storage" as a whole. Not 'a' disk. If storage is stuck..then CP time will also be likewise stuck at 100%...


Incorrect interpretation of sysstat and how Ontap works...can lead to a lot of confusion, and continued incorrect phraseology.






On Thu, Apr 11, 2013 at 12:31 PM, Scott Eno <s.eno@me.com> wrote:


Looks like you're hanging on a consistency point operation, but the sysstat output starts after that last CP was logged. This looks very similar to the ":s" CP bug, but can't say for sure. You have a long running CP operation, you NVRAM has filled up waiting for the operation to complete, but with no "s" after the "#", then it's not processing special files. It's doing something else.

Could be one of these bugs:

651573 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651597 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651926 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651929 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
655724 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
657018 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
650554 Inefficient pre-fetching of metadata blocks delays WAFL consistency point



On Apr 10, 2013, at 6:46 PM, Peta Thames <petathames@gmail.com> wrote:


Hi all,


Has anyone seen this pattern in a sysstat before?

TOASTERa> sysstat -u 1
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
29% 36942 1 0 264 0 0 0 7s 91% 100% # 100%
29% 36896 1 0 1616 0 0 0 7s 90% 100% # 100%
28% 36658 1 0 32 24 0 0 7s 100% 100% # 100%
28% 36625 1 0 108 0 0 0 7s 97% 100% # 100%
27% 36318 1 0 136 8 0 0 7s 100% 100% # 100%
28% 36391 1 0 364 144 0 0 7s 97% 100% # 100%
25% 30387 1 0 88 0 0 0 7s 98% 100% # 100%
21% 24450 1 0 8 0 0 0 7s 100% 100% # 100%
20% 22042 1 0 24 672 0 0 7s 100% 100% # 100%
18% 19781 1 0 284 8 0 0 7s 69% 100% # 100%
18% 19636 1 0 0 0 0 0 7s 100% 100% # 100%
16% 18066 1 0 24 24 0 0 7s 100% 100% # 100%
15% 16999 1 0 56 0 0 0 7s 56% 100% # 100%
15% 16622 1 0 11040 6496 0 0 7s 46% 100% # 1824%
14% 15947 1 0 24 32 0 0 7s 100% 100% # 100%
14% 15448 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15902 0 0 3304 4532 0 0 7s 88% 100% # 100%
14% 15800 0 0 32 24 0 0 7s 100% 100% # 100%
15% 16651 0 0 0 0 0 0 7s 100% 100% # 100%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
14% 15778 0 0 1520 2932 0 0 7s 100% 100% # 100%
13% 14311 0 1 28 24 0 0 7s 100% 100% # 100%
12% 12153 0 0 0 0 0 0 7s 100% 100% # 100%
12% 11993 0 0 8 0 0 0 7s 100% 100% # 100%
12% 12134 0 0 7368 5996 0 0 7s 18% 100% # 100%
13% 11981 0 0 20356 14956 <tel:20356%C2%A0%2014956> 0 0 7s 100% 100% # 100%
12% 11978 0 0 0 0 0 0 7s 100% 100% # 100%
11% 11617 0 0 24 24 0 0 7s 100% 100% # 100%
12% 11708 0 0 8 0 0 0 7s 100% 100% # 100%
11% 11630 0 0 11936 17872 0 0 7s 100% 100% # 100%
11% 11639 0 0 24 32 0 0 7s 100% 100% # 100%
12% 11791 0 0 12 0 0 0 7s 87% 100% # 100%
22% 11874 0 0 64452 92188 0 0 7s 100% 100% # 100%
12% 12015 0 0 32 24 0 0 7s 100% 100% # 100%
12% 11954 0 0 556 0 0 0 7s 85% 100% # 100%
12% 11899 0 0 0 8 0 0 7s 100% 100% # 100%
13% 13408 0 0 23816 35600 0 0 7s 91% 100% # 100%
13% 14610 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15355 0 0 1248 0 0 0 7s 72% 100% # 100%
14% 15584 0 0 18652 15224 0 0 7s 98% 100% # 100%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
14% 15397 0 0 0 8 0 0 7s 100% 100% # 100%
15% 15859 0 1 0 0 0 0 7s 100% 100% # 100%
14% 15557 0 0 4944 4568 0 0 7s 100% 100% # 100%
14% 15559 0 0 8 0 0 0 7s 100% 100% # 100%
14% 15603 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15507 0 0 3652 4724 0 0 7s 100% 100% # 100%
14% 15593 0 0 20 0 0 0 7s 75% 100% # 100%
12% 12571 0 0 268 0 0 0 7s 100% 100% # 100%
10% 9962 0 0 92 2072 0 0 7s 96% 100% # 100%
9% 8521 0 0 0 0 0 0 7s 100% 100% # 100%
8% 7680 0 0 4 8 0 0 7s 100% 100% # 100%
8% 7459 0 0 24 24 0 0 7s 100% 100% # 100%
8% 7050 0 0 2580 1600 0 0 7s 100% 100% # 100%
8% 6961 0 0 8 0 0 0 7s 100% 100% # 100%
8% 6954 0 0 24 24 0 0 7s 100% 100% # 100%
7% 6936 0 0 584 2400 0 0 7s 100% 100% # 100%
8% 7043 0 0 248 0 0 0 7s 0% 100% # 100%
7% 6946 0 0 24 24 0 0 7s 100% 100% # 100%
8% 7320 0 0 1860 7252 0 0 7s 100% 100% # 100%
55% 19882 2 2 89920 29733 0 0 1 86% 99% B 45%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
88% 17409 31 187 267319 196942 0 0 6s 83% 100% : 89%
87% 16064 11 249 304646 192912 0 0 1 78% 100% : 85%
85% 6274 70 86 322234 233245 0 0 1 81% 100% # 87%
85% 12486 18 340 236474 102383 0 0 3s 82% 100% B 73%
98% 13842 3 29 247937 255188 0 0 2s 85% 100% : 80%
99% 17878 1 1 348355 223367 0 0 3s 84% 100% # 86%
99% 4117 1 1 359628 269212 0 0 3s 78% 100% # 92%
83% 12622 10 31 252272 170064 0 0 2s 83% 100% B 73%
94% 19321 4 10 248864 249292 0 0 2s 88% 100% : 96%
99% 15893 0 0 318375 261482 0 0 2s 85% 100% : 89%
88% 13326 34 76 284960 194032 0 0 2s 82% 100% B 90%
86% 11106 73 104 267760 231524 0 0 2s 81% 100% : 100%
72% 11160 0 0 206927 128869 0 0 3s 75% 100% : 100%
63% 11304 0 0 220736 25276 0 0 3s 67% 100% : 100%
68% 11216 0 0 193324 15184 0 0 3s 66% 100% : 100%
65% 11343 0 0 175120 12588 0 0 3s 60% 100% : 100%
52% 7384 0 0 205780 104564 0 0 3s 61% 100% # 100%
83% 7205 0 0 261108 178256 0 0 3s 74% 100% B 90%
75% 11256 0 0 225164 96332 0 0 3s 76% 100% : 93%
99% 6279 0 0 296056 251191 0 0 2s 87% 100% : 90%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
65% 8616 0 0 220068 165749 0 0 2s 76% 73% : 71%
89% 5443 0 1 338684 239932 0 0 3s 80% 94% F 87%
95% 6726 0 0 379242 240750 0 0 3s 79% 100% : 91%
83% 8721 1 0 342940 113394 0 0 2s 82% 61% : 89%
77% 11971 1 0 314356 24 0 0 2s 85% 0% - 92%


... and back to normal.

Looking in Perf Advisor, it was during an extremely short, sharp latency spike that impacted every volume on every aggregate at exact the same time. It was the fourth of four spikes that had happened a few minutes apart during the previous 20 mins.

It looks like the NVRAM flush got stuck somehow but I have no idea what would cause that.

We're running ONTAP 8.0.2 7-mode on a very busy FAS3170.


Thanks,
Peta

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters




_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters






--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
It is sized appropriately to the capabilities it [controller] was designed
for.

Hence..adding more, would not provide an improvement for any given system.




On Thu, Apr 11, 2013 at 7:31 PM, Blackburn, James M (Jim) Jr. CIV USARMY
PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:

> Classification: UNCLASSIFIED
> Caveats: NONE
>
> If "NVRAM has no place in the IO operation of a filer", why doesn't every
> NetApp storage system have the same amount of NVRAM/NVMEM?
> --
> Jim Blackburn
> US Army
> james.m.blackburn33.civ@mail.mil
> Note: I am not authorized to obligate, commit, or execute United States
> government funds. No part of this message should be construed as a request
> or directive to obligate, commit, or execute United States government funds.
>
>
> -----Original Message-----
> From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net]
> On Behalf Of Jeff Mohler
> Sent: Thursday, April 11, 2013 2:10 PM
> To: Scott Eno
> Cc: Toasters@teaparty.net
> Subject: Re: NVRAM weirdness
>
> For the OP and others..NVRAM has no place in the IO operation of a filer.
>
> It holds copies of data that are in main RAM, just in -case- there is a
> problem with site power/etc.
>
>
> There has been a continued misunderstanding of how Netapp NVRAM works over
> time..when in fact, it does no work at all.
>
> :)
>
>
> In sysstat..a view of "disks" being 100% busy is more a view of "storage"
> as a whole. Not 'a' disk. If storage is stuck..then CP time will also be
> likewise stuck at 100%...
>
>
> Incorrect interpretation of sysstat and how Ontap works...can lead to a
> lot of confusion, and continued incorrect phraseology.
>
>
>
>
>
>
> On Thu, Apr 11, 2013 at 12:31 PM, Scott Eno <s.eno@me.com> wrote:
>
>
> Looks like you're hanging on a consistency point operation, but
> the sysstat output starts after that last CP was logged. This looks very
> similar to the ":s" CP bug, but can't say for sure. You have a long
> running CP operation, you NVRAM has filled up waiting for the operation to
> complete, but with no "s" after the "#", then it's not processing special
> files. It's doing something else.
>
> Could be one of these bugs:
>
> 651573 Inefficient pre-fetching of metadata blocks delays WAFL
> consistency point
> 651597 Inefficient pre-fetching of metadata blocks delays WAFL
> consistency point
> 651926 Inefficient pre-fetching of metadata blocks delays WAFL
> consistency point
> 651929 Inefficient pre-fetching of metadata blocks delays WAFL
> consistency point
> 655724 Inefficient pre-fetching of metadata blocks delays WAFL
> consistency point
> 657018 Inefficient pre-fetching of metadata blocks delays WAFL
> consistency point
> 650554 Inefficient pre-fetching of metadata blocks delays WAFL
> consistency point
>
>
>
> On Apr 10, 2013, at 6:46 PM, Peta Thames <petathames@gmail.com>
> wrote:
>
>
> Hi all,
>
>
> Has anyone seen this pattern in a sysstat before?
>
> TOASTERa> sysstat -u 1
> CPU Total Net kB/s Disk kB/s Tape kB/s
> Cache Cache CP CP Disk
> ops/s in out read write read write
> age hit time ty util
> 29% 36942 1 0 264 0 0 0
> 7s 91% 100% # 100%
> 29% 36896 1 0 1616 0 0 0
> 7s 90% 100% # 100%
> 28% 36658 1 0 32 24 0 0
> 7s 100% 100% # 100%
> 28% 36625 1 0 108 0 0 0
> 7s 97% 100% # 100%
> 27% 36318 1 0 136 8 0 0
> 7s 100% 100% # 100%
> 28% 36391 1 0 364 144 0 0
> 7s 97% 100% # 100%
> 25% 30387 1 0 88 0 0 0
> 7s 98% 100% # 100%
> 21% 24450 1 0 8 0 0 0
> 7s 100% 100% # 100%
> 20% 22042 1 0 24 672 0 0
> 7s 100% 100% # 100%
> 18% 19781 1 0 284 8 0 0
> 7s 69% 100% # 100%
> 18% 19636 1 0 0 0 0 0
> 7s 100% 100% # 100%
> 16% 18066 1 0 24 24 0 0
> 7s 100% 100% # 100%
> 15% 16999 1 0 56 0 0 0
> 7s 56% 100% # 100%
> 15% 16622 1 0 11040 6496 0 0
> 7s 46% 100% # 1824%
> 14% 15947 1 0 24 32 0 0
> 7s 100% 100% # 100%
> 14% 15448 0 0 0 0 0 0
> 7s 100% 100% # 100%
> 14% 15902 0 0 3304 4532 0 0
> 7s 88% 100% # 100%
> 14% 15800 0 0 32 24 0 0
> 7s 100% 100% # 100%
> 15% 16651 0 0 0 0 0 0
> 7s 100% 100% # 100%
> CPU Total Net kB/s Disk kB/s Tape kB/s
> Cache Cache CP CP Disk
> ops/s in out read write read write
> age hit time ty util
> 14% 15778 0 0 1520 2932 0 0
> 7s 100% 100% # 100%
> 13% 14311 0 1 28 24 0 0
> 7s 100% 100% # 100%
> 12% 12153 0 0 0 0 0 0
> 7s 100% 100% # 100%
> 12% 11993 0 0 8 0 0 0
> 7s 100% 100% # 100%
> 12% 12134 0 0 7368 5996 0 0
> 7s 18% 100% # 100%
> 13% 11981 0 0 20356 14956
> <tel:20356%C2%A0%2014956> 0 0 7s 100% 100% # 100%
> 12% 11978 0 0 0 0 0 0
> 7s 100% 100% # 100%
> 11% 11617 0 0 24 24 0 0
> 7s 100% 100% # 100%
> 12% 11708 0 0 8 0 0 0
> 7s 100% 100% # 100%
> 11% 11630 0 0 11936 17872 0 0
> 7s 100% 100% # 100%
> 11% 11639 0 0 24 32 0 0
> 7s 100% 100% # 100%
> 12% 11791 0 0 12 0 0 0
> 7s 87% 100% # 100%
> 22% 11874 0 0 64452 92188 0 0
> 7s 100% 100% # 100%
> 12% 12015 0 0 32 24 0 0
> 7s 100% 100% # 100%
> 12% 11954 0 0 556 0 0 0
> 7s 85% 100% # 100%
> 12% 11899 0 0 0 8 0 0
> 7s 100% 100% # 100%
> 13% 13408 0 0 23816 35600 0 0
> 7s 91% 100% # 100%
> 13% 14610 0 0 0 0 0 0
> 7s 100% 100% # 100%
> 14% 15355 0 0 1248 0 0 0
> 7s 72% 100% # 100%
> 14% 15584 0 0 18652 15224 0 0
> 7s 98% 100% # 100%
> CPU Total Net kB/s Disk kB/s Tape kB/s
> Cache Cache CP CP Disk
> ops/s in out read write read write
> age hit time ty util
> 14% 15397 0 0 0 8 0 0
> 7s 100% 100% # 100%
> 15% 15859 0 1 0 0 0 0
> 7s 100% 100% # 100%
> 14% 15557 0 0 4944 4568 0 0
> 7s 100% 100% # 100%
> 14% 15559 0 0 8 0 0 0
> 7s 100% 100% # 100%
> 14% 15603 0 0 0 0 0 0
> 7s 100% 100% # 100%
> 14% 15507 0 0 3652 4724 0 0
> 7s 100% 100% # 100%
> 14% 15593 0 0 20 0 0 0
> 7s 75% 100% # 100%
> 12% 12571 0 0 268 0 0 0
> 7s 100% 100% # 100%
> 10% 9962 0 0 92 2072 0 0
> 7s 96% 100% # 100%
> 9% 8521 0 0 0 0 0 0
> 7s 100% 100% # 100%
> 8% 7680 0 0 4 8 0 0
> 7s 100% 100% # 100%
> 8% 7459 0 0 24 24 0 0
> 7s 100% 100% # 100%
> 8% 7050 0 0 2580 1600 0 0
> 7s 100% 100% # 100%
> 8% 6961 0 0 8 0 0 0
> 7s 100% 100% # 100%
> 8% 6954 0 0 24 24 0 0
> 7s 100% 100% # 100%
> 7% 6936 0 0 584 2400 0 0
> 7s 100% 100% # 100%
> 8% 7043 0 0 248 0 0 0
> 7s 0% 100% # 100%
> 7% 6946 0 0 24 24 0 0
> 7s 100% 100% # 100%
> 8% 7320 0 0 1860 7252 0 0
> 7s 100% 100% # 100%
> 55% 19882 2 2 89920 29733 0 0
> 1 86% 99% B 45%
> CPU Total Net kB/s Disk kB/s Tape kB/s
> Cache Cache CP CP Disk
> ops/s in out read write read write
> age hit time ty util
> 88% 17409 31 187 267319 196942 0 0
> 6s 83% 100% : 89%
> 87% 16064 11 249 304646 192912 0 0
> 1 78% 100% : 85%
> 85% 6274 70 86 322234 233245 0 0
> 1 81% 100% # 87%
> 85% 12486 18 340 236474 102383 0 0
> 3s 82% 100% B 73%
> 98% 13842 3 29 247937 255188 0 0
> 2s 85% 100% : 80%
> 99% 17878 1 1 348355 223367 0 0
> 3s 84% 100% # 86%
> 99% 4117 1 1 359628 269212 0 0
> 3s 78% 100% # 92%
> 83% 12622 10 31 252272 170064 0 0
> 2s 83% 100% B 73%
> 94% 19321 4 10 248864 249292 0 0
> 2s 88% 100% : 96%
> 99% 15893 0 0 318375 261482 0 0
> 2s 85% 100% : 89%
> 88% 13326 34 76 284960 194032 0 0
> 2s 82% 100% B 90%
> 86% 11106 73 104 267760 231524 0 0
> 2s 81% 100% : 100%
> 72% 11160 0 0 206927 128869 0 0
> 3s 75% 100% : 100%
> 63% 11304 0 0 220736 25276 0 0
> 3s 67% 100% : 100%
> 68% 11216 0 0 193324 15184 0 0
> 3s 66% 100% : 100%
> 65% 11343 0 0 175120 12588 0 0
> 3s 60% 100% : 100%
> 52% 7384 0 0 205780 104564 0 0
> 3s 61% 100% # 100%
> 83% 7205 0 0 261108 178256 0 0
> 3s 74% 100% B 90%
> 75% 11256 0 0 225164 96332 0 0
> 3s 76% 100% : 93%
> 99% 6279 0 0 296056 251191 0 0
> 2s 87% 100% : 90%
> CPU Total Net kB/s Disk kB/s Tape kB/s
> Cache Cache CP CP Disk
> ops/s in out read write read write
> age hit time ty util
> 65% 8616 0 0 220068 165749 0 0
> 2s 76% 73% : 71%
> 89% 5443 0 1 338684 239932 0 0
> 3s 80% 94% F 87%
> 95% 6726 0 0 379242 240750 0 0
> 3s 79% 100% : 91%
> 83% 8721 1 0 342940 113394 0 0
> 2s 82% 61% : 89%
> 77% 11971 1 0 314356 24 0 0
> 2s 85% 0% - 92%
>
>
> ... and back to normal.
>
> Looking in Perf Advisor, it was during an extremely short,
> sharp latency spike that impacted every volume on every aggregate at exact
> the same time. It was the fourth of four spikes that had happened a few
> minutes apart during the previous 20 mins.
>
> It looks like the NVRAM flush got stuck somehow but I have
> no idea what would cause that.
>
> We're running ONTAP 8.0.2 7-mode on a very busy FAS3170.
>
>
> Thanks,
> Peta
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>



--
---
Gustatus Similis Pullus
RE: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
Classification: UNCLASSIFIED
Caveats: NONE

Would "capabilities" happen to be a reference to "I/O capabilities?"
--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----
From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
Sent: Thursday, April 11, 2013 2:56 PM
To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
Cc: Toasters@teaparty.net
Subject: Re: NVRAM weirdness (UNCLASSIFIED)

It is sized appropriately to the capabilities it [controller] was designed for.

Hence..adding more, would not provide an improvement for any given system.





On Thu, Apr 11, 2013 at 7:31 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:


Classification: UNCLASSIFIED
Caveats: NONE

If "NVRAM has no place in the IO operation of a filer", why doesn't every NetApp storage system have the same amount of NVRAM/NVMEM?
--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler
Sent: Thursday, April 11, 2013 2:10 PM
To: Scott Eno
Cc: Toasters@teaparty.net
Subject: Re: NVRAM weirdness

For the OP and others..NVRAM has no place in the IO operation of a filer.

It holds copies of data that are in main RAM, just in -case- there is a problem with site power/etc.


There has been a continued misunderstanding of how Netapp NVRAM works over time..when in fact, it does no work at all.

:)


In sysstat..a view of "disks" being 100% busy is more a view of "storage" as a whole. Not 'a' disk. If storage is stuck..then CP time will also be likewise stuck at 100%...


Incorrect interpretation of sysstat and how Ontap works...can lead to a lot of confusion, and continued incorrect phraseology.






On Thu, Apr 11, 2013 at 12:31 PM, Scott Eno <s.eno@me.com> wrote:


Looks like you're hanging on a consistency point operation, but the sysstat output starts after that last CP was logged. This looks very similar to the ":s" CP bug, but can't say for sure. You have a long running CP operation, you NVRAM has filled up waiting for the operation to complete, but with no "s" after the "#", then it's not processing special files. It's doing something else.

Could be one of these bugs:

651573 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651597 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651926 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651929 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
655724 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
657018 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
650554 Inefficient pre-fetching of metadata blocks delays WAFL consistency point



On Apr 10, 2013, at 6:46 PM, Peta Thames <petathames@gmail.com> wrote:


Hi all,


Has anyone seen this pattern in a sysstat before?

TOASTERa> sysstat -u 1
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
29% 36942 1 0 264 0 0 0 7s 91% 100% # 100%
29% 36896 1 0 1616 0 0 0 7s 90% 100% # 100%
28% 36658 1 0 32 24 0 0 7s 100% 100% # 100%
28% 36625 1 0 108 0 0 0 7s 97% 100% # 100%
27% 36318 1 0 136 8 0 0 7s 100% 100% # 100%
28% 36391 1 0 364 144 0 0 7s 97% 100% # 100%
25% 30387 1 0 88 0 0 0 7s 98% 100% # 100%
21% 24450 1 0 8 0 0 0 7s 100% 100% # 100%
20% 22042 1 0 24 672 0 0 7s 100% 100% # 100%
18% 19781 1 0 284 8 0 0 7s 69% 100% # 100%
18% 19636 1 0 0 0 0 0 7s 100% 100% # 100%
16% 18066 1 0 24 24 0 0 7s 100% 100% # 100%
15% 16999 1 0 56 0 0 0 7s 56% 100% # 100%
15% 16622 1 0 11040 6496 0 0 7s 46% 100% # 1824%
14% 15947 1 0 24 32 0 0 7s 100% 100% # 100%
14% 15448 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15902 0 0 3304 4532 0 0 7s 88% 100% # 100%
14% 15800 0 0 32 24 0 0 7s 100% 100% # 100%
15% 16651 0 0 0 0 0 0 7s 100% 100% # 100%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
14% 15778 0 0 1520 2932 0 0 7s 100% 100% # 100%
13% 14311 0 1 28 24 0 0 7s 100% 100% # 100%
12% 12153 0 0 0 0 0 0 7s 100% 100% # 100%
12% 11993 0 0 8 0 0 0 7s 100% 100% # 100%
12% 12134 0 0 7368 5996 0 0 7s 18% 100% # 100%
13% 11981 0 0 20356 14956 <tel:20356%C2%A0%2014956> 0 0 7s 100% 100% # 100%
12% 11978 0 0 0 0 0 0 7s 100% 100% # 100%
11% 11617 0 0 24 24 0 0 7s 100% 100% # 100%
12% 11708 0 0 8 0 0 0 7s 100% 100% # 100%
11% 11630 0 0 11936 17872 0 0 7s 100% 100% # 100%
11% 11639 0 0 24 32 0 0 7s 100% 100% # 100%
12% 11791 0 0 12 0 0 0 7s 87% 100% # 100%
22% 11874 0 0 64452 92188 0 0 7s 100% 100% # 100%
12% 12015 0 0 32 24 0 0 7s 100% 100% # 100%
12% 11954 0 0 556 0 0 0 7s 85% 100% # 100%
12% 11899 0 0 0 8 0 0 7s 100% 100% # 100%
13% 13408 0 0 23816 35600 0 0 7s 91% 100% # 100%
13% 14610 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15355 0 0 1248 0 0 0 7s 72% 100% # 100%
14% 15584 0 0 18652 15224 0 0 7s 98% 100% # 100%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
14% 15397 0 0 0 8 0 0 7s 100% 100% # 100%
15% 15859 0 1 0 0 0 0 7s 100% 100% # 100%
14% 15557 0 0 4944 4568 0 0 7s 100% 100% # 100%
14% 15559 0 0 8 0 0 0 7s 100% 100% # 100%
14% 15603 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15507 0 0 3652 4724 0 0 7s 100% 100% # 100%
14% 15593 0 0 20 0 0 0 7s 75% 100% # 100%
12% 12571 0 0 268 0 0 0 7s 100% 100% # 100%
10% 9962 0 0 92 2072 0 0 7s 96% 100% # 100%
9% 8521 0 0 0 0 0 0 7s 100% 100% # 100%
8% 7680 0 0 4 8 0 0 7s 100% 100% # 100%
8% 7459 0 0 24 24 0 0 7s 100% 100% # 100%
8% 7050 0 0 2580 1600 0 0 7s 100% 100% # 100%
8% 6961 0 0 8 0 0 0 7s 100% 100% # 100%
8% 6954 0 0 24 24 0 0 7s 100% 100% # 100%
7% 6936 0 0 584 2400 0 0 7s 100% 100% # 100%
8% 7043 0 0 248 0 0 0 7s 0% 100% # 100%
7% 6946 0 0 24 24 0 0 7s 100% 100% # 100%
8% 7320 0 0 1860 7252 0 0 7s 100% 100% # 100%
55% 19882 2 2 89920 29733 0 0 1 86% 99% B 45%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
88% 17409 31 187 267319 196942 0 0 6s 83% 100% : 89%
87% 16064 11 249 304646 192912 0 0 1 78% 100% : 85%
85% 6274 70 86 322234 233245 0 0 1 81% 100% # 87%
85% 12486 18 340 236474 102383 0 0 3s 82% 100% B 73%
98% 13842 3 29 247937 255188 0 0 2s 85% 100% : 80%
99% 17878 1 1 348355 223367 0 0 3s 84% 100% # 86%
99% 4117 1 1 359628 269212 0 0 3s 78% 100% # 92%
83% 12622 10 31 252272 170064 0 0 2s 83% 100% B 73%
94% 19321 4 10 248864 249292 0 0 2s 88% 100% : 96%
99% 15893 0 0 318375 261482 0 0 2s 85% 100% : 89%
88% 13326 34 76 284960 194032 0 0 2s 82% 100% B 90%
86% 11106 73 104 267760 231524 0 0 2s 81% 100% : 100%
72% 11160 0 0 206927 128869 0 0 3s 75% 100% : 100%
63% 11304 0 0 220736 25276 0 0 3s 67% 100% : 100%
68% 11216 0 0 193324 15184 0 0 3s 66% 100% : 100%
65% 11343 0 0 175120 12588 0 0 3s 60% 100% : 100%
52% 7384 0 0 205780 104564 0 0 3s 61% 100% # 100%
83% 7205 0 0 261108 178256 0 0 3s 74% 100% B 90%
75% 11256 0 0 225164 96332 0 0 3s 76% 100% : 93%
99% 6279 0 0 296056 251191 0 0 2s 87% 100% : 90%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
65% 8616 0 0 220068 165749 0 0 2s 76% 73% : 71%
89% 5443 0 1 338684 239932 0 0 3s 80% 94% F 87%
95% 6726 0 0 379242 240750 0 0 3s 79% 100% : 91%
83% 8721 1 0 342940 113394 0 0 2s 82% 61% : 89%
77% 11971 1 0 314356 24 0 0 2s 85% 0% - 92%


... and back to normal.

Looking in Perf Advisor, it was during an extremely short, sharp latency spike that impacted every volume on every aggregate at exact the same time. It was the fourth of four spikes that had happened a few minutes apart during the previous 20 mins.

It looks like the NVRAM flush got stuck somehow but I have no idea what would cause that.

We're running ONTAP 8.0.2 7-mode on a very busy FAS3170.


Thanks,
Peta

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters




_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters






--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters





--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
Sure. NVRAM only holds a copy of data to be written..if all goes well,
it's discarded.



On Thu, Apr 11, 2013 at 8:00 PM, Blackburn, James M (Jim) Jr. CIV USARMY
PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:

> Classification: UNCLASSIFIED
> Caveats: NONE
>
> Would "capabilities" happen to be a reference to "I/O capabilities?"
> --
> Jim Blackburn
> US Army
> james.m.blackburn33.civ@mail.mil
> Note: I am not authorized to obligate, commit, or execute United States
> government funds. No part of this message should be construed as a request
> or directive to obligate, commit, or execute United States government funds.
>
>
> -----Original Message-----
> From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
> Sent: Thursday, April 11, 2013 2:56 PM
> To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
> Cc: Toasters@teaparty.net
> Subject: Re: NVRAM weirdness (UNCLASSIFIED)
>
> It is sized appropriately to the capabilities it [controller] was designed
> for.
>
> Hence..adding more, would not provide an improvement for any given system.
>
>
>
>
>
> On Thu, Apr 11, 2013 at 7:31 PM, Blackburn, James M (Jim) Jr. CIV USARMY
> PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:
>
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> If "NVRAM has no place in the IO operation of a filer", why
> doesn't every NetApp storage system have the same amount of NVRAM/NVMEM?
> --
> Jim Blackburn
> US Army
> james.m.blackburn33.civ@mail.mil
> Note: I am not authorized to obligate, commit, or execute United
> States government funds. No part of this message should be construed as a
> request or directive to obligate, commit, or execute United States
> government funds.
>
>
> -----Original Message-----
> From: toasters-bounces@teaparty.net [mailto:
> toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler
> Sent: Thursday, April 11, 2013 2:10 PM
> To: Scott Eno
> Cc: Toasters@teaparty.net
> Subject: Re: NVRAM weirdness
>
> For the OP and others..NVRAM has no place in the IO operation of a
> filer.
>
> It holds copies of data that are in main RAM, just in -case- there
> is a problem with site power/etc.
>
>
> There has been a continued misunderstanding of how Netapp NVRAM
> works over time..when in fact, it does no work at all.
>
> :)
>
>
> In sysstat..a view of "disks" being 100% busy is more a view of
> "storage" as a whole. Not 'a' disk. If storage is stuck..then CP time
> will also be likewise stuck at 100%...
>
>
> Incorrect interpretation of sysstat and how Ontap works...can lead
> to a lot of confusion, and continued incorrect phraseology.
>
>
>
>
>
>
> On Thu, Apr 11, 2013 at 12:31 PM, Scott Eno <s.eno@me.com> wrote:
>
>
> Looks like you're hanging on a consistency point
> operation, but the sysstat output starts after that last CP was logged.
> This looks very similar to the ":s" CP bug, but can't say for sure. You
> have a long running CP operation, you NVRAM has filled up waiting for the
> operation to complete, but with no "s" after the "#", then it's not
> processing special files. It's doing something else.
>
> Could be one of these bugs:
>
> 651573 Inefficient pre-fetching of metadata blocks delays
> WAFL consistency point
> 651597 Inefficient pre-fetching of metadata blocks delays
> WAFL consistency point
> 651926 Inefficient pre-fetching of metadata blocks delays
> WAFL consistency point
> 651929 Inefficient pre-fetching of metadata blocks delays
> WAFL consistency point
> 655724 Inefficient pre-fetching of metadata blocks delays
> WAFL consistency point
> 657018 Inefficient pre-fetching of metadata blocks delays
> WAFL consistency point
> 650554 Inefficient pre-fetching of metadata blocks delays
> WAFL consistency point
>
>
>
> On Apr 10, 2013, at 6:46 PM, Peta Thames <
> petathames@gmail.com> wrote:
>
>
> Hi all,
>
>
> Has anyone seen this pattern in a sysstat before?
>
> TOASTERa> sysstat -u 1
> CPU Total Net kB/s Disk kB/s Tape
> kB/s Cache Cache CP CP Disk
> ops/s in out read write read
> write age hit time ty util
> 29% 36942 1 0 264 0 0
> 0 7s 91% 100% # 100%
> 29% 36896 1 0 1616 0 0
> 0 7s 90% 100% # 100%
> 28% 36658 1 0 32 24 0
> 0 7s 100% 100% # 100%
> 28% 36625 1 0 108 0 0
> 0 7s 97% 100% # 100%
> 27% 36318 1 0 136 8 0
> 0 7s 100% 100% # 100%
> 28% 36391 1 0 364 144 0
> 0 7s 97% 100% # 100%
> 25% 30387 1 0 88 0 0
> 0 7s 98% 100% # 100%
> 21% 24450 1 0 8 0 0
> 0 7s 100% 100% # 100%
> 20% 22042 1 0 24 672 0
> 0 7s 100% 100% # 100%
> 18% 19781 1 0 284 8 0
> 0 7s 69% 100% # 100%
> 18% 19636 1 0 0 0 0
> 0 7s 100% 100% # 100%
> 16% 18066 1 0 24 24 0
> 0 7s 100% 100% # 100%
> 15% 16999 1 0 56 0 0
> 0 7s 56% 100% # 100%
> 15% 16622 1 0 11040 6496 0
> 0 7s 46% 100% # 1824%
> 14% 15947 1 0 24 32 0
> 0 7s 100% 100% # 100%
> 14% 15448 0 0 0 0 0
> 0 7s 100% 100% # 100%
> 14% 15902 0 0 3304 4532 0
> 0 7s 88% 100% # 100%
> 14% 15800 0 0 32 24 0
> 0 7s 100% 100% # 100%
> 15% 16651 0 0 0 0 0
> 0 7s 100% 100% # 100%
> CPU Total Net kB/s Disk kB/s Tape
> kB/s Cache Cache CP CP Disk
> ops/s in out read write read
> write age hit time ty util
> 14% 15778 0 0 1520 2932 0
> 0 7s 100% 100% # 100%
> 13% 14311 0 1 28 24 0
> 0 7s 100% 100% # 100%
> 12% 12153 0 0 0 0 0
> 0 7s 100% 100% # 100%
> 12% 11993 0 0 8 0 0
> 0 7s 100% 100% # 100%
> 12% 12134 0 0 7368 5996 0
> 0 7s 18% 100% # 100%
> 13% 11981 0 0 20356 14956
> <tel:20356%C2%A0%2014956> 0 0 7s 100% 100% # 100%
> 12% 11978 0 0 0 0 0
> 0 7s 100% 100% # 100%
> 11% 11617 0 0 24 24 0
> 0 7s 100% 100% # 100%
> 12% 11708 0 0 8 0 0
> 0 7s 100% 100% # 100%
> 11% 11630 0 0 11936 17872 0
> 0 7s 100% 100% # 100%
> 11% 11639 0 0 24 32 0
> 0 7s 100% 100% # 100%
> 12% 11791 0 0 12 0 0
> 0 7s 87% 100% # 100%
> 22% 11874 0 0 64452 92188 0
> 0 7s 100% 100% # 100%
> 12% 12015 0 0 32 24 0
> 0 7s 100% 100% # 100%
> 12% 11954 0 0 556 0 0
> 0 7s 85% 100% # 100%
> 12% 11899 0 0 0 8 0
> 0 7s 100% 100% # 100%
> 13% 13408 0 0 23816 35600 0
> 0 7s 91% 100% # 100%
> 13% 14610 0 0 0 0 0
> 0 7s 100% 100% # 100%
> 14% 15355 0 0 1248 0 0
> 0 7s 72% 100% # 100%
> 14% 15584 0 0 18652 15224 0
> 0 7s 98% 100% # 100%
> CPU Total Net kB/s Disk kB/s Tape
> kB/s Cache Cache CP CP Disk
> ops/s in out read write read
> write age hit time ty util
> 14% 15397 0 0 0 8 0
> 0 7s 100% 100% # 100%
> 15% 15859 0 1 0 0 0
> 0 7s 100% 100% # 100%
> 14% 15557 0 0 4944 4568 0
> 0 7s 100% 100% # 100%
> 14% 15559 0 0 8 0 0
> 0 7s 100% 100% # 100%
> 14% 15603 0 0 0 0 0
> 0 7s 100% 100% # 100%
> 14% 15507 0 0 3652 4724 0
> 0 7s 100% 100% # 100%
> 14% 15593 0 0 20 0 0
> 0 7s 75% 100% # 100%
> 12% 12571 0 0 268 0 0
> 0 7s 100% 100% # 100%
> 10% 9962 0 0 92 2072 0
> 0 7s 96% 100% # 100%
> 9% 8521 0 0 0 0 0
> 0 7s 100% 100% # 100%
> 8% 7680 0 0 4 8 0
> 0 7s 100% 100% # 100%
> 8% 7459 0 0 24 24 0
> 0 7s 100% 100% # 100%
> 8% 7050 0 0 2580 1600 0
> 0 7s 100% 100% # 100%
> 8% 6961 0 0 8 0 0
> 0 7s 100% 100% # 100%
> 8% 6954 0 0 24 24 0
> 0 7s 100% 100% # 100%
> 7% 6936 0 0 584 2400 0
> 0 7s 100% 100% # 100%
> 8% 7043 0 0 248 0 0
> 0 7s 0% 100% # 100%
> 7% 6946 0 0 24 24 0
> 0 7s 100% 100% # 100%
> 8% 7320 0 0 1860 7252 0
> 0 7s 100% 100% # 100%
> 55% 19882 2 2 89920 29733 0
> 0 1 86% 99% B 45%
> CPU Total Net kB/s Disk kB/s Tape
> kB/s Cache Cache CP CP Disk
> ops/s in out read write read
> write age hit time ty util
> 88% 17409 31 187 267319 196942 0
> 0 6s 83% 100% : 89%
> 87% 16064 11 249 304646 192912 0
> 0 1 78% 100% : 85%
> 85% 6274 70 86 322234 233245 0
> 0 1 81% 100% # 87%
> 85% 12486 18 340 236474 102383 0
> 0 3s 82% 100% B 73%
> 98% 13842 3 29 247937 255188 0
> 0 2s 85% 100% : 80%
> 99% 17878 1 1 348355 223367 0
> 0 3s 84% 100% # 86%
> 99% 4117 1 1 359628 269212 0
> 0 3s 78% 100% # 92%
> 83% 12622 10 31 252272 170064 0
> 0 2s 83% 100% B 73%
> 94% 19321 4 10 248864 249292 0
> 0 2s 88% 100% : 96%
> 99% 15893 0 0 318375 261482 0
> 0 2s 85% 100% : 89%
> 88% 13326 34 76 284960 194032 0
> 0 2s 82% 100% B 90%
> 86% 11106 73 104 267760 231524 0
> 0 2s 81% 100% : 100%
> 72% 11160 0 0 206927 128869 0
> 0 3s 75% 100% : 100%
> 63% 11304 0 0 220736 25276 0
> 0 3s 67% 100% : 100%
> 68% 11216 0 0 193324 15184 0
> 0 3s 66% 100% : 100%
> 65% 11343 0 0 175120 12588 0
> 0 3s 60% 100% : 100%
> 52% 7384 0 0 205780 104564 0
> 0 3s 61% 100% # 100%
> 83% 7205 0 0 261108 178256 0
> 0 3s 74% 100% B 90%
> 75% 11256 0 0 225164 96332 0
> 0 3s 76% 100% : 93%
> 99% 6279 0 0 296056 251191 0
> 0 2s 87% 100% : 90%
> CPU Total Net kB/s Disk kB/s Tape
> kB/s Cache Cache CP CP Disk
> ops/s in out read write read
> write age hit time ty util
> 65% 8616 0 0 220068 165749 0
> 0 2s 76% 73% : 71%
> 89% 5443 0 1 338684 239932 0
> 0 3s 80% 94% F 87%
> 95% 6726 0 0 379242 240750 0
> 0 3s 79% 100% : 91%
> 83% 8721 1 0 342940 113394 0
> 0 2s 82% 61% : 89%
> 77% 11971 1 0 314356 24 0
> 0 2s 85% 0% - 92%
>
>
> ... and back to normal.
>
> Looking in Perf Advisor, it was during an
> extremely short, sharp latency spike that impacted every volume on every
> aggregate at exact the same time. It was the fourth of four spikes that
> had happened a few minutes apart during the previous 20 mins.
>
> It looks like the NVRAM flush got stuck somehow
> but I have no idea what would cause that.
>
> We're running ONTAP 8.0.2 7-mode on a very busy
> FAS3170.
>
>
> Thanks,
> Peta
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>



--
---
Gustatus Similis Pullus
RE: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
Classification: UNCLASSIFIED
Caveats: NONE

If NVRAM has no place in a NetApp storage system's I/O operations, why would the size of NVRAM correlate with the I/O capabilities of a NetApp storage system?
--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----
From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
Sent: Thursday, April 11, 2013 3:12 PM
To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
Cc: Toasters@teaparty.net
Subject: Re: NVRAM weirdness (UNCLASSIFIED)

Sure. NVRAM only holds a copy of data to be written..if all goes well, it's discarded.




On Thu, Apr 11, 2013 at 8:00 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:


Classification: UNCLASSIFIED
Caveats: NONE

Would "capabilities" happen to be a reference to "I/O capabilities?"

--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----

From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
Sent: Thursday, April 11, 2013 2:56 PM
To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
Cc: Toasters@teaparty.net

Subject: Re: NVRAM weirdness (UNCLASSIFIED)

It is sized appropriately to the capabilities it [controller] was designed for.

Hence..adding more, would not provide an improvement for any given system.





On Thu, Apr 11, 2013 at 7:31 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:


Classification: UNCLASSIFIED
Caveats: NONE

If "NVRAM has no place in the IO operation of a filer", why doesn't every NetApp storage system have the same amount of NVRAM/NVMEM?
--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler
Sent: Thursday, April 11, 2013 2:10 PM
To: Scott Eno
Cc: Toasters@teaparty.net
Subject: Re: NVRAM weirdness

For the OP and others..NVRAM has no place in the IO operation of a filer.

It holds copies of data that are in main RAM, just in -case- there is a problem with site power/etc.


There has been a continued misunderstanding of how Netapp NVRAM works over time..when in fact, it does no work at all.

:)


In sysstat..a view of "disks" being 100% busy is more a view of "storage" as a whole. Not 'a' disk. If storage is stuck..then CP time will also be likewise stuck at 100%...


Incorrect interpretation of sysstat and how Ontap works...can lead to a lot of confusion, and continued incorrect phraseology.






On Thu, Apr 11, 2013 at 12:31 PM, Scott Eno <s.eno@me.com> wrote:


Looks like you're hanging on a consistency point operation, but the sysstat output starts after that last CP was logged. This looks very similar to the ":s" CP bug, but can't say for sure. You have a long running CP operation, you NVRAM has filled up waiting for the operation to complete, but with no "s" after the "#", then it's not processing special files. It's doing something else.

Could be one of these bugs:

651573 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651597 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651926 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651929 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
655724 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
657018 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
650554 Inefficient pre-fetching of metadata blocks delays WAFL consistency point



On Apr 10, 2013, at 6:46 PM, Peta Thames <petathames@gmail.com> wrote:


Hi all,


Has anyone seen this pattern in a sysstat before?

TOASTERa> sysstat -u 1
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
29% 36942 1 0 264 0 0 0 7s 91% 100% # 100%
29% 36896 1 0 1616 0 0 0 7s 90% 100% # 100%
28% 36658 1 0 32 24 0 0 7s 100% 100% # 100%
28% 36625 1 0 108 0 0 0 7s 97% 100% # 100%
27% 36318 1 0 136 8 0 0 7s 100% 100% # 100%
28% 36391 1 0 364 144 0 0 7s 97% 100% # 100%
25% 30387 1 0 88 0 0 0 7s 98% 100% # 100%
21% 24450 1 0 8 0 0 0 7s 100% 100% # 100%
20% 22042 1 0 24 672 0 0 7s 100% 100% # 100%
18% 19781 1 0 284 8 0 0 7s 69% 100% # 100%
18% 19636 1 0 0 0 0 0 7s 100% 100% # 100%
16% 18066 1 0 24 24 0 0 7s 100% 100% # 100%
15% 16999 1 0 56 0 0 0 7s 56% 100% # 100%
15% 16622 1 0 11040 6496 0 0 7s 46% 100% # 1824%
14% 15947 1 0 24 32 0 0 7s 100% 100% # 100%
14% 15448 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15902 0 0 3304 4532 0 0 7s 88% 100% # 100%
14% 15800 0 0 32 24 0 0 7s 100% 100% # 100%
15% 16651 0 0 0 0 0 0 7s 100% 100% # 100%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
14% 15778 0 0 1520 2932 0 0 7s 100% 100% # 100%
13% 14311 0 1 28 24 0 0 7s 100% 100% # 100%
12% 12153 0 0 0 0 0 0 7s 100% 100% # 100%
12% 11993 0 0 8 0 0 0 7s 100% 100% # 100%
12% 12134 0 0 7368 5996 0 0 7s 18% 100% # 100%
13% 11981 0 0 20356 14956 <tel:20356%C2%A0%2014956> 0 0 7s 100% 100% # 100%
12% 11978 0 0 0 0 0 0 7s 100% 100% # 100%
11% 11617 0 0 24 24 0 0 7s 100% 100% # 100%
12% 11708 0 0 8 0 0 0 7s 100% 100% # 100%
11% 11630 0 0 11936 17872 0 0 7s 100% 100% # 100%
11% 11639 0 0 24 32 0 0 7s 100% 100% # 100%
12% 11791 0 0 12 0 0 0 7s 87% 100% # 100%
22% 11874 0 0 64452 92188 0 0 7s 100% 100% # 100%
12% 12015 0 0 32 24 0 0 7s 100% 100% # 100%
12% 11954 0 0 556 0 0 0 7s 85% 100% # 100%
12% 11899 0 0 0 8 0 0 7s 100% 100% # 100%
13% 13408 0 0 23816 35600 0 0 7s 91% 100% # 100%
13% 14610 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15355 0 0 1248 0 0 0 7s 72% 100% # 100%
14% 15584 0 0 18652 15224 0 0 7s 98% 100% # 100%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
14% 15397 0 0 0 8 0 0 7s 100% 100% # 100%
15% 15859 0 1 0 0 0 0 7s 100% 100% # 100%
14% 15557 0 0 4944 4568 0 0 7s 100% 100% # 100%
14% 15559 0 0 8 0 0 0 7s 100% 100% # 100%
14% 15603 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15507 0 0 3652 4724 0 0 7s 100% 100% # 100%
14% 15593 0 0 20 0 0 0 7s 75% 100% # 100%
12% 12571 0 0 268 0 0 0 7s 100% 100% # 100%
10% 9962 0 0 92 2072 0 0 7s 96% 100% # 100%
9% 8521 0 0 0 0 0 0 7s 100% 100% # 100%
8% 7680 0 0 4 8 0 0 7s 100% 100% # 100%
8% 7459 0 0 24 24 0 0 7s 100% 100% # 100%
8% 7050 0 0 2580 1600 0 0 7s 100% 100% # 100%
8% 6961 0 0 8 0 0 0 7s 100% 100% # 100%
8% 6954 0 0 24 24 0 0 7s 100% 100% # 100%
7% 6936 0 0 584 2400 0 0 7s 100% 100% # 100%
8% 7043 0 0 248 0 0 0 7s 0% 100% # 100%
7% 6946 0 0 24 24 0 0 7s 100% 100% # 100%
8% 7320 0 0 1860 7252 0 0 7s 100% 100% # 100%
55% 19882 2 2 89920 29733 0 0 1 86% 99% B 45%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
88% 17409 31 187 267319 196942 0 0 6s 83% 100% : 89%
87% 16064 11 249 304646 192912 0 0 1 78% 100% : 85%
85% 6274 70 86 322234 233245 0 0 1 81% 100% # 87%
85% 12486 18 340 236474 102383 0 0 3s 82% 100% B 73%
98% 13842 3 29 247937 255188 0 0 2s 85% 100% : 80%
99% 17878 1 1 348355 223367 0 0 3s 84% 100% # 86%
99% 4117 1 1 359628 269212 0 0 3s 78% 100% # 92%
83% 12622 10 31 252272 170064 0 0 2s 83% 100% B 73%
94% 19321 4 10 248864 249292 0 0 2s 88% 100% : 96%
99% 15893 0 0 318375 261482 0 0 2s 85% 100% : 89%
88% 13326 34 76 284960 194032 0 0 2s 82% 100% B 90%
86% 11106 73 104 267760 231524 0 0 2s 81% 100% : 100%
72% 11160 0 0 206927 128869 0 0 3s 75% 100% : 100%
63% 11304 0 0 220736 25276 0 0 3s 67% 100% : 100%
68% 11216 0 0 193324 15184 0 0 3s 66% 100% : 100%
65% 11343 0 0 175120 12588 0 0 3s 60% 100% : 100%
52% 7384 0 0 205780 104564 0 0 3s 61% 100% # 100%
83% 7205 0 0 261108 178256 0 0 3s 74% 100% B 90%
75% 11256 0 0 225164 96332 0 0 3s 76% 100% : 93%
99% 6279 0 0 296056 251191 0 0 2s 87% 100% : 90%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
65% 8616 0 0 220068 165749 0 0 2s 76% 73% : 71%
89% 5443 0 1 338684 239932 0 0 3s 80% 94% F 87%
95% 6726 0 0 379242 240750 0 0 3s 79% 100% : 91%
83% 8721 1 0 342940 113394 0 0 2s 82% 61% : 89%
77% 11971 1 0 314356 24 0 0 2s 85% 0% - 92%


... and back to normal.

Looking in Perf Advisor, it was during an extremely short, sharp latency spike that impacted every volume on every aggregate at exact the same time. It was the fourth of four spikes that had happened a few minutes apart during the previous 20 mins.

It looks like the NVRAM flush got stuck somehow but I have no idea what would cause that.

We're running ONTAP 8.0.2 7-mode on a very busy FAS3170.


Thanks,
Peta

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters




_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters






--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters





--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters





--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
NVRAM isnt the producer of writes, its not the arbiter of what writes it
can do..its just matched to what will be kept in RAM an brokered within an
HA pair.

Plus, some really smart people made the determination. Trust me, 48G of
NVRAM wont write to disk any faster than 1.

That's it..that simple.


On Thu, Apr 11, 2013 at 8:45 PM, Blackburn, James M (Jim) Jr. CIV USARMY
PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:

> Classification: UNCLASSIFIED
> Caveats: NONE
>
> If NVRAM has no place in a NetApp storage system's I/O operations, why
> would the size of NVRAM correlate with the I/O capabilities of a NetApp
> storage system?
> --
> Jim Blackburn
> US Army
> james.m.blackburn33.civ@mail.mil
> Note: I am not authorized to obligate, commit, or execute United States
> government funds. No part of this message should be construed as a request
> or directive to obligate, commit, or execute United States government funds.
>
>
> -----Original Message-----
> From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
> Sent: Thursday, April 11, 2013 3:12 PM
> To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
> Cc: Toasters@teaparty.net
> Subject: Re: NVRAM weirdness (UNCLASSIFIED)
>
> Sure. NVRAM only holds a copy of data to be written..if all goes well,
> it's discarded.
>
>
>
>
> On Thu, Apr 11, 2013 at 8:00 PM, Blackburn, James M (Jim) Jr. CIV USARMY
> PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:
>
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> Would "capabilities" happen to be a reference to "I/O
> capabilities?"
>
> --
> Jim Blackburn
> US Army
> james.m.blackburn33.civ@mail.mil
> Note: I am not authorized to obligate, commit, or execute United
> States government funds. No part of this message should be construed as a
> request or directive to obligate, commit, or execute United States
> government funds.
>
>
> -----Original Message-----
>
> From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
> Sent: Thursday, April 11, 2013 2:56 PM
> To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
> Cc: Toasters@teaparty.net
>
> Subject: Re: NVRAM weirdness (UNCLASSIFIED)
>
> It is sized appropriately to the capabilities it [controller] was
> designed for.
>
> Hence..adding more, would not provide an improvement for any given
> system.
>
>
>
>
>
> On Thu, Apr 11, 2013 at 7:31 PM, Blackburn, James M (Jim) Jr. CIV
> USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:
>
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> If "NVRAM has no place in the IO operation of a filer",
> why doesn't every NetApp storage system have the same amount of NVRAM/NVMEM?
> --
> Jim Blackburn
> US Army
> james.m.blackburn33.civ@mail.mil
> Note: I am not authorized to obligate, commit, or execute
> United States government funds. No part of this message should be
> construed as a request or directive to obligate, commit, or execute United
> States government funds.
>
>
> -----Original Message-----
> From: toasters-bounces@teaparty.net [mailto:
> toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler
> Sent: Thursday, April 11, 2013 2:10 PM
> To: Scott Eno
> Cc: Toasters@teaparty.net
> Subject: Re: NVRAM weirdness
>
> For the OP and others..NVRAM has no place in the IO
> operation of a filer.
>
> It holds copies of data that are in main RAM, just in
> -case- there is a problem with site power/etc.
>
>
> There has been a continued misunderstanding of how Netapp
> NVRAM works over time..when in fact, it does no work at all.
>
> :)
>
>
> In sysstat..a view of "disks" being 100% busy is more a
> view of "storage" as a whole. Not 'a' disk. If storage is stuck..then CP
> time will also be likewise stuck at 100%...
>
>
> Incorrect interpretation of sysstat and how Ontap
> works...can lead to a lot of confusion, and continued incorrect phraseology.
>
>
>
>
>
>
> On Thu, Apr 11, 2013 at 12:31 PM, Scott Eno <s.eno@me.com>
> wrote:
>
>
> Looks like you're hanging on a consistency point
> operation, but the sysstat output starts after that last CP was logged.
> This looks very similar to the ":s" CP bug, but can't say for sure. You
> have a long running CP operation, you NVRAM has filled up waiting for the
> operation to complete, but with no "s" after the "#", then it's not
> processing special files. It's doing something else.
>
> Could be one of these bugs:
>
> 651573 Inefficient pre-fetching of metadata blocks
> delays WAFL consistency point
> 651597 Inefficient pre-fetching of metadata blocks
> delays WAFL consistency point
> 651926 Inefficient pre-fetching of metadata blocks
> delays WAFL consistency point
> 651929 Inefficient pre-fetching of metadata blocks
> delays WAFL consistency point
> 655724 Inefficient pre-fetching of metadata blocks
> delays WAFL consistency point
> 657018 Inefficient pre-fetching of metadata blocks
> delays WAFL consistency point
> 650554 Inefficient pre-fetching of metadata blocks
> delays WAFL consistency point
>
>
>
> On Apr 10, 2013, at 6:46 PM, Peta Thames <
> petathames@gmail.com> wrote:
>
>
> Hi all,
>
>
> Has anyone seen this pattern in a sysstat
> before?
>
> TOASTERa> sysstat -u 1
> CPU Total Net kB/s Disk kB/s
> Tape kB/s Cache Cache CP CP Disk
> ops/s in out read write
> read write age hit time ty util
> 29% 36942 1 0 264 0
> 0 0 7s 91% 100% # 100%
> 29% 36896 1 0 1616 0
> 0 0 7s 90% 100% # 100%
> 28% 36658 1 0 32 24
> 0 0 7s 100% 100% # 100%
> 28% 36625 1 0 108 0
> 0 0 7s 97% 100% # 100%
> 27% 36318 1 0 136 8
> 0 0 7s 100% 100% # 100%
> 28% 36391 1 0 364 144
> 0 0 7s 97% 100% # 100%
> 25% 30387 1 0 88 0
> 0 0 7s 98% 100% # 100%
> 21% 24450 1 0 8 0
> 0 0 7s 100% 100% # 100%
> 20% 22042 1 0 24 672
> 0 0 7s 100% 100% # 100%
> 18% 19781 1 0 284 8
> 0 0 7s 69% 100% # 100%
> 18% 19636 1 0 0 0
> 0 0 7s 100% 100% # 100%
> 16% 18066 1 0 24 24
> 0 0 7s 100% 100% # 100%
> 15% 16999 1 0 56 0
> 0 0 7s 56% 100% # 100%
> 15% 16622 1 0 11040 6496
> 0 0 7s 46% 100% # 1824%
> 14% 15947 1 0 24 32
> 0 0 7s 100% 100% # 100%
> 14% 15448 0 0 0 0
> 0 0 7s 100% 100% # 100%
> 14% 15902 0 0 3304 4532
> 0 0 7s 88% 100% # 100%
> 14% 15800 0 0 32 24
> 0 0 7s 100% 100% # 100%
> 15% 16651 0 0 0 0
> 0 0 7s 100% 100% # 100%
> CPU Total Net kB/s Disk kB/s
> Tape kB/s Cache Cache CP CP Disk
> ops/s in out read write
> read write age hit time ty util
> 14% 15778 0 0 1520 2932
> 0 0 7s 100% 100% # 100%
> 13% 14311 0 1 28 24
> 0 0 7s 100% 100% # 100%
> 12% 12153 0 0 0 0
> 0 0 7s 100% 100% # 100%
> 12% 11993 0 0 8 0
> 0 0 7s 100% 100% # 100%
> 12% 12134 0 0 7368 5996
> 0 0 7s 18% 100% # 100%
> 13% 11981 0 0 20356 14956
> <tel:20356%C2%A0%2014956> 0 0 7s 100% 100% # 100%
> 12% 11978 0 0 0 0
> 0 0 7s 100% 100% # 100%
> 11% 11617 0 0 24 24
> 0 0 7s 100% 100% # 100%
> 12% 11708 0 0 8 0
> 0 0 7s 100% 100% # 100%
> 11% 11630 0 0 11936 17872
> 0 0 7s 100% 100% # 100%
> 11% 11639 0 0 24 32
> 0 0 7s 100% 100% # 100%
> 12% 11791 0 0 12 0
> 0 0 7s 87% 100% # 100%
> 22% 11874 0 0 64452 92188
> 0 0 7s 100% 100% # 100%
> 12% 12015 0 0 32 24
> 0 0 7s 100% 100% # 100%
> 12% 11954 0 0 556 0
> 0 0 7s 85% 100% # 100%
> 12% 11899 0 0 0 8
> 0 0 7s 100% 100% # 100%
> 13% 13408 0 0 23816 35600
> 0 0 7s 91% 100% # 100%
> 13% 14610 0 0 0 0
> 0 0 7s 100% 100% # 100%
> 14% 15355 0 0 1248 0
> 0 0 7s 72% 100% # 100%
> 14% 15584 0 0 18652 15224
> 0 0 7s 98% 100% # 100%
> CPU Total Net kB/s Disk kB/s
> Tape kB/s Cache Cache CP CP Disk
> ops/s in out read write
> read write age hit time ty util
> 14% 15397 0 0 0 8
> 0 0 7s 100% 100% # 100%
> 15% 15859 0 1 0 0
> 0 0 7s 100% 100% # 100%
> 14% 15557 0 0 4944 4568
> 0 0 7s 100% 100% # 100%
> 14% 15559 0 0 8 0
> 0 0 7s 100% 100% # 100%
> 14% 15603 0 0 0 0
> 0 0 7s 100% 100% # 100%
> 14% 15507 0 0 3652 4724
> 0 0 7s 100% 100% # 100%
> 14% 15593 0 0 20 0
> 0 0 7s 75% 100% # 100%
> 12% 12571 0 0 268 0
> 0 0 7s 100% 100% # 100%
> 10% 9962 0 0 92 2072
> 0 0 7s 96% 100% # 100%
> 9% 8521 0 0 0 0
> 0 0 7s 100% 100% # 100%
> 8% 7680 0 0 4 8
> 0 0 7s 100% 100% # 100%
> 8% 7459 0 0 24 24
> 0 0 7s 100% 100% # 100%
> 8% 7050 0 0 2580 1600
> 0 0 7s 100% 100% # 100%
> 8% 6961 0 0 8 0
> 0 0 7s 100% 100% # 100%
> 8% 6954 0 0 24 24
> 0 0 7s 100% 100% # 100%
> 7% 6936 0 0 584 2400
> 0 0 7s 100% 100% # 100%
> 8% 7043 0 0 248 0
> 0 0 7s 0% 100% # 100%
> 7% 6946 0 0 24 24
> 0 0 7s 100% 100% # 100%
> 8% 7320 0 0 1860 7252
> 0 0 7s 100% 100% # 100%
> 55% 19882 2 2 89920 29733
> 0 0 1 86% 99% B 45%
> CPU Total Net kB/s Disk kB/s
> Tape kB/s Cache Cache CP CP Disk
> ops/s in out read write
> read write age hit time ty util
> 88% 17409 31 187 267319 196942
> 0 0 6s 83% 100% : 89%
> 87% 16064 11 249 304646 192912
> 0 0 1 78% 100% : 85%
> 85% 6274 70 86 322234 233245
> 0 0 1 81% 100% # 87%
> 85% 12486 18 340 236474 102383
> 0 0 3s 82% 100% B 73%
> 98% 13842 3 29 247937 255188
> 0 0 2s 85% 100% : 80%
> 99% 17878 1 1 348355 223367
> 0 0 3s 84% 100% # 86%
> 99% 4117 1 1 359628 269212
> 0 0 3s 78% 100% # 92%
> 83% 12622 10 31 252272 170064
> 0 0 2s 83% 100% B 73%
> 94% 19321 4 10 248864 249292
> 0 0 2s 88% 100% : 96%
> 99% 15893 0 0 318375 261482
> 0 0 2s 85% 100% : 89%
> 88% 13326 34 76 284960 194032
> 0 0 2s 82% 100% B 90%
> 86% 11106 73 104 267760 231524
> 0 0 2s 81% 100% : 100%
> 72% 11160 0 0 206927 128869
> 0 0 3s 75% 100% : 100%
> 63% 11304 0 0 220736 25276
> 0 0 3s 67% 100% : 100%
> 68% 11216 0 0 193324 15184
> 0 0 3s 66% 100% : 100%
> 65% 11343 0 0 175120 12588
> 0 0 3s 60% 100% : 100%
> 52% 7384 0 0 205780 104564
> 0 0 3s 61% 100% # 100%
> 83% 7205 0 0 261108 178256
> 0 0 3s 74% 100% B 90%
> 75% 11256 0 0 225164 96332
> 0 0 3s 76% 100% : 93%
> 99% 6279 0 0 296056 251191
> 0 0 2s 87% 100% : 90%
> CPU Total Net kB/s Disk kB/s
> Tape kB/s Cache Cache CP CP Disk
> ops/s in out read write
> read write age hit time ty util
> 65% 8616 0 0 220068 165749
> 0 0 2s 76% 73% : 71%
> 89% 5443 0 1 338684 239932
> 0 0 3s 80% 94% F 87%
> 95% 6726 0 0 379242 240750
> 0 0 3s 79% 100% : 91%
> 83% 8721 1 0 342940 113394
> 0 0 2s 82% 61% : 89%
> 77% 11971 1 0 314356 24
> 0 0 2s 85% 0% - 92%
>
>
> ... and back to normal.
>
> Looking in Perf Advisor, it was during an
> extremely short, sharp latency spike that impacted every volume on every
> aggregate at exact the same time. It was the fourth of four spikes that
> had happened a few minutes apart during the previous 20 mins.
>
> It looks like the NVRAM flush got stuck
> somehow but I have no idea what would cause that.
>
> We're running ONTAP 8.0.2 7-mode on a very
> busy FAS3170.
>
>
> Thanks,
> Peta
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
>
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>



--
---
Gustatus Similis Pullus
Re: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
...that is to say..if your controller was designed to work with 1.


On Thu, Apr 11, 2013 at 9:00 PM, Jeff Mohler <speedtoys.racing@gmail.com>wrote:

> NVRAM isnt the producer of writes, its not the arbiter of what writes it
> can do..its just matched to what will be kept in RAM an brokered within an
> HA pair.
>
> Plus, some really smart people made the determination. Trust me, 48G of
> NVRAM wont write to disk any faster than 1.
>
> That's it..that simple.
>
>
> On Thu, Apr 11, 2013 at 8:45 PM, Blackburn, James M (Jim) Jr. CIV USARMY
> PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:
>
>> Classification: UNCLASSIFIED
>> Caveats: NONE
>>
>> If NVRAM has no place in a NetApp storage system's I/O operations, why
>> would the size of NVRAM correlate with the I/O capabilities of a NetApp
>> storage system?
>> --
>> Jim Blackburn
>> US Army
>> james.m.blackburn33.civ@mail.mil
>> Note: I am not authorized to obligate, commit, or execute United States
>> government funds. No part of this message should be construed as a request
>> or directive to obligate, commit, or execute United States government funds.
>>
>>
>> -----Original Message-----
>> From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
>> Sent: Thursday, April 11, 2013 3:12 PM
>> To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
>> Cc: Toasters@teaparty.net
>> Subject: Re: NVRAM weirdness (UNCLASSIFIED)
>>
>> Sure. NVRAM only holds a copy of data to be written..if all goes well,
>> it's discarded.
>>
>>
>>
>>
>> On Thu, Apr 11, 2013 at 8:00 PM, Blackburn, James M (Jim) Jr. CIV USARMY
>> PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:
>>
>>
>> Classification: UNCLASSIFIED
>> Caveats: NONE
>>
>> Would "capabilities" happen to be a reference to "I/O
>> capabilities?"
>>
>> --
>> Jim Blackburn
>> US Army
>> james.m.blackburn33.civ@mail.mil
>> Note: I am not authorized to obligate, commit, or execute United
>> States government funds. No part of this message should be construed as a
>> request or directive to obligate, commit, or execute United States
>> government funds.
>>
>>
>> -----Original Message-----
>>
>> From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
>> Sent: Thursday, April 11, 2013 2:56 PM
>> To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
>> Cc: Toasters@teaparty.net
>>
>> Subject: Re: NVRAM weirdness (UNCLASSIFIED)
>>
>> It is sized appropriately to the capabilities it [controller] was
>> designed for.
>>
>> Hence..adding more, would not provide an improvement for any
>> given system.
>>
>>
>>
>>
>>
>> On Thu, Apr 11, 2013 at 7:31 PM, Blackburn, James M (Jim) Jr. CIV
>> USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:
>>
>>
>> Classification: UNCLASSIFIED
>> Caveats: NONE
>>
>> If "NVRAM has no place in the IO operation of a filer",
>> why doesn't every NetApp storage system have the same amount of NVRAM/NVMEM?
>> --
>> Jim Blackburn
>> US Army
>> james.m.blackburn33.civ@mail.mil
>> Note: I am not authorized to obligate, commit, or execute
>> United States government funds. No part of this message should be
>> construed as a request or directive to obligate, commit, or execute United
>> States government funds.
>>
>>
>> -----Original Message-----
>> From: toasters-bounces@teaparty.net [mailto:
>> toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler
>> Sent: Thursday, April 11, 2013 2:10 PM
>> To: Scott Eno
>> Cc: Toasters@teaparty.net
>> Subject: Re: NVRAM weirdness
>>
>> For the OP and others..NVRAM has no place in the IO
>> operation of a filer.
>>
>> It holds copies of data that are in main RAM, just in
>> -case- there is a problem with site power/etc.
>>
>>
>> There has been a continued misunderstanding of how Netapp
>> NVRAM works over time..when in fact, it does no work at all.
>>
>> :)
>>
>>
>> In sysstat..a view of "disks" being 100% busy is more a
>> view of "storage" as a whole. Not 'a' disk. If storage is stuck..then CP
>> time will also be likewise stuck at 100%...
>>
>>
>> Incorrect interpretation of sysstat and how Ontap
>> works...can lead to a lot of confusion, and continued incorrect phraseology.
>>
>>
>>
>>
>>
>>
>> On Thu, Apr 11, 2013 at 12:31 PM, Scott Eno <s.eno@me.com>
>> wrote:
>>
>>
>> Looks like you're hanging on a consistency point
>> operation, but the sysstat output starts after that last CP was logged.
>> This looks very similar to the ":s" CP bug, but can't say for sure. You
>> have a long running CP operation, you NVRAM has filled up waiting for the
>> operation to complete, but with no "s" after the "#", then it's not
>> processing special files. It's doing something else.
>>
>> Could be one of these bugs:
>>
>> 651573 Inefficient pre-fetching of metadata
>> blocks delays WAFL consistency point
>> 651597 Inefficient pre-fetching of metadata
>> blocks delays WAFL consistency point
>> 651926 Inefficient pre-fetching of metadata
>> blocks delays WAFL consistency point
>> 651929 Inefficient pre-fetching of metadata
>> blocks delays WAFL consistency point
>> 655724 Inefficient pre-fetching of metadata
>> blocks delays WAFL consistency point
>> 657018 Inefficient pre-fetching of metadata
>> blocks delays WAFL consistency point
>> 650554 Inefficient pre-fetching of metadata
>> blocks delays WAFL consistency point
>>
>>
>>
>> On Apr 10, 2013, at 6:46 PM, Peta Thames <
>> petathames@gmail.com> wrote:
>>
>>
>> Hi all,
>>
>>
>> Has anyone seen this pattern in a sysstat
>> before?
>>
>> TOASTERa> sysstat -u 1
>> CPU Total Net kB/s Disk
>> kB/s Tape kB/s Cache Cache CP CP Disk
>> ops/s in out read
>> write read write age hit time ty util
>> 29% 36942 1 0 264
>> 0 0 0 7s 91% 100% # 100%
>> 29% 36896 1 0 1616
>> 0 0 0 7s 90% 100% # 100%
>> 28% 36658 1 0 32
>> 24 0 0 7s 100% 100% # 100%
>> 28% 36625 1 0 108
>> 0 0 0 7s 97% 100% # 100%
>> 27% 36318 1 0 136
>> 8 0 0 7s 100% 100% # 100%
>> 28% 36391 1 0 364
>> 144 0 0 7s 97% 100% # 100%
>> 25% 30387 1 0 88
>> 0 0 0 7s 98% 100% # 100%
>> 21% 24450 1 0 8
>> 0 0 0 7s 100% 100% # 100%
>> 20% 22042 1 0 24
>> 672 0 0 7s 100% 100% # 100%
>> 18% 19781 1 0 284
>> 8 0 0 7s 69% 100% # 100%
>> 18% 19636 1 0 0
>> 0 0 0 7s 100% 100% # 100%
>> 16% 18066 1 0 24
>> 24 0 0 7s 100% 100% # 100%
>> 15% 16999 1 0 56
>> 0 0 0 7s 56% 100% # 100%
>> 15% 16622 1 0 11040
>> 6496 0 0 7s 46% 100% # 1824%
>> 14% 15947 1 0 24
>> 32 0 0 7s 100% 100% # 100%
>> 14% 15448 0 0 0
>> 0 0 0 7s 100% 100% # 100%
>> 14% 15902 0 0 3304
>> 4532 0 0 7s 88% 100% # 100%
>> 14% 15800 0 0 32
>> 24 0 0 7s 100% 100% # 100%
>> 15% 16651 0 0 0
>> 0 0 0 7s 100% 100% # 100%
>> CPU Total Net kB/s Disk
>> kB/s Tape kB/s Cache Cache CP CP Disk
>> ops/s in out read
>> write read write age hit time ty util
>> 14% 15778 0 0 1520
>> 2932 0 0 7s 100% 100% # 100%
>> 13% 14311 0 1 28
>> 24 0 0 7s 100% 100% # 100%
>> 12% 12153 0 0 0
>> 0 0 0 7s 100% 100% # 100%
>> 12% 11993 0 0 8
>> 0 0 0 7s 100% 100% # 100%
>> 12% 12134 0 0 7368
>> 5996 0 0 7s 18% 100% # 100%
>> 13% 11981 0 0 20356
>> 14956 <tel:20356%C2%A0%2014956> 0 0 7s 100% 100% #
>> 100%
>> 12% 11978 0 0 0
>> 0 0 0 7s 100% 100% # 100%
>> 11% 11617 0 0 24
>> 24 0 0 7s 100% 100% # 100%
>> 12% 11708 0 0 8
>> 0 0 0 7s 100% 100% # 100%
>> 11% 11630 0 0 11936
>> 17872 0 0 7s 100% 100% # 100%
>> 11% 11639 0 0 24
>> 32 0 0 7s 100% 100% # 100%
>> 12% 11791 0 0 12
>> 0 0 0 7s 87% 100% # 100%
>> 22% 11874 0 0 64452
>> 92188 0 0 7s 100% 100% # 100%
>> 12% 12015 0 0 32
>> 24 0 0 7s 100% 100% # 100%
>> 12% 11954 0 0 556
>> 0 0 0 7s 85% 100% # 100%
>> 12% 11899 0 0 0
>> 8 0 0 7s 100% 100% # 100%
>> 13% 13408 0 0 23816
>> 35600 0 0 7s 91% 100% # 100%
>> 13% 14610 0 0 0
>> 0 0 0 7s 100% 100% # 100%
>> 14% 15355 0 0 1248
>> 0 0 0 7s 72% 100% # 100%
>> 14% 15584 0 0 18652
>> 15224 0 0 7s 98% 100% # 100%
>> CPU Total Net kB/s Disk
>> kB/s Tape kB/s Cache Cache CP CP Disk
>> ops/s in out read
>> write read write age hit time ty util
>> 14% 15397 0 0 0
>> 8 0 0 7s 100% 100% # 100%
>> 15% 15859 0 1 0
>> 0 0 0 7s 100% 100% # 100%
>> 14% 15557 0 0 4944
>> 4568 0 0 7s 100% 100% # 100%
>> 14% 15559 0 0 8
>> 0 0 0 7s 100% 100% # 100%
>> 14% 15603 0 0 0
>> 0 0 0 7s 100% 100% # 100%
>> 14% 15507 0 0 3652
>> 4724 0 0 7s 100% 100% # 100%
>> 14% 15593 0 0 20
>> 0 0 0 7s 75% 100% # 100%
>> 12% 12571 0 0 268
>> 0 0 0 7s 100% 100% # 100%
>> 10% 9962 0 0 92
>> 2072 0 0 7s 96% 100% # 100%
>> 9% 8521 0 0 0
>> 0 0 0 7s 100% 100% # 100%
>> 8% 7680 0 0 4
>> 8 0 0 7s 100% 100% # 100%
>> 8% 7459 0 0 24
>> 24 0 0 7s 100% 100% # 100%
>> 8% 7050 0 0 2580
>> 1600 0 0 7s 100% 100% # 100%
>> 8% 6961 0 0 8
>> 0 0 0 7s 100% 100% # 100%
>> 8% 6954 0 0 24
>> 24 0 0 7s 100% 100% # 100%
>> 7% 6936 0 0 584
>> 2400 0 0 7s 100% 100% # 100%
>> 8% 7043 0 0 248
>> 0 0 0 7s 0% 100% # 100%
>> 7% 6946 0 0 24
>> 24 0 0 7s 100% 100% # 100%
>> 8% 7320 0 0 1860
>> 7252 0 0 7s 100% 100% # 100%
>> 55% 19882 2 2 89920
>> 29733 0 0 1 86% 99% B 45%
>> CPU Total Net kB/s Disk
>> kB/s Tape kB/s Cache Cache CP CP Disk
>> ops/s in out read
>> write read write age hit time ty util
>> 88% 17409 31 187 267319
>> 196942 0 0 6s 83% 100% : 89%
>> 87% 16064 11 249 304646
>> 192912 0 0 1 78% 100% : 85%
>> 85% 6274 70 86 322234
>> 233245 0 0 1 81% 100% # 87%
>> 85% 12486 18 340 236474
>> 102383 0 0 3s 82% 100% B 73%
>> 98% 13842 3 29 247937
>> 255188 0 0 2s 85% 100% : 80%
>> 99% 17878 1 1 348355
>> 223367 0 0 3s 84% 100% # 86%
>> 99% 4117 1 1 359628
>> 269212 0 0 3s 78% 100% # 92%
>> 83% 12622 10 31 252272
>> 170064 0 0 2s 83% 100% B 73%
>> 94% 19321 4 10 248864
>> 249292 0 0 2s 88% 100% : 96%
>> 99% 15893 0 0 318375
>> 261482 0 0 2s 85% 100% : 89%
>> 88% 13326 34 76 284960
>> 194032 0 0 2s 82% 100% B 90%
>> 86% 11106 73 104 267760
>> 231524 0 0 2s 81% 100% : 100%
>> 72% 11160 0 0 206927
>> 128869 0 0 3s 75% 100% : 100%
>> 63% 11304 0 0 220736
>> 25276 0 0 3s 67% 100% : 100%
>> 68% 11216 0 0 193324
>> 15184 0 0 3s 66% 100% : 100%
>> 65% 11343 0 0 175120
>> 12588 0 0 3s 60% 100% : 100%
>> 52% 7384 0 0 205780
>> 104564 0 0 3s 61% 100% # 100%
>> 83% 7205 0 0 261108
>> 178256 0 0 3s 74% 100% B 90%
>> 75% 11256 0 0 225164
>> 96332 0 0 3s 76% 100% : 93%
>> 99% 6279 0 0 296056
>> 251191 0 0 2s 87% 100% : 90%
>> CPU Total Net kB/s Disk
>> kB/s Tape kB/s Cache Cache CP CP Disk
>> ops/s in out read
>> write read write age hit time ty util
>> 65% 8616 0 0 220068
>> 165749 0 0 2s 76% 73% : 71%
>> 89% 5443 0 1 338684
>> 239932 0 0 3s 80% 94% F 87%
>> 95% 6726 0 0 379242
>> 240750 0 0 3s 79% 100% : 91%
>> 83% 8721 1 0 342940
>> 113394 0 0 2s 82% 61% : 89%
>> 77% 11971 1 0 314356
>> 24 0 0 2s 85% 0% - 92%
>>
>>
>> ... and back to normal.
>>
>> Looking in Perf Advisor, it was during an
>> extremely short, sharp latency spike that impacted every volume on every
>> aggregate at exact the same time. It was the fourth of four spikes that
>> had happened a few minutes apart during the previous 20 mins.
>>
>> It looks like the NVRAM flush got stuck
>> somehow but I have no idea what would cause that.
>>
>> We're running ONTAP 8.0.2 7-mode on a
>> very busy FAS3170.
>>
>>
>> Thanks,
>> Peta
>>
>>
>> _______________________________________________
>> Toasters mailing list
>> Toasters@teaparty.net
>>
>> http://www.teaparty.net/mailman/listinfo/toasters
>>
>>
>>
>>
>> _______________________________________________
>> Toasters mailing list
>> Toasters@teaparty.net
>> http://www.teaparty.net/mailman/listinfo/toasters
>>
>>
>>
>>
>>
>>
>> --
>> ---
>> Gustatus Similis Pullus
>>
>> Classification: UNCLASSIFIED
>> Caveats: NONE
>>
>>
>>
>> _______________________________________________
>> Toasters mailing list
>> Toasters@teaparty.net
>> http://www.teaparty.net/mailman/listinfo/toasters
>>
>>
>>
>>
>>
>> --
>> ---
>> Gustatus Similis Pullus
>>
>> Classification: UNCLASSIFIED
>> Caveats: NONE
>>
>>
>>
>> _______________________________________________
>> Toasters mailing list
>> Toasters@teaparty.net
>> http://www.teaparty.net/mailman/listinfo/toasters
>>
>>
>>
>>
>>
>> --
>> ---
>> Gustatus Similis Pullus
>>
>> Classification: UNCLASSIFIED
>> Caveats: NONE
>>
>>
>>
>> _______________________________________________
>> Toasters mailing list
>> Toasters@teaparty.net
>> http://www.teaparty.net/mailman/listinfo/toasters
>>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>



--
---
Gustatus Similis Pullus
Re: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> If NVRAM has no place in a NetApp storage system's I/O operations, why would the size of NVRAM correlate with the I/O capabilities of a NetApp storage system?
> --

I think the point is that I/O does not pass through NVRAM, i.e., NVRAM
is not used as a disk cache. It is used as a transaction log and in
the event of an outage, the NVRAM log is replayed. When a write request
arrives, the request is logged to NVRAM and the filer ACKs the request.
"At its leisure" the filer commits a batch of write requests to disk
by writing a CP (consistency point). Then the corresponding NVRAM
log storage is reused to log new requests. When NVRAM approaches
capacity, the filer MUST write a CP. Otherwise there is no room in
NVRAM to log any more incoming write requests, forcing the filer to
reject them.

So the size of NVRAM determines to some degree how much of a backlog
of uncommitted write requests that a filer can tolerate.

> Jim Blackburn
> US Army
> james.m.blackburn33.civ@mail.mil
> Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.
>
>
> -----Original Message-----
> From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
> Sent: Thursday, April 11, 2013 3:12 PM
> To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
> Cc: Toasters@teaparty.net
> Subject: Re: NVRAM weirdness (UNCLASSIFIED)
>
> Sure. NVRAM only holds a copy of data to be written..if all goes well, it's discarded.
>
>
>
>
> On Thu, Apr 11, 2013 at 8:00 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:
>
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> Would "capabilities" happen to be a reference to "I/O capabilities?"
>
> --
> Jim Blackburn
> US Army
> james.m.blackburn33.civ@mail.mil
> Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.
>
>
> -----Original Message-----
>
> From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
> Sent: Thursday, April 11, 2013 2:56 PM
> To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
> Cc: Toasters@teaparty.net
>
> Subject: Re: NVRAM weirdness (UNCLASSIFIED)
>
> It is sized appropriately to the capabilities it [controller] was designed for.
>
> Hence..adding more, would not provide an improvement for any given system.
>
>
>
>
>
> On Thu, Apr 11, 2013 at 7:31 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:
>
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> If "NVRAM has no place in the IO operation of a filer", why doesn't every NetApp storage system have the same amount of NVRAM/NVMEM?
> --
> Jim Blackburn
> US Army
> james.m.blackburn33.civ@mail.mil
> Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.
>
>
> -----Original Message-----
> From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler
> Sent: Thursday, April 11, 2013 2:10 PM
> To: Scott Eno
> Cc: Toasters@teaparty.net
> Subject: Re: NVRAM weirdness
>
> For the OP and others..NVRAM has no place in the IO operation of a filer.
>
> It holds copies of data that are in main RAM, just in -case- there is a problem with site power/etc.
>
>
> There has been a continued misunderstanding of how Netapp NVRAM works over time..when in fact, it does no work at all.
>
> :)
>
>
> In sysstat..a view of "disks" being 100% busy is more a view of "storage" as a whole. Not 'a' disk. If storage is stuck..then CP time will also be likewise stuck at 100%...
>
>
> Incorrect interpretation of sysstat and how Ontap works...can lead to a lot of confusion, and continued incorrect phraseology.
>
>
>
>
>
>
> On Thu, Apr 11, 2013 at 12:31 PM, Scott Eno <s.eno@me.com> wrote:
>
>
> Looks like you're hanging on a consistency point operation, but the sysstat output starts after that last CP was logged. This looks very similar to the ":s" CP bug, but can't say for sure. You have a long running CP operation, you NVRAM has filled up waiting for the operation to complete, but with no "s" after the "#", then it's not processing special files. It's doing something else.
>
> Could be one of these bugs:
>
> 651573 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
> 651597 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
> 651926 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
> 651929 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
> 655724 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
> 657018 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
> 650554 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
>
>
>
> On Apr 10, 2013, at 6:46 PM, Peta Thames <petathames@gmail.com> wrote:
>
>
> Hi all,
>
>
> Has anyone seen this pattern in a sysstat before?
>
> TOASTERa> sysstat -u 1
> CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
> ops/s in out read write read write age hit time ty util
> 29% 36942 1 0 264 0 0 0 7s 91% 100% # 100%
> 29% 36896 1 0 1616 0 0 0 7s 90% 100% # 100%
> 28% 36658 1 0 32 24 0 0 7s 100% 100% # 100%
> 28% 36625 1 0 108 0 0 0 7s 97% 100% # 100%
> 27% 36318 1 0 136 8 0 0 7s 100% 100% # 100%
> 28% 36391 1 0 364 144 0 0 7s 97% 100% # 100%
> 25% 30387 1 0 88 0 0 0 7s 98% 100% # 100%
> 21% 24450 1 0 8 0 0 0 7s 100% 100% # 100%
> 20% 22042 1 0 24 672 0 0 7s 100% 100% # 100%
> 18% 19781 1 0 284 8 0 0 7s 69% 100% # 100%
> 18% 19636 1 0 0 0 0 0 7s 100% 100% # 100%
> 16% 18066 1 0 24 24 0 0 7s 100% 100% # 100%
> 15% 16999 1 0 56 0 0 0 7s 56% 100% # 100%
> 15% 16622 1 0 11040 6496 0 0 7s 46% 100% # 1824%
> 14% 15947 1 0 24 32 0 0 7s 100% 100% # 100%
> 14% 15448 0 0 0 0 0 0 7s 100% 100% # 100%
> 14% 15902 0 0 3304 4532 0 0 7s 88% 100% # 100%
> 14% 15800 0 0 32 24 0 0 7s 100% 100% # 100%
> 15% 16651 0 0 0 0 0 0 7s 100% 100% # 100%
> CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
> ops/s in out read write read write age hit time ty util
> 14% 15778 0 0 1520 2932 0 0 7s 100% 100% # 100%
> 13% 14311 0 1 28 24 0 0 7s 100% 100% # 100%
> 12% 12153 0 0 0 0 0 0 7s 100% 100% # 100%
> 12% 11993 0 0 8 0 0 0 7s 100% 100% # 100%
> 12% 12134 0 0 7368 5996 0 0 7s 18% 100% # 100%
> 13% 11981 0 0 20356 14956 <tel:20356%C2%A0%2014956> 0 0 7s 100% 100% # 100%
> 12% 11978 0 0 0 0 0 0 7s 100% 100% # 100%
> 11% 11617 0 0 24 24 0 0 7s 100% 100% # 100%
> 12% 11708 0 0 8 0 0 0 7s 100% 100% # 100%
> 11% 11630 0 0 11936 17872 0 0 7s 100% 100% # 100%
> 11% 11639 0 0 24 32 0 0 7s 100% 100% # 100%
> 12% 11791 0 0 12 0 0 0 7s 87% 100% # 100%
> 22% 11874 0 0 64452 92188 0 0 7s 100% 100% # 100%
> 12% 12015 0 0 32 24 0 0 7s 100% 100% # 100%
> 12% 11954 0 0 556 0 0 0 7s 85% 100% # 100%
> 12% 11899 0 0 0 8 0 0 7s 100% 100% # 100%
> 13% 13408 0 0 23816 35600 0 0 7s 91% 100% # 100%
> 13% 14610 0 0 0 0 0 0 7s 100% 100% # 100%
> 14% 15355 0 0 1248 0 0 0 7s 72% 100% # 100%
> 14% 15584 0 0 18652 15224 0 0 7s 98% 100% # 100%
> CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
> ops/s in out read write read write age hit time ty util
> 14% 15397 0 0 0 8 0 0 7s 100% 100% # 100%
> 15% 15859 0 1 0 0 0 0 7s 100% 100% # 100%
> 14% 15557 0 0 4944 4568 0 0 7s 100% 100% # 100%
> 14% 15559 0 0 8 0 0 0 7s 100% 100% # 100%
> 14% 15603 0 0 0 0 0 0 7s 100% 100% # 100%
> 14% 15507 0 0 3652 4724 0 0 7s 100% 100% # 100%
> 14% 15593 0 0 20 0 0 0 7s 75% 100% # 100%
> 12% 12571 0 0 268 0 0 0 7s 100% 100% # 100%
> 10% 9962 0 0 92 2072 0 0 7s 96% 100% # 100%
> 9% 8521 0 0 0 0 0 0 7s 100% 100% # 100%
> 8% 7680 0 0 4 8 0 0 7s 100% 100% # 100%
> 8% 7459 0 0 24 24 0 0 7s 100% 100% # 100%
> 8% 7050 0 0 2580 1600 0 0 7s 100% 100% # 100%
> 8% 6961 0 0 8 0 0 0 7s 100% 100% # 100%
> 8% 6954 0 0 24 24 0 0 7s 100% 100% # 100%
> 7% 6936 0 0 584 2400 0 0 7s 100% 100% # 100%
> 8% 7043 0 0 248 0 0 0 7s 0% 100% # 100%
> 7% 6946 0 0 24 24 0 0 7s 100% 100% # 100%
> 8% 7320 0 0 1860 7252 0 0 7s 100% 100% # 100%
> 55% 19882 2 2 89920 29733 0 0 1 86% 99% B 45%
> CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
> ops/s in out read write read write age hit time ty util
> 88% 17409 31 187 267319 196942 0 0 6s 83% 100% : 89%
> 87% 16064 11 249 304646 192912 0 0 1 78% 100% : 85%
> 85% 6274 70 86 322234 233245 0 0 1 81% 100% # 87%
> 85% 12486 18 340 236474 102383 0 0 3s 82% 100% B 73%
> 98% 13842 3 29 247937 255188 0 0 2s 85% 100% : 80%
> 99% 17878 1 1 348355 223367 0 0 3s 84% 100% # 86%
> 99% 4117 1 1 359628 269212 0 0 3s 78% 100% # 92%
> 83% 12622 10 31 252272 170064 0 0 2s 83% 100% B 73%
> 94% 19321 4 10 248864 249292 0 0 2s 88% 100% : 96%
> 99% 15893 0 0 318375 261482 0 0 2s 85% 100% : 89%
> 88% 13326 34 76 284960 194032 0 0 2s 82% 100% B 90%
> 86% 11106 73 104 267760 231524 0 0 2s 81% 100% : 100%
> 72% 11160 0 0 206927 128869 0 0 3s 75% 100% : 100%
> 63% 11304 0 0 220736 25276 0 0 3s 67% 100% : 100%
> 68% 11216 0 0 193324 15184 0 0 3s 66% 100% : 100%
> 65% 11343 0 0 175120 12588 0 0 3s 60% 100% : 100%
> 52% 7384 0 0 205780 104564 0 0 3s 61% 100% # 100%
> 83% 7205 0 0 261108 178256 0 0 3s 74% 100% B 90%
> 75% 11256 0 0 225164 96332 0 0 3s 76% 100% : 93%
> 99% 6279 0 0 296056 251191 0 0 2s 87% 100% : 90%
> CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
> ops/s in out read write read write age hit time ty util
> 65% 8616 0 0 220068 165749 0 0 2s 76% 73% : 71%
> 89% 5443 0 1 338684 239932 0 0 3s 80% 94% F 87%
> 95% 6726 0 0 379242 240750 0 0 3s 79% 100% : 91%
> 83% 8721 1 0 342940 113394 0 0 2s 82% 61% : 89%
> 77% 11971 1 0 314356 24 0 0 2s 85% 0% - 92%
>
>
> ... and back to normal.
>
> Looking in Perf Advisor, it was during an extremely short, sharp latency spike that impacted every volume on every aggregate at exact the same time. It was the fourth of four spikes that had happened a few minutes apart during the previous 20 mins.
>
> It looks like the NVRAM flush got stuck somehow but I have no idea what would cause that.
>
> We're running ONTAP 8.0.2 7-mode on a very busy FAS3170.
>
>
> Thanks,
> Peta
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>



Steve Losen scl@virginia.edu phone: 434-924-0640

University of Virginia ITC Unix Support


_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) wrote:
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> If "NVRAM has no place in the IO operation of a filer", why doesn't every
> NetApp storage system have the same amount of NVRAM/NVMEM?

It all depends on what the definition of 'has a place' is.
I'd argue that the NVRAM is involved in the I/O operation of a Filer.

Putting copies of data (which sits in RAM) into NVRAM, mucking about with it
(to keep it 100% consistent at all times in case of something sinister
happening at *any* time!), and making sure it's emptied out etc etc kindof
puts it in the I/O path w.r.t. capacity. It's an internal limiting factor
inside the ONTAP kernel so...

The size of the NVRAM is effectively limited by the desired cluster pair
failover time. The contents have to to be mirrored across the CI link all
the time as well. It's an optimisation problem, like so many other things
inside a storage system of the complexity that a FAS box has

In the coming new hi-end FAS models (not before early 2014 is my guess), the
NVRAM will be larger than now due to other things mitigating the increasing
failover time. (It's a whole new HW platform.)

/M
--
Michael Bergman
Sr Systems Analyst / Storage Architect
Engineering Hub Stockholm Phone +46 10 7152945
Service Delivery, Engineering EMEA N SMS/MMS +46 70 5480835
Ericsson Torshamnsg 33, 16480 Sthlm, Sweden
--
This communication is confidential. We only send and receive email on the
basis of the terms set out at www.ericsson.com/email_disclaimer
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
Main RAM holds writes.

When the area in main RAM reaches its capacity, things are kicked off.

NVRAM is a paralleled part of the life of a write cycle, but it's not a
step that a write passes through.

I see the differences in how its explained..both are right..but the
original intent, was to remove NVRAM as the culprit of an IO problem.

Short of the battery being dead, or the memory on the card itself
failing..its not in play.


On Thu, Apr 11, 2013 at 9:23 PM, Michael Bergman <
michael.bergman@ericsson.com> wrote:

> Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) wrote:
>
>> Classification: UNCLASSIFIED
>> Caveats: NONE
>>
>> If "NVRAM has no place in the IO operation of a filer", why doesn't every
>> NetApp storage system have the same amount of NVRAM/NVMEM?
>>
>
> It all depends on what the definition of 'has a place' is.
> I'd argue that the NVRAM is involved in the I/O operation of a Filer.
>
> Putting copies of data (which sits in RAM) into NVRAM, mucking about with
> it (to keep it 100% consistent at all times in case of something sinister
> happening at *any* time!), and making sure it's emptied out etc etc kindof
> puts it in the I/O path w.r.t. capacity. It's an internal limiting factor
> inside the ONTAP kernel so...
>
> The size of the NVRAM is effectively limited by the desired cluster pair
> failover time. The contents have to to be mirrored across the CI link all
> the time as well. It's an optimisation problem, like so many other things
> inside a storage system of the complexity that a FAS box has
>
> In the coming new hi-end FAS models (not before early 2014 is my guess),
> the NVRAM will be larger than now due to other things mitigating the
> increasing failover time. (It's a whole new HW platform.)
>
> /M
> --
> Michael Bergman
> Sr Systems Analyst / Storage Architect
> Engineering Hub Stockholm Phone +46 10 7152945
> Service Delivery, Engineering EMEA N SMS/MMS +46 70 5480835
> Ericsson Torshamnsg 33, 16480 Sthlm, Sweden
> --
> This communication is confidential. We only send and receive email on the
> basis of the terms set out at www.ericsson.com/email_**disclaimer<http://www.ericsson.com/email_disclaimer>
>
> ______________________________**_________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/**mailman/listinfo/toasters<http://www.teaparty.net/mailman/listinfo/toasters>
>



--
---
Gustatus Similis Pullus
RE: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
Classification: UNCLASSIFIED
Caveats: NONE

While I think it's rational to state that increasing NVRAM won't result in faster writes to disk, I don't think that statement addresses the premise that "NVRAM has no place in the IO operation of a filer". Here's a quote from NetApp KB article 3011527:

***
The NetApp Storage Controller has two buffers for accepting and caching write data. The Storage Controller can also only process one Consistency Point at a time due to this buffered writing scenario. Under certain circumstances, while one Consistency Point is being processed and written to disk, the second memory buffer can become full prior to the previous Consistency Point being completed. Since the Consistency Point process is global ( meaning ALL writes flow through this mechanism ), a Storage Controller in this situation must stop acknowledging ALL incoming write data requests until the first Consistency Point is completed and the corresponding Non-Volatile RAM (NVRAM) and local memory buffers are cleared.
***
--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----
From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
Sent: Thursday, April 11, 2013 4:01 PM
To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
Cc: Toasters@teaparty.net
Subject: Re: NVRAM weirdness (UNCLASSIFIED)

...that is to say..if your controller was designed to work with 1.



On Thu, Apr 11, 2013 at 9:00 PM, Jeff Mohler <speedtoys.racing@gmail.com> wrote:


NVRAM isnt the producer of writes, its not the arbiter of what writes it can do..its just matched to what will be kept in RAM an brokered within an HA pair.


Plus, some really smart people made the determination. Trust me, 48G of NVRAM wont write to disk any faster than 1.


That's it..that simple.



On Thu, Apr 11, 2013 at 8:45 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:


Classification: UNCLASSIFIED
Caveats: NONE

If NVRAM has no place in a NetApp storage system's I/O operations, why would the size of NVRAM correlate with the I/O capabilities of a NetApp storage system?

--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----
From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]

Sent: Thursday, April 11, 2013 3:12 PM
To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
Cc: Toasters@teaparty.net
Subject: Re: NVRAM weirdness (UNCLASSIFIED)

Sure. NVRAM only holds a copy of data to be written..if all goes well, it's discarded.




On Thu, Apr 11, 2013 at 8:00 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:


Classification: UNCLASSIFIED
Caveats: NONE

Would "capabilities" happen to be a reference to "I/O capabilities?"

--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----

From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
Sent: Thursday, April 11, 2013 2:56 PM
To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
Cc: Toasters@teaparty.net

Subject: Re: NVRAM weirdness (UNCLASSIFIED)

It is sized appropriately to the capabilities it [controller] was designed for.

Hence..adding more, would not provide an improvement for any given system.





On Thu, Apr 11, 2013 at 7:31 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:


Classification: UNCLASSIFIED
Caveats: NONE

If "NVRAM has no place in the IO operation of a filer", why doesn't every NetApp storage system have the same amount of NVRAM/NVMEM?
--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----
From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler
Sent: Thursday, April 11, 2013 2:10 PM
To: Scott Eno
Cc: Toasters@teaparty.net
Subject: Re: NVRAM weirdness

For the OP and others..NVRAM has no place in the IO operation of a filer.

It holds copies of data that are in main RAM, just in -case- there is a problem with site power/etc.


There has been a continued misunderstanding of how Netapp NVRAM works over time..when in fact, it does no work at all.

:)


In sysstat..a view of "disks" being 100% busy is more a view of "storage" as a whole. Not 'a' disk. If storage is stuck..then CP time will also be likewise stuck at 100%...


Incorrect interpretation of sysstat and how Ontap works...can lead to a lot of confusion, and continued incorrect phraseology.






On Thu, Apr 11, 2013 at 12:31 PM, Scott Eno <s.eno@me.com> wrote:


Looks like you're hanging on a consistency point operation, but the sysstat output starts after that last CP was logged. This looks very similar to the ":s" CP bug, but can't say for sure. You have a long running CP operation, you NVRAM has filled up waiting for the operation to complete, but with no "s" after the "#", then it's not processing special files. It's doing something else.

Could be one of these bugs:

651573 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651597 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651926 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651929 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
655724 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
657018 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
650554 Inefficient pre-fetching of metadata blocks delays WAFL consistency point



On Apr 10, 2013, at 6:46 PM, Peta Thames <petathames@gmail.com> wrote:


Hi all,


Has anyone seen this pattern in a sysstat before?

TOASTERa> sysstat -u 1
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
29% 36942 1 0 264 0 0 0 7s 91% 100% # 100%
29% 36896 1 0 1616 0 0 0 7s 90% 100% # 100%
28% 36658 1 0 32 24 0 0 7s 100% 100% # 100%
28% 36625 1 0 108 0 0 0 7s 97% 100% # 100%
27% 36318 1 0 136 8 0 0 7s 100% 100% # 100%
28% 36391 1 0 364 144 0 0 7s 97% 100% # 100%
25% 30387 1 0 88 0 0 0 7s 98% 100% # 100%
21% 24450 1 0 8 0 0 0 7s 100% 100% # 100%
20% 22042 1 0 24 672 0 0 7s 100% 100% # 100%
18% 19781 1 0 284 8 0 0 7s 69% 100% # 100%
18% 19636 1 0 0 0 0 0 7s 100% 100% # 100%
16% 18066 1 0 24 24 0 0 7s 100% 100% # 100%
15% 16999 1 0 56 0 0 0 7s 56% 100% # 100%
15% 16622 1 0 11040 6496 0 0 7s 46% 100% # 1824%
14% 15947 1 0 24 32 0 0 7s 100% 100% # 100%
14% 15448 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15902 0 0 3304 4532 0 0 7s 88% 100% # 100%
14% 15800 0 0 32 24 0 0 7s 100% 100% # 100%
15% 16651 0 0 0 0 0 0 7s 100% 100% # 100%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
14% 15778 0 0 1520 2932 0 0 7s 100% 100% # 100%
13% 14311 0 1 28 24 0 0 7s 100% 100% # 100%
12% 12153 0 0 0 0 0 0 7s 100% 100% # 100%
12% 11993 0 0 8 0 0 0 7s 100% 100% # 100%
12% 12134 0 0 7368 5996 0 0 7s 18% 100% # 100%
13% 11981 0 0 20356 14956 <tel:20356%20%C2%A014956> <tel:20356%C2%A0%2014956> 0 0 7s 100% 100% # 100%
12% 11978 0 0 0 0 0 0 7s 100% 100% # 100%
11% 11617 0 0 24 24 0 0 7s 100% 100% # 100%
12% 11708 0 0 8 0 0 0 7s 100% 100% # 100%
11% 11630 0 0 11936 17872 0 0 7s 100% 100% # 100%
11% 11639 0 0 24 32 0 0 7s 100% 100% # 100%
12% 11791 0 0 12 0 0 0 7s 87% 100% # 100%
22% 11874 0 0 64452 92188 0 0 7s 100% 100% # 100%
12% 12015 0 0 32 24 0 0 7s 100% 100% # 100%
12% 11954 0 0 556 0 0 0 7s 85% 100% # 100%
12% 11899 0 0 0 8 0 0 7s 100% 100% # 100%
13% 13408 0 0 23816 35600 0 0 7s 91% 100% # 100%
13% 14610 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15355 0 0 1248 0 0 0 7s 72% 100% # 100%
14% 15584 0 0 18652 15224 0 0 7s 98% 100% # 100%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
14% 15397 0 0 0 8 0 0 7s 100% 100% # 100%
15% 15859 0 1 0 0 0 0 7s 100% 100% # 100%
14% 15557 0 0 4944 4568 0 0 7s 100% 100% # 100%
14% 15559 0 0 8 0 0 0 7s 100% 100% # 100%
14% 15603 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15507 0 0 3652 4724 0 0 7s 100% 100% # 100%
14% 15593 0 0 20 0 0 0 7s 75% 100% # 100%
12% 12571 0 0 268 0 0 0 7s 100% 100% # 100%
10% 9962 0 0 92 2072 0 0 7s 96% 100% # 100%
9% 8521 0 0 0 0 0 0 7s 100% 100% # 100%
8% 7680 0 0 4 8 0 0 7s 100% 100% # 100%
8% 7459 0 0 24 24 0 0 7s 100% 100% # 100%
8% 7050 0 0 2580 1600 0 0 7s 100% 100% # 100%
8% 6961 0 0 8 0 0 0 7s 100% 100% # 100%
8% 6954 0 0 24 24 0 0 7s 100% 100% # 100%
7% 6936 0 0 584 2400 0 0 7s 100% 100% # 100%
8% 7043 0 0 248 0 0 0 7s 0% 100% # 100%
7% 6946 0 0 24 24 0 0 7s 100% 100% # 100%
8% 7320 0 0 1860 7252 0 0 7s 100% 100% # 100%
55% 19882 2 2 89920 29733 0 0 1 86% 99% B 45%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
88% 17409 31 187 267319 196942 0 0 6s 83% 100% : 89%
87% 16064 11 249 304646 192912 0 0 1 78% 100% : 85%
85% 6274 70 86 322234 233245 0 0 1 81% 100% # 87%
85% 12486 18 340 236474 102383 0 0 3s 82% 100% B 73%
98% 13842 3 29 247937 255188 0 0 2s 85% 100% : 80%
99% 17878 1 1 348355 223367 0 0 3s 84% 100% # 86%
99% 4117 1 1 359628 269212 0 0 3s 78% 100% # 92%
83% 12622 10 31 252272 170064 0 0 2s 83% 100% B 73%
94% 19321 4 10 248864 249292 0 0 2s 88% 100% : 96%
99% 15893 0 0 318375 261482 0 0 2s 85% 100% : 89%
88% 13326 34 76 284960 194032 0 0 2s 82% 100% B 90%
86% 11106 73 104 267760 231524 0 0 2s 81% 100% : 100%
72% 11160 0 0 206927 128869 0 0 3s 75% 100% : 100%
63% 11304 0 0 220736 25276 0 0 3s 67% 100% : 100%
68% 11216 0 0 193324 15184 0 0 3s 66% 100% : 100%
65% 11343 0 0 175120 12588 0 0 3s 60% 100% : 100%
52% 7384 0 0 205780 104564 0 0 3s 61% 100% # 100%
83% 7205 0 0 261108 178256 0 0 3s 74% 100% B 90%
75% 11256 0 0 225164 96332 0 0 3s 76% 100% : 93%
99% 6279 0 0 296056 251191 0 0 2s 87% 100% : 90%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
65% 8616 0 0 220068 165749 0 0 2s 76% 73% : 71%
89% 5443 0 1 338684 239932 0 0 3s 80% 94% F 87%
95% 6726 0 0 379242 240750 0 0 3s 79% 100% : 91%
83% 8721 1 0 342940 113394 0 0 2s 82% 61% : 89%
77% 11971 1 0 314356 24 0 0 2s 85% 0% - 92%


... and back to normal.

Looking in Perf Advisor, it was during an extremely short, sharp latency spike that impacted every volume on every aggregate at exact the same time. It was the fourth of four spikes that had happened a few minutes apart during the previous 20 mins.

It looks like the NVRAM flush got stuck somehow but I have no idea what would cause that.

We're running ONTAP 8.0.2 7-mode on a very busy FAS3170.


Thanks,
Peta

_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters




_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters






--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters





--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters





--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters





--
---
Gustatus Similis Pullus




--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
Correct: A copy is placed into NVRAM in parallel to main RAM, RAM
allocation fills, CP happens, contents of both are thrown away.

Other magic happens in an HA pair..but the same process is followed.


On Thu, Apr 11, 2013 at 9:32 PM, Blackburn, James M (Jim) Jr. CIV USARMY
PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:

> Classification: UNCLASSIFIED
> Caveats: NONE
>
> While I think it's rational to state that increasing NVRAM won't result in
> faster writes to disk, I don't think that statement addresses the premise
> that "NVRAM has no place in the IO operation of a filer". Here's a quote
> from NetApp KB article 3011527:
>
> ***
> The NetApp Storage Controller has two buffers for accepting and caching
> write data. The Storage Controller can also only process one Consistency
> Point at a time due to this buffered writing scenario. Under certain
> circumstances, while one Consistency Point is being processed and written
> to disk, the second memory buffer can become full prior to the previous
> Consistency Point being completed. Since the Consistency Point process is
> global ( meaning ALL writes flow through this mechanism ), a Storage
> Controller in this situation must stop acknowledging ALL incoming write
> data requests until the first Consistency Point is completed and the
> corresponding Non-Volatile RAM (NVRAM) and local memory buffers are cleared.
> ***
> --
> Jim Blackburn
> US Army
> james.m.blackburn33.civ@mail.mil
> Note: I am not authorized to obligate, commit, or execute United States
> government funds. No part of this message should be construed as a request
> or directive to obligate, commit, or execute United States government funds.
>
>
> -----Original Message-----
> From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
> Sent: Thursday, April 11, 2013 4:01 PM
> To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
> Cc: Toasters@teaparty.net
> Subject: Re: NVRAM weirdness (UNCLASSIFIED)
>
> ...that is to say..if your controller was designed to work with 1.
>
>
>
> On Thu, Apr 11, 2013 at 9:00 PM, Jeff Mohler <speedtoys.racing@gmail.com>
> wrote:
>
>
> NVRAM isnt the producer of writes, its not the arbiter of what
> writes it can do..its just matched to what will be kept in RAM an brokered
> within an HA pair.
>
>
> Plus, some really smart people made the determination. Trust me,
> 48G of NVRAM wont write to disk any faster than 1.
>
>
> That's it..that simple.
>
>
>
> On Thu, Apr 11, 2013 at 8:45 PM, Blackburn, James M (Jim) Jr. CIV
> USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:
>
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> If NVRAM has no place in a NetApp storage system's I/O
> operations, why would the size of NVRAM correlate with the I/O capabilities
> of a NetApp storage system?
>
> --
> Jim Blackburn
> US Army
> james.m.blackburn33.civ@mail.mil
> Note: I am not authorized to obligate, commit, or execute
> United States government funds. No part of this message should be
> construed as a request or directive to obligate, commit, or execute United
> States government funds.
>
>
> -----Original Message-----
> From: Jeff Mohler [mailto:speedtoys.racing@gmail.com]
>
> Sent: Thursday, April 11, 2013 3:12 PM
> To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
> Cc: Toasters@teaparty.net
> Subject: Re: NVRAM weirdness (UNCLASSIFIED)
>
> Sure. NVRAM only holds a copy of data to be written..if
> all goes well, it's discarded.
>
>
>
>
> On Thu, Apr 11, 2013 at 8:00 PM, Blackburn, James M (Jim)
> Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil> wrote:
>
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> Would "capabilities" happen to be a reference to
> "I/O capabilities?"
>
> --
> Jim Blackburn
> US Army
> james.m.blackburn33.civ@mail.mil
> Note: I am not authorized to obligate, commit, or
> execute United States government funds. No part of this message should be
> construed as a request or directive to obligate, commit, or execute United
> States government funds.
>
>
> -----Original Message-----
>
> From: Jeff Mohler [mailto:
> speedtoys.racing@gmail.com]
> Sent: Thursday, April 11, 2013 2:56 PM
> To: Blackburn, James M (Jim) Jr. CIV USARMY PEO
> EIS (US)
> Cc: Toasters@teaparty.net
>
> Subject: Re: NVRAM weirdness (UNCLASSIFIED)
>
> It is sized appropriately to the capabilities it
> [controller] was designed for.
>
> Hence..adding more, would not provide an
> improvement for any given system.
>
>
>
>
>
> On Thu, Apr 11, 2013 at 7:31 PM, Blackburn, James
> M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil>
> wrote:
>
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
> If "NVRAM has no place in the IO operation
> of a filer", why doesn't every NetApp storage system have the same amount
> of NVRAM/NVMEM?
> --
> Jim Blackburn
> US Army
> james.m.blackburn33.civ@mail.mil
> Note: I am not authorized to obligate,
> commit, or execute United States government funds. No part of this message
> should be construed as a request or directive to obligate, commit, or
> execute United States government funds.
>
>
> -----Original Message-----
> From: toasters-bounces@teaparty.net[mailto:
> toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler
> Sent: Thursday, April 11, 2013 2:10 PM
> To: Scott Eno
> Cc: Toasters@teaparty.net
> Subject: Re: NVRAM weirdness
>
> For the OP and others..NVRAM has no place
> in the IO operation of a filer.
>
> It holds copies of data that are in main
> RAM, just in -case- there is a problem with site power/etc.
>
>
> There has been a continued
> misunderstanding of how Netapp NVRAM works over time..when in fact, it does
> no work at all.
>
> :)
>
>
> In sysstat..a view of "disks" being 100%
> busy is more a view of "storage" as a whole. Not 'a' disk. If storage is
> stuck..then CP time will also be likewise stuck at 100%...
>
>
> Incorrect interpretation of sysstat and
> how Ontap works...can lead to a lot of confusion, and continued incorrect
> phraseology.
>
>
>
>
>
>
> On Thu, Apr 11, 2013 at 12:31 PM, Scott
> Eno <s.eno@me.com> wrote:
>
>
> Looks like you're hanging on a
> consistency point operation, but the sysstat output starts after that last
> CP was logged. This looks very similar to the ":s" CP bug, but can't say
> for sure. You have a long running CP operation, you NVRAM has filled up
> waiting for the operation to complete, but with no "s" after the "#", then
> it's not processing special files. It's doing something else.
>
> Could be one of these bugs:
>
> 651573 Inefficient pre-fetching of
> metadata blocks delays WAFL consistency point
> 651597 Inefficient pre-fetching of
> metadata blocks delays WAFL consistency point
> 651926 Inefficient pre-fetching of
> metadata blocks delays WAFL consistency point
> 651929 Inefficient pre-fetching of
> metadata blocks delays WAFL consistency point
> 655724 Inefficient pre-fetching of
> metadata blocks delays WAFL consistency point
> 657018 Inefficient pre-fetching of
> metadata blocks delays WAFL consistency point
> 650554 Inefficient pre-fetching of
> metadata blocks delays WAFL consistency point
>
>
>
> On Apr 10, 2013, at 6:46 PM, Peta
> Thames <petathames@gmail.com> wrote:
>
>
> Hi all,
>
>
> Has anyone seen this
> pattern in a sysstat before?
>
> TOASTERa> sysstat -u 1
> CPU Total Net
> kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
> ops/s in
> out read write read write age hit time ty util
> 29% 36942 1
> 0 264 0 0 0 7s 91% 100% # 100%
> 29% 36896 1
> 0 1616 0 0 0 7s 90% 100% # 100%
> 28% 36658 1
> 0 32 24 0 0 7s 100% 100% # 100%
> 28% 36625 1
> 0 108 0 0 0 7s 97% 100% # 100%
> 27% 36318 1
> 0 136 8 0 0 7s 100% 100% # 100%
> 28% 36391 1
> 0 364 144 0 0 7s 97% 100% # 100%
> 25% 30387 1
> 0 88 0 0 0 7s 98% 100% # 100%
> 21% 24450 1
> 0 8 0 0 0 7s 100% 100% # 100%
> 20% 22042 1
> 0 24 672 0 0 7s 100% 100% # 100%
> 18% 19781 1
> 0 284 8 0 0 7s 69% 100% # 100%
> 18% 19636 1
> 0 0 0 0 0 7s 100% 100% # 100%
> 16% 18066 1
> 0 24 24 0 0 7s 100% 100% # 100%
> 15% 16999 1
> 0 56 0 0 0 7s 56% 100% # 100%
> 15% 16622 1
> 0 11040 6496 0 0 7s 46% 100% # 1824%
> 14% 15947 1
> 0 24 32 0 0 7s 100% 100% # 100%
> 14% 15448 0
> 0 0 0 0 0 7s 100% 100% # 100%
> 14% 15902 0
> 0 3304 4532 0 0 7s 88% 100% # 100%
> 14% 15800 0
> 0 32 24 0 0 7s 100% 100% # 100%
> 15% 16651 0
> 0 0 0 0 0 7s 100% 100% # 100%
> CPU Total Net
> kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
> ops/s in
> out read write read write age hit time ty util
> 14% 15778 0
> 0 1520 2932 0 0 7s 100% 100% # 100%
> 13% 14311 0
> 1 28 24 0 0 7s 100% 100% # 100%
> 12% 12153 0
> 0 0 0 0 0 7s 100% 100% # 100%
> 12% 11993 0
> 0 8 0 0 0 7s 100% 100% # 100%
> 12% 12134 0
> 0 7368 5996 0 0 7s 18% 100% # 100%
> 13% 11981 0
> 0 20356 14956 <tel:20356%20%C2%A014956> <tel:20356%C2%A0%2014956>
> 0 0 7s 100% 100% # 100%
> 12% 11978 0
> 0 0 0 0 0 7s 100% 100% # 100%
> 11% 11617 0
> 0 24 24 0 0 7s 100% 100% # 100%
> 12% 11708 0
> 0 8 0 0 0 7s 100% 100% # 100%
> 11% 11630 0
> 0 11936 17872 0 0 7s 100% 100% # 100%
> 11% 11639 0
> 0 24 32 0 0 7s 100% 100% # 100%
> 12% 11791 0
> 0 12 0 0 0 7s 87% 100% # 100%
> 22% 11874 0
> 0 64452 92188 0 0 7s 100% 100% # 100%
> 12% 12015 0
> 0 32 24 0 0 7s 100% 100% # 100%
> 12% 11954 0
> 0 556 0 0 0 7s 85% 100% # 100%
> 12% 11899 0
> 0 0 8 0 0 7s 100% 100% # 100%
> 13% 13408 0
> 0 23816 35600 0 0 7s 91% 100% # 100%
> 13% 14610 0
> 0 0 0 0 0 7s 100% 100% # 100%
> 14% 15355 0
> 0 1248 0 0 0 7s 72% 100% # 100%
> 14% 15584 0
> 0 18652 15224 0 0 7s 98% 100% # 100%
> CPU Total Net
> kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
> ops/s in
> out read write read write age hit time ty util
> 14% 15397 0
> 0 0 8 0 0 7s 100% 100% # 100%
> 15% 15859 0
> 1 0 0 0 0 7s 100% 100% # 100%
> 14% 15557 0
> 0 4944 4568 0 0 7s 100% 100% # 100%
> 14% 15559 0
> 0 8 0 0 0 7s 100% 100% # 100%
> 14% 15603 0
> 0 0 0 0 0 7s 100% 100% # 100%
> 14% 15507 0
> 0 3652 4724 0 0 7s 100% 100% # 100%
> 14% 15593 0
> 0 20 0 0 0 7s 75% 100% # 100%
> 12% 12571 0
> 0 268 0 0 0 7s 100% 100% # 100%
> 10% 9962 0
> 0 92 2072 0 0 7s 96% 100% # 100%
> 9% 8521 0
> 0 0 0 0 0 7s 100% 100% # 100%
> 8% 7680 0
> 0 4 8 0 0 7s 100% 100% # 100%
> 8% 7459 0
> 0 24 24 0 0 7s 100% 100% # 100%
> 8% 7050 0
> 0 2580 1600 0 0 7s 100% 100% # 100%
> 8% 6961 0
> 0 8 0 0 0 7s 100% 100% # 100%
> 8% 6954 0
> 0 24 24 0 0 7s 100% 100% # 100%
> 7% 6936 0
> 0 584 2400 0 0 7s 100% 100% # 100%
> 8% 7043 0
> 0 248 0 0 0 7s 0% 100% # 100%
> 7% 6946 0
> 0 24 24 0 0 7s 100% 100% # 100%
> 8% 7320 0
> 0 1860 7252 0 0 7s 100% 100% # 100%
> 55% 19882 2
> 2 89920 29733 0 0 1 86% 99% B 45%
> CPU Total Net
> kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
> ops/s in
> out read write read write age hit time ty util
> 88% 17409 31
> 187 267319 196942 0 0 6s 83% 100% : 89%
> 87% 16064 11
> 249 304646 192912 0 0 1 78% 100% : 85%
> 85% 6274 70
> 86 322234 233245 0 0 1 81% 100% # 87%
> 85% 12486 18
> 340 236474 102383 0 0 3s 82% 100% B 73%
> 98% 13842 3
> 29 247937 255188 0 0 2s 85% 100% : 80%
> 99% 17878 1
> 1 348355 223367 0 0 3s 84% 100% # 86%
> 99% 4117 1
> 1 359628 269212 0 0 3s 78% 100% # 92%
> 83% 12622 10
> 31 252272 170064 0 0 2s 83% 100% B 73%
> 94% 19321 4
> 10 248864 249292 0 0 2s 88% 100% : 96%
> 99% 15893 0
> 0 318375 261482 0 0 2s 85% 100% : 89%
> 88% 13326 34
> 76 284960 194032 0 0 2s 82% 100% B 90%
> 86% 11106 73
> 104 267760 231524 0 0 2s 81% 100% : 100%
> 72% 11160 0
> 0 206927 128869 0 0 3s 75% 100% : 100%
> 63% 11304 0
> 0 220736 25276 0 0 3s 67% 100% : 100%
> 68% 11216 0
> 0 193324 15184 0 0 3s 66% 100% : 100%
> 65% 11343 0
> 0 175120 12588 0 0 3s 60% 100% : 100%
> 52% 7384 0
> 0 205780 104564 0 0 3s 61% 100% # 100%
> 83% 7205 0
> 0 261108 178256 0 0 3s 74% 100% B 90%
> 75% 11256 0
> 0 225164 96332 0 0 3s 76% 100% : 93%
> 99% 6279 0
> 0 296056 251191 0 0 2s 87% 100% : 90%
> CPU Total Net
> kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
> ops/s in
> out read write read write age hit time ty util
> 65% 8616 0
> 0 220068 165749 0 0 2s 76% 73% : 71%
> 89% 5443 0
> 1 338684 239932 0 0 3s 80% 94% F 87%
> 95% 6726 0
> 0 379242 240750 0 0 3s 79% 100% : 91%
> 83% 8721 1
> 0 342940 113394 0 0 2s 82% 61% : 89%
> 77% 11971 1
> 0 314356 24 0 0 2s 85% 0% - 92%
>
>
> ... and back to normal.
>
> Looking in Perf Advisor,
> it was during an extremely short, sharp latency spike that impacted every
> volume on every aggregate at exact the same time. It was the fourth of
> four spikes that had happened a few minutes apart during the previous 20
> mins.
>
> It looks like the NVRAM
> flush got stuck somehow but I have no idea what would cause that.
>
> We're running ONTAP 8.0.2
> 7-mode on a very busy FAS3170.
>
>
> Thanks,
> Peta
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
>
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
>
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
>
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
>
>
>
> --
> ---
> Gustatus Similis Pullus
>
> Classification: UNCLASSIFIED
> Caveats: NONE
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> http://www.teaparty.net/mailman/listinfo/toasters
>



--
---
Gustatus Similis Pullus
RE: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
Here's how I usually explain this -

NVRAM is a journal. Under normal circumstances, it is written but never read. When writes to disk occur, they will read from main RAM, not the NVRAM. After that's done, the NVRAM is simply overwritten. The only time it should be read is during an event such as a failure on that controller that requires the partner to take over. The partner replays the uncommitted transactions from that NVRAM onto the disks. In this respect, it's just like the journal on any journaled filesystems and it's much like the redo logging functions of databases. Most storage arrays work in a similar way.

The reason it's so small compared to most arrays is because of WAFL. We don't need that much NVRAM because when writes happen, ONTAP writes out single complete RAID stripes and calculates parity in memory. If there was a need to do lots of reads to regenerate parity, then we'd have to increase the NVRAM more to smooth out performance.

If a system is having write problems, it's not because of NVRAM sizing or speed, it's because the back-end disks' capability to receive writes is being overwhelmed. Technically speaking, increasing NVRAM would probably delay the point at which performance problems begin, but it would be relief measured in seconds. If there are problems, it's because the disks are overwhelmed. It's a pretty easy thing to monitor because of the use of aggregates. I used to work for Oracle, and we'd just measure the aggregate utilization as a whole. If the aggregate is peaking at 56% utilization or something like that, then there's plenty of headroom left. If utilization starts touching 90% then it's probably time to start the purchase order for a new shelf (or checking the workload to see if something is incorrectly loading down the filer.

The sizing of the NVRAM is related mostly to CPU capability. When the writes are committed and the journal on NVRAM is invalidated and then overwritten, there is IO to be processed. If there was 500GB of NVRAM on a FAS2240, that would exceed the ability of the platform to process IO in a timely manner. You would see performance problems because the platform wouldn't be able to write down anything close to 500GB of inbound writes during the disk writing phase. In that extreme of an example, the filer would probably cease responding for 60 seconds every so often.

From: toasters-bounces@teaparty.net [mailto:toasters-bounces@teaparty.net] On Behalf Of Jeff Mohler
Sent: Friday, April 12, 2013 6:38 AM
To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
Cc: Toasters@teaparty.net
Subject: Re: NVRAM weirdness (UNCLASSIFIED)

Correct: A copy is placed into NVRAM in parallel to main RAM, RAM allocation fills, CP happens, contents of both are thrown away.
Other magic happens in an HA pair..but the same process is followed.

On Thu, Apr 11, 2013 at 9:32 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil<mailto:james.m.blackburn33.civ@mail.mil>> wrote:
Classification: UNCLASSIFIED
Caveats: NONE

While I think it's rational to state that increasing NVRAM won't result in faster writes to disk, I don't think that statement addresses the premise that "NVRAM has no place in the IO operation of a filer". Here's a quote from NetApp KB article 3011527:

***
The NetApp Storage Controller has two buffers for accepting and caching write data. The Storage Controller can also only process one Consistency Point at a time due to this buffered writing scenario. Under certain circumstances, while one Consistency Point is being processed and written to disk, the second memory buffer can become full prior to the previous Consistency Point being completed. Since the Consistency Point process is global ( meaning ALL writes flow through this mechanism ), a Storage Controller in this situation must stop acknowledging ALL incoming write data requests until the first Consistency Point is completed and the corresponding Non-Volatile RAM (NVRAM) and local memory buffers are cleared.
***
--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil<mailto:james.m.blackburn33.civ@mail.mil>
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----
From: Jeff Mohler [mailto:speedtoys.racing@gmail.com<mailto:speedtoys.racing@gmail.com>]
Sent: Thursday, April 11, 2013 4:01 PM
To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
Cc: Toasters@teaparty.net<mailto:Toasters@teaparty.net>
Subject: Re: NVRAM weirdness (UNCLASSIFIED)

...that is to say..if your controller was designed to work with 1.



On Thu, Apr 11, 2013 at 9:00 PM, Jeff Mohler <speedtoys.racing@gmail.com<mailto:speedtoys.racing@gmail.com>> wrote:


NVRAM isnt the producer of writes, its not the arbiter of what writes it can do..its just matched to what will be kept in RAM an brokered within an HA pair.


Plus, some really smart people made the determination. Trust me, 48G of NVRAM wont write to disk any faster than 1.


That's it..that simple.



On Thu, Apr 11, 2013 at 8:45 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil<mailto:james.m.blackburn33.civ@mail.mil>> wrote:


Classification: UNCLASSIFIED
Caveats: NONE

If NVRAM has no place in a NetApp storage system's I/O operations, why would the size of NVRAM correlate with the I/O capabilities of a NetApp storage system?

--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil<mailto:james.m.blackburn33.civ@mail.mil>
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----
From: Jeff Mohler [mailto:speedtoys.racing@gmail.com<mailto:speedtoys.racing@gmail.com>]

Sent: Thursday, April 11, 2013 3:12 PM
To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
Cc: Toasters@teaparty.net<mailto:Toasters@teaparty.net>
Subject: Re: NVRAM weirdness (UNCLASSIFIED)

Sure. NVRAM only holds a copy of data to be written..if all goes well, it's discarded.




On Thu, Apr 11, 2013 at 8:00 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil<mailto:james.m.blackburn33.civ@mail.mil>> wrote:


Classification: UNCLASSIFIED
Caveats: NONE

Would "capabilities" happen to be a reference to "I/O capabilities?"

--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil<mailto:james.m.blackburn33.civ@mail.mil>
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----

From: Jeff Mohler [mailto:speedtoys.racing@gmail.com<mailto:speedtoys.racing@gmail.com>]
Sent: Thursday, April 11, 2013 2:56 PM
To: Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US)
Cc: Toasters@teaparty.net<mailto:Toasters@teaparty.net>

Subject: Re: NVRAM weirdness (UNCLASSIFIED)

It is sized appropriately to the capabilities it [controller] was designed for.

Hence..adding more, would not provide an improvement for any given system.





On Thu, Apr 11, 2013 at 7:31 PM, Blackburn, James M (Jim) Jr. CIV USARMY PEO EIS (US) <james.m.blackburn33.civ@mail.mil<mailto:james.m.blackburn33.civ@mail.mil>> wrote:


Classification: UNCLASSIFIED
Caveats: NONE

If "NVRAM has no place in the IO operation of a filer", why doesn't every NetApp storage system have the same amount of NVRAM/NVMEM?
--
Jim Blackburn
US Army
james.m.blackburn33.civ@mail.mil<mailto:james.m.blackburn33.civ@mail.mil>
Note: I am not authorized to obligate, commit, or execute United States government funds. No part of this message should be construed as a request or directive to obligate, commit, or execute United States government funds.


-----Original Message-----
From: toasters-bounces@teaparty.net<mailto:toasters-bounces@teaparty.net> [mailto:toasters-bounces@teaparty.net<mailto:toasters-bounces@teaparty.net>] On Behalf Of Jeff Mohler
Sent: Thursday, April 11, 2013 2:10 PM
To: Scott Eno
Cc: Toasters@teaparty.net<mailto:Toasters@teaparty.net>
Subject: Re: NVRAM weirdness

For the OP and others..NVRAM has no place in the IO operation of a filer.

It holds copies of data that are in main RAM, just in -case- there is a problem with site power/etc.


There has been a continued misunderstanding of how Netapp NVRAM works over time..when in fact, it does no work at all.

:)


In sysstat..a view of "disks" being 100% busy is more a view of "storage" as a whole. Not 'a' disk. If storage is stuck..then CP time will also be likewise stuck at 100%...


Incorrect interpretation of sysstat and how Ontap works...can lead to a lot of confusion, and continued incorrect phraseology.






On Thu, Apr 11, 2013 at 12:31 PM, Scott Eno <s.eno@me.com<mailto:s.eno@me.com>> wrote:


Looks like you're hanging on a consistency point operation, but the sysstat output starts after that last CP was logged. This looks very similar to the ":s" CP bug, but can't say for sure. You have a long running CP operation, you NVRAM has filled up waiting for the operation to complete, but with no "s" after the "#", then it's not processing special files. It's doing something else.

Could be one of these bugs:

651573 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651597 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651926 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
651929 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
655724 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
657018 Inefficient pre-fetching of metadata blocks delays WAFL consistency point
650554 Inefficient pre-fetching of metadata blocks delays WAFL consistency point



On Apr 10, 2013, at 6:46 PM, Peta Thames <petathames@gmail.com<mailto:petathames@gmail.com>> wrote:


Hi all,


Has anyone seen this pattern in a sysstat before?

TOASTERa> sysstat -u 1
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
29% 36942 1 0 264 0 0 0 7s 91% 100% # 100%
29% 36896 1 0 1616 0 0 0 7s 90% 100% # 100%
28% 36658 1 0 32 24 0 0 7s 100% 100% # 100%
28% 36625 1 0 108 0 0 0 7s 97% 100% # 100%
27% 36318 1 0 136 8 0 0 7s 100% 100% # 100%
28% 36391 1 0 364 144 0 0 7s 97% 100% # 100%
25% 30387 1 0 88 0 0 0 7s 98% 100% # 100%
21% 24450 1 0 8 0 0 0 7s 100% 100% # 100%
20% 22042 1 0 24 672 0 0 7s 100% 100% # 100%
18% 19781 1 0 284 8 0 0 7s 69% 100% # 100%
18% 19636 1 0 0 0 0 0 7s 100% 100% # 100%
16% 18066 1 0 24 24 0 0 7s 100% 100% # 100%
15% 16999 1 0 56 0 0 0 7s 56% 100% # 100%
15% 16622 1 0 11040 6496 0 0 7s 46% 100% # 1824%
14% 15947 1 0 24 32 0 0 7s 100% 100% # 100%
14% 15448 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15902 0 0 3304 4532 0 0 7s 88% 100% # 100%
14% 15800 0 0 32 24 0 0 7s 100% 100% # 100%
15% 16651 0 0 0 0 0 0 7s 100% 100% # 100%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
14% 15778 0 0 1520 2932 0 0 7s 100% 100% # 100%
13% 14311 0 1 28 24 0 0 7s 100% 100% # 100%
12% 12153 0 0 0 0 0 0 7s 100% 100% # 100%
12% 11993 0 0 8 0 0 0 7s 100% 100% # 100%
12% 12134 0 0 7368 5996 0 0 7s 18% 100% # 100%
13% 11981 0 0 20356 14956<tel:20356%20%2014956> <tel:20356%20%C2%A014956> <tel:20356%C2%A0%2014956> 0 0 7s 100% 100% # 100%
12% 11978 0 0 0 0 0 0 7s 100% 100% # 100%
11% 11617 0 0 24 24 0 0 7s 100% 100% # 100%
12% 11708 0 0 8 0 0 0 7s 100% 100% # 100%
11% 11630 0 0 11936 17872 0 0 7s 100% 100% # 100%
11% 11639 0 0 24 32 0 0 7s 100% 100% # 100%
12% 11791 0 0 12 0 0 0 7s 87% 100% # 100%
22% 11874 0 0 64452 92188 0 0 7s 100% 100% # 100%
12% 12015 0 0 32 24 0 0 7s 100% 100% # 100%
12% 11954 0 0 556 0 0 0 7s 85% 100% # 100%
12% 11899 0 0 0 8 0 0 7s 100% 100% # 100%
13% 13408 0 0 23816 35600 0 0 7s 91% 100% # 100%
13% 14610 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15355 0 0 1248 0 0 0 7s 72% 100% # 100%
14% 15584 0 0 18652 15224 0 0 7s 98% 100% # 100%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
14% 15397 0 0 0 8 0 0 7s 100% 100% # 100%
15% 15859 0 1 0 0 0 0 7s 100% 100% # 100%
14% 15557 0 0 4944 4568 0 0 7s 100% 100% # 100%
14% 15559 0 0 8 0 0 0 7s 100% 100% # 100%
14% 15603 0 0 0 0 0 0 7s 100% 100% # 100%
14% 15507 0 0 3652 4724 0 0 7s 100% 100% # 100%
14% 15593 0 0 20 0 0 0 7s 75% 100% # 100%
12% 12571 0 0 268 0 0 0 7s 100% 100% # 100%
10% 9962 0 0 92 2072 0 0 7s 96% 100% # 100%
9% 8521 0 0 0 0 0 0 7s 100% 100% # 100%
8% 7680 0 0 4 8 0 0 7s 100% 100% # 100%
8% 7459 0 0 24 24 0 0 7s 100% 100% # 100%
8% 7050 0 0 2580 1600 0 0 7s 100% 100% # 100%
8% 6961 0 0 8 0 0 0 7s 100% 100% # 100%
8% 6954 0 0 24 24 0 0 7s 100% 100% # 100%
7% 6936 0 0 584 2400 0 0 7s 100% 100% # 100%
8% 7043 0 0 248 0 0 0 7s 0% 100% # 100%
7% 6946 0 0 24 24 0 0 7s 100% 100% # 100%
8% 7320 0 0 1860 7252 0 0 7s 100% 100% # 100%
55% 19882 2 2 89920 29733 0 0 1 86% 99% B 45%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
88% 17409 31 187 267319 196942 0 0 6s 83% 100% : 89%
87% 16064 11 249 304646 192912 0 0 1 78% 100% : 85%
85% 6274 70 86 322234 233245 0 0 1 81% 100% # 87%
85% 12486 18 340 236474 102383 0 0 3s 82% 100% B 73%
98% 13842 3 29 247937 255188 0 0 2s 85% 100% : 80%
99% 17878 1 1 348355 223367 0 0 3s 84% 100% # 86%
99% 4117 1 1 359628 269212 0 0 3s 78% 100% # 92%
83% 12622 10 31 252272 170064 0 0 2s 83% 100% B 73%
94% 19321 4 10 248864 249292 0 0 2s 88% 100% : 96%
99% 15893 0 0 318375 261482 0 0 2s 85% 100% : 89%
88% 13326 34 76 284960 194032 0 0 2s 82% 100% B 90%
86% 11106 73 104 267760 231524 0 0 2s 81% 100% : 100%
72% 11160 0 0 206927 128869 0 0 3s 75% 100% : 100%
63% 11304 0 0 220736 25276 0 0 3s 67% 100% : 100%
68% 11216 0 0 193324 15184 0 0 3s 66% 100% : 100%
65% 11343 0 0 175120 12588 0 0 3s 60% 100% : 100%
52% 7384 0 0 205780 104564 0 0 3s 61% 100% # 100%
83% 7205 0 0 261108 178256 0 0 3s 74% 100% B 90%
75% 11256 0 0 225164 96332 0 0 3s 76% 100% : 93%
99% 6279 0 0 296056 251191 0 0 2s 87% 100% : 90%
CPU Total Net kB/s Disk kB/s Tape kB/s Cache Cache CP CP Disk
ops/s in out read write read write age hit time ty util
65% 8616 0 0 220068 165749 0 0 2s 76% 73% : 71%
89% 5443 0 1 338684 239932 0 0 3s 80% 94% F 87%
95% 6726 0 0 379242 240750 0 0 3s 79% 100% : 91%
83% 8721 1 0 342940 113394 0 0 2s 82% 61% : 89%
77% 11971 1 0 314356 24 0 0 2s 85% 0% - 92%


... and back to normal.

Looking in Perf Advisor, it was during an extremely short, sharp latency spike that impacted every volume on every aggregate at exact the same time. It was the fourth of four spikes that had happened a few minutes apart during the previous 20 mins.

It looks like the NVRAM flush got stuck somehow but I have no idea what would cause that.

We're running ONTAP 8.0.2 7-mode on a very busy FAS3170.


Thanks,
Peta

_______________________________________________
Toasters mailing list
Toasters@teaparty.net<mailto:Toasters@teaparty.net>
http://www.teaparty.net/mailman/listinfo/toasters




_______________________________________________
Toasters mailing list
Toasters@teaparty.net<mailto:Toasters@teaparty.net>
http://www.teaparty.net/mailman/listinfo/toasters






--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net<mailto:Toasters@teaparty.net>
http://www.teaparty.net/mailman/listinfo/toasters





--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net<mailto:Toasters@teaparty.net>
http://www.teaparty.net/mailman/listinfo/toasters





--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net<mailto:Toasters@teaparty.net>
http://www.teaparty.net/mailman/listinfo/toasters





--
---
Gustatus Similis Pullus




--
---
Gustatus Similis Pullus

Classification: UNCLASSIFIED
Caveats: NONE



_______________________________________________
Toasters mailing list
Toasters@teaparty.net<mailto:Toasters@teaparty.net>
http://www.teaparty.net/mailman/listinfo/toasters



--
---
Gustatus Similis Pullus
Re: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
Steve Losen wrote:
> I think the point is that I/O does not pass through NVRAM, i.e., NVRAM
> is not used as a disk cache.

Right. The NetApp NVRAM is *not* a write cache. It's a log.

But when you say that to ppl, just like that statement, they have to
understand what the semantic difference between a log and a cache really is
in a context such as this. It's pretty strict. Most people don't, not in my
experience anyway so confusion results

> It is used as a transaction log and in the event of an outage, the NVRAM
> log is replayed. When a write request arrives, the request is logged to
> NVRAM and the filer ACKs the request.
> "At its leisure" the filer commits a batch of write requests to disk
> by writing a CP (consistency point). Then the corresponding NVRAM
> log storage is reused to log new requests. When NVRAM approaches
> capacity, the filer MUST write a CP. Otherwise there is no room in
> NVRAM to log any more incoming write requests, forcing the filer to
> reject them.
>
> So the size of NVRAM determines to some degree how much of a backlog
> of uncommitted write requests that a filer can tolerate.

Good elaboration/description. The NVRAM is a limiting factor for I/O for a
whole controller. How fast you can write to disk, is limited by what needs
to be done with the buffers temporarily held in the NVRAM, it's essentially
a "wall-clock problem" when you hit sustained back-2-back CP

A multi-dimensional optimisation problem. The way it works, proven in the
field since so many years, is quite impressive IMO.

The bigger the NVRAM, the more write burst capacity, and the more
random-write-efficient the "Tetris Algorithm" becomes (up to some limit I'd
guess, I have no idea what that would be). But in a failover cluster the
issues with latency for the NVRAM mirroring and the time it takes to do a
failover limits the size of the NVRAM so...

My 2 cents is on NetApp to mitigate the issues with this very difficult
optimisation problem using some new/improved cleverness around SSD devices.
Something perhaps that really works as a write cache, at least partially, in
conjunction with the log (NVRAM).
We have FlashPool now, and that's good, but does a FlashPool really do
anything in real life to help with random writes? (or rather random re-writes)?

/M

>> Jim Blackburn
>> US Army
>> james.m.blackburn33.civ@mail.mil
>> Note: I am not authorized to obligate, commit, or execute United States
>> government funds. No part of this message should be construed as a request
>> or directive to obligate, commit, or execute United States government funds.
_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters
Re: NVRAM weirdness (UNCLASSIFIED) [ In reply to ]
>
> My 2 cents is on NetApp to mitigate the issues with this very difficult
> optimisation problem using some new/improved cleverness around SSD devices.
> Something perhaps that really works as a write cache, at least partially, in
> conjunction with the log (NVRAM).
> We have FlashPool now, and that's good, but does a FlashPool really do
> anything in real life to help with random writes? (or rather random re-writes)?
>

There is no such thing as a "random write" or "random re-write", at
least not at the physical disk block level.

Remember that netapp ALWAYS writes to fresh, unallocated blocks,
preferably to entire raid stripes. A filer never overwrites data
"in place". If netapp did overwrite in place, then CPs (consistency
points) would not be possible and neither would snapshots (as
implemented by netapp). Netapp maintains a CP on disk at all times.
This would not be possible if blocks in the current CP were
modified in place. Netapp writes a new CP to fresh blocks and
when finished, it "steps off" the old CP and onto the new CP.
Then it frees any block in the old CP that has been replaced
by a fresh block in the new CP (provided the old block is not
frozen in a snapshot).

For example, if you logically overwrite a database file block,
in reality the filer writes the new data to a fresh block
and maps the new block into the database file in place of the
old block. This happens when the filer writes the next CP to disk.

A filer lets write requests accumulate for a while before writing
them to disk in a CP. This allows the filer to intelligently
schedule a burst of efficient writes to whatever free blocks it
chooses to use. (WAFL means "Write Anywhere File Layout".)
If possible the filer writes entire raid stripes, so no reads
are necessary to recalculate parity, and so that the writes are
scattered across spindles.

I admit that writing an entire raid stripe requires all the data
blocks in the stripe to be free. This could be a tall order on
a busy aggregate that is nearly full.

Random reads are a completely different story.

Steve Losen scl@virginia.edu phone: 434-924-0640

University of Virginia ITC Unix Support


_______________________________________________
Toasters mailing list
Toasters@teaparty.net
http://www.teaparty.net/mailman/listinfo/toasters