Mailing List Archive

SV: [EXTERNAL] SV: Ndmpcopy times out...
Hi again



This is the result after setting the timeout to 0…. Doesn’t seem to make any difference…

Below the ndmpcopy has been started via the system-processor, system console, node run …

I then try to login again to see of the session is still active… but it is not..



Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 14:58:56 2021 : We have processed 2211434 files and directories.

Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 15:05:40 2021: Writing data to files.

Ndmpcopy: 10.64.9.142: Log: DUMP: Tue May 18 15:05:40 2021 : We have written 1026117 KB.

Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 15:05:40 2021 : We have read 1024120 KB from the backup.

Ndmpcopy: 10.64.9.142: Log: DUMP: Tue May 18 15:10:40 2021 : We have written 53866876 KB.

Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 15:10:40 2021 : We have read 53864840 KB from the backup.

Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 15:15:40 2021 : We have read 106737975 KB from the backup.

Ndmpcopy: 10.64.9.142: Log: DUMP: Tue May 18 15:15:40 2021 : We have written 106740019 KB.

Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 15:20:40 2021 : We have read 159429965 KB from the backup.

Ndmpcopy: 10.64.9.142: Log: DUMP: Tue May 18 15:20:40 2021 : We have written 159431993 KB.

Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 15:25:40 2021 : We have read 211327838 KB from the backup.

Ndmpcopy: 10.64.9.142: Log: DUMP: Tue May 18 15:25:40 2021 : We have written 211329882 KB.

Autologout: System Console being disconnected due to inactivity

SP NODE-01*> Autologout : Session being disconnected due to inactivity

Connection to 10.64.9.180 closed.

[BEUMER GROUP]root@NODE-dkaar1:~#

[BEUMER GROUP]root@NODE-dkaar1:~# ssh ndmpbackup@10.64.9.180

ndmpbackup@10.64.9.180's password:

SP STOR02-DKAAR1-01> system console

Type Ctrl-D to exit.

NODE-DKAAR1::*>



I’ll start a NetApp Case on this…



/Heino





Fra: tmac <tmacmd@gmail.com>
Dato: tirsdag, 18. maj 2021 kl. 15.29
Til: Alexander Griesser <AGriesser@anexia-it.com>
Cc: Heino Walther <hw@beardmann.dk>, toasters@teaparty.net <toasters@teaparty.net>
Emne: Re: [EXTERNAL] SV: Ndmpcopy times out...

So based on your testing, the timeout does only affect the actual cluster login. It does not propagate to the SP.

Autologout: System Console being disconnected due to inactivity



This is the interaction of the SP and ONTAP. Remember, you do the "system console" to get to the "serial" access of the node.

That is what is timing out. If a command is running via system console it should continue to run.

If you are getting that message (above), sounds like a bug to me.

You could try to modify the SSH setting to enable the "keepalive" bit that is supposed to send something benign every minute??





--tmac



Tim McCarthy, Principal Consultant

Proud Member of the https://twitter.com/NetAppATeam"]#NetAppATeam

I Blog at https://tmacsrack.wordpress.com/"]TMACsRack







On Tue, May 18, 2021 at 8:59 AM Alexander Griesser <AGriesser@anexia-it.com> wrote:


Hey Heino,



For testing purposes, I did set the system timeout to 1 minute here:



$ time ssh admin@1.1.1.1

CLUSTER::> (Login timeout will occur in 30 seconds)

CLUSTER::> (Login timeout will occur in 20 seconds)

CLUSTER::> (Login timeout will occur in 10 seconds)

CLUSTER::>

Exiting due to timeout

Connection to 1.1.1.1 closed.

real 1m0.485s

user 0m0.016s

sys 0m0.000s

-> On an interactive shell, the connection closes exactly after 1 minute.



Next try, same timeout setting, but started a `sleep 120` in the interactive session:



$ time ssh admin@1.1.1.1

CLUSTER::> sleep 120

CLUSTER::> (Login timeout will occur in 30 seconds)

CLUSTER::> (Login timeout will occur in 20 seconds)

CLUSTER::> (Login timeout will occur in 10 seconds)

CLUSTER::>

Exiting due to timeout

Connection to 1.1.1.1 closed.

real 3m1.638s

user 0m0.012s

sys 0m0.004s

-> 3 minutes, 2 for the sleep, 1 for the timeout.



When I login to node shell using SSH, the timeout does not count, obviously.

I did manually exit it then, since it did not kick me out – unless the nodeshell has a separate timeout?



$ time ssh admin@1.1.1.1

CLUSTER::> node run -node node1

Type 'exit' or 'Ctrl-D' to return to the CLI

Node1>

Node1> exit

logout

CLUSTER::> exit

Goodbye

Connection to 1.1.1.1 closed.

real 7m21.102s

user 0m0.016s

sys 0m0.004s



Depending on how exactly you ran the command, it might either be one of the timeouts on the filer, or maybe also just a timeout of the TCP connection which might be dropped due to inactivity on a firewall or the like?



Best,



Alexander Griesser

Head of Systems Operations



ANEXIA Internetdienstleistungs GmbH



E-Mail: AGriesser@anexia-it.com

Web: http://www.anexia-it.com/"]http://www.anexia-it.com



Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt

Geschäftsführer: Alexander Windbichler

Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601



Von: Toasters <toasters-bounces@teaparty.net> Im Auftrag von Heino Walther
Gesendet: Dienstag, 18. Mai 2021 14:15
An: toasters@teaparty.net
Betreff: [EXTERNAL] SV: Ndmpcopy times out...



ACHTUNG: Diese E-Mail stammt von einem externen Absender. Bitte vermeide es, Anhänge oder externe Links zu öffnen.



Btw. Found this article describing the process: https://kb.netapp.com/Advice_and_Troubleshooting/Data_Protection_and_Security/NDMP/Ndmpcopy_run_via_SSH_consistently_aborts_after_a_fixed_amount_of_time"]https://kb.netapp.com/Advice_and_Troubleshooting/Data_Protection_and_Security/NDMP/Ndmpcopy_run_via_SSH_consistently_aborts_after_a_fixed_amount_of_time



Here is the “solution” as described in the article… the problem is that once I get the “disconnect” it does actually disconnect… maybe not from the Service-Processor, but it does disconnect the “node-shell” and the ndmpcopy process as a result…

I cannot find any timeout options in the service-processor options…. So not sure if I’m doing something wrong? I would think I am doing exactly as described below…

  • Avoid SSH-related timeouts by running ndmpcopy from the console.
  • To run ndmpcopy (or any command) from the console:


1) First, find the IP of the service processor (SP) by running:

::> system service-processor show

2) After the IP of the SP is known, log in to the SP.

3) From the SP prompt, run system console to access the console.

4) Once at the system console prompt, re-run the ndmpcopy command from the console.



NOTE: It is possible the connection to the system console will time out. Unlike a SSH session, any process started from system console will continue to run in the background.
  • start ndmpcopy from the clustershell, via node run.
  • DO NOT start ndmpcopy directly from nodeshell




The command I then run as point 4 is: node run -node node1 -command “ndmpcopy…..” and then wait…



So I’m at a loss here …



/Heino







Fra: Toasters <toasters-bounces@teaparty.net> på vegne af Heino Walther <hw@beardmann.dk>
Dato: tirsdag, 18. maj 2021 kl. 13.59
Til: toasters@teaparty.net <toasters@teaparty.net>
Emne: Ndmpcopy times out...

Hi guys



I have to migrate a large folder form one volume to another on the same system.

We are talking ONTAP 9.something, so the ndmpcopy is not a part of the cDot commandset, so the node shell have to be used…

The process runs and it starts to copy etc.. but after x-minutes the connection is terminated due to inactivity…

I have now tried to login to the service-processor, then “system console”, and then “node run -node node1 -command “ndmpcopy ….” And once again it starts, but is then terminated as my connection as shown here:



Ndmpcopy: http://10.64.9.142"]10.64.9.142: Log: DUMP: dumping (Pass IV) [regular files]

Ndmpcopy: http://10.64.9.142"]10.64.9.142: Log: RESTORE: Tue May 18 13:08:27 2021: Creating files and directories.

Ndmpcopy: http://10.64.9.142"]10.64.9.142: Log: RESTORE: Tue May 18 13:10:33 2021 : We have processed 298105 files and directories.

Ndmpcopy: http://10.64.9.142"]10.64.9.142: Log: RESTORE: Tue May 18 13:15:33 2021 : We have processed 508611 files and directories.

Ndmpcopy: http://10.64.9.142"]10.64.9.142: Log: RESTORE: Tue May 18 13:20:33 2021 : We have processed 693207 files and directories.

Ndmpcopy: http://10.64.9.142"]10.64.9.142: Log: RESTORE: Tue May 18 13:25:33 2021 : We have processed 860486 files and directories.

Autologout: System Console being disconnected due to inactivity



Any good suggestions are very welcome &#128522;



/Heino





_______________________________________________
Toasters mailing list
Toasters@teaparty.net
https://www.teaparty.net/mailman/listinfo/toasters"]https://www.teaparty.net/mailman/listinfo/toasters
Re: SV: [EXTERNAL] SV: Ndmpcopy times out... [ In reply to ]
On 18.05, Heino Walther wrote:
> ...
> Autologout: System Console being disconnected due to inactivity
> SP NODE-01*> Autologout : Session being disconnected due to inactivity

So it doesn't seem like an Ssh protocol problem, more like ONTAP is
deliberately terminating your session due to apparent lack of use.

Here's my random thought, use ssh with the "-n" flag to avoid ONTAP
allocating a terminal. See man ssh.

Might help :-)

Yours,
Robb.


_______________________________________________
Toasters mailing list
Toasters@teaparty.net
https://www.teaparty.net/mailman/listinfo/toasters
Re: SV: [EXTERNAL] SV: Ndmpcopy times out... [ In reply to ]
it is not ONTAP...it is the system processor kicking him off...according to
the message he keeps getting
--tmac

*Tim McCarthy, **Principal Consultant*

*Proud Member of the #NetAppATeam <https://twitter.com/NetAppATeam>*

*I Blog at TMACsRack <https://tmacsrack.wordpress.com/>*



On Tue, May 18, 2021 at 2:50 PM Walfherder <toasters@tson.de> wrote:

>
> On 18.05, Heino Walther wrote:
> > ...
> > Autologout: System Console being disconnected due to inactivity
> > SP NODE-01*> Autologout : Session being disconnected due to inactivity
>
> So it doesn't seem like an Ssh protocol problem, more like ONTAP is
> deliberately terminating your session due to apparent lack of use.
>
> Here's my random thought, use ssh with the "-n" flag to avoid ONTAP
> allocating a terminal. See man ssh.
>
> Might help :-)
>
> Yours,
> Robb.
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> https://www.teaparty.net/mailman/listinfo/toasters
>
SV: SV: [EXTERNAL] SV: Ndmpcopy times out... [ In reply to ]
The ssh command with the ServerAliveInterval set did the trick.. that is, it is still running from inside a “screen” session on a Linux box…



So thanks tmac &#128522;



Now I only have to face a new challenge where the ndmpcopied data is infact stored as “non-deduped” data and there is not space on the aggregate for all of it…

Yet it is so slow that I think with a little dedupe process along the way we will be fine…. (8TB NL-SAS disks aren’t the fastest and you do not have “shared dedupe” between volumes on rotating disks) &#128522;



/Heino



Fra: tmac <tmacmd@gmail.com>
Dato: tirsdag, 18. maj 2021 kl. 21.01
Til: Toasters <toasters@teaparty.net>, Heino Walther <hw@beardmann.dk>
Emne: Re: SV: [EXTERNAL] SV: Ndmpcopy times out...

it is not ONTAP...it is the system processor kicking him off...according to the message he keeps getting

--tmac



Tim McCarthy, Principal Consultant

Proud Member of the https://twitter.com/NetAppATeam"]#NetAppATeam

I Blog at https://tmacsrack.wordpress.com/"]TMACsRack







On Tue, May 18, 2021 at 2:50 PM Walfherder <toasters@tson.de> wrote:



On 18.05, Heino Walther wrote:
> ...
> Autologout: System Console being disconnected due to inactivity
> SP NODE-01*> Autologout : Session being disconnected due to inactivity

So it doesn't seem like an Ssh protocol problem, more like ONTAP is
deliberately terminating your session due to apparent lack of use.

Here's my random thought, use ssh with the "-n" flag to avoid ONTAP
allocating a terminal. See man ssh.

Might help :-)

Yours,
Robb.


_______________________________________________
Toasters mailing list
Toasters@teaparty.net
https://www.teaparty.net/mailman/listinfo/toasters"]https://www.teaparty.net/mailman/listinfo/toasters