Mailing List Archive

AW: [EXTERNAL] SV: Ndmpcopy times out...
Hey Heino,

For testing purposes, I did set the system timeout to 1 minute here:

$ time ssh admin@1.1.1.1
CLUSTER::> (Login timeout will occur in 30 seconds)
CLUSTER::> (Login timeout will occur in 20 seconds)
CLUSTER::> (Login timeout will occur in 10 seconds)
CLUSTER::>
Exiting due to timeout
Connection to 1.1.1.1 closed.
real 1m0.485s
user 0m0.016s
sys 0m0.000s
-> On an interactive shell, the connection closes exactly after 1 minute.

Next try, same timeout setting, but started a `sleep 120` in the interactive session:

$ time ssh admin@1.1.1.1
CLUSTER::> sleep 120
CLUSTER::> (Login timeout will occur in 30 seconds)
CLUSTER::> (Login timeout will occur in 20 seconds)
CLUSTER::> (Login timeout will occur in 10 seconds)
CLUSTER::>
Exiting due to timeout
Connection to 1.1.1.1 closed.
real 3m1.638s
user 0m0.012s
sys 0m0.004s
-> 3 minutes, 2 for the sleep, 1 for the timeout.

When I login to node shell using SSH, the timeout does not count, obviously.
I did manually exit it then, since it did not kick me out – unless the nodeshell has a separate timeout?

$ time ssh admin@1.1.1.1
CLUSTER::> node run -node node1
Type 'exit' or 'Ctrl-D' to return to the CLI
Node1>
Node1> exit
logout
CLUSTER::> exit
Goodbye
Connection to 1.1.1.1 closed.
real 7m21.102s
user 0m0.016s
sys 0m0.004s

Depending on how exactly you ran the command, it might either be one of the timeouts on the filer, or maybe also just a timeout of the TCP connection which might be dropped due to inactivity on a firewall or the like?

Best,

Alexander Griesser
Head of Systems Operations

ANEXIA Internetdienstleistungs GmbH

E-Mail: AGriesser@anexia-it.com<mailto:AGriesser@anexia-it.com>
Web: http://www.anexia-it.com<http://www.anexia-it.com/>

Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt
Geschäftsführer: Alexander Windbichler
Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT U63216601

Von: Toasters <toasters-bounces@teaparty.net> Im Auftrag von Heino Walther
Gesendet: Dienstag, 18. Mai 2021 14:15
An: toasters@teaparty.net
Betreff: [EXTERNAL] SV: Ndmpcopy times out...

ACHTUNG: Diese E-Mail stammt von einem externen Absender. Bitte vermeide es, Anhänge oder externe Links zu öffnen.

Btw. Found this article describing the process: https://kb.netapp.com/Advice_and_Troubleshooting/Data_Protection_and_Security/NDMP/Ndmpcopy_run_via_SSH_consistently_aborts_after_a_fixed_amount_of_time

Here is the “solution” as described in the article… the problem is that once I get the “disconnect” it does actually disconnect… maybe not from the Service-Processor, but it does disconnect the “node-shell” and the ndmpcopy process as a result…
I cannot find any timeout options in the service-processor options…. So not sure if I’m doing something wrong? I would think I am doing exactly as described below…


* Avoid SSH-related timeouts by running ndmpcopy from the console.
* To run ndmpcopy (or any command) from the console:
1) First, find the IP of the service processor (SP) by running:
::> system service-processor show
2) After the IP of the SP is known, log in to the SP.
3) From the SP prompt, run system console to access the console.
4) Once at the system console prompt, re-run the ndmpcopy command from the console.

NOTE: It is possible the connection to the system console will time out. Unlike a SSH session, any process started from system console will continue to run in the background.

* start ndmpcopy from the clustershell, via node run.
* DO NOT start ndmpcopy directly from nodeshell

The command I then run as point 4 is: node run -node node1 -command “ndmpcopy…..” and then wait…

So I’m at a loss here …

/Heino



Fra: Toasters <toasters-bounces@teaparty.net<mailto:toasters-bounces@teaparty.net>> på vegne af Heino Walther <hw@beardmann.dk<mailto:hw@beardmann.dk>>
Dato: tirsdag, 18. maj 2021 kl. 13.59
Til: toasters@teaparty.net<mailto:toasters@teaparty.net> <toasters@teaparty.net<mailto:toasters@teaparty.net>>
Emne: Ndmpcopy times out...
Hi guys

I have to migrate a large folder form one volume to another on the same system.
We are talking ONTAP 9.something, so the ndmpcopy is not a part of the cDot commandset, so the node shell have to be used…
The process runs and it starts to copy etc.. but after x-minutes the connection is terminated due to inactivity…
I have now tried to login to the service-processor, then “system console”, and then “node run -node node1 -command “ndmpcopy ….” And once again it starts, but is then terminated as my connection as shown here:

Ndmpcopy: 10.64.9.142: Log: DUMP: dumping (Pass IV) [regular files]
Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:08:27 2021: Creating files and directories.
Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:10:33 2021 : We have processed 298105 files and directories.
Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:15:33 2021 : We have processed 508611 files and directories.
Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:20:33 2021 : We have processed 693207 files and directories.
Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:25:33 2021 : We have processed 860486 files and directories.
Autologout: System Console being disconnected due to inactivity

Any good suggestions are very welcome ????

/Heino
Re: [EXTERNAL] SV: Ndmpcopy times out... [ In reply to ]
So based on your testing, the timeout does only affect the actual cluster
login. It does not propagate to the SP.

Autologout: System Console being disconnected due to inactivity


This is the interaction of the SP and ONTAP. Remember, you do the "system
console" to get to the "serial" access of the node.
That is what is timing out. If a command is running via system console it
should continue to run.
If you are getting that message (above), sounds like a bug to me.
You could try to modify the SSH setting to enable the "keepalive" bit that
is supposed to send something benign every minute??


--tmac

*Tim McCarthy, **Principal Consultant*

*Proud Member of the #NetAppATeam <https://twitter.com/NetAppATeam>*

*I Blog at TMACsRack <https://tmacsrack.wordpress.com/>*



On Tue, May 18, 2021 at 8:59 AM Alexander Griesser <AGriesser@anexia-it.com>
wrote:

> Hey Heino,
>
>
>
> For testing purposes, I did set the system timeout to 1 minute here:
>
>
>
> $ time ssh admin@1.1.1.1
>
> CLUSTER::> (Login timeout will occur in 30 seconds)
>
> CLUSTER::> (Login timeout will occur in 20 seconds)
>
> CLUSTER::> (Login timeout will occur in 10 seconds)
>
> CLUSTER::>
>
> Exiting due to timeout
>
> Connection to 1.1.1.1 closed.
>
> real 1m0.485s
>
> user 0m0.016s
>
> sys 0m0.000s
>
> -> On an interactive shell, the connection closes exactly after 1 minute.
>
>
>
> Next try, same timeout setting, but started a `sleep 120` in the
> interactive session:
>
>
>
> $ time ssh admin@1.1.1.1
>
> CLUSTER::> sleep 120
>
> CLUSTER::> (Login timeout will occur in 30 seconds)
>
> CLUSTER::> (Login timeout will occur in 20 seconds)
>
> CLUSTER::> (Login timeout will occur in 10 seconds)
>
> CLUSTER::>
>
> Exiting due to timeout
>
> Connection to 1.1.1.1 closed.
>
> real 3m1.638s
>
> user 0m0.012s
>
> sys 0m0.004s
>
> -> 3 minutes, 2 for the sleep, 1 for the timeout.
>
>
>
> When I login to node shell using SSH, the timeout does not count,
> obviously.
>
> I did manually exit it then, since it did not kick me out – unless the
> nodeshell has a separate timeout?
>
>
>
> $ time ssh admin@1.1.1.1
>
> CLUSTER::> node run -node node1
>
> Type 'exit' or 'Ctrl-D' to return to the CLI
>
> Node1>
>
> Node1> exit
>
> logout
>
> CLUSTER::> exit
>
> Goodbye
>
> Connection to 1.1.1.1 closed.
>
> real 7m21.102s
>
> user 0m0.016s
>
> sys 0m0.004s
>
>
>
> Depending on how exactly you ran the command, it might either be one of
> the timeouts on the filer, or maybe also just a timeout of the TCP
> connection which might be dropped due to inactivity on a firewall or the
> like?
>
>
>
> Best,
>
>
>
> *Alexander Griesser*
>
> Head of Systems Operations
>
>
>
> ANEXIA Internetdienstleistungs GmbH
>
>
>
> E-Mail: AGriesser@anexia-it.com
>
> Web: http://www.anexia-it.com
>
>
>
> Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt
>
> Geschäftsführer: Alexander Windbichler
>
> Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT
> U63216601
>
>
>
> *Von:* Toasters <toasters-bounces@teaparty.net> *Im Auftrag von *Heino
> Walther
> *Gesendet:* Dienstag, 18. Mai 2021 14:15
> *An:* toasters@teaparty.net
> *Betreff:* [EXTERNAL] SV: Ndmpcopy times out...
>
>
>
> ACHTUNG: Diese E-Mail stammt von einem externen Absender. Bitte vermeide
> es, Anhänge oder externe Links zu öffnen.
>
>
>
> Btw. Found this article describing the process:
> https://kb.netapp.com/Advice_and_Troubleshooting/Data_Protection_and_Security/NDMP/Ndmpcopy_run_via_SSH_consistently_aborts_after_a_fixed_amount_of_time
>
>
>
> Here is the “solution” as described in the article… the problem is that
> once I get the “disconnect” it does actually disconnect… maybe not from
> the Service-Processor, but it does disconnect the “node-shell” and the
> ndmpcopy process as a result…
>
> I cannot find any timeout options in the service-processor options…. So
> not sure if I’m doing something wrong? I would think I am doing exactly as
> described below…
>
>
>
> - Avoid SSH-related timeouts by running ndmpcopy from the console.
> - To run ndmpcopy (or any command) from the console:
>
> 1) First, find the IP of the service processor (SP) by running:
>
> ::> system service-processor show
>
> 2) After the IP of the SP is known, log in to the SP.
>
> 3) From the SP prompt, run system console to access the console.
>
> 4) Once at the system console prompt, re-run the ndmpcopy command from the
> console.
>
>
>
> *NOTE: *It is possible the connection to the system console will time
> out. Unlike a SSH session, any process started from system console will
> continue to run in the background.
>
> - start ndmpcopy from the clustershell, via node run.
> - *DO NOT *start ndmpcopy directly from nodeshell
>
>
>
> The command I then run as point 4 is: node run -node node1 -command
> “ndmpcopy…..” and then wait…
>
>
>
> So I’m at a loss here …
>
>
>
> /Heino
>
>
>
>
>
>
>
> *Fra: *Toasters <toasters-bounces@teaparty.net> på vegne af Heino Walther
> <hw@beardmann.dk>
> *Dato: *tirsdag, 18. maj 2021 kl. 13.59
> *Til: *toasters@teaparty.net <toasters@teaparty.net>
> *Emne: *Ndmpcopy times out...
>
> Hi guys
>
>
>
> I have to migrate a large folder form one volume to another on the same
> system.
>
> We are talking ONTAP 9.something, so the ndmpcopy is not a part of the
> cDot commandset, so the node shell have to be used…
>
> The process runs and it starts to copy etc.. but after x-minutes the
> connection is terminated due to inactivity…
>
> I have now tried to login to the service-processor, then “system console”,
> and then “node run -node node1 -command “ndmpcopy ….” And once again it
> starts, but is then terminated as my connection as shown here:
>
>
>
> Ndmpcopy: 10.64.9.142: Log: DUMP: dumping (Pass IV) [regular files]
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:08:27 2021: Creating
> files and directories.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:10:33 2021 : We have
> processed 298105 files and directories.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:15:33 2021 : We have
> processed 508611 files and directories.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:20:33 2021 : We have
> processed 693207 files and directories.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:25:33 2021 : We have
> processed 860486 files and directories.
>
> Autologout: System Console being disconnected due to inactivity
>
>
>
> Any good suggestions are very welcome ????
>
>
>
> /Heino
>
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> https://www.teaparty.net/mailman/listinfo/toasters
Re: [EXTERNAL] SV: Ndmpcopy times out... [ In reply to ]
One more to try if you havent already

SSH to ONTAP and run the command:

ssh -o ServerAliveInterval=60 admin@cluster "system node run -node node1
ndmpcopy..."

--tmac

*Tim McCarthy, **Principal Consultant*

*Proud Member of the #NetAppATeam <https://twitter.com/NetAppATeam>*

*I Blog at TMACsRack <https://tmacsrack.wordpress.com/>*



On Tue, May 18, 2021 at 12:06 PM Heino Walther <hw@beardmann.dk> wrote:

> Hi again
>
>
>
> This is the result after setting the timeout to 0…. Doesn’t seem to make
> any difference…
>
> Below the ndmpcopy has been started via the system-processor, system
> console, node run …
>
> I then try to login again to see of the session is still active… but it is
> not..
>
>
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 14:58:56 2021 : We have
> processed 2211434 files and directories.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 15:05:40 2021: Writing
> data to files.
>
> Ndmpcopy: 10.64.9.142: Log: DUMP: Tue May 18 15:05:40 2021 : We have
> written 1026117 KB.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 15:05:40 2021 : We have
> read 1024120 KB from the backup.
>
> Ndmpcopy: 10.64.9.142: Log: DUMP: Tue May 18 15:10:40 2021 : We have
> written 53866876 KB.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 15:10:40 2021 : We have
> read 53864840 KB from the backup.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 15:15:40 2021 : We have
> read 106737975 KB from the backup.
>
> Ndmpcopy: 10.64.9.142: Log: DUMP: Tue May 18 15:15:40 2021 : We have
> written 106740019 KB.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 15:20:40 2021 : We have
> read 159429965 KB from the backup.
>
> Ndmpcopy: 10.64.9.142: Log: DUMP: Tue May 18 15:20:40 2021 : We have
> written 159431993 KB.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 15:25:40 2021 : We have
> read 211327838 KB from the backup.
>
> Ndmpcopy: 10.64.9.142: Log: DUMP: Tue May 18 15:25:40 2021 : We have
> written 211329882 KB.
>
> Autologout: System Console being disconnected due to inactivity
>
> SP NODE-01*> Autologout : Session being disconnected due to inactivity
>
> Connection to 10.64.9.180 closed.
>
> [BEUMER GROUP]root@NODE-dkaar1:~#
>
> [BEUMER GROUP]root@NODE-dkaar1:~# ssh ndmpbackup@10.64.9.180
>
> ndmpbackup@10.64.9.180's password:
>
> SP STOR02-DKAAR1-01> system console
>
> Type Ctrl-D to exit.
>
> NODE-DKAAR1::*>
>
>
>
> I’ll start a NetApp Case on this…
>
>
>
> /Heino
>
>
>
>
>
> *Fra: *tmac <tmacmd@gmail.com>
> *Dato: *tirsdag, 18. maj 2021 kl. 15.29
> *Til: *Alexander Griesser <AGriesser@anexia-it.com>
> *Cc: *Heino Walther <hw@beardmann.dk>, toasters@teaparty.net <
> toasters@teaparty.net>
> *Emne: *Re: [EXTERNAL] SV: Ndmpcopy times out...
>
> So based on your testing, the timeout does only affect the actual cluster
> login. It does not propagate to the SP.
>
> Autologout: System Console being disconnected due to inactivity
>
>
>
> This is the interaction of the SP and ONTAP. Remember, you do the "system
> console" to get to the "serial" access of the node.
>
> That is what is timing out. If a command is running via system console it
> should continue to run.
>
> If you are getting that message (above), sounds like a bug to me.
>
> You could try to modify the SSH setting to enable the "keepalive" bit that
> is supposed to send something benign every minute??
>
>
>
>
>
> --tmac
>
>
>
> *Tim McCarthy, **Principal Consultant*
>
> *Proud Member of the #NetAppATeam <https://twitter.com/NetAppATeam>*
>
> *I Blog at **TMACsRack <https://tmacsrack.wordpress.com/>*
>
>
>
>
>
>
>
> On Tue, May 18, 2021 at 8:59 AM Alexander Griesser <
> AGriesser@anexia-it.com> wrote:
>
> Hey Heino,
>
>
>
> For testing purposes, I did set the system timeout to 1 minute here:
>
>
>
> $ time ssh admin@1.1.1.1
>
> CLUSTER::> (Login timeout will occur in 30 seconds)
>
> CLUSTER::> (Login timeout will occur in 20 seconds)
>
> CLUSTER::> (Login timeout will occur in 10 seconds)
>
> CLUSTER::>
>
> Exiting due to timeout
>
> Connection to 1.1.1.1 closed.
>
> real 1m0.485s
>
> user 0m0.016s
>
> sys 0m0.000s
>
> -> On an interactive shell, the connection closes exactly after 1 minute.
>
>
>
> Next try, same timeout setting, but started a `sleep 120` in the
> interactive session:
>
>
>
> $ time ssh admin@1.1.1.1
>
> CLUSTER::> sleep 120
>
> CLUSTER::> (Login timeout will occur in 30 seconds)
>
> CLUSTER::> (Login timeout will occur in 20 seconds)
>
> CLUSTER::> (Login timeout will occur in 10 seconds)
>
> CLUSTER::>
>
> Exiting due to timeout
>
> Connection to 1.1.1.1 closed.
>
> real 3m1.638s
>
> user 0m0.012s
>
> sys 0m0.004s
>
> -> 3 minutes, 2 for the sleep, 1 for the timeout.
>
>
>
> When I login to node shell using SSH, the timeout does not count,
> obviously.
>
> I did manually exit it then, since it did not kick me out – unless the
> nodeshell has a separate timeout?
>
>
>
> $ time ssh admin@1.1.1.1
>
> CLUSTER::> node run -node node1
>
> Type 'exit' or 'Ctrl-D' to return to the CLI
>
> Node1>
>
> Node1> exit
>
> logout
>
> CLUSTER::> exit
>
> Goodbye
>
> Connection to 1.1.1.1 closed.
>
> real 7m21.102s
>
> user 0m0.016s
>
> sys 0m0.004s
>
>
>
> Depending on how exactly you ran the command, it might either be one of
> the timeouts on the filer, or maybe also just a timeout of the TCP
> connection which might be dropped due to inactivity on a firewall or the
> like?
>
>
>
> Best,
>
>
>
> *Alexander Griesser*
>
> Head of Systems Operations
>
>
>
> ANEXIA Internetdienstleistungs GmbH
>
>
>
> E-Mail: AGriesser@anexia-it.com
>
> Web: http://www.anexia-it.com
>
>
>
> Anschrift Hauptsitz Klagenfurt: Feldkirchnerstraße 140, 9020 Klagenfurt
>
> Geschäftsführer: Alexander Windbichler
>
> Firmenbuch: FN 289918a | Gerichtsstand: Klagenfurt | UID-Nummer: AT
> U63216601
>
>
>
> *Von:* Toasters <toasters-bounces@teaparty.net> *Im Auftrag von *Heino
> Walther
> *Gesendet:* Dienstag, 18. Mai 2021 14:15
> *An:* toasters@teaparty.net
> *Betreff:* [EXTERNAL] SV: Ndmpcopy times out...
>
>
>
> ACHTUNG: Diese E-Mail stammt von einem externen Absender. Bitte vermeide
> es, Anhänge oder externe Links zu öffnen.
>
>
>
> Btw. Found this article describing the process:
> https://kb.netapp.com/Advice_and_Troubleshooting/Data_Protection_and_Security/NDMP/Ndmpcopy_run_via_SSH_consistently_aborts_after_a_fixed_amount_of_time
>
>
>
> Here is the “solution” as described in the article… the problem is that
> once I get the “disconnect” it does actually disconnect… maybe not from
> the Service-Processor, but it does disconnect the “node-shell” and the
> ndmpcopy process as a result…
>
> I cannot find any timeout options in the service-processor options…. So
> not sure if I’m doing something wrong? I would think I am doing exactly as
> described below…
>
>
>
> - Avoid SSH-related timeouts by running ndmpcopy from the console.
> - To run ndmpcopy (or any command) from the console:
>
> 1) First, find the IP of the service processor (SP) by running:
>
> ::> system service-processor show
>
> 2) After the IP of the SP is known, log in to the SP.
>
> 3) From the SP prompt, run system console to access the console.
>
> 4) Once at the system console prompt, re-run the ndmpcopy command from the
> console.
>
>
>
> *NOTE: *It is possible the connection to the system console will time
> out. Unlike a SSH session, any process started from system console will
> continue to run in the background.
>
> - start ndmpcopy from the clustershell, via node run.
> - *DO NOT *start ndmpcopy directly from nodeshell
>
>
>
> The command I then run as point 4 is: node run -node node1 -command
> “ndmpcopy…..” and then wait…
>
>
>
> So I’m at a loss here …
>
>
>
> /Heino
>
>
>
>
>
>
>
> *Fra: *Toasters <toasters-bounces@teaparty.net> på vegne af Heino Walther
> <hw@beardmann.dk>
> *Dato: *tirsdag, 18. maj 2021 kl. 13.59
> *Til: *toasters@teaparty.net <toasters@teaparty.net>
> *Emne: *Ndmpcopy times out...
>
> Hi guys
>
>
>
> I have to migrate a large folder form one volume to another on the same
> system.
>
> We are talking ONTAP 9.something, so the ndmpcopy is not a part of the
> cDot commandset, so the node shell have to be used…
>
> The process runs and it starts to copy etc.. but after x-minutes the
> connection is terminated due to inactivity…
>
> I have now tried to login to the service-processor, then “system console”,
> and then “node run -node node1 -command “ndmpcopy ….” And once again it
> starts, but is then terminated as my connection as shown here:
>
>
>
> Ndmpcopy: 10.64.9.142: Log: DUMP: dumping (Pass IV) [regular files]
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:08:27 2021: Creating
> files and directories.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:10:33 2021 : We have
> processed 298105 files and directories.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:15:33 2021 : We have
> processed 508611 files and directories.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:20:33 2021 : We have
> processed 693207 files and directories.
>
> Ndmpcopy: 10.64.9.142: Log: RESTORE: Tue May 18 13:25:33 2021 : We have
> processed 860486 files and directories.
>
> Autologout: System Console being disconnected due to inactivity
>
>
>
> Any good suggestions are very welcome ????
>
>
>
> /Heino
>
>
>
>
>
> _______________________________________________
> Toasters mailing list
> Toasters@teaparty.net
> https://www.teaparty.net/mailman/listinfo/toasters
>
>