Mailing List Archive

SSH hang question
Very rarely, but it has repeated, we see openssh on the client side
hanging. On the server side there is no indication of connection in the
logs. These are always scripted remote commands that do not have user
interaction when we find it. This seems to be happening only in vm
environments but I could be wrong. It seems surprising to me that there
would not be timeouts and retries on the protocol, but I'm curious if this
is expected behavior in some configuration or if not, what should I try to
gather when it happens again? Or maybe there is some setting to make the
connection reliable.

We have seen this maybe 2 or 3 times over a couple of years so it is not
frequent. Happened yesterday during a complex distributed installation
process between rhel 7 vm's in the same data center lan.

Any advice appreciated.

steve
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: SSH hang question [ In reply to ]
On Sun, 10 Nov 2019 at 05:10, Steve McAfee <smcafee.social@gmail.com> wrote:
> Very rarely, but it has repeated, we see openssh on the client side
> hanging. On the server side there is no indication of connection in the
> logs. These are always scripted remote commands that do not have user
> interaction when we find it. This seems to be happening only in vm
> environments but I could be wrong.

What's the VM platform and underlying network technology? At least one (VMWare
Fusion) is known to have problems, although yours doesn't sound
exactly like this:
https://marc.info/?l=openssh-unix-dev&m=153535111501535&w=2

> It seems surprising to me that there
> would not be timeouts and retries on the protocol,

SSH is built on top of TCP, which provides the reliable bytestream and
thus implements the
timeouts and retries, so if you can find the problematic connection in
the output of netstat
you mayget some clues about what's going on.

One of the failure modes that can behave as you describe is the infamous TCP MTU
blackhole, wherein a large packet gets fragmented, the 2nd fragment
gets dropped for
some reason and the IP packet times out during reassembly. TCP
retransmits the packet,
which again gets fragmented and the cycle repeats until TCP eventually
times out the
connection. PPPoE and 802.1Q vlans are common culprits because they reduce the
MTUs just a little bit.

I'd suggest checking:
- netstat for the failing connections looking for increasing SendQ values,
- netstat -s on problematic machines looking for atypical counter values
- MTUs on the hosts and everything in between them.

If it's none of these things then it's probably time to break out tcpdump.

> Or maybe there is some setting to make the connection reliable.

The ServerAliveInterval and ServerAliveCount settings can detect ths
class of failure
I described above, but in those cases the root cause is a broken
network and the network
is what needs to be fixed.

--
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA (new)
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: SSH hang question [ In reply to ]
Hi,

On Sun, Nov 10, 2019 at 06:58:47PM +1100, Darren Tucker wrote:
> One of the failure modes that can behave as you describe is the infamous TCP MTU
> blackhole, wherein a large packet gets fragmented, the 2nd fragment
> gets dropped for
> some reason and the IP packet times out during reassembly.

I've run into mobile networks recently that drop packets if you change
the QoS flags. So SSH negotiation works fine, afterwards the client
changes QoS bits to "interactive", and that seems to confuse their
nat gateway... "ssh $machine $command" worked, so I changed my .ssh/config
to

host $myjumphost
# gert, 19.10.19, "wie non-interactive session" - DTAG hakt grad mal
ipqos cs1

... and it went back to working.

Might or might not be the case here.

gert

--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany gert@greenie.muc.de
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: SSH hang question [ In reply to ]
ISTR that I've read about some bugs with packet traversal through VM NAT
setups.
It was something about the RST packet triggering NAT destruction but not
being relayed further all the time (some race condition?!), that sounds
as if it fit the bill here.


https://forums.virtualbox.org/viewtopic.php?f=1&t=20579 comes close but
is a bit old...


You could try to use the TCP keepalive settings in SSH, and/or to reduce
the MTU size (as mentioned in another mail here already).


Else a few more setup details might help...
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: SSH hang question [ In reply to ]
Thanks everyone for the feedback on the OpenSSH hang. I'm going to ask the
customer to review mtu in their configuration first to see if they can find
a problem. Also, if their host OS is windows there are some things
suggested to check, but I don't think it is windows. If it ever happens
again I'll try to investigate more as suggested by Darren before we
interrupt it.

steve

On Sun, Nov 10, 2019 at 5:29 AM Philipp Marek <philipp@marek.priv.at> wrote:

> ISTR that I've read about some bugs with packet traversal through VM NAT
> setups.
> It was something about the RST packet triggering NAT destruction but not
> being relayed further all the time (some race condition?!), that sounds
> as if it fit the bill here.
>
>
> https://forums.virtualbox.org/viewtopic.php?f=1&t=20579 comes close but
> is a bit old...
>
>
> You could try to use the TCP keepalive settings in SSH, and/or to reduce
> the MTU size (as mentioned in another mail here already).
>
>
> Else a few more setup details might help...
>
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: SSH hang question [ In reply to ]
It was reported on VMware not VirtualBox but perhaps the NATs are similarly
broken - it might be worth trying to revert the change from old style QoS
to diffserv by setting "IPQoS lowdelay throughput".

--
Sent from a phone, apologies for poor formatting.

On 15 November 2019 14:29:39 Steve McAfee <smcafee.social@gmail.com> wrote:

> Thanks everyone for the feedback on the OpenSSH hang. I'm going to ask the
> customer to review mtu in their configuration first to see if they can find
> a problem. Also, if their host OS is windows there are some things
> suggested to check, but I don't think it is windows. If it ever happens
> again I'll try to investigate more as suggested by Darren before we
> interrupt it.
>
> steve
>
> On Sun, Nov 10, 2019 at 5:29 AM Philipp Marek <philipp@marek.priv.at> wrote:
>
>> ISTR that I've read about some bugs with packet traversal through VM NAT
>> setups.
>> It was something about the RST packet triggering NAT destruction but not
>> being relayed further all the time (some race condition?!), that sounds
>> as if it fit the bill here.
>>
>>
>> https://forums.virtualbox.org/viewtopic.php?f=1&t=20579 comes close but
>> is a bit old...
>>
>>
>> You could try to use the TCP keepalive settings in SSH, and/or to reduce
>> the MTU size (as mentioned in another mail here already).
>>
>>
>> Else a few more setup details might help...
>>
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev@mindrot.org
> https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev



_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev