Mailing List Archive

OS X poll breakage (Was: Please help test recent changes)
On Fri, Jan 07, 2022 at 09:52:09AM +1100, Damien Miller wrote:
> 1. Conversion of the ssh and sshd mainloop from select() to poll()
>
> This should be entirely invisible to users, so any behaviour change
> is a bug. If you see something and want to help debug it further,
> uncomment the DEBUG_CHANNEL_POLL #define in channels.c for helps of
> extra debug logging.

This change breaks (at least) the exit-status test on Mac OS X. I
haven't had a chance to investigate yet,

$ git bisect run sh -c "autoreconf && ./configure --without-openssl &&
make t-exec LTESTS=exit-status"
[... lots ...]
17877bc81db3846e6e7d4cfb124d966bb9c9296b is the first bad commit
commit 17877bc81db3846e6e7d4cfb124d966bb9c9296b
Author: djm@openbsd.org <djm@openbsd.org>
Date: Thu Jan 6 21:48:38 2022 +0000

upstream: convert ssh, sshd mainloops from select() to poll();

feedback & ok deraadt@ and markus@ has been in snaps for a few months

OpenBSD-Commit-ID: a77e16a667d5b194dcdb3b76308b8bba7fa7239c

$ uname -a
Darwin osx-highsierra 17.7.0 Darwin Kernel Version 17.7.0: Fri Oct 30
13:34:27 PDT 2020; root:xnu-4570.71.82.8~1/RELEASE_X86_64 x86_64

--
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA (new)
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: OS X poll breakage (Was: Please help test recent changes) [ In reply to ]
On Mon, 10 Jan 2022, Darren Tucker wrote:

> On Fri, Jan 07, 2022 at 09:52:09AM +1100, Damien Miller wrote:
> > 1. Conversion of the ssh and sshd mainloop from select() to poll()
> >
> > This should be entirely invisible to users, so any behaviour change
> > is a bug. If you see something and want to help debug it further,
> > uncomment the DEBUG_CHANNEL_POLL #define in channels.c for helps of
> > extra debug logging.
>
> This change breaks (at least) the exit-status test on Mac OS X. I
> haven't had a chance to investigate yet,
>
> $ git bisect run sh -c "autoreconf && ./configure --without-openssl &&
> make t-exec LTESTS=exit-status"
> [... lots ...]
> 17877bc81db3846e6e7d4cfb124d966bb9c9296b is the first bad commit
> commit 17877bc81db3846e6e7d4cfb124d966bb9c9296b
> Author: djm@openbsd.org <djm@openbsd.org>
> Date: Thu Jan 6 21:48:38 2022 +0000
>
> upstream: convert ssh, sshd mainloops from select() to poll();
>
> feedback & ok deraadt@ and markus@ has been in snaps for a few months
>
> OpenBSD-Commit-ID: a77e16a667d5b194dcdb3b76308b8bba7fa7239c
>
> $ uname -a
> Darwin osx-highsierra 17.7.0 Darwin Kernel Version 17.7.0: Fri Oct 30
> 13:34:27 PDT 2020; root:xnu-4570.71.82.8~1/RELEASE_X86_64 x86_64

Here's the client side log of the failure. It comes from the
"same with early close of stdout/err" section of the test, but I can't
actually see anything get closed...

debug3: receive packet: type 91
debug2: channel_input_open_confirmation: channel 0: callback start
debug2: client_session2_setup: id 0
debug1: Sending command: exec sh -c 'sleep 2; exec > /dev/null 2>&1; sleep 3; exit 0'
debug2: channel 0: request exec confirm 1
debug3: send packet: type 98
debug2: channel_input_open_confirmation: channel 0: callback done
debug2: channel 0: open confirm rwindow 0 rmax 32768
debug2: channel 0: rcvd adjust 2097152
debug3: receive packet: type 99
debug2: channel_input_status_confirm: type 99 id 0
debug2: exec request accepted on channel 0
channel 0: invalid rfd pollfd[2].fd 4 r4 w7 e8 s-1

and (separate run) with DEBUG_CHANNEL_POLL set:

debug2: channel_input_status_confirm: type 99 id 0
debug2: exec request accepted on channel 0
debug3: dump_channel_poll: channel 0: rfd r4 w7 e8 s-1 pfd[2].fd=4 want 0x01 ev 0x01 ready 0x00 rev 0x00
debug3: dump_channel_poll: channel 0: rfd r4 w7 e8 s-1 pfd[3].fd=7 want 0x01 ev 0x00 ready 0x00 rev 0x00
debug3: dump_channel_poll: channel 0: rfd r4 w7 e8 s-1 pfd[4].fd=8 want 0x01 ev 0x00 ready 0x00 rev 0x00
debug1: channel_after_poll: pfd[2].fd 4 rev 0x0020
debug3: dump_channel_poll: channel 0: rfd r4 w7 e8 s-1 pfd[2].fd=4 want 0x01 ev 0x01 ready 0x00 rev 0x20
channel 0: invalid rfd pollfd[2].fd 4 r4 w7 e8 s-1
FAIL: exit code (with sleep) mismatch for: 255 != 0

It looks like it fails with HAVE_POLL set to 0 too.

Still looking...

-d
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: OS X poll breakage (Was: Please help test recent changes) [ In reply to ]
On Tue, 11 Jan 2022, Damien Miller wrote:

> Here's the client side log of the failure. It comes from the
> "same with early close of stdout/err" section of the test, but I can't
> actually see anything get closed...
>
> debug3: receive packet: type 91
> debug2: channel_input_open_confirmation: channel 0: callback start
> debug2: client_session2_setup: id 0
> debug1: Sending command: exec sh -c 'sleep 2; exec > /dev/null 2>&1; sleep 3; exit 0'
> debug2: channel 0: request exec confirm 1
> debug3: send packet: type 98
> debug2: channel_input_open_confirmation: channel 0: callback done
> debug2: channel 0: open confirm rwindow 0 rmax 32768
> debug2: channel 0: rcvd adjust 2097152
> debug3: receive packet: type 99
> debug2: channel_input_status_confirm: type 99 id 0
> debug2: exec request accepted on channel 0
> channel 0: invalid rfd pollfd[2].fd 4 r4 w7 e8 s-1
>
> and (separate run) with DEBUG_CHANNEL_POLL set:
>
> debug2: channel_input_status_confirm: type 99 id 0
> debug2: exec request accepted on channel 0
> debug3: dump_channel_poll: channel 0: rfd r4 w7 e8 s-1 pfd[2].fd=4 want 0x01 ev 0x01 ready 0x00 rev 0x00
> debug3: dump_channel_poll: channel 0: rfd r4 w7 e8 s-1 pfd[3].fd=7 want 0x01 ev 0x00 ready 0x00 rev 0x00
> debug3: dump_channel_poll: channel 0: rfd r4 w7 e8 s-1 pfd[4].fd=8 want 0x01 ev 0x00 ready 0x00 rev 0x00
> debug1: channel_after_poll: pfd[2].fd 4 rev 0x0020
> debug3: dump_channel_poll: channel 0: rfd r4 w7 e8 s-1 pfd[2].fd=4 want 0x01 ev 0x01 ready 0x00 rev 0x20
> channel 0: invalid rfd pollfd[2].fd 4 r4 w7 e8 s-1
> FAIL: exit code (with sleep) mismatch for: 255 != 0
>
> It looks like it fails with HAVE_POLL set to 0 too.
>
> Still looking...

Wow, it looks like Darwin's poll(2) is completely broken for character-
special devices (at least). E.g. the attached program spins shows similar
behaviour when run on /dev/null - it spins, returning revents=POLLNVAL.

It looks like I'm not the first to see this either, some people noticed
it 17 years ago!

https://lists.apple.com/archives/darwin-dev/2005/May/msg00220.html

There's apparently a bug open (Apple bug 3710161), but I can't see it
and if they haven't fixed it by now then they're presumably not in any
great hurry.

Unsetting HAVE_POLL lets the test pass. It seems like some other
programs use a similar approach, e.g.

https://www.mail-archive.com/bug-gnulib@gnu.org/msg00296.html

So I think we need a HAVE_BROKEN_POLL :(

-d
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: OS X poll breakage (Was: Please help test recent changes) [ In reply to ]
On Tue, 11 Jan 2022, Damien Miller wrote:

> Wow, it looks like Darwin's poll(2) is completely broken for character-
> special devices (at least). E.g. the attached program spins shows similar
> behaviour when run on /dev/null - it spins, returning revents=POLLNVAL.

oops, this was the test program I was using.

Compare "./demo -" and "./demo /dev/null"

#include <sys/stat.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <poll.h>
#include <err.h>

int
main(int argc, char **argv)
{
struct pollfd p;
struct stat s;
int r;
char c;

setlinebuf(stdout);
p.fd = 0;
if (argc > 1 && strcmp(argv[1], "-") != 0) {
if ((p.fd = open(argv[1], O_RDONLY)) == -1)
err(1, "open %s", argv[1]);
}
if (fstat(p.fd, &s) != 0)
err(1, "stat");
printf("fd %d mode 0%o\n", p.fd, s.st_mode);
for (;;) {
p.events = POLLIN;
r = poll(&p, 1, -1);
if (r == -1)
err(1, "poll");
printf("poll %d rev 0x%02x\n", r, p.revents);
if ((p.revents & (POLLIN|POLLHUP)) == 0)
continue;
r = read(p.fd, &c, 1);
if (r == -1)
err(1, "read");
if (r == 0)
break;
}
return 0;
}


_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: OS X poll breakage (Was: Please help test recent changes) [ In reply to ]
On Tue, 11 Jan 2022, Damien Miller wrote:

> So I think we need a HAVE_BROKEN_POLL :(

This seems to work. I don't think the preprocessor will do the right thing
if we have to support a system with BROKEN_POLL && HAVE_PPOLL, but we can
cross that bridge if we come to it...

diff --git a/configure.ac b/configure.ac
index 1feb73ef9..eb265143b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -731,6 +731,10 @@ main() { if (NSVersionOfRunTimeLibrary("System") >= (60 << 16))
# proc_pidinfo()-based closefrom() replacement.
AC_CHECK_HEADERS([libproc.h])
AC_CHECK_FUNCS([proc_pidinfo])
+ # poll(2) is broken for character-special devices (at least).
+ # cf. Apple bug 3710161 (not public, but searchable)
+ AC_DEFINE([BROKEN_POLL], [1],
+ [System poll(2) implementation is broken])
;;
*-*-dragonfly*)
SSHDLIBS="$SSHDLIBS -lcrypt"
diff --git a/openbsd-compat/bsd-poll.c b/openbsd-compat/bsd-poll.c
index 2d28eed5b..8084776ce 100644
--- a/openbsd-compat/bsd-poll.c
+++ b/openbsd-compat/bsd-poll.c
@@ -15,7 +15,7 @@
*/

#include "includes.h"
-#if !defined(HAVE_PPOLL) || !defined(HAVE_POLL)
+#if !defined(HAVE_PPOLL) || !defined(HAVE_POLL) || defined(BROKEN_POLL)

#include <sys/types.h>
#include <sys/time.h>
@@ -29,7 +29,7 @@
#include <unistd.h>
#include "bsd-poll.h"

-#ifndef HAVE_PPOLL
+#if !defined(HAVE_PPOLL) || defined(BROKEN_POLL)
/*
* A minimal implementation of ppoll(2), built on top of pselect(2).
*
@@ -109,9 +109,9 @@ out:
errno = saved_errno;
return ret;
}
-#endif /* HAVE_PPOLL */
+#endif /* !HAVE_PPOLL || BROKEN_POLL */

-#ifndef HAVE_POLL
+#if !defined(HAVE_POLL) || defined(BROKEN_POLL)
int
poll(struct pollfd *fds, nfds_t nfds, int timeout)
{
@@ -126,6 +126,6 @@ poll(struct pollfd *fds, nfds_t nfds, int timeout)

return ppoll(fds, nfds, tsp, NULL);
}
-#endif /* HAVE_POLL */
+#endif /* !HAVE_POLL || BROKEN_POLL */

-#endif /* HAVE_PPOLL || HAVE_POLL */
+#endif /* !HAVE_PPOLL || !HAVE_POLL || BROKEN_POLL */
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: OS X poll breakage (Was: Please help test recent changes) [ In reply to ]
On Tue, 11 Jan 2022 at 18:44, Damien Miller <djm@mindrot.org> wrote:

> On Tue, 11 Jan 2022, Damien Miller wrote:
>
> > So I think we need a HAVE_BROKEN_POLL :(
>
> This seems to work.


Nice work! Change looks ok to me.


> I don't think the preprocessor will do the right thing
> if we have to support a system with BROKEN_POLL && HAVE_PPOLL, but we can
> cross that bridge if we come to it...
>

It's shims all the way down.

--
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA (new)
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: OS X poll breakage (Was: Please help test recent changes) [ In reply to ]
On 1/11/22 02:30, Damien Miller wrote:
> On Tue, 11 Jan 2022, Damien Miller wrote:
>
>> Here's the client side log of the failure. It comes from the
>> "same with early close of stdout/err" section of the test, but I can't
>> actually see anything get closed...
>>
>> debug3: receive packet: type 91
>> debug2: channel_input_open_confirmation: channel 0: callback start
>> debug2: client_session2_setup: id 0
>> debug1: Sending command: exec sh -c 'sleep 2; exec > /dev/null 2>&1; sleep 3; exit 0'
>> debug2: channel 0: request exec confirm 1
>> debug3: send packet: type 98
>> debug2: channel_input_open_confirmation: channel 0: callback done
>> debug2: channel 0: open confirm rwindow 0 rmax 32768
>> debug2: channel 0: rcvd adjust 2097152
>> debug3: receive packet: type 99
>> debug2: channel_input_status_confirm: type 99 id 0
>> debug2: exec request accepted on channel 0
>> channel 0: invalid rfd pollfd[2].fd 4 r4 w7 e8 s-1
>>
>> and (separate run) with DEBUG_CHANNEL_POLL set:
>>
>> debug2: channel_input_status_confirm: type 99 id 0
>> debug2: exec request accepted on channel 0
>> debug3: dump_channel_poll: channel 0: rfd r4 w7 e8 s-1 pfd[2].fd=4 want 0x01 ev 0x01 ready 0x00 rev 0x00
>> debug3: dump_channel_poll: channel 0: rfd r4 w7 e8 s-1 pfd[3].fd=7 want 0x01 ev 0x00 ready 0x00 rev 0x00
>> debug3: dump_channel_poll: channel 0: rfd r4 w7 e8 s-1 pfd[4].fd=8 want 0x01 ev 0x00 ready 0x00 rev 0x00
>> debug1: channel_after_poll: pfd[2].fd 4 rev 0x0020
>> debug3: dump_channel_poll: channel 0: rfd r4 w7 e8 s-1 pfd[2].fd=4 want 0x01 ev 0x01 ready 0x00 rev 0x20
>> channel 0: invalid rfd pollfd[2].fd 4 r4 w7 e8 s-1
>> FAIL: exit code (with sleep) mismatch for: 255 != 0
>>
>> It looks like it fails with HAVE_POLL set to 0 too.
>>
>> Still looking...
>
> Wow, it looks like Darwin's poll(2) is completely broken for character-
> special devices (at least). E.g. the attached program spins shows similar
> behaviour when run on /dev/null - it spins, returning revents=POLLNVAL.

Is using kqueue(2) an option? That’s faster, and it doesn’t have the
problem with large file descriptors that poll(2) has.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Re: OS X poll breakage (Was: Please help test recent changes) [ In reply to ]
On Tue, 11 Jan 2022, Demi Marie Obenour wrote:

> Is using kqueue(2) an option?

The linked-to post says that there’s an open Apple bug that
both poll(2) and kqueue(2) on anything in /dev/ are broken.

bye,
//mirabilos
--
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

****************************************************
/?\ The UTF-8 Ribbon
? ? Campaign against Mit dem tarent-Newsletter nichts mehr verpassen:
 ?  HTML eMail! Also, https://www.tarent.de/newsletter
? ? header encryption!
****************************************************
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: OS X poll breakage (Was: Please help test recent changes) [ In reply to ]
On Tue, 11 Jan 2022 23:12:18 +0100, Thorsten Glaser wrote:

> The linked-to post says that there’s an open Apple bug that
> both poll(2) and kqueue(2) on anything in /dev/ are broken.

The reason poll(2) is broken is that it is implemented in terms of
kqueue(2). If Apple fixes kqueue(2) for character devices poll
would probably start working...

- todd
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: OS X poll breakage (Was: Please help test recent changes) [ In reply to ]
On Tue, 11 Jan 2022, Demi Marie Obenour wrote:

> > Wow, it looks like Darwin's poll(2) is completely broken for
> > character- special devices (at least). E.g. the attached program
> > spins shows similar behaviour when run on /dev/null - it spins,
> > returning revents=POLLNVAL.
>
> Is using kqueue(2) an option? That’s faster, and it doesn’t have the
> problem with large file descriptors that poll(2) has.

No, because it's not cross-platform and I don't want to have to maintain
multiple mainloops (believe me, one is more than enough). It would have
been really nice if the OSS Unix-like operating systems has standardised
on one event notification; kqueue and epoll are both fine.

Also, it seems like the OS X kqueue(2) implementation suffers from the
same bug anyway. I suspect that this is the root cause, and that their
poll(2) is built on top of kqueue(2).

-d
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: OS X poll breakage (Was: Please help test recent changes) [ In reply to ]
epoll doesn't (or didn't) work with /dev/null either in any case.

Only select works on everything on macOS and only poll or select on Linux.



On Tue, 11 Jan 2022, 22:46 Damien Miller, <djm@mindrot.org> wrote:

> On Tue, 11 Jan 2022, Demi Marie Obenour wrote:
>
> > > Wow, it looks like Darwin's poll(2) is completely broken for
> > > character- special devices (at least). E.g. the attached program
> > > spins shows similar behaviour when run on /dev/null - it spins,
> > > returning revents=POLLNVAL.
> >
> > Is using kqueue(2) an option? That’s faster, and it doesn’t have the
> > problem with large file descriptors that poll(2) has.
>
> No, because it's not cross-platform and I don't want to have to maintain
> multiple mainloops (believe me, one is more than enough). It would have
> been really nice if the OSS Unix-like operating systems has standardised
> on one event notification; kqueue and epoll are both fine.
>
> Also, it seems like the OS X kqueue(2) implementation suffers from the
> same bug anyway. I suspect that this is the root cause, and that their
> poll(2) is built on top of kqueue(2).
>
> -d
> _______________________________________________
> openssh-unix-dev mailing list
> openssh-unix-dev@mindrot.org
> https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
>
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: OS X poll breakage (Was: Please help test recent changes) [ In reply to ]
On Wed, 12 Jan 2022, Nicholas Marriott wrote:

> epoll doesn't (or didn't) work with /dev/null either in any case.

I didn't know this. At least there's the prospect of patching it ourselves :)

https://github.com/torvalds/linux/compare/master...djmdjm:epoll-devnull

(untested)

-d
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev
Re: OS X poll breakage (Was: Please help test recent changes) [ In reply to ]
On 1/11/22 17:42, Damien Miller wrote:
> On Tue, 11 Jan 2022, Demi Marie Obenour wrote:
>
>>> Wow, it looks like Darwin's poll(2) is completely broken for
>>> character- special devices (at least). E.g. the attached program
>>> spins shows similar behaviour when run on /dev/null - it spins,
>>> returning revents=POLLNVAL.
>>
>> Is using kqueue(2) an option? That’s faster, and it doesn’t have the
>> problem with large file descriptors that poll(2) has.
>
> No, because it's not cross-platform and I don't want to have to maintain
> multiple mainloops (believe me, one is more than enough). It would have
> been really nice if the OSS Unix-like operating systems has standardised
> on one event notification; kqueue and epoll are both fine.
>
> Also, it seems like the OS X kqueue(2) implementation suffers from the
> same bug anyway. I suspect that this is the root cause, and that their
> poll(2) is built on top of kqueue(2).

What about filtering out file descriptors belonging to /dev files?

--
Sincerely,
Demi Marie Obenour (she/her/hers)
Re: OS X poll breakage (Was: Please help test recent changes) [ In reply to ]
On Thu, 13 Jan 2022 at 17:15, Demi Marie Obenour <demiobenour@gmail.com>
wrote:

> [...]
> What about filtering out file descriptors belonging to /dev files?
>

Well given that one of the main things people want sshd to interact with
are pseudoterminals (/dev/ttyp*)...

--
Darren Tucker (dtucker at dtucker.net)
GPG key 11EAA6FA / A86E 3E07 5B19 5880 E860 37F4 9357 ECEF 11EA A6FA (new)
Good judgement comes with experience. Unfortunately, the experience
usually comes from bad judgement.
_______________________________________________
openssh-unix-dev mailing list
openssh-unix-dev@mindrot.org
https://lists.mindrot.org/mailman/listinfo/openssh-unix-dev