Mailing List Archive

[patch 0/6] lightweight robust futexes: -V3
This is release -V3 of the "lightweight robust futexes" patchset. The
patchset can also be downloaded from:

http://redhat.com/~mingo/lightweight-robust-futexes/

Changes since -V2:

Ulrich Drepper ran the code through more glibc testcases, which
unearthed a couple of bugs:

- fixed bug in the i386 and x86_64 assembly code (Ulrich Drepper)

- fixed bug in the list walking futex-wakeups (found by Ulrich Drepper)

- race fix: do not bail out in the list walk when the list_op_pending
pointer cannot be followed by the kernel - another userspace thread
may have already destroyed the mutex (and unmapped it), before this
thread had a chance to clear the field.

- cleanup: renamed list_add_pending to list_op_pending. (the field is
used for list removals too)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 [ In reply to ]
Another thing I noticed was that futex_offset on the surface looks like
a malicious users dream variable .. I didn't notice security addressed
at all in your initial write up . I was told it was a big topic at last
years OLS .. In your write up you did say you corrupted the
robust_list , but did you corrupt the offset? Is this even a concern?

Daniel


On Thu, 2006-02-16 at 10:41 +0100, Ingo Molnar wrote:
> This is release -V3 of the "lightweight robust futexes" patchset. The
> patchset can also be downloaded from:
>
> http://redhat.com/~mingo/lightweight-robust-futexes/
>
> Changes since -V2:
>
> Ulrich Drepper ran the code through more glibc testcases, which
> unearthed a couple of bugs:
>
> - fixed bug in the i386 and x86_64 assembly code (Ulrich Drepper)
>
> - fixed bug in the list walking futex-wakeups (found by Ulrich Drepper)
>
> - race fix: do not bail out in the list walk when the list_op_pending
> pointer cannot be followed by the kernel - another userspace thread
> may have already destroyed the mutex (and unmapped it), before this
> thread had a chance to clear the field.
>
> - cleanup: renamed list_add_pending to list_op_pending. (the field is
> used for list removals too)
>
> Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 [ In reply to ]
* Daniel Walker <dwalker@mvista.com> wrote:

> Another thing I noticed was that futex_offset on the surface looks
> like a malicious users dream variable .. [...]

i have no idea what you mean by that - could you explain whatever threat
you have in mind, in more detail?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 [ In reply to ]
On Thu, 2006-02-16 at 18:24 +0100, Ingo Molnar wrote:
> * Daniel Walker <dwalker@mvista.com> wrote:
>
> > Another thing I noticed was that futex_offset on the surface looks
> > like a malicious users dream variable .. [...]
>
> i have no idea what you mean by that - could you explain whatever threat
> you have in mind, in more detail?

As I said, "on the surface" you could manipulate the futex_offset to
access memory unrelated to the futex structure . That's all I'm
referring too ..

Daniel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 - Why in userspace? [ In reply to ]
I just jump into a thread somewhere to ask my question :-)

Why does the list have to be in userspace?

As I see it there can only be a problem when some thread has done
FUTEX_WAIT and is blocked. If no task is blocked (or on it's way to
being blocked) there is no problem.

The solution I could imagine was the FUTEX_WAIT operation adds the
waiting task to a list of waiters attached to the mutex owner's task_t
(which is known by it's pid in the userspace flag) just before calling
schedule(). This list needs to be protected by a spinlock, ofcourse.
When a task dies it can wake up the waiters on it's list without relying
on the userspace.

What race conditions have I missed?

Esben




On Thu, 16 Feb 2006, Daniel Walker wrote:

> On Thu, 2006-02-16 at 18:24 +0100, Ingo Molnar wrote:
> > * Daniel Walker <dwalker@mvista.com> wrote:
> >
> > > Another thing I noticed was that futex_offset on the surface looks
> > > like a malicious users dream variable .. [...]
> >
> > i have no idea what you mean by that - could you explain whatever threat
> > you have in mind, in more detail?
>
> As I said, "on the surface" you could manipulate the futex_offset to
> access memory unrelated to the futex structure . That's all I'm
> referring too ..
>
> Daniel
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 - Why in userspace? [ In reply to ]
On Thu, 2006-02-16 at 20:06 +0100, Esben Nielsen wrote:
> I just jump into a thread somewhere to ask my question :-)
>
> Why does the list have to be in userspace?

because it's faster ;)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 - Why in userspace? [ In reply to ]
On Thu, 16 Feb 2006, Arjan van de Ven wrote:

> On Thu, 2006-02-16 at 20:06 +0100, Esben Nielsen wrote:
> > I just jump into a thread somewhere to ask my question :-)
> >
> > Why does the list have to be in userspace?
>
> because it's faster ;)
>
>
Faster???
As I see it, extra manipulations have to be done even in the non-congested
case: Every time the lock is taken the locking thread has to add the lock
to a the list, and reversely remove the lock from the list. I.e.
instructions are _added_ to the fast path where you stay purely in
userspace.

I am ofcourse comparing to a solution where you do a syscall on everytime
you do a lock. What I am asking about is whethere it wouldn't be
enough to maintain the list at the FUTEX_WAIT/FUTEX_WAKE operation - i.e.
the slow path where you have to go into the kernel.

Esben


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 - Why in userspace? [ In reply to ]
On Thu, 16 Feb 2006, Esben Nielsen wrote:

> On Thu, 16 Feb 2006, Arjan van de Ven wrote:
>
> > On Thu, 2006-02-16 at 20:06 +0100, Esben Nielsen wrote:
> > > I just jump into a thread somewhere to ask my question :-)
> > >
> > > Why does the list have to be in userspace?
> >
> > because it's faster ;)
> >
> >
> Faster???
> As I see it, extra manipulations have to be done even in the non-congested
> case: Every time the lock is taken the locking thread has to add the lock
> to a the list, and reversely remove the lock from the list. I.e.
> instructions are _added_ to the fast path where you stay purely in
> userspace.
>
> I am ofcourse comparing to a solution where you do a syscall on everytime
Correction:
"I am ofcourse NOT comparing"
...
> you do a lock. What I am asking about is whethere it wouldn't be
> enough to maintain the list at the FUTEX_WAIT/FUTEX_WAKE operation - i.e.
> the slow path where you have to go into the kernel.
>
> Esben
>
Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 - Why in userspace? [ In reply to ]
Esben Nielsen wrote:
> On Thu, 16 Feb 2006, Arjan van de Ven wrote:
>
>
>>On Thu, 2006-02-16 at 20:06 +0100, Esben Nielsen wrote:

>>>Why does the list have to be in userspace?
>>
>>because it's faster ;)

> Faster???
> As I see it, extra manipulations have to be done even in the non-congested
> case: Every time the lock is taken the locking thread has to add the lock
> to a the list, and reversely remove the lock from the list. I.e.
> instructions are _added_ to the fast path where you stay purely in
> userspace.
>
> I am ofcourse comparing to a solution where you do a syscall on everytime
> you do a lock.


The whole *point* of futexes is that on uncontested operations you don't
have to do a syscall. Thus, if you can avoid taking a syscall while
still getting reliability, you'll be faster.

Dropping to kernelspace isn't free.

Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 [ In reply to ]
* Daniel Walker <dwalker@mvista.com> wrote:

> On Thu, 2006-02-16 at 18:24 +0100, Ingo Molnar wrote:
> > * Daniel Walker <dwalker@mvista.com> wrote:
> >
> > > Another thing I noticed was that futex_offset on the surface looks
> > > like a malicious users dream variable .. [...]
> >
> > i have no idea what you mean by that - could you explain whatever threat
> > you have in mind, in more detail?
>
> As I said, "on the surface" you could manipulate the
> futex_offset to access memory unrelated to the futex structure .
> That's all I'm referring too ..

and? You can 'manipulate' arbitrary userspace memory, be that used by
the kernel or not, and you can do a sys_futex(FUTEX_WAKE) on any
arbitrary userspace memory address too (this is a core property of
futexes). You must have meant something specific when you said "on the
surface looks like a malicious users dream variable". In other words:
please move your statement out of innuendo by backing it up with
specifics (or by retracting it) - right now it's hanging in the air :)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 - Why in userspace? [ In reply to ]
* Esben Nielsen <simlo@phys.au.dk> wrote:

> > On Thu, 2006-02-16 at 20:06 +0100, Esben Nielsen wrote:
> > > I just jump into a thread somewhere to ask my question :-)
> > >
> > > Why does the list have to be in userspace?
> >
> > because it's faster ;)
> >
> >
> Faster???

yes, it's faster.

> As I see it, extra manipulations have to be done even in the non-congested
> case: Every time the lock is taken the locking thread has to add the lock
> to a the list, and reversely remove the lock from the list. I.e.
> instructions are _added_ to the fast path where you stay purely in
> userspace.

Note that glibc was already doing these list ops for the current
(limited, userspace-only) implementation of robust mutexes, so moving
the cleanup to kernel-space has only a small effect on the userspace
fastpath.

even considering the list ops, they are 2-3 (non-LOCK-ed) instructions
of memory values that are already cached => it's almost for free. Ulrich
(who is a kind of person who writes glibc code in assembly whenever he
can excuse it with a performance argument) would be pretty upset if it
wasnt cheap :-)

> I am ofcourse comparing to a solution where you do a syscall on
> everytime you do a lock. What I am asking about is whethere it
> wouldn't be enough to maintain the list at the FUTEX_WAIT/FUTEX_WAKE
> operation - i.e. the slow path where you have to go into the kernel.

no, that's not enough at all: we need to be able to clean up after
futexes even if the kernel was _never involved_ with them. The pure
userspace futex fastpath still can keep a lock stuck! In fact that is
the common-case.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 [ In reply to ]
Daniel wrote:
> "on the surface" you could manipulate the futex_offset to
> access memory unrelated to the futex structure .

If a piece of malicious code has wormed its way far enough into my
application to be manipulating this list, then I don't think that code
will gain any further advantage by manpulating this list. I think my
application is already powned.

That malicious code would have no need to have the kernel futext handling
code do its dirty work indirectly via manipulations of this list. It can
just do the dirty work directly.

All Ingo needs to insure is that the kernel will assume no more
priviledge when reading/writing this list than the current task had,
from user space, reading/writing this list.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 [ In reply to ]
On Thu, 16 Feb 2006, Ingo Molnar wrote:
> and? You can 'manipulate' arbitrary userspace memory, be that used by
> the kernel or not, and you can do a sys_futex(FUTEX_WAKE) on any
> arbitrary userspace memory address too (this is a core property of
> futexes). You must have meant something specific when you said "on the
> surface looks like a malicious users dream variable". In other words:
> please move your statement out of innuendo by backing it up with
> specifics (or by retracting it) - right now it's hanging in the air :)


Sorry I didn't mean to leave something hanging out there. I was
just making an observation. The 'dream variable' comment was maybe a little
to much and I'll gladly retract that .. I'll replace it with , I think the
code needs more review in that area ..

Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 [ In reply to ]
Nice stuff ...

I wonder if some of the initial questions about whether gcc would be
forcing something on the kernel, and whether it was unsafe for the
kernel to be walking a user list, are distracting from a more
interesting (in my view) question.

One can view this as just another sort of "interesting" system call,
where user code puts some data in various register and memory
locations, and then ends up by some predictable path in kernel code
which is acting on the request encoded in that data.

As always with system calls:
1) the kernel can't trust the user data any further than the user
could have thrown it, and
2) the interface needs a robust ABI and one or more language API's,
which will stand the test of time, over various architectures
and 32-64 emulations.

From what I could see glancing at the code and comments, Ingo has (1)
covered easily enough.

Would it make sense to have a language independent specification of
this interface, providing a detailed ABI, suitably generalized to cover
the various big endian, little endian, 32 and 64 and cross environments
that Linux normally supports?

I have in mind something that a competent assembly language coder could
write to, directly, when coding user access to this facility? Or some
other language or library implementor, besides C and glibc, could
develop to?

The biggest problem that I find in new and interesting ways for the
kernel to interact with user space is not thinking carefully through
and documenting in obscene detail the exact interface (this byte here
means this, that little endian quad there means thus, ...) for all
archs and emulations of interest. This tends to result in some corner
cases that have warts which can never be fixed, in order to maintain
compatibility.

This is sort of like specifying the over the wire protocols the
internet, where each byte is spelled out, avoiding any assumption
of what sort of computing device is on the other end. Well, not
quite that bad. I guess we can assume the user code is running
on the same arch as the kernel, give or take possible word size
and endian emulations ... though if performance of this even from
within machine architecture emulators was a priority, even that
assumption is perhaps not desirable.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 [ In reply to ]
* Daniel Walker <dwalker@mvista.com> wrote:

>
> On Thu, 16 Feb 2006, Ingo Molnar wrote:
> >and? You can 'manipulate' arbitrary userspace memory, be that used by
> >the kernel or not, and you can do a sys_futex(FUTEX_WAKE) on any
> >arbitrary userspace memory address too (this is a core property of
> >futexes). You must have meant something specific when you said "on the
> >surface looks like a malicious users dream variable". In other words:
> >please move your statement out of innuendo by backing it up with
> >specifics (or by retracting it) - right now it's hanging in the air :)
>
>
> Sorry I didn't mean to leave something hanging out there. I was
> just making an observation. The 'dream variable' comment was maybe a
> little to much and I'll gladly retract that .. I'll replace it with ,
> I think the code needs more review in that area ..

basically, ->futex_offset is not blindly trusted by the kernel either:
it's simply used to calculate a "userspace pointer" value, which it then
uses in a (secure) get_user() access, to do a FUTEX_WAKEUP. [.Note that
FUTEX_WAKEUP is already done at do_exit() time via the ->clear_child_tid
userspace pointer.] All in one: this is totally safe.

The purpose of ->futex_offset is to not hardcode glibc's data structure
layout into the kernel. Since 'clean up after locks' is a relatively
rare operation, it was the prudent thing to do.

We could also have registered the futex_offset in the kernel itself, but
I didnt do it that way because that would add another word to
task_struct (for the sake of an operation that is rare), and it would
also make the sys_set_robust_list() operation a bit more expensive. So I
minimized the API to only take a single userspace pointer, which pointer
points to the robust_list_head structure which contains all data to
continue. That data is never trusted and is handled very carefully.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 [ In reply to ]
* Paul Jackson <pj@sgi.com> wrote:

> That malicious code would have no need to have the kernel futext
> handling code do its dirty work indirectly via manipulations of this
> list. It can just do the dirty work directly.
>
> All Ingo needs to insure is that the kernel will assume no more
> priviledge when reading/writing this list than the current task had,
> from user space, reading/writing this list.

Correct, this is precisely what happens.

Furthermore, the new exit-time futex code within the kernel will do only
one, very limited thing with userspace memory: it will atomically set
bit 30 of a word at a userspace address (if the word is accessible to
and writable by userspace), if and only if that word is equal to
current->pid. This is really not the sort of memory writing capability
attackers are looking for :-)

Btw., we already have a similar mechanism in the kernel (and had for
years): the current->clear_child_tid pointer will be overwritten with 0
by the kernel at do_exit() time, and causes a futex wakeup. See
kernel/fork.c:mm_release():

if (tsk->clear_child_tid && atomic_read(&mm->mm_users) > 1) {
u32 __user * tidptr = tsk->clear_child_tid;
tsk->clear_child_tid = NULL;

/*
* We don't check the error code - if userspace has
* not set up a proper pointer then tough luck.
*/
put_user(0, tidptr);
sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);

So the concept is not unprecedented at all, nor did it ever cause any
security problems [.and i think i'd know - i wrote the above code too].
And 'write 0' is slightly more interesting to attackers than 'set bit 30
if word equals to TID'.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 [ In reply to ]
Ingo Molnar wrote:

> basically, ->futex_offset is not blindly trusted by the kernel either:
> it's simply used to calculate a "userspace pointer" value, which it then
> uses in a (secure) get_user() access, to do a FUTEX_WAKEUP. [.Note that
> FUTEX_WAKEUP is already done at do_exit() time via the ->clear_child_tid
> userspace pointer.] All in one: this is totally safe.

As mentioned by Paul...how do you deal with 32/64 compatibility where
your pointers are different sizes?

Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 [ In reply to ]
* Paul Jackson <pj@sgi.com> wrote:

> Nice stuff ...
>
> I wonder if some of the initial questions about whether gcc would be
> forcing something on the kernel, and whether it was unsafe for the
> kernel to be walking a user list, are distracting from a more
> interesting (in my view) question.
>
> One can view this as just another sort of "interesting" system call,
> where user code puts some data in various register and memory
> locations, and then ends up by some predictable path in kernel code
> which is acting on the request encoded in that data.

correct.

> As always with system calls:
> 1) the kernel can't trust the user data any further than the user
> could have thrown it, and
> 2) the interface needs a robust ABI and one or more language API's,
> which will stand the test of time, over various architectures
> and 32-64 emulations.
>
> >From what I could see glancing at the code and comments, Ingo has (1)
> covered easily enough.
>
> Would it make sense to have a language independent specification of
> this interface, providing a detailed ABI, suitably generalized to
> cover the various big endian, little endian, 32 and 64 and cross
> environments that Linux normally supports?

little/big endian shouldnt be a problem i think, as this is a
nonpersistent object. (futexes do not survive reboot)

The 32-bit-on-64-bit support code was indeed interesting, but it's also
pretty straightforward. See kernel/futex_compat.c where the 64-bit
kernel walks a 32-bit userspace. The method i took was to have _two_
lists:

struct robust_list_head __user *robust_list;
#ifdef CONFIG_COMPAT
struct compat_robust_list_head __user *compat_robust_list;
#endif

and at do_exit() time we process _both_ lists, first the 64-bit one,
then the 32-bit one. This handles execution environments that have both
32-bit and 64-bit state - they could crash in e.g. 32-bit mode holding
robust futexes, while holding 64-bit robust futexes too. This method
correctly handles e.g. x86 binaries on x86_64 [i checked that], and
native binaries too.

> I have in mind something that a competent assembly language coder
> could write to, directly, when coding user access to this facility?
> Or some other language or library implementor, besides C and glibc,
> could develop to?

in this particular case i dont think it could be described in a more
generic way. I'm not against your idea per se - but someone would have
to code it up ;) Nor do i think that in this particular case we'd need
more flexibility than the patch offers: only a minimal amount of things
are 'hardcoded' in the robust-list approach, and even those are either
known futex properties, or are 'obvious' approaches like the fact that
it's represented as a linked list. (which is what glibc uses anyway) But
e.g. we dont force the single linked list: userspace can use a
double-linked list too - the kernel will simply walk the single-linked
component of that list in a forwards way.

> This is sort of like specifying the over the wire protocols the
> internet, where each byte is spelled out, avoiding any assumption of
> what sort of computing device is on the other end. Well, not quite
> that bad. I guess we can assume the user code is running on the same
> arch as the kernel, give or take possible word size and endian
> emulations ... though if performance of this even from within machine
> architecture emulators was a priority, even that assumption is perhaps
> not desirable.

i think my patch is a good example of how to do it with our existing
tools: i separated the list walking into a separate function
(exit_robust_list() and compat_exit_robust_list()), which purely handles
the data structure details.

In theory you are right, these two functions do essentially the same
thing, and we could have automatically 'converted'
compat_exit_robust_list() from the native exit_robust_list() function -
but in practice it was a pretty straightforward process anyway for these
~50-line functions. I think it would need a more complex example than
this to justify some sort of new language.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 [ In reply to ]
* Christopher Friesen <cfriesen@nortel.com> wrote:

> Ingo Molnar wrote:
>
> >basically, ->futex_offset is not blindly trusted by the kernel either:
> >it's simply used to calculate a "userspace pointer" value, which it then
> >uses in a (secure) get_user() access, to do a FUTEX_WAKEUP. [.Note that
> >FUTEX_WAKEUP is already done at do_exit() time via the ->clear_child_tid
> >userspace pointer.] All in one: this is totally safe.
>
> As mentioned by Paul...how do you deal with 32/64 compatibility where
> your pointers are different sizes?

i just replied to Paul's mail with details about this. (Please reply to
that mail if there are any open questions.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 - Why in userspace? [ In reply to ]
On Thu, 16 Feb 2006, Ingo Molnar wrote:

> [...]
>
> > I am ofcourse comparing to a solution where you do a syscall on
> > everytime you do a lock. What I am asking about is whethere it
> > wouldn't be enough to maintain the list at the FUTEX_WAIT/FUTEX_WAKE
> > operation - i.e. the slow path where you have to go into the kernel.
>
> no, that's not enough at all: we need to be able to clean up after
> futexes even if the kernel was _never involved_ with them. The pure
> userspace futex fastpath still can keep a lock stuck! In fact that is
> the common-case.
>

As I understand the protocol the userspace task writes it's pid into the
lock atomically when locking it and erases it atomically when it leaves
the lock. If it is killed inbetween the pid is still there.
Now if another task comes along it reads the pid, sets the wait flag and goes
into the kernel. The kernel will now be able to see that the pid is no
longer valid and therefore the owner must be dead.

Esben

> Ingo
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 - Why in userspace? [ In reply to ]
* Esben Nielsen <simlo@phys.au.dk> wrote:

> As I understand the protocol the userspace task writes it's pid into
> the lock atomically when locking it and erases it atomically when it
> leaves the lock. If it is killed inbetween the pid is still there. Now
> if another task comes along it reads the pid, sets the wait flag and
> goes into the kernel. The kernel will now be able to see that the pid
> is no longer valid and therefore the owner must be dead.

this is racy - we cannot know whether the PID wrapped around.

nor does this method offer any solution for the case where there are
already waiters pending: they might be hung forever. With our solution
one of those waiters gets woken up and notice that the lock is dead.
(and in the unlikely even of that thread dying too while trying to
recover the data, the kernel will do yet another wakeup, of the next
waiter.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 - Why in userspace? [ In reply to ]
On Thu, 16 Feb 2006, Ingo Molnar wrote:

>
> * Esben Nielsen <simlo@phys.au.dk> wrote:
>
> > As I understand the protocol the userspace task writes it's pid into
> > the lock atomically when locking it and erases it atomically when it
> > leaves the lock. If it is killed inbetween the pid is still there. Now
> > if another task comes along it reads the pid, sets the wait flag and
> > goes into the kernel. The kernel will now be able to see that the pid
> > is no longer valid and therefore the owner must be dead.
>
> this is racy - we cannot know whether the PID wrapped around.
>
What about adding more bits to check on? The PID to lookup the task_t and
then some extra bits to uniquely identify the actual task.

> nor does this method offer any solution for the case where there are
> already waiters pending: they might be hung forever.
It was for this case I suggested maintaining a list of waiters within the
kernel on each task_t. The adding has to be done FUTEX_WAIT so the adding
operation needs to be protected.

> With our solution
> one of those waiters gets woken up and notice that the lock is dead.
> (and in the unlikely even of that thread dying too while trying to
> recover the data, the kernel will do yet another wakeup, of the next
> waiter.)
>
I admit your solution is a good one. The only drawback - besides being
untraditional - is that memory corruption can leave futexes locked at
exit.

Esben

> Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 - Why in userspace? [ In reply to ]
* Esben Nielsen <simlo@phys.au.dk> wrote:

> > this is racy - we cannot know whether the PID wrapped around.
> >
> What about adding more bits to check on? The PID to lookup the task_t
> and then some extra bits to uniquely identify the actual task.

which would just be a fancy name for a wider PID space, and would thus
still not protect against PID reuse :-)

> > nor does this method offer any solution for the case where there are
> > already waiters pending: they might be hung forever.
>
> It was for this case I suggested maintaining a list of waiters within
> the kernel on each task_t. The adding has to be done FUTEX_WAIT so the
> adding operation needs to be protected.

i'm not sure i follow - what list is this and how would it be
maintained?

> > With our solution
> > one of those waiters gets woken up and notice that the lock is dead.
> > (and in the unlikely even of that thread dying too while trying to
> > recover the data, the kernel will do yet another wakeup, of the next
> > waiter.)
> >
> I admit your solution is a good one. The only drawback - besides being
> untraditional - is that memory corruption can leave futexes locked at
> exit.

so? Memory corruption can overwrite the futex value anyway, and can thus
cause the wrong owner to be identified - causing a locked futex. This
patch does not protect against bad effects of memory corruption -
there's really no way to keep userspace from breaking itself.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 - Why in userspace? [ In reply to ]
On Fri, 17 Feb 2006, Ingo Molnar wrote:

>
> * Esben Nielsen <simlo@phys.au.dk> wrote:
>
> > > this is racy - we cannot know whether the PID wrapped around.
> > >
> > What about adding more bits to check on? The PID to lookup the task_t
> > and then some extra bits to uniquely identify the actual task.
>
> which would just be a fancy name for a wider PID space, and would thus
> still not protect against PID reuse :-)
>
Can it really be correct there is no way to uniquely identify a thread in
the uptime of the system? It could be done with BigIntegers :-)


> > > nor does this method offer any solution for the case where there are
> > > already waiters pending: they might be hung forever.
> >
> > It was for this case I suggested maintaining a list of waiters within
> > the kernel on each task_t. The adding has to be done FUTEX_WAIT so the
> > adding operation needs to be protected.
>
> i'm not sure i follow - what list is this and how would it be
> maintained?
>

At the FUTEX_WAIT operation add the waiter to a list of waiters on the
owner's task_t. At FUTEX_WAKE remove the waiter. At task exit wake up the
waiters.

> > > With our solution
> > > one of those waiters gets woken up and notice that the lock is dead.
> > > (and in the unlikely even of that thread dying too while trying to
> > > recover the data, the kernel will do yet another wakeup, of the next
> > > waiter.)
> > >
> > I admit your solution is a good one. The only drawback - besides being
> > untraditional - is that memory corruption can leave futexes locked at
> > exit.
>
> so? Memory corruption can overwrite the futex value anyway, and can thus
> cause the wrong owner to be identified - causing a locked futex. This
> patch does not protect against bad effects of memory corruption -
> there's really no way to keep userspace from breaking itself.
>

At least you could wake up those who are already blocked in the kernel...

Esben

> Ingo
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 0/6] lightweight robust futexes: -V3 - Why in userspace? [ In reply to ]
* Esben Nielsen <simlo@phys.au.dk> wrote:

> > > > this is racy - we cannot know whether the PID wrapped around.
> > > >
> > > What about adding more bits to check on? The PID to lookup the task_t
> > > and then some extra bits to uniquely identify the actual task.
> >
> > which would just be a fancy name for a wider PID space, and would thus
> > still not protect against PID reuse :-)
> >
>
> Can it really be correct there is no way to uniquely identify a thread
> in the uptime of the system? It could be done with BigIntegers :-)

well, that's how PIDs work. I'd have no problem with 64-bit PIDs/TIDs in
theory, but besides a _massive_ ABI break (which makes the whole thing a
non-starter), users would probably resist 'ps' output like:

917239487098712349 tty8 Ss+ 0:00 -bash
1844674407370955161 tty9 Ss+ 0:00 -bash
1356415698798712343 tty10 Ss+ 0:00 -bash

:-| Another problem is that futexes are fundamentally 32-bit, so there's
no space in them for 64-bit TIDs ...

> > i'm not sure i follow - what list is this and how would it be
> > maintained?
>
> At the FUTEX_WAIT operation add the waiter to a list of waiters on the
> owner's task_t. At FUTEX_WAKE remove the waiter. At task exit wake up
> the waiters.

well, but even in this case the kernel has no idea (currently) about the
owner - it is the waiters that have kernel state in this case. So extra
code would have to be added to look up the TID, make sure the lookup is
secure: check that the task looked up is really mapping that vma, and to
queue waiters to the owner. Quite clumsy ... and this is still only the
easy case, when the kernel is involved in the futex use.

> > > I admit your solution is a good one. The only drawback - besides being
> > > untraditional - is that memory corruption can leave futexes locked at
> > > exit.
> >
> > so? Memory corruption can overwrite the futex value anyway, and can thus
> > cause the wrong owner to be identified - causing a locked futex. This
> > patch does not protect against bad effects of memory corruption -
> > there's really no way to keep userspace from breaking itself.
> >
>
> At least you could wake up those who are already blocked in the
> kernel...

which would be of little use if the wrong TID is in the futex. There's
really no protection against userspace breaking itself.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

1 2  View All