Mailing List Archive

1 2  View All
Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls [ In reply to ]
On Fri, Nov 26 2021 at 22:52, Peter Zijlstra wrote:
> On Fri, Nov 26, 2021 at 10:11:17PM +0100, Thomas Gleixner wrote:
>> On Wed, Nov 24 2021 at 22:19, Peter Zijlstra wrote:
>> > On Mon, Nov 22, 2021 at 01:13:24PM -0800, Peter Oskolkov wrote:
>> >
>> >> + * Timestamp: a 46-bit CLOCK_MONOTONIC timestamp, at 16ns resolution.
>> >
>> >> +static int umcg_update_state(u64 __user *state_ts, u64 *expected, u64 desired,
>> >> + bool may_fault)
>> >> +{
>> >> + u64 curr_ts = (*expected) >> (64 - UMCG_STATE_TIMESTAMP_BITS);
>> >> + u64 next_ts = ktime_get_ns() >> UMCG_STATE_TIMESTAMP_GRANULARITY;
>> >
>> > I'm still very hesitant to use ktime (fear the HPET); but I suppose it
>> > makes sense to use a time base that's accessible to userspace. Was
>> > MONOTONIC_RAW considered?
>>
>> MONOTONIC_RAW is not really useful as you can't sleep on it and it won't
>> solve the HPET crap either.
>
> But it's ns are of equal size to sched_clock(), if both share TSC IIRC.
> Whereas MONOTONIC, being subject to ntp rate stuff, has differently
> sized ns.

The size is the same, i.e. 1 bit per nanosecond :)

> The only time that's relevant though is when you're going to mix these
> timestamps with CLOCK_THREAD_CPUTIME_ID, which might just be
> interesting.

Uuurg. If you want to go towards CLOCK_THREAD_CPUTIME_ID, that's going
to be really nasty. Actually you can sleep on that clock, but that's a
completely different universe. If anything like that is desired then we
need to rewrite that posix CPU timer muck completely with all the bells
and whistels and race conditions attached to it. *Shudder*

Thanks,

tglx
Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls [ In reply to ]
On Mon, Nov 29, 2021 at 09:34:49AM -0800, Peter Oskolkov wrote:
> On Mon, Nov 29, 2021 at 8:41 AM Peter Zijlstra <peterz@infradead.org> wrote:

> > However, do note this whole scheme fundamentally has some of that, the
> > moment the syscall unblocks until sys_exit is 'unmanaged' runtime for
> > all tasks, they can consume however much time the syscall needs there.
> >
> > Also, timeout on sys_umcg_wait() gets you the exact same situation (or
> > worse, multiple running workers).
>
> It should not. Timed out workers should be added to the runnable list
> and not become running unless a server chooses so. So sys_umcg_wait()
> with a timeout should behave similarly to a normal sleep, in that the
> server is woken upon the worker blocking, and upon the worker wakeup
> the worker is added to the woken workers list and waits for a server
> to run it. The only difference is that in a sleep the worker becomes
> BLOCKED, while in sys_umcg_wait() the worker is RUNNABLE the whole
> time.

OK, that's somewhat subtle and I hadn't gotten that either.

Currently it return -ETIMEDOUT in RUNNING state for both server and
worker callers.

Let me go fix that then.

> > > Another big concern I have is that you removed UMCG_TF_LOCKED. I
> >
> > OOh yes, I forgot to mention that. I couldn't figure out what it was
> > supposed to do.
> >
> > > definitely needed it to guard workers during "sched work" in the
> > > userspace in my approach. I'm not sure if the flag is absolutely
> > > needed with your approach, but most likely it is - the kernel-side
> > > scheduler does lock tasks and runqueues and disables interrupts and
> > > migrations and other things so that the scheduling logic is not
> > > hijacked by concurrent stuff. Why do you assume that the userspace
> > > scheduling code does not need similar protections?
> >
> > I've not yet come across a case where this is needed. Migration for
> > instance is possible when RUNNABLE, simply write ::server_tid before
> > ::state. Userspace just needs to make sure who actually owns the task,
> > but it can do that outside of this state.
> >
> > But like I said; I've not yet done the userspace part (and I lost most
> > of today trying to install a new machine), so perhaps I'll run into it
> > soon enough.
>
> The most obvious scenario where I needed locking is when worker A
> wants to context switch into worker B, while another worker C wants to
> context switch into worker A, and worker A pagefaults. This involves:
>
> worker A context: worker A context switches into worker B:
>
> - worker B::server_tid = worker A::server_tid
> - worker A::server_tid = none
> - worker A::state = runnable
> - worker B::state = running
> - worker A::next_tid = worker B
> - worker A calls sys_umcg_wait()
>
> worker B context: before the above completes, worker C wants to
> context switch into worker A, with similar steps.
>
> "interrupt context": in the middle of the mess above, worker A pagefaults
>
> Too many moving parts. UMCG_TF_LOCKED helped me make this mess
> manageable. Maybe without pagefaults clever ordering of the operations
> listed above could make things work, but pagefaults mess things badly,
> so some kind of "preempt_disable()" for the userspace scheduling code
> was needed, and UMCG_TF_LOCKED was the solution I had.

I'm not sure I'm following. For this to be true A and C must be running
on a different server right?

So we have something like:

S0 running A S1 running B

Therefore:

S0::state == RUNNABLE S1::state == RUNNABLE
A::server_tid == S0.tid B::server_tid == S1.tid
A::state == RUNNING B::state == RUNNING

Now, you want A to switch to C, therefore C had better be with S0, eg we
have:

C::server_tid == S0.tid
C::state == RUNNABLE

So then A does:

A::next_tid = C.tid;
sys_umcg_wait();

Which will:

pin(A);
pin(S0);

cmpxchg(A::state, RUNNING, RUNNABLE);

next_tid = A::next_tid; // C

enqueue(S0::runnable, A);

At which point B steals S0's runnable queue, and tries to make A go.

runnable = xchg(S0::runnable_list_ptr, NULL); // == A
A::server_tid = S1.tid;
B::next_tid = A.tid;
sys_umcg_wait();

wake(C)
cmpxchg(C::state, RUNNABLE, RUNNING); <-- *fault*


Something like that, right?

What currently happens is that S0 goes back to S0 and S1 ends up in A.
That is, if, for any reason we fail to wake next_tid, we'll wake
server_tid.

So then S0 wakes up and gets to re-evaluate life. If it has another
worker it can go run that, otherwise it can try and steal a worker
somewhere or just idle out.

Now arguably, the only reason A->C can fault is because C is garbage, at
which point your program is malformed and it doesn't matter what
happens one way or the other.
Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls [ In reply to ]
On Mon, Nov 29, 2021 at 1:08 PM Peter Zijlstra <peterz@infradead.org> wrote:
[...]
> > > > Another big concern I have is that you removed UMCG_TF_LOCKED. I
> > >
> > > OOh yes, I forgot to mention that. I couldn't figure out what it was
> > > supposed to do.
[...]
>
> So then A does:
>
> A::next_tid = C.tid;
> sys_umcg_wait();
>
> Which will:
>
> pin(A);
> pin(S0);
>
> cmpxchg(A::state, RUNNING, RUNNABLE);

Hmm.... That's another difference between your patch and mine: my
approach was "the side that initiates the change updates the state".
So in my code the userspace changes the current task's state RUNNING
=> RUNNABLE and the next task's state, or the server's state, RUNNABLE
=> RUNNING before calling sys_umcg_wait(). The kernel changed worker
states to BLOCKED/RUNNABLE during block/wake detection, and marked
servers RUNNING when waking them during block/wake detection; but all
applicable state changes for sys_umcg_wait() happen in the userspace.

The reasoning behind this approach was:
- do in kernel only that which cannot be done in the userspace, to
make the kernel code smaller/simpler
- similar to how futexes work: futex_wait does not change the futex
value to the desired value, but just checks whether the futex value
matches the desired value
- similar to how futexes work, concurrent state changes can happen in
the userspace without calling into the kernel at all
for example:
- (a): worker A goes to sleep into sys_umcg_wait()
- (b): worker B wants to context switch into worker A "a moment" later
- due to preemption/interrupts/pagefaults/whatnot, (b) happens
in reality before (a)
in my patchset, the situation above happily resolves in the
userspace so that worker A keeps running without ever calling
sys_umcg_wait().

Again, I don't think this is deal breaking, and your approach will
work, just a bit less efficiently in some cases :)

I'm still not sure we can live without UMCG_TF_LOCKED. What if worker
A transfers its server to worker B that A intends to context switch
into, and then worker A pagefaults or gets interrupted before calling
sys_umcg_wait()? The server will be woken up and will see that it is
assigned to worker B; now what? If worker A is "locked" before the
whole thing starts, the pagefault/interrupt will not trigger
block/wake detection, worker A will keep RUNNING for all intended
purposes, and eventually will call sys_umcg_wait() as it had
intended...

[...]
Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls [ In reply to ]
Sorry, I haven't been feeling too well and as such procastinated on this
because thinking is required :/ Trying to pick up the bits.

On Mon, Nov 29, 2021 at 03:38:38PM -0800, Peter Oskolkov wrote:
> On Mon, Nov 29, 2021 at 1:08 PM Peter Zijlstra <peterz@infradead.org> wrote:
> [...]
> > > > > Another big concern I have is that you removed UMCG_TF_LOCKED. I
> > > >
> > > > OOh yes, I forgot to mention that. I couldn't figure out what it was
> > > > supposed to do.
> [...]
> >
> > So then A does:
> >
> > A::next_tid = C.tid;
> > sys_umcg_wait();
> >
> > Which will:
> >
> > pin(A);
> > pin(S0);
> >
> > cmpxchg(A::state, RUNNING, RUNNABLE);
>
> Hmm.... That's another difference between your patch and mine: my
> approach was "the side that initiates the change updates the state".
> So in my code the userspace changes the current task's state RUNNING
> => RUNNABLE and the next task's state,

I couldn't make that work for wakeups; when a thread blocks in a
random syscall there is no userspace to wake the next thread. And since
it seems required in this case, it's easier and more consistent to
always do it.

> or the server's state, RUNNABLE
> => RUNNING before calling sys_umcg_wait().

Yes, this is indeed required; I've found the same when trying to build
the userspace server loop. And yes, I'm starting to see where you're
coming from.

> I'm still not sure we can live without UMCG_TF_LOCKED. What if worker
> A transfers its server to worker B that A intends to context switch

S0 running A

Therefore:

S0::state == RUNNABLE
A::server_tid = S0.tid
A::state == RUNNING

you want A to switch to B, therefore:

B::state == RUNNABLE

if B is not yet on S0 then:

B::server_tid = S0.tid;

finally:

0:
A::next_tid = B.tid;
1:
A::state = RUNNABLE:
2:
sys_umcg_wait();
3:

> into, and then worker A pagefaults or gets interrupted before calling
> sys_umcg_wait()?

So the problem is tripping umcg_notify_resume() on the labels 1 and 2,
right? tripping it on 0 and 3 is trivially correct.

If we trip it on 1 and !(A::state & TG_PREEMPT), then nothing, since
::state == RUNNING we'll just continue onwards and all is well. That is,
nothing has happened yet.

However, if we trip it on 2: we're screwed. Because at that point
::state is scribbled.

> The server will be woken up and will see that it is
> assigned to worker B; now what? If worker A is "locked" before the
> whole thing starts, the pagefault/interrupt will not trigger
> block/wake detection, worker A will keep RUNNING for all intended
> purposes, and eventually will call sys_umcg_wait() as it had
> intended...

No, the failure case is different; umcg_notify_resume() will simply
block A until someone sets A::state == RUNNING and kicks it, which will
be no-one.

Now, the above situation is actually simple to fix, but it gets more
interesting when we're using sys_umcg_wait() to build wait primitives.
Because in that case we get stuff like:

for (;;) {
self->state = RUNNABLE;
smp_mb();
if (cond)
break;
sys_umcg_wait();
}
self->state = RUNNING;

And we really need to not block and also not do sys_umcg_wait() early.

So yes, I agree that we need a special case here that ensures
umcg_notify_resume() doesn't block. Let me ponder naming and comments.
Either a TF_COND_WAIT or a whole new state. I can't decide yet.

Now, obviously if you do a random syscall anywhere around here, you get
to keep the pieces :-)



I've also added ::next_tid to the whole umcg_pin_pages() thing, and made
it so that ::next_tid gets cleared when it's been used. That way things
like:

self->next_tid = pick_from_runqueue();
sys_that_is_expected_to_sleep();
if (self->next_tid) {
return_to_runqueue(self->next_tid);
self->next_tid = 0;
}

Are much simpler to manage. Either it did sleep and ::next_tid is
consumed, or it didn't sleep and it needs to be returned to the
runqueue.
Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls [ In reply to ]
On Mon, Nov 29, 2021 at 09:34:49AM -0800, Peter Oskolkov wrote:
> On Mon, Nov 29, 2021 at 8:41 AM Peter Zijlstra <peterz@infradead.org> wrote:

> > Also, timeout on sys_umcg_wait() gets you the exact same situation (or
> > worse, multiple running workers).
>
> It should not. Timed out workers should be added to the runnable list
> and not become running unless a server chooses so. So sys_umcg_wait()
> with a timeout should behave similarly to a normal sleep, in that the
> server is woken upon the worker blocking, and upon the worker wakeup
> the worker is added to the woken workers list and waits for a server
> to run it. The only difference is that in a sleep the worker becomes
> BLOCKED, while in sys_umcg_wait() the worker is RUNNABLE the whole
> time.
>
> Why then have sys_umcg_wait() with a timeout at all, instead of
> calling nanosleep()? Because the worker in sys_umcg_wait() can be
> context-switched into by another worker, or made running by a server;
> if the worker is in nanosleep(), it just sleeps.

I've been trying to figure out the semantics of that timeout thing, and
I can't seem to make sense of it.

Consider two workers:

S0 running A S1 running B

therefore:

S0::state == RUNNABLE S1::state == RUNNABLE
A::server_tid == S0.tid B::server_tid = S1.tid
A::state == RUNNING B::state == RUNNING

Doing:

self->state = RUNNABLE; self->state = RUNNABLE;
sys_umcg_wait(0); sys_umcg_wait(10);
umcg_enqueue_runnable() umcg_enqueue_runnable()
umcg_wake() umcg_wake()
umcg_wait() umcg_wait()
hrtimer_start()

In both cases we get the exact same outcome:

A::state == RUNNABLE B::state == RUNNABLE
S0::state == RUNNING S1::state == RUNNING
S0::runnable_ptr == &A S1::runnable_ptr = &B


Which is, AFAICT, the exact state you wanted to achieve, except B now
has an active timer, but what do you want it to do when that goes?

I'm tempted to say workers cannot have timeout, and servers can use it
to wake themselves.
Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls [ In reply to ]
On Mon, Dec 06, 2021 at 12:32:22PM +0100, Peter Zijlstra wrote:
> On Mon, Nov 29, 2021 at 03:38:38PM -0800, Peter Oskolkov wrote:
> > On Mon, Nov 29, 2021 at 1:08 PM Peter Zijlstra <peterz@infradead.org> wrote:

> Now, the above situation is actually simple to fix, but it gets more
> interesting when we're using sys_umcg_wait() to build wait primitives.
> Because in that case we get stuff like:
>
> for (;;) {
> self->state = RUNNABLE;
> smp_mb();
> if (cond)
> break;
> sys_umcg_wait();
> }
> self->state = RUNNING;
>
> And we really need to not block and also not do sys_umcg_wait() early.
>
> So yes, I agree that we need a special case here that ensures
> umcg_notify_resume() doesn't block. Let me ponder naming and comments.
> Either a TF_COND_WAIT or a whole new state. I can't decide yet.

Hurmph... OTOH since self above hasn't actually done anything yet, it
isn't reported as runnable yet, and so for all intents and purposes the
userspace state thinks it's running (which is true) and nobody should be
trying a concurrent wakeup and there anre't any races.

Bah, now I'm confused again :-) Let me go think more.
Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls [ In reply to ]
On Mon, Dec 06, 2021 at 12:32:22PM +0100, Peter Zijlstra wrote:
>
> Sorry, I haven't been feeling too well and as such procastinated on this
> because thinking is required :/ Trying to pick up the bits.

*sigh* and yet another week gone... someone was unhappy about refcount_t.


> No, the failure case is different; umcg_notify_resume() will simply
> block A until someone sets A::state == RUNNING and kicks it, which will
> be no-one.
>
> Now, the above situation is actually simple to fix, but it gets more
> interesting when we're using sys_umcg_wait() to build wait primitives.
> Because in that case we get stuff like:
>
> for (;;) {
> self->state = RUNNABLE;
> smp_mb();
> if (cond)
> break;
> sys_umcg_wait();
> }
> self->state = RUNNING;
>
> And we really need to not block and also not do sys_umcg_wait() early.
>
> So yes, I agree that we need a special case here that ensures
> umcg_notify_resume() doesn't block. Let me ponder naming and comments.
> Either a TF_COND_WAIT or a whole new state. I can't decide yet.
>
> Now, obviously if you do a random syscall anywhere around here, you get
> to keep the pieces :-)

Something like so I suppose..

--- a/include/uapi/linux/umcg.h
+++ b/include/uapi/linux/umcg.h
@@ -42,6 +42,32 @@
*
*/
#define UMCG_TF_PREEMPT 0x0100U
+/*
+ * UMCG_TF_COND_WAIT: indicate the task *will* call sys_umcg_wait()
+ *
+ * Enables server loops like (vs umcg_sys_exit()):
+ *
+ * for(;;) {
+ * self->status = UMCG_TASK_RUNNABLE | UMCG_TF_COND_WAIT;
+ * // smp_mb() implied by xchg()
+ *
+ * runnable_ptr = xchg(self->runnable_workers_ptr, NULL);
+ * while (runnable_ptr) {
+ * next = runnable_ptr->runnable_workers_ptr;
+ *
+ * umcg_server_add_runnable(self, runnable_ptr);
+ *
+ * runnable_ptr = next;
+ * }
+ *
+ * self->next = umcg_server_pick_next(self);
+ * sys_umcg_wait(0, 0);
+ * }
+ *
+ * without a signal or interrupt in between setting umcg_task::state and
+ * sys_umcg_wait() resulting in an infinite wait in umcg_notify_resume().
+ */
+#define UMCG_TF_COND_WAIT 0x0200U

#define UMCG_TF_MASK 0xff00U

--- a/kernel/sched/umcg.c
+++ b/kernel/sched/umcg.c
@@ -180,7 +180,7 @@ void umcg_worker_exit(void)
/*
* Do a state transition, @from -> @to, and possible read @next after that.
*
- * Will clear UMCG_TF_PREEMPT.
+ * Will clear UMCG_TF_PREEMPT, UMCG_TF_COND_WAIT.
*
* When @to == {BLOCKED,RUNNABLE}, update timestamps.
*
@@ -216,7 +216,8 @@ static int umcg_update_state(struct task
if ((old & UMCG_TASK_MASK) != from)
goto fail;

- new = old & ~(UMCG_TASK_MASK | UMCG_TF_PREEMPT);
+ new = old & ~(UMCG_TASK_MASK |
+ UMCG_TF_PREEMPT | UMCG_TF_COND_WAIT);
new |= to & UMCG_TASK_MASK;

} while (!unsafe_try_cmpxchg_user(&self->state, &old, new, Efault));
@@ -567,11 +568,13 @@ void umcg_notify_resume(struct pt_regs *
if (state == UMCG_TASK_RUNNING)
goto done;

- // XXX can get here when:
- //
- // self->state = RUNNABLE
- // <signal>
- // sys_umcg_wait();
+ /*
+ * See comment at UMCG_TF_COND_WAIT; TL;DR: user *will* call
+ * sys_umcg_wait() and signals/interrupts shouldn't block
+ * return-to-user.
+ */
+ if (state == UMCG_TASK_RUNNABLE | UMCG_TF_COND_WAIT)
+ goto done;

if (state & UMCG_TF_PREEMPT) {
if (umcg_pin_pages())
@@ -658,6 +661,13 @@ SYSCALL_DEFINE2(umcg_wait, u32, flags, u
if (ret)
goto unblock;

+ /*
+ * Clear UMCG_TF_COND_WAIT *and* check state == RUNNABLE.
+ */
+ ret = umcg_update_state(self, tsk, UMCG_TASK_RUNNABLE, UMCG_TASK_RUNNABLE);
+ if (ret)
+ goto unpin;
+
if (worker) {
ret = umcg_enqueue_runnable(tsk);
if (ret)
Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls [ In reply to ]
On Mon, Dec 6, 2021 at 3:47 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Nov 29, 2021 at 09:34:49AM -0800, Peter Oskolkov wrote:
> > On Mon, Nov 29, 2021 at 8:41 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> > > Also, timeout on sys_umcg_wait() gets you the exact same situation (or
> > > worse, multiple running workers).
> >
> > It should not. Timed out workers should be added to the runnable list
> > and not become running unless a server chooses so. So sys_umcg_wait()
> > with a timeout should behave similarly to a normal sleep, in that the
> > server is woken upon the worker blocking, and upon the worker wakeup
> > the worker is added to the woken workers list and waits for a server
> > to run it. The only difference is that in a sleep the worker becomes
> > BLOCKED, while in sys_umcg_wait() the worker is RUNNABLE the whole
> > time.
> >
> > Why then have sys_umcg_wait() with a timeout at all, instead of
> > calling nanosleep()? Because the worker in sys_umcg_wait() can be
> > context-switched into by another worker, or made running by a server;
> > if the worker is in nanosleep(), it just sleeps.
>
> I've been trying to figure out the semantics of that timeout thing, and
> I can't seem to make sense of it.
>
> Consider two workers:
>
> S0 running A S1 running B
>
> therefore:
>
> S0::state == RUNNABLE S1::state == RUNNABLE
> A::server_tid == S0.tid B::server_tid = S1.tid
> A::state == RUNNING B::state == RUNNING
>
> Doing:
>
> self->state = RUNNABLE; self->state = RUNNABLE;
> sys_umcg_wait(0); sys_umcg_wait(10);
> umcg_enqueue_runnable() umcg_enqueue_runnable()

sys_umcg_wait() should not enqueue the worker as runnable; workers are
enqueued to indicate wakeup events.

> umcg_wake() umcg_wake()
> umcg_wait() umcg_wait()
> hrtimer_start()
>
> In both cases we get the exact same outcome:
>
> A::state == RUNNABLE B::state == RUNNABLE
> S0::state == RUNNING S1::state == RUNNING
> S0::runnable_ptr == &A S1::runnable_ptr = &B

So without sys_umcg_wait enqueueing into the queue, the state now is

A::state == RUNNABLE B::state == RUNNABLE
S0::state == RUNNING S1::state == RUNNING
S0::runnable_ptr == NULL S1::runnable_ptr = NULL

>
>
> Which is, AFAICT, the exact state you wanted to achieve, except B now
> has an active timer, but what do you want it to do when that goes?

When the timer goes off, _then_ B is enqueued into the queue, so the
state becomes

A::state == RUNNABLE B::state == RUNNABLE
S0::state == RUNNING S1::state == RUNNING
S0::runnable_ptr == NULL S1::runnable_ptr = &B

So worker timeouts in sys_umcg_wait are treated as wakeup events, with
the difference that when the worker is eventually scheduled by a
server, sys_umcg_wait returns with ETIMEDOUT.

>
> I'm tempted to say workers cannot have timeout, and servers can use it
> to wake themselves.
Re: [PATCH v0.9.1 3/6] sched/umcg: implement UMCG syscalls [ In reply to ]
On Wed, Jan 19, 2022 at 09:26:41AM -0800, Peter Oskolkov wrote:
> On Mon, Dec 6, 2021 at 3:47 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Nov 29, 2021 at 09:34:49AM -0800, Peter Oskolkov wrote:
> > > On Mon, Nov 29, 2021 at 8:41 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > > > Also, timeout on sys_umcg_wait() gets you the exact same situation (or
> > > > worse, multiple running workers).
> > >
> > > It should not. Timed out workers should be added to the runnable list
> > > and not become running unless a server chooses so. So sys_umcg_wait()
> > > with a timeout should behave similarly to a normal sleep, in that the
> > > server is woken upon the worker blocking, and upon the worker wakeup
> > > the worker is added to the woken workers list and waits for a server
> > > to run it. The only difference is that in a sleep the worker becomes
> > > BLOCKED, while in sys_umcg_wait() the worker is RUNNABLE the whole
> > > time.
> > >
> > > Why then have sys_umcg_wait() with a timeout at all, instead of
> > > calling nanosleep()? Because the worker in sys_umcg_wait() can be
> > > context-switched into by another worker, or made running by a server;
> > > if the worker is in nanosleep(), it just sleeps.
> >
> > I've been trying to figure out the semantics of that timeout thing, and
> > I can't seem to make sense of it.
> >
> > Consider two workers:
> >
> > S0 running A S1 running B
> >
> > therefore:
> >
> > S0::state == RUNNABLE S1::state == RUNNABLE
> > A::server_tid == S0.tid B::server_tid = S1.tid
> > A::state == RUNNING B::state == RUNNING
> >
> > Doing:
> >
> > self->state = RUNNABLE; self->state = RUNNABLE;
> > sys_umcg_wait(0); sys_umcg_wait(10);
> > umcg_enqueue_runnable() umcg_enqueue_runnable()
>
> sys_umcg_wait() should not enqueue the worker as runnable; workers are
> enqueued to indicate wakeup events.

Oooh... I see.

> So worker timeouts in sys_umcg_wait are treated as wakeup events, with
> the difference that when the worker is eventually scheduled by a
> server, sys_umcg_wait returns with ETIMEDOUT.

Right.. OK, let me go fold and polish what I have now before I go change
things again though.

1 2  View All