Mailing List Archive: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

[PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

fweisbec at gmail

Oct 30, 2012, 8:35 AM

Post #1 of 12 (1365 views)

The IRQ_WORK_BUSY flag is set right before we execute the
work. Once this flag value is set, the work enters a
claimable state again.

This is necessary because if we want to enqueue a work but we
fail the claim, we want to ensure that the CPU where that work
is still pending will see and handle the data we expected the
work to compute.

This might not work as expected though because IRQ_WORK_BUSY
isn't set atomically. By the time a CPU fails a work claim,
this work may well have been already executed by the CPU where
it was previously pending.

Due to the lack of appropriate memory barrier, the IRQ_WORK_BUSY
flag value may not be visible by the CPU trying to claim while
the work is executing, and that until we clear the busy bit in
the work flags using cmpxchg() that implies the full barrier.

One solution could involve a full barrier between setting
IRQ_WORK_BUSY flag and the work execution. This way we
ensure that the work execution site sees the expected data
and the claim site sees the IRQ_WORK_BUSY:

CPU 0 CPU 1

data = something flags = IRQ_WORK_BUSY
smp_mb() (implicit with cmpxchg smp_mb()
on flags in claim) execute_work (sees data from CPU 0)
try to claim

As a shortcut, let's just use xchg() that implies a full memory
barrier.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
---
kernel/irq_work.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 764240a..ea79365 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -130,9 +130,12 @@ void irq_work_run(void)

/*
* Clear the PENDING bit, after this point the @work
- * can be re-used.
+ * can be re-used. Use xchg to force ordering against
+ * data to process, such that if claiming fails on
+ * another CPU, we see and handle the data it wants
+ * us to process on the work.
*/
- work->flags = IRQ_WORK_BUSY;
+ xchg(&work->flags, IRQ_WORK_BUSY);
work->func(work);
/*
* Clear the BUSY bit and return to the free state if
--
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting [ In reply to ]

rostedt at goodmis

Oct 30, 2012, 9:26 AM

Post #2 of 12 (1358 views)

On Tue, 2012-10-30 at 16:35 +0100, Frederic Weisbecker wrote:
> The IRQ_WORK_BUSY flag is set right before we execute the
> work. Once this flag value is set, the work enters a
> claimable state again.
>
> This is necessary because if we want to enqueue a work but we
> fail the claim, we want to ensure that the CPU where that work
> is still pending will see and handle the data we expected the
> work to compute.
>
> This might not work as expected though because IRQ_WORK_BUSY
> isn't set atomically. By the time a CPU fails a work claim,
> this work may well have been already executed by the CPU where
> it was previously pending.
>
> Due to the lack of appropriate memory barrier, the IRQ_WORK_BUSY
> flag value may not be visible by the CPU trying to claim while
> the work is executing, and that until we clear the busy bit in
> the work flags using cmpxchg() that implies the full barrier.
>
> One solution could involve a full barrier between setting
> IRQ_WORK_BUSY flag and the work execution. This way we
> ensure that the work execution site sees the expected data
> and the claim site sees the IRQ_WORK_BUSY:
>
> CPU 0 CPU 1
>
> data = something flags = IRQ_WORK_BUSY
> smp_mb() (implicit with cmpxchg smp_mb()
> on flags in claim) execute_work (sees data from CPU 0)
> try to claim
>
> As a shortcut, let's just use xchg() that implies a full memory
> barrier.
>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

> ---
> kernel/irq_work.c | 7 +++++--
> 1 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 764240a..ea79365 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -130,9 +130,12 @@ void irq_work_run(void)
>
> /*
> * Clear the PENDING bit, after this point the @work
> - * can be re-used.
> + * can be re-used. Use xchg to force ordering against
> + * data to process, such that if claiming fails on
> + * another CPU, we see and handle the data it wants
> + * us to process on the work.
> */
> - work->flags = IRQ_WORK_BUSY;
> + xchg(&work->flags, IRQ_WORK_BUSY);
> work->func(work);
> /*
> * Clear the BUSY bit and return to the free state if

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting [ In reply to ]

anish198519851985 at gmail

Oct 30, 2012, 11:33 AM

Post #3 of 12 (1361 views)

> The IRQ_WORK_BUSY flag is set right before we execute the
> work. Once this flag value is set, the work enters a
> claimable state again.
>
> This is necessary because if we want to enqueue a work but we
> fail the claim, we want to ensure that the CPU where that work
> is still pending will see and handle the data we expected the
> work to compute.
>
> This might not work as expected though because IRQ_WORK_BUSY
> isn't set atomically. By the time a CPU fails a work claim,
> this work may well have been already executed by the CPU where
> it was previously pending.
>
> Due to the lack of appropriate memory barrier, the IRQ_WORK_BUSY
> flag value may not be visible by the CPU trying to claim while
> the work is executing, and that until we clear the busy bit in
> the work flags using cmpxchg() that implies the full barrier.
>
> One solution could involve a full barrier between setting
> IRQ_WORK_BUSY flag and the work execution. This way we
> ensure that the work execution site sees the expected data
> and the claim site sees the IRQ_WORK_BUSY:
>
> CPU 0 CPU 1
>
> data = something flags = IRQ_WORK_BUSY
> smp_mb() (implicit with cmpxchg smp_mb()
> on flags in claim) execute_work (sees data from CPU
> 0)
> try to claim
>
As I understand without the memory barrier proposed by you the situation
would be as below:
CPU 0 CPU 1

data = something flags = IRQ_WORK_BUSY
smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0)
on flags in claim)
_success_ in claiming and goes
ahead and execute the work(wrong?)
cmpxchg cause flag to IRQ_WORK_BUSY

Now knows the flag==IRQ_WORK_BUSY

Am I right?

Probably a stupid question.Why do we return the bool from irq_work_queue
when no one bothers to check the return value?Wouldn't it be better if
this function is void as used by the users of this function or am I
looking at the wrong code.
>
> As a shortcut, let's just use xchg() that implies a full memory
> barrier.
>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
> kernel/irq_work.c | 7 +++++--
> 1 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 764240a..ea79365 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -130,9 +130,12 @@ void irq_work_run(void)
>
> /*
> * Clear the PENDING bit, after this point the @work
> - * can be re-used.
> + * can be re-used. Use xchg to force ordering against
> + * data to process, such that if claiming fails on
> + * another CPU, we see and handle the data it wants
> + * us to process on the work.
> */
> - work->flags = IRQ_WORK_BUSY;
> + xchg(&work->flags, IRQ_WORK_BUSY);
> work->func(work);
> /*
> * Clear the BUSY bit and return to the free state if
> --
> 1.7.5.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting [ In reply to ]

rostedt at goodmis

Oct 30, 2012, 11:45 AM

Post #4 of 12 (1363 views)

On Wed, 2012-10-31 at 03:33 +0900, anish kumar wrote:

> > CPU 0 CPU 1
> >
> > data = something flags = IRQ_WORK_BUSY
> > smp_mb() (implicit with cmpxchg smp_mb()
> > on flags in claim) execute_work (sees data from CPU
> > 0)
> > try to claim
> >
> As I understand without the memory barrier proposed by you the situation
> would be as below:
> CPU 0 CPU 1
>
> data = something flags = IRQ_WORK_BUSY
> smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0)
> on flags in claim)
> _success_ in claiming and goes

Correct, because it would see the stale value of flags.

> ahead and execute the work(wrong?)
> cmpxchg cause flag to IRQ_WORK_BUSY
>
> Now knows the flag==IRQ_WORK_BUSY
>
> Am I right?

right.

>
> Probably a stupid question.Why do we return the bool from irq_work_queue
> when no one bothers to check the return value?Wouldn't it be better if
> this function is void as used by the users of this function or am I
> looking at the wrong code.

Not a stupid question, as I was asking that to myself just earlier
today. But forgot to mention it as well. Especially, because it makes it
look like there's a bug in the code. Maybe someday someone will care if
their work was finished by itself, or some other CPU.

Probably should just nix the return value.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting [ In reply to ]

fweisbec at gmail

Oct 30, 2012, 5:36 PM

Post #5 of 12 (1351 views)

2012/10/30 anish kumar <anish198519851985@gmail.com>:
> As I understand without the memory barrier proposed by you the situation
> would be as below:
> CPU 0 CPU 1
>
> data = something flags = IRQ_WORK_BUSY
> smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0)
> on flags in claim)
> _success_ in claiming and goes
> ahead and execute the work(wrong?)
> cmpxchg cause flag to IRQ_WORK_BUSY
>
> Now knows the flag==IRQ_WORK_BUSY
>
> Am I right?

(Adding Paul in Cc because I'm again confused with memory barriers)

Actually what I had in mind is rather that CPU 0 fails its claim
because it's not seeing the IRQ_WORK_BUSY flag as it should:

CPU 0 CPU 1

data = something flags = IRQ_WORK_BUSY
cmpxchg() for claim execute_work (sees data from CPU 0)

CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this
value in a non-atomic way.

Also, while browsing Paul's perfbook, I realize my changelog is buggy.
It seems we can't reliably use memory barriers here because we would
be in the following case:

CPU 0 CPU 1

store(work data) store(flags)
smp_mb() smp_mb()
load(flags) load(work data)

On top of this barrier pairing, we can't make the assumption that, for
example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees
the flags stored in CPU 1.

So now I wonder if cmpxchg() can give us more confidence:

CPU 0 CPU 1

store(work data) xchg(flags, IRQ_WORK_BUSY)
cmpxchg(flags, IRQ_WORK_FLAGS) load(work data)

Can I make this assumption?

- If CPU 0 fails the cmpxchg() (which means CPU 1 has not yet xchg())
then CPU 1 will execute the work and see our data.

At least cmpxchg / xchg pair orders correctly to ensure somebody will
execute our work. Now probably some locking is needed from the work
function itself if it's not per cpu.

>
> Probably a stupid question.Why do we return the bool from irq_work_queue
> when no one bothers to check the return value?Wouldn't it be better if
> this function is void as used by the users of this function or am I
> looking at the wrong code.

No idea. Probably Peter had plans there.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting [ In reply to ]

rostedt at goodmis

Oct 30, 2012, 7:23 PM

Post #6 of 12 (1352 views)

On Wed, 2012-10-31 at 01:36 +0100, Frederic Weisbecker wrote:
> 2012/10/30 anish kumar <anish198519851985@gmail.com>:
> > As I understand without the memory barrier proposed by you the situation
> > would be as below:
> > CPU 0 CPU 1
> >
> > data = something flags = IRQ_WORK_BUSY
> > smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0)
> > on flags in claim)
> > _success_ in claiming and goes
> > ahead and execute the work(wrong?)
> > cmpxchg cause flag to IRQ_WORK_BUSY
> >
> > Now knows the flag==IRQ_WORK_BUSY
> >
> > Am I right?
>
> (Adding Paul in Cc because I'm again confused with memory barriers)
>
> Actually what I had in mind is rather that CPU 0 fails its claim
> because it's not seeing the IRQ_WORK_BUSY flag as it should:
>
>
> CPU 0 CPU 1
>
> data = something flags = IRQ_WORK_BUSY
> cmpxchg() for claim execute_work (sees data from CPU 0)
>
> CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this
> value in a non-atomic way.
>
> Also, while browsing Paul's perfbook, I realize my changelog is buggy.
> It seems we can't reliably use memory barriers here because we would
> be in the following case:
>
> CPU 0 CPU 1
>
> store(work data) store(flags)
> smp_mb() smp_mb()
> load(flags) load(work data)
>
> On top of this barrier pairing, we can't make the assumption that, for
> example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees
> the flags stored in CPU 1.
>
> So now I wonder if cmpxchg() can give us more confidence:

More confidence over what? The xchg()? They are equivalent (wrt memory
barriers).

Here's the issue that currently exists. Let's look at the code:

/*
* Claim the entry so that no one else will poke at it.
*/
static bool irq_work_claim(struct irq_work *work)
{
unsigned long flags, nflags;

for (;;) {
flags = work->flags;
if (flags & IRQ_WORK_PENDING)
return false;
nflags = flags | IRQ_WORK_FLAGS;
if (cmpxchg(&work->flags, flags, nflags) == flags)
break;
cpu_relax();
}

return true;
}

and

llnode = llist_del_all(this_list);
while (llnode != NULL) {
work = llist_entry(llnode, struct irq_work, llnode);

llnode = llist_next(llnode);

/*
* Clear the PENDING bit, after this point the @work
* can be re-used.
*/
work->flags = IRQ_WORK_BUSY;
work->func(work);
/*
* Clear the BUSY bit and return to the free state if
* no-one else claimed it meanwhile.
*/
(void)cmpxchg(&work->flags, IRQ_WORK_BUSY, 0);
}

The irq_work_claim() will only queue its work if it's not already
pending. If it is pending, then someone is going to process it anyway.
But once we start running the function, new work needs to be processed
again.

Thus we have:

CPU 1 CPU 2
----- -----
(flags = 0)
cmpxchg(flags, 0, IRQ_WORK_FLAGS)
(flags = 3)
[...]

if (flags & IRQ_WORK_PENDING)
return false
flags = IRQ_WORK_BUSY
(flags = 2)
func()

The above shows the normal case were CPU 2 doesn't need to queue work,
because its going to be done for it by CPU 1. But...

CPU 1 CPU 2
----- -----
(flags = 0)
cmpxchg(flags, 0, IRQ_WORK_FLAGS)
(flags = 3)
[...]
flags = IRQ_WORK_BUSY
(flags = 2)
func()
(sees flags = 3)
if (flags & IRQ_WORK_PENDING)
return false
cmpxchg(flags, 2, 0);
(flags = 0)

Here, because we did not do a memory barrier after
flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work,
and missed the opportunity. Now if you had this fix with the xchg() as
you have in your patch, then CPU 2 would not see the stale flags.
Except, with the code I showed above it still can!

CPU 1 CPU 2
----- -----
(flags = 0)
cmpxchg(flags, 0, IRQ_WORK_FLAGS)
(flags = 3)
[...]
(fetch flags)
xchg(&flags, IRQ_WORK_BUSY)
(flags = 2)
func()
(sees flags = 3)
if (flags & IRQ_WORK_PENDING)
return false
cmpxchg(flags, 2, 0);
(flags = 0)

Even with the update of xchg(), if CPU2 fetched the flags before CPU1
did the xchg, then it would still lose out. But that's where your
previous patch comes in that does:

flags = work->flags & ~IRQ_WORK_PENDING;
for (;;) {
nflags = flags | IRQ_WORK_FLAGS;
oflags = cmpxchg(&work->flags, flags, nflags);
if (oflags == flags)
break;
if (oflags & IRQ_WORK_PENDING)
return false;
flags = oflags;
cpu_relax();
}

This now does:

CPU 1 CPU 2
----- -----
(flags = 0)
cmpxchg(flags, 0, IRQ_WORK_FLAGS)
(flags = 3)
[...]
xchg(&flags, IRQ_WORK_BUSY)
(flags = 2)
func()
oflags = cmpxchg(&flags, flags, nflags);
(sees flags = 2)
if (flags & IRQ_WORK_PENDING)
(not true)
(loop)
cmpxchg(flags, 2, 0);
(flags = 2)
flags = 3

With both patch 1 and 2 you fixed the bug.

I guess you suffer from...

http://weknowmemes.com/wp-content/uploads/2012/09/my-code-doesnt-work-i-have-no-idea-why.jpg

;-)

>
>
> CPU 0 CPU 1
>
> store(work data) xchg(flags, IRQ_WORK_BUSY)
> cmpxchg(flags, IRQ_WORK_FLAGS) load(work data)
>
> Can I make this assumption?
>
> - If CPU 0 fails the cmpxchg() (which means CPU 1 has not yet xchg())
> then CPU 1 will execute the work and see our data.
>
> At least cmpxchg / xchg pair orders correctly to ensure somebody will
> execute our work. Now probably some locking is needed from the work
> function itself if it's not per cpu.

Yeah, there's nothing stopping the work->func() from executing from two
different CPUs. The protection is just for the internal use of llist
that is used. We don't need to worry about it being queued twice and
corrupting the work->llnode.

But the synchronization of the func() should be up to the func() code
itself, not a guarantee of the irq_work.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting [ In reply to ]

fweisbec at gmail

Oct 30, 2012, 7:54 PM

Post #7 of 12 (1353 views)

2012/10/31 Steven Rostedt <rostedt@goodmis.org>:
> More confidence over what? The xchg()? They are equivalent (wrt memory
> barriers).
>
> Here's the issue that currently exists. Let's look at the code:
>
>
> /*
> * Claim the entry so that no one else will poke at it.
> */
> static bool irq_work_claim(struct irq_work *work)
> {
> unsigned long flags, nflags;
>
> for (;;) {
> flags = work->flags;
> if (flags & IRQ_WORK_PENDING)
> return false;
> nflags = flags | IRQ_WORK_FLAGS;
> if (cmpxchg(&work->flags, flags, nflags) == flags)
> break;
> cpu_relax();
> }
>
> return true;
> }
>
> and
>
> llnode = llist_del_all(this_list);
> while (llnode != NULL) {
> work = llist_entry(llnode, struct irq_work, llnode);
>
> llnode = llist_next(llnode);
>
> /*
> * Clear the PENDING bit, after this point the @work
> * can be re-used.
> */
> work->flags = IRQ_WORK_BUSY;
> work->func(work);
> /*
> * Clear the BUSY bit and return to the free state if
> * no-one else claimed it meanwhile.
> */
> (void)cmpxchg(&work->flags, IRQ_WORK_BUSY, 0);
> }
>
> The irq_work_claim() will only queue its work if it's not already
> pending. If it is pending, then someone is going to process it anyway.
> But once we start running the function, new work needs to be processed
> again.
>
> Thus we have:
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
>
> if (flags & IRQ_WORK_PENDING)
> return false
> flags = IRQ_WORK_BUSY
> (flags = 2)
> func()
>
> The above shows the normal case were CPU 2 doesn't need to queue work,
> because its going to be done for it by CPU 1. But...
>
>
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> flags = IRQ_WORK_BUSY
> (flags = 2)
> func()
> (sees flags = 3)
> if (flags & IRQ_WORK_PENDING)
> return false
> cmpxchg(flags, 2, 0);
> (flags = 0)
>
>
> Here, because we did not do a memory barrier after
> flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work,
> and missed the opportunity. Now if you had this fix with the xchg() as
> you have in your patch, then CPU 2 would not see the stale flags.
> Except, with the code I showed above it still can!
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> (fetch flags)
> xchg(&flags, IRQ_WORK_BUSY)
> (flags = 2)
> func()
> (sees flags = 3)
> if (flags & IRQ_WORK_PENDING)
> return false
> cmpxchg(flags, 2, 0);
> (flags = 0)
>
>
> Even with the update of xchg(), if CPU2 fetched the flags before CPU1
> did the xchg, then it would still lose out. But that's where your
> previous patch comes in that does:
>
> flags = work->flags & ~IRQ_WORK_PENDING;
> for (;;) {
> nflags = flags | IRQ_WORK_FLAGS;
> oflags = cmpxchg(&work->flags, flags, nflags);
> if (oflags == flags)
> break;
> if (oflags & IRQ_WORK_PENDING)
> return false;
> flags = oflags;
> cpu_relax();
> }
>
>
> This now does:
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> xchg(&flags, IRQ_WORK_BUSY)
> (flags = 2)
> func()
> oflags = cmpxchg(&flags, flags, nflags);
> (sees flags = 2)
> if (flags & IRQ_WORK_PENDING)
> (not true)
> (loop)
> cmpxchg(flags, 2, 0);
> (flags = 2)
> flags = 3
>
>
>
> With both patch 1 and 2 you fixed the bug.
>
> I guess you suffer from...
>
> http://weknowmemes.com/wp-content/uploads/2012/09/my-code-doesnt-work-i-have-no-idea-why.jpg
>
> ;-)
>

Heh. No I was ok with all the above :) My point was that
xchg()/cmpxchg() is good to synchronize against flag values and the
need or not to queue the work, but memory barriers couldn't replace
that, and my ordering assumption against user data manipulated by work
were wrong. As such my changelogs are buggy.

My last doubt was: does the cmpxchg/xchg pair we have can settle an
ordering between flags and user data such that we don't need locking
for these. But as you highlight below, we definetly need locking for
user data anyway.

>>
>>
>> CPU 0 CPU 1
>>
>> store(work data) xchg(flags, IRQ_WORK_BUSY)
>> cmpxchg(flags, IRQ_WORK_FLAGS) load(work data)
>>
>> Can I make this assumption?
>>
>> - If CPU 0 fails the cmpxchg() (which means CPU 1 has not yet xchg())
>> then CPU 1 will execute the work and see our data.
>>
>> At least cmpxchg / xchg pair orders correctly to ensure somebody will
>> execute our work. Now probably some locking is needed from the work
>> function itself if it's not per cpu.
>
> Yeah, there's nothing stopping the work->func() from executing from two
> different CPUs. The protection is just for the internal use of llist
> that is used. We don't need to worry about it being queued twice and
> corrupting the work->llnode.
>
> But the synchronization of the func() should be up to the func() code
> itself, not a guarantee of the irq_work.

Agreed.

I'll refine my changelogs accordingly.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting [ In reply to ]

anish198519851985 at gmail

Oct 31, 2012, 4:04 AM

Post #8 of 12 (1346 views)

On Tue, 2012-10-30 at 22:23 -0400, Steven Rostedt wrote:
> On Wed, 2012-10-31 at 01:36 +0100, Frederic Weisbecker wrote:
> > 2012/10/30 anish kumar <anish198519851985@gmail.com>:
> > > As I understand without the memory barrier proposed by you the situation
> > > would be as below:
> > > CPU 0 CPU 1
> > >
> > > data = something flags = IRQ_WORK_BUSY
> > > smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0)
> > > on flags in claim)
> > > _success_ in claiming and goes
> > > ahead and execute the work(wrong?)
> > > cmpxchg cause flag to IRQ_WORK_BUSY
> > >
> > > Now knows the flag==IRQ_WORK_BUSY
> > >
> > > Am I right?
> >
> > (Adding Paul in Cc because I'm again confused with memory barriers)
> >
> > Actually what I had in mind is rather that CPU 0 fails its claim
> > because it's not seeing the IRQ_WORK_BUSY flag as it should:
> >
> >
> > CPU 0 CPU 1
> >
> > data = something flags = IRQ_WORK_BUSY
> > cmpxchg() for claim execute_work (sees data from CPU 0)
> >
> > CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this
> > value in a non-atomic way.
> >
> > Also, while browsing Paul's perfbook, I realize my changelog is buggy.
> > It seems we can't reliably use memory barriers here because we would
> > be in the following case:
> >
> > CPU 0 CPU 1
> >
> > store(work data) store(flags)
> > smp_mb() smp_mb()
> > load(flags) load(work data)
> >
> > On top of this barrier pairing, we can't make the assumption that, for
> > example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees
> > the flags stored in CPU 1.
> >
> > So now I wonder if cmpxchg() can give us more confidence:
>
> More confidence over what? The xchg()? They are equivalent (wrt memory
> barriers).
>
> Here's the issue that currently exists. Let's look at the code:
>
>
> /*
> * Claim the entry so that no one else will poke at it.
> */
> static bool irq_work_claim(struct irq_work *work)
> {
> unsigned long flags, nflags;
>
> for (;;) {
> flags = work->flags;
> if (flags & IRQ_WORK_PENDING)
> return false;
> nflags = flags | IRQ_WORK_FLAGS;
nflags = 1 | 3
nflags = 2 | 3
In both cases the result would be same.If I am right then wouldn't this
operation be redundant?
> if (cmpxchg(&work->flags, flags, nflags) == flags)
> break;
> cpu_relax();
> }
>
> return true;
> }
>
> and
>
> llnode = llist_del_all(this_list);
> while (llnode != NULL) {
> work = llist_entry(llnode, struct irq_work, llnode);
>
> llnode = llist_next(llnode);
>
> /*
> * Clear the PENDING bit, after this point the @work
> * can be re-used.
> */
> work->flags = IRQ_WORK_BUSY;
> work->func(work);
> /*
> * Clear the BUSY bit and return to the free state if
> * no-one else claimed it meanwhile.
> */
> (void)cmpxchg(&work->flags, IRQ_WORK_BUSY, 0);
> }
>
> The irq_work_claim() will only queue its work if it's not already
> pending. If it is pending, then someone is going to process it anyway.
> But once we start running the function, new work needs to be processed
> again.
>
> Thus we have:
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
>
> if (flags & IRQ_WORK_PENDING)
> return false
> flags = IRQ_WORK_BUSY
> (flags = 2)
> func()
>
> The above shows the normal case were CPU 2 doesn't need to queue work,
> because its going to be done for it by CPU 1. But...
>
>
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> flags = IRQ_WORK_BUSY
> (flags = 2)
> func()
> (sees flags = 3)
> if (flags & IRQ_WORK_PENDING)
> return false
> cmpxchg(flags, 2, 0);
> (flags = 0)
>
>
> Here, because we did not do a memory barrier after
> flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work,
> and missed the opportunity. Now if you had this fix with the xchg() as
> you have in your patch, then CPU 2 would not see the stale flags.
> Except, with the code I showed above it still can!
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> (fetch flags)
> xchg(&flags, IRQ_WORK_BUSY)
> (flags = 2)
> func()
> (sees flags = 3)
> if (flags & IRQ_WORK_PENDING)
> return false
> cmpxchg(flags, 2, 0);
> (flags = 0)
>
>
> Even with the update of xchg(), if CPU2 fetched the flags before CPU1
> did the xchg, then it would still lose out. But that's where your
> previous patch comes in that does:
>
> flags = work->flags & ~IRQ_WORK_PENDING;
> for (;;) {
> nflags = flags | IRQ_WORK_FLAGS;
> oflags = cmpxchg(&work->flags, flags, nflags);
> if (oflags == flags)
> break;
> if (oflags & IRQ_WORK_PENDING)
> return false;
> flags = oflags;
> cpu_relax();
> }
>
>
> This now does:
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> xchg(&flags, IRQ_WORK_BUSY)
> (flags = 2)
> func()
> oflags = cmpxchg(&flags, flags, nflags);
> (sees flags = 2)
> if (flags & IRQ_WORK_PENDING)
> (not true)
> (loop)
> cmpxchg(flags, 2, 0);
> (flags = 2)
> flags = 3
>
>
>
> With both patch 1 and 2 you fixed the bug.
This is the best explanation anyone can put forward for this problem.
>
> I guess you suffer from...
>
> http://weknowmemes.com/wp-content/uploads/2012/09/my-code-doesnt-work-i-have-no-idea-why.jpg
>
> ;-)
>
>
> >
> >
> > CPU 0 CPU 1
> >
> > store(work data) xchg(flags, IRQ_WORK_BUSY)
> > cmpxchg(flags, IRQ_WORK_FLAGS) load(work data)
> >
> > Can I make this assumption?
> >
> > - If CPU 0 fails the cmpxchg() (which means CPU 1 has not yet xchg())
> > then CPU 1 will execute the work and see our data.
> >
> > At least cmpxchg / xchg pair orders correctly to ensure somebody will
> > execute our work. Now probably some locking is needed from the work
> > function itself if it's not per cpu.
>
> Yeah, there's nothing stopping the work->func() from executing from two
> different CPUs. The protection is just for the internal use of llist
> that is used. We don't need to worry about it being queued twice and
> corrupting the work->llnode.
>
> But the synchronization of the func() should be up to the func() code
> itself, not a guarantee of the irq_work.
>
> -- Steve
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting [ In reply to ]

anish198519851985 at gmail

Oct 31, 2012, 4:06 AM

Post #9 of 12 (1352 views)

On Tue, 2012-10-30 at 14:45 -0400, Steven Rostedt wrote:
> On Wed, 2012-10-31 at 03:33 +0900, anish kumar wrote:
>
> > > CPU 0 CPU 1
> > >
> > > data = something flags = IRQ_WORK_BUSY
> > > smp_mb() (implicit with cmpxchg smp_mb()
> > > on flags in claim) execute_work (sees data from CPU
> > > 0)
> > > try to claim
> > >
> > As I understand without the memory barrier proposed by you the situation
> > would be as below:
> > CPU 0 CPU 1
> >
> > data = something flags = IRQ_WORK_BUSY
> > smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0)
> > on flags in claim)
> > _success_ in claiming and goes
>
> Correct, because it would see the stale value of flags.
>
> > ahead and execute the work(wrong?)
> > cmpxchg cause flag to IRQ_WORK_BUSY
> >
> > Now knows the flag==IRQ_WORK_BUSY
> >
> > Am I right?
>
> right.
>
> >
> > Probably a stupid question.Why do we return the bool from irq_work_queue
> > when no one bothers to check the return value?Wouldn't it be better if
> > this function is void as used by the users of this function or am I
> > looking at the wrong code.
>
> Not a stupid question, as I was asking that to myself just earlier
> today. But forgot to mention it as well. Especially, because it makes it
> look like there's a bug in the code. Maybe someday someone will care if
> their work was finished by itself, or some other CPU.
>
> Probably should just nix the return value.
Can I send a patch to fix this?
>
> -- Steve
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting [ In reply to ]

rostedt at goodmis

Oct 31, 2012, 5:59 AM

Post #10 of 12 (1348 views)

On Wed, 2012-10-31 at 20:04 +0900, anish kumar wrote:

> > /*
> > * Claim the entry so that no one else will poke at it.
> > */
> > static bool irq_work_claim(struct irq_work *work)
> > {
> > unsigned long flags, nflags;
> >
> > for (;;) {
> > flags = work->flags;
> > if (flags & IRQ_WORK_PENDING)
> > return false;
> > nflags = flags | IRQ_WORK_FLAGS;
> nflags = 1 | 3
> nflags = 2 | 3
> In both cases the result would be same.If I am right then wouldn't this
> operation be redundant?

Right. Actually we could change the new loop to:

for (;;) {
oflags = cmpxchg(&work->flags, flags, IRQ_WORK_FLAGS);
if (oflags == flags)
break;
if (oflags & IRQ_WORK_PENDING)
return false;
flags = oflags;
cpu_relax();
}

> > if (cmpxchg(&work->flags, flags, nflags) == flags)
> > break;
> > cpu_relax();
> > }
> >
> > return true;
> > }
> >

> >
> > This now does:
> >
> > CPU 1 CPU 2
> > ----- -----
> > (flags = 0)
> > cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> > (flags = 3)
> > [...]
> > xchg(&flags, IRQ_WORK_BUSY)
> > (flags = 2)
> > func()
> > oflags = cmpxchg(&flags, flags, nflags);
> > (sees flags = 2)
> > if (flags & IRQ_WORK_PENDING)
> > (not true)
> > (loop)
> > cmpxchg(flags, 2, 0);
> > (flags = 2)
> > flags = 3
> >
> >
> >
> > With both patch 1 and 2 you fixed the bug.
> This is the best explanation anyone can put forward for this problem.

Frederic,

Would you like to add my explanation to your change log? You can add the
entire thing, which I think would explain a lot to people.

Thanks,

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting [ In reply to ]

fweisbec at gmail

Oct 31, 2012, 6:32 AM

Post #11 of 12 (1350 views)

2012/10/31 Steven Rostedt <rostedt@goodmis.org>:
> On Wed, 2012-10-31 at 20:04 +0900, anish kumar wrote:
>> nflags = 1 | 3
>> nflags = 2 | 3
>> In both cases the result would be same.If I am right then wouldn't this
>> operation be redundant?
>
> Right. Actually we could change the new loop to:
>
> for (;;) {
> oflags = cmpxchg(&work->flags, flags, IRQ_WORK_FLAGS);
> if (oflags == flags)
> break;
> if (oflags & IRQ_WORK_PENDING)
> return false;
> flags = oflags;
> cpu_relax();
> }

We could. But I wanted to keep the code able to handle new flags in
the future (such as IRQ_WORK_LAZY).

> Frederic,
>
> Would you like to add my explanation to your change log? You can add the
> entire thing, which I think would explain a lot to people.

It's indeed a very clear explanation. I'll put that in the changelog, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting [ In reply to ]

rostedt at goodmis

Oct 31, 2012, 6:51 AM

Post #12 of 12 (1356 views)

On Wed, 2012-10-31 at 20:04 +0900, anish kumar wrote:

> > This now does:
> >
> > CPU 1 CPU 2
> > ----- -----
> > (flags = 0)
> > cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> > (flags = 3)
> > [...]

We can still add here:

(fetch flags)

> > xchg(&flags, IRQ_WORK_BUSY)
> > (flags = 2)
> > func()
> > oflags = cmpxchg(&flags, flags, nflags);

Then the cmpxchg() would fail, and oflags would be 2

> > (sees flags = 2)
> > if (flags & IRQ_WORK_PENDING)

This should be:
if (oflags & IRQ_WORK_PENDING)

> > (not true)
> > (loop)
> > cmpxchg(flags, 2, 0);
> > (flags = 2)

This should be:
(flags = 0)
as we described the previous cmpxchg() as failing, flags would still be
2 before this cmpxchg(), and this one would succeed.

-- Steve

> > flags = 3
> >
> >
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/