Mailing List Archive

[REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6
After resuming a ptracee with PTRACE_SINGLESTEP, in the following
ptrace stop retrieving the dr6 value for the tracee gets a value that
does not include DR_STEP (it is in fact always DR6_RESERVED). I
bisected this to the 13cb73490f475f8e7669f9288be0bcfa85399b1f merge. I
did not bisect further.

I don't see any handling to ever set DR_STEP in virtual_dr6, so I
think this code is just broken.

Sorry for not testing this when I was CCd on the original patch series :)

- Kyle
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 07:33:08AM -0700, Kyle Huey wrote:
> After resuming a ptracee with PTRACE_SINGLESTEP, in the following
> ptrace stop retrieving the dr6 value for the tracee gets a value that
> does not include DR_STEP (it is in fact always DR6_RESERVED). I
> bisected this to the 13cb73490f475f8e7669f9288be0bcfa85399b1f merge. I
> did not bisect further.
>
> I don't see any handling to ever set DR_STEP in virtual_dr6, so I
> think this code is just broken.
>
> Sorry for not testing this when I was CCd on the original patch series :)

Urgh, now I have to try and remember how all that worked again ...

I suspect it's either one (or both) of the last two:

f4956cf83ed1 ("x86/debug: Support negative polarity DR6 bits")
d53d9bc0cf78 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6")


Just to clarify, the sequence is something like:

- tracer: ptrace(PTRACE_SINGLESTEP)
- tracee: #DB, DR6 contains DR_STEP
- tracer: ptrace_get_debugreg(6)

?

You're right that that would be broken, let me try and figure out what
the best way would be 'fix' that.

Also, can you confirm that pthread_set_debugreg(6) should not do
anything useful?
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 04:55:21PM +0100, Peter Zijlstra wrote:
> On Mon, Oct 26, 2020 at 07:33:08AM -0700, Kyle Huey wrote:
> > After resuming a ptracee with PTRACE_SINGLESTEP, in the following
> > ptrace stop retrieving the dr6 value for the tracee gets a value that
> > does not include DR_STEP (it is in fact always DR6_RESERVED). I
> > bisected this to the 13cb73490f475f8e7669f9288be0bcfa85399b1f merge. I
> > did not bisect further.
> >
> > I don't see any handling to ever set DR_STEP in virtual_dr6, so I
> > think this code is just broken.
> >
> > Sorry for not testing this when I was CCd on the original patch series :)
>
> Urgh, now I have to try and remember how all that worked again ...
>
> I suspect it's either one (or both) of the last two:
>
> f4956cf83ed1 ("x86/debug: Support negative polarity DR6 bits")
> d53d9bc0cf78 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6")
>
>
> Just to clarify, the sequence is something like:
>
> - tracer: ptrace(PTRACE_SINGLESTEP)
> - tracee: #DB, DR6 contains DR_STEP
> - tracer: ptrace_get_debugreg(6)
>
> ?
>
> You're right that that would be broken, let me try and figure out what
> the best way would be 'fix' that.
>
> Also, can you confirm that pthread_set_debugreg(6) should not do
> anything useful?


Does something like this make sense?


diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3c70fb34028b..0e7641ac19a8 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -799,6 +799,13 @@ static __always_inline unsigned long debug_read_clear_dr6(void)
*/
current->thread.virtual_dr6 = 0;

+ /*
+ * If PTRACE requested SINGLE(BLOCK)STEP, make sure to reflect that in
+ * the ptrace visible DR6 copy.
+ */
+ if (test_thread_flag(TIF_BLOCKSTEP) || test_thread_flag(TIF_SINGLESTEP))
+ current->thread.virtual_dr6 |= dr6 & DR_STEP;
+
/*
* The SDM says "The processor clears the BTF flag when it
* generates a debug exception." Clear TIF_BLOCKSTEP to keep
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 8:55 AM Peter Zijlstra <peterz@infradead.org> wrote:
> Urgh, now I have to try and remember how all that worked again ...

Sorry.

> I suspect it's either one (or both) of the last two:
>
> f4956cf83ed1 ("x86/debug: Support negative polarity DR6 bits")
> d53d9bc0cf78 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6")

I think it's the latter, particularly the removal of this assignment[0]

> Just to clarify, the sequence is something like:
>
> - tracer: ptrace(PTRACE_SINGLESTEP)
> - tracee: #DB, DR6 contains DR_STEP
> - tracer: ptrace_get_debugreg(6)

Right.

> Also, can you confirm that pthread_set_debugreg(6) should not do
> anything useful?

I don't believe it did anything useful.

- Kyle

[0] https://github.com/torvalds/linux/commit/d53d9bc0cf78#diff-51ce909c2f65ed9cc668bc36cc3c18528541d8a10e84287874cd37a5918abae5L790
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 9:05 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Oct 26, 2020 at 04:55:21PM +0100, Peter Zijlstra wrote:
> > On Mon, Oct 26, 2020 at 07:33:08AM -0700, Kyle Huey wrote:
> > > After resuming a ptracee with PTRACE_SINGLESTEP, in the following
> > > ptrace stop retrieving the dr6 value for the tracee gets a value that
> > > does not include DR_STEP (it is in fact always DR6_RESERVED). I
> > > bisected this to the 13cb73490f475f8e7669f9288be0bcfa85399b1f merge. I
> > > did not bisect further.
> > >
> > > I don't see any handling to ever set DR_STEP in virtual_dr6, so I
> > > think this code is just broken.
> > >
> > > Sorry for not testing this when I was CCd on the original patch series :)
> >
> > Urgh, now I have to try and remember how all that worked again ...
> >
> > I suspect it's either one (or both) of the last two:
> >
> > f4956cf83ed1 ("x86/debug: Support negative polarity DR6 bits")
> > d53d9bc0cf78 ("x86/debug: Change thread.debugreg6 to thread.virtual_dr6")
> >
> >
> > Just to clarify, the sequence is something like:
> >
> > - tracer: ptrace(PTRACE_SINGLESTEP)
> > - tracee: #DB, DR6 contains DR_STEP
> > - tracer: ptrace_get_debugreg(6)
> >
> > ?
> >
> > You're right that that would be broken, let me try and figure out what
> > the best way would be 'fix' that.
> >
> > Also, can you confirm that pthread_set_debugreg(6) should not do
> > anything useful?
>
>
> Does something like this make sense?
>
>
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 3c70fb34028b..0e7641ac19a8 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -799,6 +799,13 @@ static __always_inline unsigned long debug_read_clear_dr6(void)
> */
> current->thread.virtual_dr6 = 0;
>
> + /*
> + * If PTRACE requested SINGLE(BLOCK)STEP, make sure to reflect that in
> + * the ptrace visible DR6 copy.
> + */
> + if (test_thread_flag(TIF_BLOCKSTEP) || test_thread_flag(TIF_SINGLESTEP))
> + current->thread.virtual_dr6 |= dr6 & DR_STEP;
> +
> /*
> * The SDM says "The processor clears the BTF flag when it
> * generates a debug exception." Clear TIF_BLOCKSTEP to keep

I don't think the `& DR_STEP` should be necessary, that bit should be
set by the hardware, I believe.

Also if a program sets TF on itself in EFLAGS and generates a trap it
should still have DR_STEP set in the ptrace-visible dr6, which this
won't do.

Is there a reason not to always copy dr6 into virtual_dr6 here,
regardless of TIF_SINGLESTEP/etc?

- Kyle
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 09:14:13AM -0700, Kyle Huey wrote:
> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > index 3c70fb34028b..0e7641ac19a8 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -799,6 +799,13 @@ static __always_inline unsigned long debug_read_clear_dr6(void)
> > */
> > current->thread.virtual_dr6 = 0;
> >
> > + /*
> > + * If PTRACE requested SINGLE(BLOCK)STEP, make sure to reflect that in
> > + * the ptrace visible DR6 copy.
> > + */
> > + if (test_thread_flag(TIF_BLOCKSTEP) || test_thread_flag(TIF_SINGLESTEP))
> > + current->thread.virtual_dr6 |= dr6 & DR_STEP;
> > +
> > /*
> > * The SDM says "The processor clears the BTF flag when it
> > * generates a debug exception." Clear TIF_BLOCKSTEP to keep
>
> I don't think the `& DR_STEP` should be necessary, that bit should be
> set by the hardware, I believe.

Yeah, the idea is to only 'copy' the DR_STEP bit, not any others.

> Also if a program sets TF on itself in EFLAGS and generates a trap it
> should still have DR_STEP set in the ptrace-visible dr6, which this
> won't do.

Why? The state was not requested by ptrace(). And the signal should have
it's si_code set to TRACE_TRACE.

> Is there a reason not to always copy dr6 into virtual_dr6 here,
> regardless of TIF_SINGLESTEP/etc?

Why should we expose DR6 bits that are the result of kernel internal
actions?

Suppose someone sets an in-kernel breakpoint (perf can do that for
example), the #DB happens and we write whatever into virtual_dr6.

Even if you have a userspace breakpoint not through ptrace(), then why
should we expose that in dr6 ?


In that respect, I think the current virtual_dr6 = 0 is placed wrong, it
should only be in exc_debug_user(). The only 'problem' then is that we
seem to be able to loose BTF, but perhaps that is already an extant bug.

Consider:

- perf: setup in-kernel #DB
- tracer: ptrace(PTRACE_SINGLEBLOCK)
- tracee: #DB on perf breakpoint, looses BTF
- tracee .. never triggers actual blockstep

Hmm ? Should we re-set BTF when TIF_BLOCKSTEP && !user_mode(regs) ?
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 05:31:00PM +0100, Peter Zijlstra wrote:
> In that respect, I think the current virtual_dr6 = 0 is placed wrong, it
> should only be in exc_debug_user(). The only 'problem' then is that we
> seem to be able to loose BTF, but perhaps that is already an extant bug.
>
> Consider:
>
> - perf: setup in-kernel #DB
> - tracer: ptrace(PTRACE_SINGLEBLOCK)
> - tracee: #DB on perf breakpoint, looses BTF
> - tracee .. never triggers actual blockstep
>
> Hmm ? Should we re-set BTF when TIF_BLOCKSTEP && !user_mode(regs) ?

Something like so then.

Or sould we also have the userspace #DB re-set BTF when it was !DR_STEP?
I need to go untangle that ptrace stuff :/

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3c70fb34028b..31de8b0980ca 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -793,19 +793,6 @@ static __always_inline unsigned long debug_read_clear_dr6(void)
set_debugreg(DR6_RESERVED, 6);
dr6 ^= DR6_RESERVED; /* Flip to positive polarity */

- /*
- * Clear the virtual DR6 value, ptrace routines will set bits here for
- * things we want signals for.
- */
- current->thread.virtual_dr6 = 0;
-
- /*
- * The SDM says "The processor clears the BTF flag when it
- * generates a debug exception." Clear TIF_BLOCKSTEP to keep
- * TIF_BLOCKSTEP in sync with the hardware BTF flag.
- */
- clear_thread_flag(TIF_BLOCKSTEP);
-
return dr6;
}

@@ -873,6 +860,20 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
*/
WARN_ON_ONCE(user_mode(regs));

+ if (test_thread_flag(TIF_BLOCKSTEP)) {
+ /*
+ * The SDM says "The processor clears the BTF flag when it
+ * generates a debug exception." but PTRACE_BLOCKSTEP requested
+ * it for userspace, but we just took a kernel #DB, so re-set
+ * BTF.
+ */
+ unsigned long debugctl;
+
+ rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
+ debugctl |= DEBUGCTLMSR_BTF;
+ wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
+ }
+
/*
* Catch SYSENTER with TF set and clear DR_STEP. If this hit a
* watchpoint at the same time then that will still be handled.
@@ -935,6 +936,26 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
irqentry_enter_from_user_mode(regs);
instrumentation_begin();

+ /*
+ * Clear the virtual DR6 value, ptrace routines will set bits here for
+ * things we want signals for.
+ */
+ current->thread.virtual_dr6 = 0;
+
+ /*
+ * If PTRACE requested SINGLE(BLOCK)STEP, make sure to reflect that in
+ * the ptrace visible DR6 copy.
+ */
+ if (test_thread_flag(TIF_BLOCKSTEP) || test_thread_flag(TIF_SINGLESTEP))
+ current->thread.virtual_dr6 |= (dr6 & DR_STEP);
+
+ /*
+ * The SDM says "The processor clears the BTF flag when it
+ * generates a debug exception." Clear TIF_BLOCKSTEP to keep
+ * TIF_BLOCKSTEP in sync with the hardware BTF flag.
+ */
+ clear_thread_flag(TIF_BLOCKSTEP);
+
/*
* If dr6 has no reason to give us about the origin of this trap,
* then it's very likely the result of an icebp/int01 trap.
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 9:55 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Oct 26, 2020 at 05:31:00PM +0100, Peter Zijlstra wrote:
> > In that respect, I think the current virtual_dr6 = 0 is placed wrong, it
> > should only be in exc_debug_user(). The only 'problem' then is that we
> > seem to be able to loose BTF, but perhaps that is already an extant bug.
> >
> > Consider:
> >
> > - perf: setup in-kernel #DB
> > - tracer: ptrace(PTRACE_SINGLEBLOCK)
> > - tracee: #DB on perf breakpoint, looses BTF
> > - tracee .. never triggers actual blockstep
> >
> > Hmm ? Should we re-set BTF when TIF_BLOCKSTEP && !user_mode(regs) ?
>
> Something like so then.
>
> Or sould we also have the userspace #DB re-set BTF when it was !DR_STEP?
> I need to go untangle that ptrace stuff :/
>
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 3c70fb34028b..31de8b0980ca 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -793,19 +793,6 @@ static __always_inline unsigned long debug_read_clear_dr6(void)
> set_debugreg(DR6_RESERVED, 6);
> dr6 ^= DR6_RESERVED; /* Flip to positive polarity */
>
> - /*
> - * Clear the virtual DR6 value, ptrace routines will set bits here for
> - * things we want signals for.
> - */
> - current->thread.virtual_dr6 = 0;
> -
> - /*
> - * The SDM says "The processor clears the BTF flag when it
> - * generates a debug exception." Clear TIF_BLOCKSTEP to keep
> - * TIF_BLOCKSTEP in sync with the hardware BTF flag.
> - */
> - clear_thread_flag(TIF_BLOCKSTEP);
> -
> return dr6;
> }
>
> @@ -873,6 +860,20 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
> */
> WARN_ON_ONCE(user_mode(regs));
>
> + if (test_thread_flag(TIF_BLOCKSTEP)) {
> + /*
> + * The SDM says "The processor clears the BTF flag when it
> + * generates a debug exception." but PTRACE_BLOCKSTEP requested
> + * it for userspace, but we just took a kernel #DB, so re-set
> + * BTF.
> + */
> + unsigned long debugctl;
> +
> + rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
> + debugctl |= DEBUGCTLMSR_BTF;
> + wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
> + }
> +
> /*
> * Catch SYSENTER with TF set and clear DR_STEP. If this hit a
> * watchpoint at the same time then that will still be handled.
> @@ -935,6 +936,26 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
> irqentry_enter_from_user_mode(regs);
> instrumentation_begin();
>
> + /*
> + * Clear the virtual DR6 value, ptrace routines will set bits here for
> + * things we want signals for.
> + */
> + current->thread.virtual_dr6 = 0;
> +
> + /*
> + * If PTRACE requested SINGLE(BLOCK)STEP, make sure to reflect that in
> + * the ptrace visible DR6 copy.
> + */
> + if (test_thread_flag(TIF_BLOCKSTEP) || test_thread_flag(TIF_SINGLESTEP))
> + current->thread.virtual_dr6 |= (dr6 & DR_STEP);
> +
> + /*
> + * The SDM says "The processor clears the BTF flag when it
> + * generates a debug exception." Clear TIF_BLOCKSTEP to keep
> + * TIF_BLOCKSTEP in sync with the hardware BTF flag.
> + */
> + clear_thread_flag(TIF_BLOCKSTEP);
> +
> /*
> * If dr6 has no reason to give us about the origin of this trap,
> * then it's very likely the result of an icebp/int01 trap.

This looks good to me (at least the non BTF parts), and I'll test it
shortly, but especially now that clearing virtual_dr6 is moved to
exc_debug_user I still don't see why it's not ok to copy the entire
dr6 value into virtual_dr6 unconditionally. Any extraneous dr6 state
from an in-kernel #DB would have been picked up and cleared already
when we entered exc_debug_kernel.

- Kyle
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 10:12:30AM -0700, Kyle Huey wrote:
> On Mon, Oct 26, 2020 at 9:55 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > @@ -935,6 +936,26 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
> > irqentry_enter_from_user_mode(regs);
> > instrumentation_begin();
> >
> > + /*
> > + * Clear the virtual DR6 value, ptrace routines will set bits here for
> > + * things we want signals for.
> > + */
> > + current->thread.virtual_dr6 = 0;
> > +
> > + /*
> > + * If PTRACE requested SINGLE(BLOCK)STEP, make sure to reflect that in
> > + * the ptrace visible DR6 copy.
> > + */
> > + if (test_thread_flag(TIF_BLOCKSTEP) || test_thread_flag(TIF_SINGLESTEP))
> > + current->thread.virtual_dr6 |= (dr6 & DR_STEP);
> > +
> > + /*
> > + * The SDM says "The processor clears the BTF flag when it
> > + * generates a debug exception." Clear TIF_BLOCKSTEP to keep
> > + * TIF_BLOCKSTEP in sync with the hardware BTF flag.
> > + */
> > + clear_thread_flag(TIF_BLOCKSTEP);
> > +
> > /*
> > * If dr6 has no reason to give us about the origin of this trap,
> > * then it's very likely the result of an icebp/int01 trap.
>
> This looks good to me (at least the non BTF parts), and I'll test it
> shortly, but especially now that clearing virtual_dr6 is moved to
> exc_debug_user I still don't see why it's not ok to copy the entire
> dr6 value into virtual_dr6 unconditionally. Any extraneous dr6 state
> from an in-kernel #DB would have been picked up and cleared already
> when we entered exc_debug_kernel.

There is !ptrace user breakpoints as well. Why should we want potential
random bits in dr6 ?

Suppose perf and ptrace set a user breakpoint on the exact same
instruction. The #DB fires and has two DR_TRAP# bits set. perf consumes
one and ptrace consumes one.

Only the ptrace one should be visible to ptrace, the perf one doesn't
affect the userspace execution at all and shouldn't be visible.
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 10:18 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Oct 26, 2020 at 10:12:30AM -0700, Kyle Huey wrote:
> > On Mon, Oct 26, 2020 at 9:55 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > @@ -935,6 +936,26 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
> > > irqentry_enter_from_user_mode(regs);
> > > instrumentation_begin();
> > >
> > > + /*
> > > + * Clear the virtual DR6 value, ptrace routines will set bits here for
> > > + * things we want signals for.
> > > + */
> > > + current->thread.virtual_dr6 = 0;
> > > +
> > > + /*
> > > + * If PTRACE requested SINGLE(BLOCK)STEP, make sure to reflect that in
> > > + * the ptrace visible DR6 copy.
> > > + */
> > > + if (test_thread_flag(TIF_BLOCKSTEP) || test_thread_flag(TIF_SINGLESTEP))
> > > + current->thread.virtual_dr6 |= (dr6 & DR_STEP);
> > > +
> > > + /*
> > > + * The SDM says "The processor clears the BTF flag when it
> > > + * generates a debug exception." Clear TIF_BLOCKSTEP to keep
> > > + * TIF_BLOCKSTEP in sync with the hardware BTF flag.
> > > + */
> > > + clear_thread_flag(TIF_BLOCKSTEP);
> > > +
> > > /*
> > > * If dr6 has no reason to give us about the origin of this trap,
> > > * then it's very likely the result of an icebp/int01 trap.
> >
> > This looks good to me (at least the non BTF parts), and I'll test it
> > shortly, but especially now that clearing virtual_dr6 is moved to
> > exc_debug_user I still don't see why it's not ok to copy the entire
> > dr6 value into virtual_dr6 unconditionally. Any extraneous dr6 state
> > from an in-kernel #DB would have been picked up and cleared already
> > when we entered exc_debug_kernel.
>
> There is !ptrace user breakpoints as well. Why should we want potential
> random bits in dr6 ?
>
> Suppose perf and ptrace set a user breakpoint on the exact same
> instruction. The #DB fires and has two DR_TRAP# bits set. perf consumes
> one and ptrace consumes one.
>
> Only the ptrace one should be visible to ptrace, the perf one doesn't
> affect the userspace execution at all and shouldn't be visible.

Ok. Makes sense.

I can confirm that your second patch does fix the behavior I was
seeing and rr works again.

- Kyle
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 9:55 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Oct 26, 2020 at 05:31:00PM +0100, Peter Zijlstra wrote:
> > In that respect, I think the current virtual_dr6 = 0 is placed wrong, it
> > should only be in exc_debug_user(). The only 'problem' then is that we
> > seem to be able to loose BTF, but perhaps that is already an extant bug.
> >
> > Consider:
> >
> > - perf: setup in-kernel #DB
> > - tracer: ptrace(PTRACE_SINGLEBLOCK)
> > - tracee: #DB on perf breakpoint, looses BTF
> > - tracee .. never triggers actual blockstep
> >
> > Hmm ? Should we re-set BTF when TIF_BLOCKSTEP && !user_mode(regs) ?
>
> Something like so then.
>
> Or sould we also have the userspace #DB re-set BTF when it was !DR_STEP?
> I need to go untangle that ptrace stuff :/
>
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 3c70fb34028b..31de8b0980ca 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -793,19 +793,6 @@ static __always_inline unsigned long debug_read_clear_dr6(void)
> set_debugreg(DR6_RESERVED, 6);
> dr6 ^= DR6_RESERVED; /* Flip to positive polarity */
>
> - /*
> - * Clear the virtual DR6 value, ptrace routines will set bits here for
> - * things we want signals for.
> - */
> - current->thread.virtual_dr6 = 0;
> -
> - /*
> - * The SDM says "The processor clears the BTF flag when it
> - * generates a debug exception." Clear TIF_BLOCKSTEP to keep
> - * TIF_BLOCKSTEP in sync with the hardware BTF flag.
> - */
> - clear_thread_flag(TIF_BLOCKSTEP);
> -
> return dr6;
> }
>
> @@ -873,6 +860,20 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
> */
> WARN_ON_ONCE(user_mode(regs));
>
> + if (test_thread_flag(TIF_BLOCKSTEP)) {
> + /*
> + * The SDM says "The processor clears the BTF flag when it
> + * generates a debug exception." but PTRACE_BLOCKSTEP requested
> + * it for userspace, but we just took a kernel #DB, so re-set
> + * BTF.
> + */
> + unsigned long debugctl;
> +
> + rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
> + debugctl |= DEBUGCTLMSR_BTF;
> + wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
> + }
> +
> /*
> * Catch SYSENTER with TF set and clear DR_STEP. If this hit a
> * watchpoint at the same time then that will still be handled.
> @@ -935,6 +936,26 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
> irqentry_enter_from_user_mode(regs);
> instrumentation_begin();
>
> + /*
> + * Clear the virtual DR6 value, ptrace routines will set bits here for
> + * things we want signals for.
> + */
> + current->thread.virtual_dr6 = 0;
> +
> + /*
> + * If PTRACE requested SINGLE(BLOCK)STEP, make sure to reflect that in
> + * the ptrace visible DR6 copy.
> + */
> + if (test_thread_flag(TIF_BLOCKSTEP) || test_thread_flag(TIF_SINGLESTEP))
> + current->thread.virtual_dr6 |= (dr6 & DR_STEP);

I'm guessing that this would fail a much simpler test, though: have a
program use PUSHF to set TF and then read out DR6 from the SIGTRAP. I
can whip up such a test if you like.

Is there any compelling reason not to just drop the condition and do:

current->thread.virtual_dr6 |= (dr6 & DR_STEP);

unconditionally? This DR6 cause, along with ICEBP, have the
regrettable distinctions of being the only causes that a user program
can trigger all on its own without informing the kernel first. This
means that we can't fully separate the concept of "user mode is
single-stepping itself" from "ptrace or something else is causing the
kernel to single step a program."

I bet that, without making this tweak, the virtual_dr6 change will
regress some horrific Wine use case.

--Andy
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 4:30 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Mon, Oct 26, 2020 at 9:55 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Oct 26, 2020 at 05:31:00PM +0100, Peter Zijlstra wrote:
> > > In that respect, I think the current virtual_dr6 = 0 is placed wrong, it
> > > should only be in exc_debug_user(). The only 'problem' then is that we
> > > seem to be able to loose BTF, but perhaps that is already an extant bug.
> > >
> > > Consider:
> > >
> > > - perf: setup in-kernel #DB
> > > - tracer: ptrace(PTRACE_SINGLEBLOCK)
> > > - tracee: #DB on perf breakpoint, looses BTF
> > > - tracee .. never triggers actual blockstep
> > >
> > > Hmm ? Should we re-set BTF when TIF_BLOCKSTEP && !user_mode(regs) ?
> >
> > Something like so then.
> >
> > Or sould we also have the userspace #DB re-set BTF when it was !DR_STEP?
> > I need to go untangle that ptrace stuff :/
> >
> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > index 3c70fb34028b..31de8b0980ca 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -793,19 +793,6 @@ static __always_inline unsigned long debug_read_clear_dr6(void)
> > set_debugreg(DR6_RESERVED, 6);
> > dr6 ^= DR6_RESERVED; /* Flip to positive polarity */
> >
> > - /*
> > - * Clear the virtual DR6 value, ptrace routines will set bits here for
> > - * things we want signals for.
> > - */
> > - current->thread.virtual_dr6 = 0;
> > -
> > - /*
> > - * The SDM says "The processor clears the BTF flag when it
> > - * generates a debug exception." Clear TIF_BLOCKSTEP to keep
> > - * TIF_BLOCKSTEP in sync with the hardware BTF flag.
> > - */
> > - clear_thread_flag(TIF_BLOCKSTEP);
> > -
> > return dr6;
> > }
> >
> > @@ -873,6 +860,20 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
> > */
> > WARN_ON_ONCE(user_mode(regs));
> >
> > + if (test_thread_flag(TIF_BLOCKSTEP)) {
> > + /*
> > + * The SDM says "The processor clears the BTF flag when it
> > + * generates a debug exception." but PTRACE_BLOCKSTEP requested
> > + * it for userspace, but we just took a kernel #DB, so re-set
> > + * BTF.
> > + */
> > + unsigned long debugctl;
> > +
> > + rdmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
> > + debugctl |= DEBUGCTLMSR_BTF;
> > + wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctl);
> > + }
> > +
> > /*
> > * Catch SYSENTER with TF set and clear DR_STEP. If this hit a
> > * watchpoint at the same time then that will still be handled.
> > @@ -935,6 +936,26 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
> > irqentry_enter_from_user_mode(regs);
> > instrumentation_begin();
> >
> > + /*
> > + * Clear the virtual DR6 value, ptrace routines will set bits here for
> > + * things we want signals for.
> > + */
> > + current->thread.virtual_dr6 = 0;
> > +
> > + /*
> > + * If PTRACE requested SINGLE(BLOCK)STEP, make sure to reflect that in
> > + * the ptrace visible DR6 copy.
> > + */
> > + if (test_thread_flag(TIF_BLOCKSTEP) || test_thread_flag(TIF_SINGLESTEP))
> > + current->thread.virtual_dr6 |= (dr6 & DR_STEP);
>
> I'm guessing that this would fail a much simpler test, though: have a
> program use PUSHF to set TF and then read out DR6 from the SIGTRAP. I
> can whip up such a test if you like.
>
> Is there any compelling reason not to just drop the condition and do:
>
> current->thread.virtual_dr6 |= (dr6 & DR_STEP);
>
> unconditionally? This DR6 cause, along with ICEBP, have the
> regrettable distinctions of being the only causes that a user program
> can trigger all on its own without informing the kernel first. This
> means that we can't fully separate the concept of "user mode is
> single-stepping itself" from "ptrace or something else is causing the
> kernel to single step a program."
>
> I bet that, without making this tweak, the virtual_dr6 change will
> regress some horrific Wine use case.

PeterZ, this new scheme of having handlers clear bits in dr6 to
consume them and set bits in virtual_dr6 to send signals is
incomprehensible -- there is no possible way to read traps.c and tell
what the code does :(

I attached a test case. I'll make a real patch out of this in a bit.
This passes on 5.8, and I haven't tested it yet on 5.10-rc1. The real
patch will also test ICEBP, and I'm sure we'll be quite unhappy with
the result of that.
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 04:45:18PM -0700, Andy Lutomirski wrote:

> PeterZ, this new scheme of having handlers clear bits in dr6 to
> consume them and set bits in virtual_dr6 to send signals is
> incomprehensible -- there is no possible way to read traps.c and tell
> what the code does :(

IIRC, it was you who suggested it. Also the old code was equally
incomprehensible. Now it has actual semantics.

> I attached a test case. I'll make a real patch out of this in a bit.
> This passes on 5.8, and I haven't tested it yet on 5.10-rc1. The real
> patch will also test ICEBP, and I'm sure we'll be quite unhappy with
> the result of that.

> diff --git a/tools/testing/selftests/x86/single_step_syscall.c b/tools/testing/selftests/x86/single_step_syscall.c
> index 5a4c6e06872e..f6abefd4066e 100644
> --- a/tools/testing/selftests/x86/single_step_syscall.c
> +++ b/tools/testing/selftests/x86/single_step_syscall.c
> @@ -72,7 +72,6 @@ static unsigned char altstack_data[SIGSTKSZ];
> static void sigtrap(int sig, siginfo_t *info, void *ctx_void)
> {
> ucontext_t *ctx = (ucontext_t*)ctx_void;
> - unsigned long dr6 = info->si_code;
>
> if (get_eflags() & X86_EFLAGS_TF) {
> set_eflags(get_eflags() & ~X86_EFLAGS_TF);
> @@ -89,7 +88,10 @@ static void sigtrap(int sig, siginfo_t *info, void *ctx_void)
> (unsigned long)ctx->uc_mcontext.gregs[REG_IP]);
> }
>
> - printf("DR6 = 0x%lx\n", dr6);
> + if (info->si_code != TRAP_TRACE) {
> + printf("[FAIL]\tsi_code was 0x%lx; should have been TRAP_TRACE (0x%lx)\n", (unsigned long)info->si_code, (unsigned long)TRAP_TRACE);
> + _exit(1);
> + }
> }

That actually works (tested). Nothing here cares about the virtual_dr6 value.

Think of virtual_dr6 as the value used by ptrace_{get,set}_debugreg(6).
It (should) only contain bits that results from other ptrace() actions.
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 04:30:32PM -0700, Andy Lutomirski wrote:

> > @@ -935,6 +936,26 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
> > irqentry_enter_from_user_mode(regs);
> > instrumentation_begin();
> >
> > + /*
> > + * Clear the virtual DR6 value, ptrace routines will set bits here for
> > + * things we want signals for.
> > + */
> > + current->thread.virtual_dr6 = 0;
> > +
> > + /*
> > + * If PTRACE requested SINGLE(BLOCK)STEP, make sure to reflect that in
> > + * the ptrace visible DR6 copy.
> > + */
> > + if (test_thread_flag(TIF_BLOCKSTEP) || test_thread_flag(TIF_SINGLESTEP))
> > + current->thread.virtual_dr6 |= (dr6 & DR_STEP);
>
> I'm guessing that this would fail a much simpler test, though: have a
> program use PUSHF to set TF and then read out DR6 from the SIGTRAP. I
> can whip up such a test if you like.

Kyle also mentioned it. The reason I didn't do that is because ptrace()
didn't set the TF, so why should it see it in ptrace_get_debugreg(6) ?

> Is there any compelling reason not to just drop the condition and do:
>
> current->thread.virtual_dr6 |= (dr6 & DR_STEP);
>
> unconditionally? This DR6 cause, along with ICEBP, have the
> regrettable distinctions of being the only causes that a user program
> can trigger all on its own without informing the kernel first. This
> means that we can't fully separate the concept of "user mode is
> single-stepping itself" from "ptrace or something else is causing the
> kernel to single step a program."

As per the other reply; TF and INT1 should work just fine. virtual_dr6
has nothing to do with that.
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Mon, Oct 26, 2020 at 04:30:32PM -0700, Andy Lutomirski wrote:
> Is there any compelling reason not to just drop the condition and do:
>
> current->thread.virtual_dr6 |= (dr6 & DR_STEP);
>
> unconditionally?

Because why should it?

> This DR6 cause, along with ICEBP, have the
> regrettable distinctions of being the only causes that a user program
> can trigger all on its own without informing the kernel first. This
> means that we can't fully separate the concept of "user mode is
> single-stepping itself" from "ptrace or something else is causing the
> kernel to single step a program."

TIF_SINGLESTEP does that. If the kernel is single-stepping userspace it
has TIF_SINGLESTEP (and possibly TIF_FORCED_TF) to distinguish these
cases.

> I bet that, without making this tweak, the virtual_dr6 change will
> regress some horrific Wine use case.

Then we should make sure the Wine people are aware and test this. Do you
know who to poke?

If there are regressions, we'll fix them, but I'd prefer to not create a
mess just because. This whole #DB thing was a giant trainwreck, we'll
obviously have to be bug compatible, but only when people actually
notice.
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Tue, Oct 27, 2020 at 10:08:14AM +0100, Peter Zijlstra wrote:
> On Mon, Oct 26, 2020 at 04:30:32PM -0700, Andy Lutomirski wrote:
> > Is there any compelling reason not to just drop the condition and do:
> >
> > current->thread.virtual_dr6 |= (dr6 & DR_STEP);
> >
> > unconditionally?
>
> Because why should it?

Because WINE does indeed rely on that. I'll go post a new version.
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Tue, Oct 27, 2020 at 1:19 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Oct 26, 2020 at 04:30:32PM -0700, Andy Lutomirski wrote:
>
> > > @@ -935,6 +936,26 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
> > > irqentry_enter_from_user_mode(regs);
> > > instrumentation_begin();
> > >
> > > + /*
> > > + * Clear the virtual DR6 value, ptrace routines will set bits here for
> > > + * things we want signals for.
> > > + */
> > > + current->thread.virtual_dr6 = 0;
> > > +
> > > + /*
> > > + * If PTRACE requested SINGLE(BLOCK)STEP, make sure to reflect that in
> > > + * the ptrace visible DR6 copy.
> > > + */
> > > + if (test_thread_flag(TIF_BLOCKSTEP) || test_thread_flag(TIF_SINGLESTEP))
> > > + current->thread.virtual_dr6 |= (dr6 & DR_STEP);
> >
> > I'm guessing that this would fail a much simpler test, though: have a
> > program use PUSHF to set TF and then read out DR6 from the SIGTRAP. I
> > can whip up such a test if you like.
>
> Kyle also mentioned it. The reason I didn't do that is because ptrace()
> didn't set the TF, so why should it see it in ptrace_get_debugreg(6) ?

I assume you already figured this out, but my specific concern is with
the get_si_code(dr6) part -- that's sent directly to the task being
debugged or debugging itself (and, sadly, to ptrace, and who knows
what debuggers do).
Re: [REGRESSION] x86/debug: After PTRACE_SINGLESTEP DR_STEP is no longer reported in dr6 [ In reply to ]
On Tue, Oct 27, 2020 at 11:00:52AM -0700, Andy Lutomirski wrote:
> On Tue, Oct 27, 2020 at 1:19 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Oct 26, 2020 at 04:30:32PM -0700, Andy Lutomirski wrote:
> >
> > > > @@ -935,6 +936,26 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
> > > > irqentry_enter_from_user_mode(regs);
> > > > instrumentation_begin();
> > > >
> > > > + /*
> > > > + * Clear the virtual DR6 value, ptrace routines will set bits here for
> > > > + * things we want signals for.
> > > > + */
> > > > + current->thread.virtual_dr6 = 0;
> > > > +
> > > > + /*
> > > > + * If PTRACE requested SINGLE(BLOCK)STEP, make sure to reflect that in
> > > > + * the ptrace visible DR6 copy.
> > > > + */
> > > > + if (test_thread_flag(TIF_BLOCKSTEP) || test_thread_flag(TIF_SINGLESTEP))
> > > > + current->thread.virtual_dr6 |= (dr6 & DR_STEP);
> > >
> > > I'm guessing that this would fail a much simpler test, though: have a
> > > program use PUSHF to set TF and then read out DR6 from the SIGTRAP. I
> > > can whip up such a test if you like.
> >
> > Kyle also mentioned it. The reason I didn't do that is because ptrace()
> > didn't set the TF, so why should it see it in ptrace_get_debugreg(6) ?
>
> I assume you already figured this out, but my specific concern is with
> the get_si_code(dr6) part -- that's sent directly to the task being
> debugged or debugging itself (and, sadly, to ptrace, and who knows
> what debuggers do).

Right, so for a task doing TF on its own, DR_STEP should remain set in
our on-stack dr6 variable, nothing should consume it.

So the get_si_code(dr6) should be identical. So the only difference is
if DR_STEP is visible in ptrace or not.