Mailing List Archive

[PATCH AUTOSEL 6.1 12/21] fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super()
From: Eric Biggers <ebiggers@google.com>

[ Upstream commit ec64036e68634231f5891faa2b7a81cdc5dcd001 ]

Now that the key associated with the "test_dummy_operation" mount option
is added on-demand when it's needed, rather than immediately when the
filesystem is mounted, fscrypt_destroy_keyring() no longer needs to be
called from __put_super() to avoid a memory leak on mount failure.

Remove this call, which was causing confusion because it appeared to be
a sleep-in-atomic bug (though it wasn't, for a somewhat-subtle reason).

Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20230208062107.199831-5-ebiggers@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/super.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/fs/super.c b/fs/super.c
index 4f8a626a35cd9..76d47620b930d 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -291,7 +291,6 @@ static void __put_super(struct super_block *s)
WARN_ON(s->s_inode_lru.node);
WARN_ON(!list_empty(&s->s_mounts));
security_sb_free(s);
- fscrypt_destroy_keyring(s);
put_user_ns(s->s_user_ns);
kfree(s->s_subtype);
call_rcu(&s->rcu, destroy_super_rcu);
--
2.39.0
Re: [PATCH AUTOSEL 6.1 12/21] fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super() [ In reply to ]
On Sat, Feb 25, 2023 at 10:42:47PM -0500, Sasha Levin wrote:
> From: Eric Biggers <ebiggers@google.com>
>
> [ Upstream commit ec64036e68634231f5891faa2b7a81cdc5dcd001 ]
>
> Now that the key associated with the "test_dummy_operation" mount option
> is added on-demand when it's needed, rather than immediately when the
> filesystem is mounted, fscrypt_destroy_keyring() no longer needs to be
> called from __put_super() to avoid a memory leak on mount failure.
>
> Remove this call, which was causing confusion because it appeared to be
> a sleep-in-atomic bug (though it wasn't, for a somewhat-subtle reason).
>
> Signed-off-by: Eric Biggers <ebiggers@google.com>
> Link: https://lore.kernel.org/r/20230208062107.199831-5-ebiggers@kernel.org
> Signed-off-by: Sasha Levin <sashal@kernel.org>

Why is this being backported?

- Eric
Re: [PATCH AUTOSEL 6.1 12/21] fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super() [ In reply to ]
On Sat, Feb 25, 2023 at 08:07:55PM -0800, Eric Biggers wrote:
> On Sat, Feb 25, 2023 at 10:42:47PM -0500, Sasha Levin wrote:
> > From: Eric Biggers <ebiggers@google.com>
> >
> > [ Upstream commit ec64036e68634231f5891faa2b7a81cdc5dcd001 ]
> >
> > Now that the key associated with the "test_dummy_operation" mount option
> > is added on-demand when it's needed, rather than immediately when the
> > filesystem is mounted, fscrypt_destroy_keyring() no longer needs to be
> > called from __put_super() to avoid a memory leak on mount failure.
> >
> > Remove this call, which was causing confusion because it appeared to be
> > a sleep-in-atomic bug (though it wasn't, for a somewhat-subtle reason).
> >
> > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > Link: https://lore.kernel.org/r/20230208062107.199831-5-ebiggers@kernel.org
> > Signed-off-by: Sasha Levin <sashal@kernel.org>
>
> Why is this being backported?
>
> - Eric

BTW, can you please permanently exclude all commits authored by me from AUTOSEL
so that I don't have to repeatedly complain about every commit individually?
Especially when these mails often come on weekends and holidays.

I know how to use Cc stable, and how to ask explicitly for a stable backport if
I find out after the fact that one is needed. (And other real people can always
ask too... not counting AUTOSEL, even though you are sending the AUTOSEL emails,
since clearly they go through no or very little human review.)

Of course, it's not just me that AUTOSEL isn't working for. So, you'll still
continue backporting random commits that I have to spend hours bisecting, e.g.
https://lore.kernel.org/stable/20220921155332.234913-7-sashal@kernel.org.

But at least I won't have to deal with this garbage for my own commits.

Now, I'm not sure I'll get a response to this --- I received no response to my
last AUTOSEL question at
https://lore.kernel.org/stable/Y1DTFiP12ws04eOM@sol.localdomain. So to
hopefully entice you to actually do something, I'm also letting you know that I
won't be reviewing any AUTOSEL mails for my commits anymore.

- Eric
Re: [PATCH AUTOSEL 6.1 12/21] fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super() [ In reply to ]
On Sat, Feb 25, 2023 at 09:30:37PM -0800, Eric Biggers wrote:
> On Sat, Feb 25, 2023 at 08:07:55PM -0800, Eric Biggers wrote:
> > On Sat, Feb 25, 2023 at 10:42:47PM -0500, Sasha Levin wrote:
> > > From: Eric Biggers <ebiggers@google.com>
> > >
> > > [ Upstream commit ec64036e68634231f5891faa2b7a81cdc5dcd001 ]
> > >
> > > Now that the key associated with the "test_dummy_operation" mount option
> > > is added on-demand when it's needed, rather than immediately when the
> > > filesystem is mounted, fscrypt_destroy_keyring() no longer needs to be
> > > called from __put_super() to avoid a memory leak on mount failure.
> > >
> > > Remove this call, which was causing confusion because it appeared to be
> > > a sleep-in-atomic bug (though it wasn't, for a somewhat-subtle reason).
> > >
> > > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > > Link: https://lore.kernel.org/r/20230208062107.199831-5-ebiggers@kernel.org
> > > Signed-off-by: Sasha Levin <sashal@kernel.org>
> >
> > Why is this being backported?
> >
> > - Eric
>
> BTW, can you please permanently exclude all commits authored by me from AUTOSEL
> so that I don't have to repeatedly complain about every commit individually?
> Especially when these mails often come on weekends and holidays.
>
> I know how to use Cc stable, and how to ask explicitly for a stable backport if
> I find out after the fact that one is needed. (And other real people can always
> ask too... not counting AUTOSEL, even though you are sending the AUTOSEL emails,
> since clearly they go through no or very little human review.)
>
> Of course, it's not just me that AUTOSEL isn't working for. So, you'll still
> continue backporting random commits that I have to spend hours bisecting, e.g.
> https://lore.kernel.org/stable/20220921155332.234913-7-sashal@kernel.org.
>
> But at least I won't have to deal with this garbage for my own commits.
>
> Now, I'm not sure I'll get a response to this --- I received no response to my
> last AUTOSEL question at
> https://lore.kernel.org/stable/Y1DTFiP12ws04eOM@sol.localdomain. So to
> hopefully entice you to actually do something, I'm also letting you know that I
> won't be reviewing any AUTOSEL mails for my commits anymore.
>

The really annoying thing is that someone even replied to your AUTOSEL email for
that broken patch and told you it is broken
(https://lore.kernel.org/stable/d91aaff1-470f-cfdf-41cf-031eea9d6aca@mailbox.org),
and ***you ignored it and applied the patch anyway***.

Why are you even sending these emails if you are ignoring feedback anyway?

How do I even get you to not apply a patch? Is it even possible?

I guess I might as well just add an email filter that auto-deletes all AUTOSEL
emails, as apparently there's no point in responding anyway?

- Eric
Re: [PATCH AUTOSEL 6.1 12/21] fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super() [ In reply to ]
On Sun, Feb 26, 2023 at 2:24?PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Sat, Feb 25, 2023 at 09:30:37PM -0800, Eric Biggers wrote:
> > On Sat, Feb 25, 2023 at 08:07:55PM -0800, Eric Biggers wrote:
> > > On Sat, Feb 25, 2023 at 10:42:47PM -0500, Sasha Levin wrote:
> > > > From: Eric Biggers <ebiggers@google.com>
> > > >
> > > > [ Upstream commit ec64036e68634231f5891faa2b7a81cdc5dcd001 ]
> > > >
> > > > Now that the key associated with the "test_dummy_operation" mount option
> > > > is added on-demand when it's needed, rather than immediately when the
> > > > filesystem is mounted, fscrypt_destroy_keyring() no longer needs to be
> > > > called from __put_super() to avoid a memory leak on mount failure.
> > > >
> > > > Remove this call, which was causing confusion because it appeared to be
> > > > a sleep-in-atomic bug (though it wasn't, for a somewhat-subtle reason).
> > > >
> > > > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > > > Link: https://lore.kernel.org/r/20230208062107.199831-5-ebiggers@kernel.org
> > > > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > >
> > > Why is this being backported?
> > >
> > > - Eric
> >
> > BTW, can you please permanently exclude all commits authored by me from AUTOSEL
> > so that I don't have to repeatedly complain about every commit individually?
> > Especially when these mails often come on weekends and holidays.
> >
> > I know how to use Cc stable, and how to ask explicitly for a stable backport if
> > I find out after the fact that one is needed. (And other real people can always
> > ask too... not counting AUTOSEL, even though you are sending the AUTOSEL emails,
> > since clearly they go through no or very little human review.)
> >
> > Of course, it's not just me that AUTOSEL isn't working for. So, you'll still
> > continue backporting random commits that I have to spend hours bisecting, e.g.
> > https://lore.kernel.org/stable/20220921155332.234913-7-sashal@kernel.org.
> >
> > But at least I won't have to deal with this garbage for my own commits.
> >
> > Now, I'm not sure I'll get a response to this --- I received no response to my
> > last AUTOSEL question at
> > https://lore.kernel.org/stable/Y1DTFiP12ws04eOM@sol.localdomain. So to
> > hopefully entice you to actually do something, I'm also letting you know that I
> > won't be reviewing any AUTOSEL mails for my commits anymore.
> >
>
> The really annoying thing is that someone even replied to your AUTOSEL email for
> that broken patch and told you it is broken
> (https://lore.kernel.org/stable/d91aaff1-470f-cfdf-41cf-031eea9d6aca@mailbox.org),
> and ***you ignored it and applied the patch anyway***.
>
> Why are you even sending these emails if you are ignoring feedback anyway?
>
> How do I even get you to not apply a patch? Is it even possible?
>
> I guess I might as well just add an email filter that auto-deletes all AUTOSEL
> emails, as apparently there's no point in responding anyway?

I test this branch for Greg but don't pay attention to these emails
Sasha sends out (because there's just waaaaay too many of them to look
through unless they get a reply; I find them quite annoying
otherwise.) But if these commits automatically get applied to stable
trees, even with objections from the committers, then I personally
question the methodology for having AUTOSEL in the first place.
Commits should be tested and backported with explicit purpose by their
developers, IMO.

-- Slade
Re: [PATCH AUTOSEL 6.1 12/21] fs/super.c: stop calling fscrypt_destroy_keyring() from __put_super() [ In reply to ]
On Sun, Feb 26, 2023 at 11:24:36AM -0800, Eric Biggers wrote:
>On Sat, Feb 25, 2023 at 09:30:37PM -0800, Eric Biggers wrote:
>> On Sat, Feb 25, 2023 at 08:07:55PM -0800, Eric Biggers wrote:
>> > On Sat, Feb 25, 2023 at 10:42:47PM -0500, Sasha Levin wrote:
>> > > From: Eric Biggers <ebiggers@google.com>
>> > >
>> > > [ Upstream commit ec64036e68634231f5891faa2b7a81cdc5dcd001 ]
>> > >
>> > > Now that the key associated with the "test_dummy_operation" mount option
>> > > is added on-demand when it's needed, rather than immediately when the
>> > > filesystem is mounted, fscrypt_destroy_keyring() no longer needs to be
>> > > called from __put_super() to avoid a memory leak on mount failure.
>> > >
>> > > Remove this call, which was causing confusion because it appeared to be
>> > > a sleep-in-atomic bug (though it wasn't, for a somewhat-subtle reason).
>> > >
>> > > Signed-off-by: Eric Biggers <ebiggers@google.com>
>> > > Link: https://lore.kernel.org/r/20230208062107.199831-5-ebiggers@kernel.org
>> > > Signed-off-by: Sasha Levin <sashal@kernel.org>
>> >
>> > Why is this being backported?
>> >
>> > - Eric
>>
>> BTW, can you please permanently exclude all commits authored by me from AUTOSEL
>> so that I don't have to repeatedly complain about every commit individually?
>> Especially when these mails often come on weekends and holidays.

Yup, no problem - I'll ignore any commits authored by you.

>> I know how to use Cc stable, and how to ask explicitly for a stable backport if
>> I find out after the fact that one is needed. (And other real people can always
>> ask too... not counting AUTOSEL, even though you are sending the AUTOSEL emails,
>> since clearly they go through no or very little human review.)

One of the challanges here is that it's difficult to solicit reviews or
really any interaction from authors after a commit lands upstream. Look
at the response rates to Greg's "FAILED" emails that ask authors to
provide backports to commits they tagged for stable.

>> Of course, it's not just me that AUTOSEL isn't working for. So, you'll still
>> continue backporting random commits that I have to spend hours bisecting, e.g.
>> https://lore.kernel.org/stable/20220921155332.234913-7-sashal@kernel.org.
>>
>> But at least I won't have to deal with this garbage for my own commits.
>>
>> Now, I'm not sure I'll get a response to this --- I received no response to my
>> last AUTOSEL question at
>> https://lore.kernel.org/stable/Y1DTFiP12ws04eOM@sol.localdomain. So to
>> hopefully entice you to actually do something, I'm also letting you know that I
>> won't be reviewing any AUTOSEL mails for my commits anymore.
>>
>
>The really annoying thing is that someone even replied to your AUTOSEL email for
>that broken patch and told you it is broken
>(https://lore.kernel.org/stable/d91aaff1-470f-cfdf-41cf-031eea9d6aca@mailbox.org),
>and ***you ignored it and applied the patch anyway***.
>
>Why are you even sending these emails if you are ignoring feedback anyway?

I obviously didn't ignore it on purpose, right?

--
Thanks,
Sasha
Re: AUTOSEL process [ In reply to ]
On Mon, Feb 27, 2023 at 09:18:59AM -0500, Sasha Levin wrote:
> On Sun, Feb 26, 2023 at 11:24:36AM -0800, Eric Biggers wrote:
> > On Sat, Feb 25, 2023 at 09:30:37PM -0800, Eric Biggers wrote:
> > > On Sat, Feb 25, 2023 at 08:07:55PM -0800, Eric Biggers wrote:
> > > > On Sat, Feb 25, 2023 at 10:42:47PM -0500, Sasha Levin wrote:
> > > > > From: Eric Biggers <ebiggers@google.com>
> > > > >
> > > > > [ Upstream commit ec64036e68634231f5891faa2b7a81cdc5dcd001 ]
> > > > >
> > > > > Now that the key associated with the "test_dummy_operation" mount option
> > > > > is added on-demand when it's needed, rather than immediately when the
> > > > > filesystem is mounted, fscrypt_destroy_keyring() no longer needs to be
> > > > > called from __put_super() to avoid a memory leak on mount failure.
> > > > >
> > > > > Remove this call, which was causing confusion because it appeared to be
> > > > > a sleep-in-atomic bug (though it wasn't, for a somewhat-subtle reason).
> > > > >
> > > > > Signed-off-by: Eric Biggers <ebiggers@google.com>
> > > > > Link: https://lore.kernel.org/r/20230208062107.199831-5-ebiggers@kernel.org
> > > > > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > > >
> > > > Why is this being backported?
> > > >
> > > > - Eric
> > >
> > > BTW, can you please permanently exclude all commits authored by me from AUTOSEL
> > > so that I don't have to repeatedly complain about every commit individually?
> > > Especially when these mails often come on weekends and holidays.
>
> Yup, no problem - I'll ignore any commits authored by you.
>
> > > I know how to use Cc stable, and how to ask explicitly for a stable backport if
> > > I find out after the fact that one is needed. (And other real people can always
> > > ask too... not counting AUTOSEL, even though you are sending the AUTOSEL emails,
> > > since clearly they go through no or very little human review.)
>
> One of the challanges here is that it's difficult to solicit reviews or
> really any interaction from authors after a commit lands upstream. Look
> at the response rates to Greg's "FAILED" emails that ask authors to
> provide backports to commits they tagged for stable.

Well, it doesn't help that most of the stable emails aren't sent to the
subsystem's mailing list, but instead just to the individual people mentioned in
the commit. So many people who would like to help never know about it.

> > > Of course, it's not just me that AUTOSEL isn't working for. So, you'll still
> > > continue backporting random commits that I have to spend hours bisecting, e.g.
> > > https://lore.kernel.org/stable/20220921155332.234913-7-sashal@kernel.org.
> > >
> > > But at least I won't have to deal with this garbage for my own commits.
> > >
> > > Now, I'm not sure I'll get a response to this --- I received no response to my
> > > last AUTOSEL question at
> > > https://lore.kernel.org/stable/Y1DTFiP12ws04eOM@sol.localdomain. So to
> > > hopefully entice you to actually do something, I'm also letting you know that I
> > > won't be reviewing any AUTOSEL mails for my commits anymore.
> > >
> >
> > The really annoying thing is that someone even replied to your AUTOSEL email for
> > that broken patch and told you it is broken
> > (https://lore.kernel.org/stable/d91aaff1-470f-cfdf-41cf-031eea9d6aca@mailbox.org),
> > and ***you ignored it and applied the patch anyway***.
> >
> > Why are you even sending these emails if you are ignoring feedback anyway?
>
> I obviously didn't ignore it on purpose, right?
>

I don't know, is it obvious? You've said in the past that sometimes you'd like
to backport a commit even if the maintainer objects and/or it is known buggy.
https://lore.kernel.org/stable/d91aaff1-470f-cfdf-41cf-031eea9d6aca@mailbox.org
also didn't explicitly say "Don't backport this" but instead "This patch has
issues", so maybe that made a difference?

Anyway, the fact is that it happened. And if it happened in the one bug that I
happened to look at because it personally affected me and I spent hours
bisecting, it probably is happening in lots of other cases too. So it seems the
process is not working...

Separately from responses to the AUTOSEL email, it also seems that you aren't
checking for any reported regressions or pending fixes for a commit before
backporting it. Simply searching lore for the commit title
https://lore.kernel.org/all/?q=%22drm%2Famdgpu%3A+use+dirty+framebuffer+helper%22
would have turned up the bug report
https://lore.kernel.org/dri-devel/20220918120926.10322-1-user@am64/ that
bisected a regression to that commit, as well as a patch that Fixes that commit:
https://lore.kernel.org/all/20220920130832.2214101-1-alexander.deucher@amd.com/
Both of these existed before you even sent the AUTOSEL email!

So to summarize, that buggy commit was backported even though:

* There were no indications that it was a bug fix (and thus potentially
suitable for stable) in the first place.
* On the AUTOSEL thread, someone told you the commit is broken.
* There was already a thread that reported a regression caused by the commit.
Easily findable via lore search.
* There was also already a pending patch that Fixes the commit. Again easily
findable via lore search.

So it seems a *lot* of things went wrong, no? Why? If so many things can go
wrong, it's not just a "mistake" but rather the process is the problem...

- Eric
Re: AUTOSEL process [ In reply to ]
On Mon, Feb 27, 2023 at 09:47:48AM -0800, Eric Biggers wrote:
> > > > Of course, it's not just me that AUTOSEL isn't working for. So, you'll still
> > > > continue backporting random commits that I have to spend hours bisecting, e.g.
> > > > https://lore.kernel.org/stable/20220921155332.234913-7-sashal@kernel.org.
> > > >
> > > > But at least I won't have to deal with this garbage for my own commits.
> > > >
> > > > Now, I'm not sure I'll get a response to this --- I received no response to my
> > > > last AUTOSEL question at
> > > > https://lore.kernel.org/stable/Y1DTFiP12ws04eOM@sol.localdomain. So to
> > > > hopefully entice you to actually do something, I'm also letting you know that I
> > > > won't be reviewing any AUTOSEL mails for my commits anymore.
> > > >
> > >
> > > The really annoying thing is that someone even replied to your AUTOSEL email for
> > > that broken patch and told you it is broken
> > > (https://lore.kernel.org/stable/d91aaff1-470f-cfdf-41cf-031eea9d6aca@mailbox.org),
> > > and ***you ignored it and applied the patch anyway***.
> > >
> > > Why are you even sending these emails if you are ignoring feedback anyway?
> >
> > I obviously didn't ignore it on purpose, right?
> >
>
> I don't know, is it obvious? You've said in the past that sometimes you'd like
> to backport a commit even if the maintainer objects and/or it is known buggy.
> https://lore.kernel.org/stable/d91aaff1-470f-cfdf-41cf-031eea9d6aca@mailbox.org
> also didn't explicitly say "Don't backport this" but instead "This patch has
> issues", so maybe that made a difference?
>
> Anyway, the fact is that it happened. And if it happened in the one bug that I
> happened to look at because it personally affected me and I spent hours
> bisecting, it probably is happening in lots of other cases too. So it seems the
> process is not working...
>
> Separately from responses to the AUTOSEL email, it also seems that you aren't
> checking for any reported regressions or pending fixes for a commit before
> backporting it. Simply searching lore for the commit title
> https://lore.kernel.org/all/?q=%22drm%2Famdgpu%3A+use+dirty+framebuffer+helper%22
> would have turned up the bug report
> https://lore.kernel.org/dri-devel/20220918120926.10322-1-user@am64/ that
> bisected a regression to that commit, as well as a patch that Fixes that commit:
> https://lore.kernel.org/all/20220920130832.2214101-1-alexander.deucher@amd.com/
> Both of these existed before you even sent the AUTOSEL email!
>
> So to summarize, that buggy commit was backported even though:
>
> * There were no indications that it was a bug fix (and thus potentially
> suitable for stable) in the first place.
> * On the AUTOSEL thread, someone told you the commit is broken.
> * There was already a thread that reported a regression caused by the commit.
> Easily findable via lore search.
> * There was also already a pending patch that Fixes the commit. Again easily
> findable via lore search.
>
> So it seems a *lot* of things went wrong, no? Why? If so many things can go
> wrong, it's not just a "mistake" but rather the process is the problem...

BTW, another cause of this is that the commit (66f99628eb24) was AUTOSEL'd after
only being in mainline for 4 days, and *released* in all LTS kernels after only
being in mainline for 12 days. Surely that's a timeline befitting a critical
security vulnerability, not some random neural-network-selected commit that
wasn't even fixing anything?

- Eric
Re: AUTOSEL process [ In reply to ]
On Mon, Feb 27, 2023 at 10:06:32AM -0800, Eric Biggers wrote:
>On Mon, Feb 27, 2023 at 09:47:48AM -0800, Eric Biggers wrote:
>> > > > Of course, it's not just me that AUTOSEL isn't working for. So, you'll still
>> > > > continue backporting random commits that I have to spend hours bisecting, e.g.
>> > > > https://lore.kernel.org/stable/20220921155332.234913-7-sashal@kernel.org.
>> > > >
>> > > > But at least I won't have to deal with this garbage for my own commits.
>> > > >
>> > > > Now, I'm not sure I'll get a response to this --- I received no response to my
>> > > > last AUTOSEL question at
>> > > > https://lore.kernel.org/stable/Y1DTFiP12ws04eOM@sol.localdomain. So to
>> > > > hopefully entice you to actually do something, I'm also letting you know that I
>> > > > won't be reviewing any AUTOSEL mails for my commits anymore.
>> > > >
>> > >
>> > > The really annoying thing is that someone even replied to your AUTOSEL email for
>> > > that broken patch and told you it is broken
>> > > (https://lore.kernel.org/stable/d91aaff1-470f-cfdf-41cf-031eea9d6aca@mailbox.org),
>> > > and ***you ignored it and applied the patch anyway***.
>> > >
>> > > Why are you even sending these emails if you are ignoring feedback anyway?
>> >
>> > I obviously didn't ignore it on purpose, right?
>> >
>>
>> I don't know, is it obvious? You've said in the past that sometimes you'd like
>> to backport a commit even if the maintainer objects and/or it is known buggy.
>> https://lore.kernel.org/stable/d91aaff1-470f-cfdf-41cf-031eea9d6aca@mailbox.org
>> also didn't explicitly say "Don't backport this" but instead "This patch has
>> issues", so maybe that made a difference?

From what I gather I missed the reply - I would not blindly ignore a
maintainer.

>> Anyway, the fact is that it happened. And if it happened in the one bug that I
>> happened to look at because it personally affected me and I spent hours
>> bisecting, it probably is happening in lots of other cases too. So it seems the
>> process is not working...

This one is tricky, becuase we also end up taking a lot of commits that
do fix real bugs, and were never tagged for stable or even had a fixes
tag.

Maybe I should run the numbers again, but when we compared regression
rates of stable tagged releases and AUTOSEL ones, it was fairly
identical.

>> Separately from responses to the AUTOSEL email, it also seems that you aren't
>> checking for any reported regressions or pending fixes for a commit before
>> backporting it. Simply searching lore for the commit title
>> https://lore.kernel.org/all/?q=%22drm%2Famdgpu%3A+use+dirty+framebuffer+helper%22
>> would have turned up the bug report
>> https://lore.kernel.org/dri-devel/20220918120926.10322-1-user@am64/ that
>> bisected a regression to that commit, as well as a patch that Fixes that commit:
>> https://lore.kernel.org/all/20220920130832.2214101-1-alexander.deucher@amd.com/
>> Both of these existed before you even sent the AUTOSEL email!

I would love to have a way to automatically grep lore for reported
issues that are pinpointed to a given commit. I'm hoping that Thorsten's
regression tracker could be used that way soon enough.

>> So to summarize, that buggy commit was backported even though:
>>
>> * There were no indications that it was a bug fix (and thus potentially
>> suitable for stable) in the first place.
>> * On the AUTOSEL thread, someone told you the commit is broken.
>> * There was already a thread that reported a regression caused by the commit.
>> Easily findable via lore search.
>> * There was also already a pending patch that Fixes the commit. Again easily
>> findable via lore search.
>>
>> So it seems a *lot* of things went wrong, no? Why? If so many things can go
>> wrong, it's not just a "mistake" but rather the process is the problem...
>
>BTW, another cause of this is that the commit (66f99628eb24) was AUTOSEL'd after
>only being in mainline for 4 days, and *released* in all LTS kernels after only
>being in mainline for 12 days. Surely that's a timeline befitting a critical
>security vulnerability, not some random neural-network-selected commit that
>wasn't even fixing anything?

I would love to have a mechanism that tells me with 100% confidence if a
given commit fixes a bug or not, could you provide me with one?

w.r.t timelines, this is something that was discussed on the mailing
list a few years ago where we decided that giving AUTOSEL commits 7 days
of soaking time is sufficient, if anything changed we can have this
discussion again.

Note, however, that it's not enough to keep pointing at a tiny set and
using it to suggest that the entire process is broken. How many AUTOSEL
commits introduced a regression? How many -stable tagged ones did? How
many bugs did AUTOSEL commits fix?

--
Thanks,
Sasha
Re: AUTOSEL process [ In reply to ]
On Mon, Feb 27, 2023 at 03:39:14PM -0500, Sasha Levin wrote:
> > > So to summarize, that buggy commit was backported even though:
> > >
> > > * There were no indications that it was a bug fix (and thus potentially
> > > suitable for stable) in the first place.
> > > * On the AUTOSEL thread, someone told you the commit is broken.
> > > * There was already a thread that reported a regression caused by the commit.
> > > Easily findable via lore search.
> > > * There was also already a pending patch that Fixes the commit. Again easily
> > > findable via lore search.
> > >
> > > So it seems a *lot* of things went wrong, no? Why? If so many things can go
> > > wrong, it's not just a "mistake" but rather the process is the problem...
> >
> > BTW, another cause of this is that the commit (66f99628eb24) was AUTOSEL'd after
> > only being in mainline for 4 days, and *released* in all LTS kernels after only
> > being in mainline for 12 days. Surely that's a timeline befitting a critical
> > security vulnerability, not some random neural-network-selected commit that
> > wasn't even fixing anything?
>
> I would love to have a mechanism that tells me with 100% confidence if a
> given commit fixes a bug or not, could you provide me with one?

Just because you can't be 100% certain whether a commit is a fix doesn't mean
you should be rushing to backport random commits that have no indications they
are fixing anything.

> w.r.t timelines, this is something that was discussed on the mailing
> list a few years ago where we decided that giving AUTOSEL commits 7 days
> of soaking time is sufficient, if anything changed we can have this
> discussion again.

Nothing has changed, but that doesn't mean that your process is actually
working. 7 days might be appropriate for something that looks like a security
fix, but not for a random commit with no indications it is fixing anything.

BTW, based on that example it's not even 7 days between AUTOSEL and patch
applied, but actually 7 days from AUTOSEL to *release*. So e.g. if someone
takes just a 1 week vacation, in that time a commit they would have NAK'ed can
be AUTOSEL'ed and pushed out across all LTS kernels...

> Note, however, that it's not enough to keep pointing at a tiny set and
> using it to suggest that the entire process is broken. How many AUTOSEL
> commits introduced a regression? How many -stable tagged ones did? How
> many bugs did AUTOSEL commits fix?

So basically you don't accept feedback from individual people, as individual
people don't have enough data?

- Eric
Re: AUTOSEL process [ In reply to ]
On Mon, Feb 27, 2023 at 09:38:46PM +0000, Eric Biggers wrote:
>On Mon, Feb 27, 2023 at 03:39:14PM -0500, Sasha Levin wrote:
>> > > So to summarize, that buggy commit was backported even though:
>> > >
>> > > * There were no indications that it was a bug fix (and thus potentially
>> > > suitable for stable) in the first place.
>> > > * On the AUTOSEL thread, someone told you the commit is broken.
>> > > * There was already a thread that reported a regression caused by the commit.
>> > > Easily findable via lore search.
>> > > * There was also already a pending patch that Fixes the commit. Again easily
>> > > findable via lore search.
>> > >
>> > > So it seems a *lot* of things went wrong, no? Why? If so many things can go
>> > > wrong, it's not just a "mistake" but rather the process is the problem...
>> >
>> > BTW, another cause of this is that the commit (66f99628eb24) was AUTOSEL'd after
>> > only being in mainline for 4 days, and *released* in all LTS kernels after only
>> > being in mainline for 12 days. Surely that's a timeline befitting a critical
>> > security vulnerability, not some random neural-network-selected commit that
>> > wasn't even fixing anything?
>>
>> I would love to have a mechanism that tells me with 100% confidence if a
>> given commit fixes a bug or not, could you provide me with one?
>
>Just because you can't be 100% certain whether a commit is a fix doesn't mean
>you should be rushing to backport random commits that have no indications they
>are fixing anything.

The difference in opinion here is that I don't think it's rushing: the
stable kernel rules say a commit must be in a released kernel, while the
AUTOSEL timelines make it so a commit must have been in two released
kernels.

Should we extend it? Maybe, I just don't think we have enough data to
make a decision either way.

>> w.r.t timelines, this is something that was discussed on the mailing
>> list a few years ago where we decided that giving AUTOSEL commits 7 days
>> of soaking time is sufficient, if anything changed we can have this
>> discussion again.
>
>Nothing has changed, but that doesn't mean that your process is actually
>working. 7 days might be appropriate for something that looks like a security
>fix, but not for a random commit with no indications it is fixing anything.

How do we know if this is working or not though? How do you quantify the
amount of useful commits?

How do you know if a certain fix has security implications? Or even if
it actually fixes anything? For every "security" commit tagged for
stable I could probably list a "security" commit with no tags whatsoever.

>BTW, based on that example it's not even 7 days between AUTOSEL and patch
>applied, but actually 7 days from AUTOSEL to *release*. So e.g. if someone
>takes just a 1 week vacation, in that time a commit they would have NAK'ed can
>be AUTOSEL'ed and pushed out across all LTS kernels...

Right, and same as above: what's "enough"?

>> Note, however, that it's not enough to keep pointing at a tiny set and
>> using it to suggest that the entire process is broken. How many AUTOSEL
>> commits introduced a regression? How many -stable tagged ones did? How
>> many bugs did AUTOSEL commits fix?
>
>So basically you don't accept feedback from individual people, as individual
>people don't have enough data?

I'd love to improve the process, but for that we need to figure out
criteria for what we consider good or bad, collect data, and make
decisions based on that data.

What I'm getting from this thread is a few anecdotal examples and
statements that the process isn't working at all.

I took Jon's stablefixes script which he used for his previous articles
around stable kernel regressions (here:
https://lwn.net/Articles/812231/) and tried running it on the 5.15
stable tree (just a random pick). I've proceeded with ignoring the
non-user-visible regressions as Jon defined in his article (basically
issues that were introduced and fixed in the same releases) and ended up
with 604 commits that caused a user visible regression.

Out of those 604 commits:

- 170 had an explicit stable tag.
- 434 did not have a stable tag.

Looking at the commits in the 5.15 tree:

With stable tag:

$ git log --oneline -i --grep "cc.*stable" v5.15..stable/linux-5.15.y | wc -l
3676

Without stable tag (-96 commits which are version bumps):

$ git log --oneline --invert-grep -i --grep "cc.*stable" v5.15..stable/linux-5.15.y | wc -l
10649

Regression rate for commits with stable tag: 170 / 3676 = 4.62%
Regression rate for commits without a stable tag: 434 / 10553 = 4.11%

Is the analysis flawed somehow? Probably, and I'd happy take feedback on
how/what I can do better, but this type of analysis is what I look for
to know if the process is working well or not.

--
Thanks,
Sasha
Re: AUTOSEL process [ In reply to ]
On Mon, Feb 27, 2023 at 05:35:30PM -0500, Sasha Levin wrote:
> On Mon, Feb 27, 2023 at 09:38:46PM +0000, Eric Biggers wrote:
> > Just because you can't be 100% certain whether a commit is a fix doesn't mean
> > you should be rushing to backport random commits that have no indications they
> > are fixing anything.
>
> The difference in opinion here is that I don't think it's rushing: the
> stable kernel rules say a commit must be in a released kernel, while the
> AUTOSEL timelines make it so a commit must have been in two released
> kernels.

Patches in -rc1 have been in _no_ released kernels. I'd feel a lot
better about AUTOSEL if it didn't pick up changes until, say, -rc4,
unless they were cc'd to stable.

> > Nothing has changed, but that doesn't mean that your process is actually
> > working. 7 days might be appropriate for something that looks like a security
> > fix, but not for a random commit with no indications it is fixing anything.
>
> How do we know if this is working or not though? How do you quantify the
> amount of useful commits?

Sasha, 7 days is too short. People have to be allowed to take holiday.

> I'd love to improve the process, but for that we need to figure out
> criteria for what we consider good or bad, collect data, and make
> decisions based on that data.
>
> What I'm getting from this thread is a few anecdotal examples and
> statements that the process isn't working at all.
>
> I took Jon's stablefixes script which he used for his previous articles
> around stable kernel regressions (here:
> https://lwn.net/Articles/812231/) and tried running it on the 5.15
> stable tree (just a random pick). I've proceeded with ignoring the
> non-user-visible regressions as Jon defined in his article (basically
> issues that were introduced and fixed in the same releases) and ended up
> with 604 commits that caused a user visible regression.
>
> Out of those 604 commits:
>
> - 170 had an explicit stable tag.
> - 434 did not have a stable tag.

I think a lot of people don't realise they have to _both_ put a Fixes
tag _and_ add a Cc: stable. How many of those 604 commits had a Fixes
tag?
Re: AUTOSEL process [ In reply to ]
On Mon, Feb 27, 2023 at 05:35:30PM -0500, Sasha Levin wrote:
> > > Note, however, that it's not enough to keep pointing at a tiny set and
> > > using it to suggest that the entire process is broken. How many AUTOSEL
> > > commits introduced a regression? How many -stable tagged ones did? How
> > > many bugs did AUTOSEL commits fix?
> >
> > So basically you don't accept feedback from individual people, as individual
> > people don't have enough data?
>
> I'd love to improve the process, but for that we need to figure out
> criteria for what we consider good or bad, collect data, and make
> decisions based on that data.
>
> What I'm getting from this thread is a few anecdotal examples and
> statements that the process isn't working at all.
>
> I took Jon's stablefixes script which he used for his previous articles
> around stable kernel regressions (here:
> https://lwn.net/Articles/812231/) and tried running it on the 5.15
> stable tree (just a random pick). I've proceeded with ignoring the
> non-user-visible regressions as Jon defined in his article (basically
> issues that were introduced and fixed in the same releases) and ended up
> with 604 commits that caused a user visible regression.
>
> Out of those 604 commits:
>
> - 170 had an explicit stable tag.
> - 434 did not have a stable tag.
>
> Looking at the commits in the 5.15 tree:
>
> With stable tag:
>
> $ git log --oneline -i --grep "cc.*stable" v5.15..stable/linux-5.15.y | wc -l
> 3676
>
> Without stable tag (-96 commits which are version bumps):
>
> $ git log --oneline --invert-grep -i --grep "cc.*stable" v5.15..stable/linux-5.15.y | wc -l
> 10649
>
> Regression rate for commits with stable tag: 170 / 3676 = 4.62%
> Regression rate for commits without a stable tag: 434 / 10553 = 4.11%
>
> Is the analysis flawed somehow? Probably, and I'd happy take feedback on
> how/what I can do better, but this type of analysis is what I look for
> to know if the process is working well or not.

I'm shocked that these are the statistics you use to claim the current AUTOSEL
process is working. I think they actually show quite the opposite!

First, since many AUTOSEL commits aren't actually fixes but nearly all
stable-tagged commits *are* fixes, the rate of regressions per commit would need
to be lower for AUTOSEL commits than for stable-tagged commits in order for
AUTOSEL commits to have the same rate of regressions *per fix*. Your numbers
suggest a similar regression rate *per commit*. Thus, AUTOSEL probably
introduces more regressions *per fix* than stable-tagged commits.

Second, the way you're identifying regression-introducing commits seems to be
excluding one of the most common, maybe *the* most common, cause of AUTOSEL
regressions: missing prerequisite commits. A very common case that I've seen
repeatedly is AUTOSEL picking just patch 2 or higher of a multi-patch series.
For an example, see the patch that started this thread... If a missing
prerequisite is backported later, my understanding is that it usually isn't
given a Fixes tag, as the upstream commit didn't have it. I think such
regressions aren't counted in your statistic, which only looks at Fixes tags.

(Of course, stable-tagged commits sometimes have missing prerequisite bugs too.
But it's expected to be at a lower rate, since the original developers and
maintainers are directly involved in adding the stable tags. These are the
people who are more familiar than anyone else with prerequisites.)

Third, the category "commits without a stable tag" doesn't include just AUTOSEL
commits, but also non-AUTOSEL commits that people asked to be added to stable
because they fixed a problem for them. Such commits often have been in mainline
for a long time, so naturally they're expected to have a lower regression rate
than stable-tagged commits due to the longer soak time, on average. So if the
regression rate of stable-tagged and non-stable-tagged commits is actually
similar, that suggests the regression rate of non-stable-tagged commits is being
brought up artifically by a high regression rate in AUTOSEL commits...

So, I think your statistics actually reflect quite badly on AUTOSEL in its
current form.

By the way, to be clear, AUTOSEL is absolutely needed. The way you are doing it
currently is not working well, though. I think it needs to be tuned to select
fewer, higher-confidence fixes, and you need to do some basic checks against
each one, like "does this commit have a pending fix" and "is this commit part of
a multi-patch series, and if so are earlier patches needed as prerequisites".
There also needs to be more soak time in mainline, and more review time.

IMO you also need to take a hard look at whatever neural network thing you are
using, as from what I've seen its results are quite poor... It does pick up
some obvious fixes, but it seems they could have just as easily been found
through some heuristics with grep. Beyond those obvious fixes, what it picks up
seems to be barely distinguishable from a random selection.

- Eric
Re: AUTOSEL process [ In reply to ]
On Mon, Feb 27, 2023 at 10:59:27PM +0000, Matthew Wilcox wrote:
>On Mon, Feb 27, 2023 at 05:35:30PM -0500, Sasha Levin wrote:
>> On Mon, Feb 27, 2023 at 09:38:46PM +0000, Eric Biggers wrote:
>> > Just because you can't be 100% certain whether a commit is a fix doesn't mean
>> > you should be rushing to backport random commits that have no indications they
>> > are fixing anything.
>>
>> The difference in opinion here is that I don't think it's rushing: the
>> stable kernel rules say a commit must be in a released kernel, while the
>> AUTOSEL timelines make it so a commit must have been in two released
>> kernels.
>
>Patches in -rc1 have been in _no_ released kernels. I'd feel a lot
>better about AUTOSEL if it didn't pick up changes until, say, -rc4,
>unless they were cc'd to stable.

This happened before my time, but -rc are considered releases.

The counter point to your argument/ask is that if you run the numbers on
regressions between -rc releases, it's the later one that tend to
introduce (way) more issues.

I've actually written about it a few years back to ksummit discuss
(here: https://lwn.net/Articles/753329/) because the numbers I saw
indicate that later -rc releases are 3x likely to introduce a
regression.

Linus pushed back on it saying that it is "by design" because those
commits are way more complex than ones that land during the early -rc
cycles.

So yes, I don't mind modifying the release workflow to decrease the
regressions we introduce, but I think that there's a difference between
what folks see as "helpful" and the outcome it would have.

>> > Nothing has changed, but that doesn't mean that your process is actually
>> > working. 7 days might be appropriate for something that looks like a security
>> > fix, but not for a random commit with no indications it is fixing anything.
>>
>> How do we know if this is working or not though? How do you quantify the
>> amount of useful commits?
>
>Sasha, 7 days is too short. People have to be allowed to take holiday.

That's true, and I don't have strong objections to making it longer. How
often did it happen though? We don't end up getting too many replies
past the 7 day window.

I'll bump it to 14 days for a few months and see if it changes anything.

>> I'd love to improve the process, but for that we need to figure out
>> criteria for what we consider good or bad, collect data, and make
>> decisions based on that data.
>>
>> What I'm getting from this thread is a few anecdotal examples and
>> statements that the process isn't working at all.
>>
>> I took Jon's stablefixes script which he used for his previous articles
>> around stable kernel regressions (here:
>> https://lwn.net/Articles/812231/) and tried running it on the 5.15
>> stable tree (just a random pick). I've proceeded with ignoring the
>> non-user-visible regressions as Jon defined in his article (basically
>> issues that were introduced and fixed in the same releases) and ended up
>> with 604 commits that caused a user visible regression.
>>
>> Out of those 604 commits:
>>
>> - 170 had an explicit stable tag.
>> - 434 did not have a stable tag.
>
>I think a lot of people don't realise they have to _both_ put a Fixes
>tag _and_ add a Cc: stable. How many of those 604 commits had a Fixes
>tag?

What do you mean? Just a cc: stable tag is enough to land it in stable,
you don't have to do both. The numbers above reflect that.

Running the numbers, there are 9422 commits with a Fixes tag in the 5.15
tree, out of which 360 had a regression, so 360 / 9422 = 3.82%.

--
Thanks,
Sasha
Re: AUTOSEL process [ In reply to ]
On Mon, Feb 27, 2023 at 07:52:39PM -0500, Sasha Levin wrote:
> > > > Nothing has changed, but that doesn't mean that your process is actually
> > > > working. 7 days might be appropriate for something that looks like a security
> > > > fix, but not for a random commit with no indications it is fixing anything.
> > >
> > > How do we know if this is working or not though? How do you quantify the
> > > amount of useful commits?
> >
> > Sasha, 7 days is too short. People have to be allowed to take holiday.
>
> That's true, and I don't have strong objections to making it longer. How
> often did it happen though? We don't end up getting too many replies
> past the 7 day window.
>
> I'll bump it to 14 days for a few months and see if it changes anything.

It's not just for the review time, but also for the longer soak time in
mainline.

Of course, for that to work properly you have to actually take advantage of it,
for example by re-checking for fixes when actually applying the patch, not just
when sending the initial AUTOSEL email. Which I hope you're doing already, but
who knows.

- Eric
Re: AUTOSEL process [ In reply to ]
On Tue, Feb 28, 2023 at 12:32:20AM +0000, Eric Biggers wrote:
>On Mon, Feb 27, 2023 at 05:35:30PM -0500, Sasha Levin wrote:
>> > > Note, however, that it's not enough to keep pointing at a tiny set and
>> > > using it to suggest that the entire process is broken. How many AUTOSEL
>> > > commits introduced a regression? How many -stable tagged ones did? How
>> > > many bugs did AUTOSEL commits fix?
>> >
>> > So basically you don't accept feedback from individual people, as individual
>> > people don't have enough data?
>>
>> I'd love to improve the process, but for that we need to figure out
>> criteria for what we consider good or bad, collect data, and make
>> decisions based on that data.
>>
>> What I'm getting from this thread is a few anecdotal examples and
>> statements that the process isn't working at all.
>>
>> I took Jon's stablefixes script which he used for his previous articles
>> around stable kernel regressions (here:
>> https://lwn.net/Articles/812231/) and tried running it on the 5.15
>> stable tree (just a random pick). I've proceeded with ignoring the
>> non-user-visible regressions as Jon defined in his article (basically
>> issues that were introduced and fixed in the same releases) and ended up
>> with 604 commits that caused a user visible regression.
>>
>> Out of those 604 commits:
>>
>> - 170 had an explicit stable tag.
>> - 434 did not have a stable tag.
>>
>> Looking at the commits in the 5.15 tree:
>>
>> With stable tag:
>>
>> $ git log --oneline -i --grep "cc.*stable" v5.15..stable/linux-5.15.y | wc -l
>> 3676
>>
>> Without stable tag (-96 commits which are version bumps):
>>
>> $ git log --oneline --invert-grep -i --grep "cc.*stable" v5.15..stable/linux-5.15.y | wc -l
>> 10649
>>
>> Regression rate for commits with stable tag: 170 / 3676 = 4.62%
>> Regression rate for commits without a stable tag: 434 / 10553 = 4.11%
>>
>> Is the analysis flawed somehow? Probably, and I'd happy take feedback on
>> how/what I can do better, but this type of analysis is what I look for
>> to know if the process is working well or not.
>
>I'm shocked that these are the statistics you use to claim the current AUTOSEL
>process is working. I think they actually show quite the opposite!
>
>First, since many AUTOSEL commits aren't actually fixes but nearly all
>stable-tagged commits *are* fixes, the rate of regressions per commit would need
>to be lower for AUTOSEL commits than for stable-tagged commits in order for
>AUTOSEL commits to have the same rate of regressions *per fix*. Your numbers
>suggest a similar regression rate *per commit*. Thus, AUTOSEL probably
>introduces more regressions *per fix* than stable-tagged commits.

Interesting claim. How many of the AUTOSEL commits are "actual" fixes?
How do you know if a commit is a fix for anything or not?

Could you try and back claims with some evidence?

Yes, in a perfect world where we know if a commit is a fix we could
avoid introducing regressions into the stable trees. Heck, maybe we could
even stop writing buggy code to begin with?

>Second, the way you're identifying regression-introducing commits seems to be
>excluding one of the most common, maybe *the* most common, cause of AUTOSEL
>regressions: missing prerequisite commits. A very common case that I've seen
>repeatedly is AUTOSEL picking just patch 2 or higher of a multi-patch series.
>For an example, see the patch that started this thread... If a missing
>prerequisite is backported later, my understanding is that it usually isn't
>given a Fixes tag, as the upstream commit didn't have it. I think such
>regressions aren't counted in your statistic, which only looks at Fixes tags.

It definitely happens, but we usually end up dropping the AUTOSEL-ed
commit rather than bringing in complex dependency chains.

Look at the stable-queue for a record of those.

>(Of course, stable-tagged commits sometimes have missing prerequisite bugs too.
>But it's expected to be at a lower rate, since the original developers and
>maintainers are directly involved in adding the stable tags. These are the
>people who are more familiar than anyone else with prerequisites.)

You'd be surprised. There is documentation around how one would annotate
dependencies for stable tagged commits, something along the lines of:

cc: stable@kernel.org # dep1 dep2

Grep through the git log and see how often this is actually used.

>Third, the category "commits without a stable tag" doesn't include just AUTOSEL
>commits, but also non-AUTOSEL commits that people asked to be added to stable
>because they fixed a problem for them. Such commits often have been in mainline
>for a long time, so naturally they're expected to have a lower regression rate
>than stable-tagged commits due to the longer soak time, on average. So if the
>regression rate of stable-tagged and non-stable-tagged commits is actually
>similar, that suggests the regression rate of non-stable-tagged commits is being
>brought up artifically by a high regression rate in AUTOSEL commits...

Yes, the numbers are pretty skewed up by different aspects of the
process.

>So, I think your statistics actually reflect quite badly on AUTOSEL in its
>current form.
>
>By the way, to be clear, AUTOSEL is absolutely needed. The way you are doing it
>currently is not working well, though. I think it needs to be tuned to select
>fewer, higher-confidence fixes, and you need to do some basic checks against
>each one, like "does this commit have a pending fix" and "is this commit part of

Keep in mind that there's some lag time between when we do our thing vs
when things appear upstream.

Greg actually runs the "is there a pending fix" check even after I've
pushed the patches, before he cuts releases.

>a multi-patch series, and if so are earlier patches needed as prerequisites".
>There also needs to be more soak time in mainline, and more review time.

Tricky bit with mainline/review time is that very few of our users
actually run -rc trees.

We end up hitting many of the regressions because the commits actually
end up in stable trees. Should it work that way? No, but our testing
story around -rc releases is quite lacking.

>IMO you also need to take a hard look at whatever neural network thing you are
>using, as from what I've seen its results are quite poor... It does pick up
>some obvious fixes, but it seems they could have just as easily been found
>through some heuristics with grep. Beyond those obvious fixes, what it picks up
>seems to be barely distinguishable from a random selection.

I mean... patches welcome? Do you want to come up with a set of
heuristics that performs better and give it a go? I'll happily switch
over.

I'm not sure how feedback in the form of "this sucks but I'm sure it
could be much better" is useful.

--
Thanks,
Sasha
Re: AUTOSEL process [ In reply to ]
On Mon, Feb 27, 2023 at 08:53:31PM -0500, Sasha Levin wrote:
> >
> > I'm shocked that these are the statistics you use to claim the current AUTOSEL
> > process is working. I think they actually show quite the opposite!
> >
> > First, since many AUTOSEL commits aren't actually fixes but nearly all
> > stable-tagged commits *are* fixes, the rate of regressions per commit would need
> > to be lower for AUTOSEL commits than for stable-tagged commits in order for
> > AUTOSEL commits to have the same rate of regressions *per fix*. Your numbers
> > suggest a similar regression rate *per commit*. Thus, AUTOSEL probably
> > introduces more regressions *per fix* than stable-tagged commits.
>
> Interesting claim. How many of the AUTOSEL commits are "actual" fixes?
> How do you know if a commit is a fix for anything or not?
>
> Could you try and back claims with some evidence?
>
> Yes, in a perfect world where we know if a commit is a fix we could
> avoid introducing regressions into the stable trees. Heck, maybe we could
> even stop writing buggy code to begin with?

Are you seriously trying to claim that a random commit your neural network
picked up is just as likely to be a fix as a commit that the author explicitly
tagged as a fix and/or for stable?

That's quite an extraordinary claim, and it's not true from my experience. Lots
of AUTOSEL patches that get Cc'ed to me, if I'm familiar enough with the area to
understand fairly well whether the patch is a "fix", are not actually fixes. Or
are very borderline "fixes" that don't meet stable criteria. (Note, I generally
only bother responding to AUTOSEL if I think a patch is actually going to cause
a problem. So a lack of response isn't necessarily agreement that a patch is
really suitable for stable...)

Oh sorry, personal experience is not "evidence". Please disregard my invalid
non-evidence-based opinion.

> > (Of course, stable-tagged commits sometimes have missing prerequisite bugs too.
> > But it's expected to be at a lower rate, since the original developers and
> > maintainers are directly involved in adding the stable tags. These are the
> > people who are more familiar than anyone else with prerequisites.)
>
> You'd be surprised. There is documentation around how one would annotate
> dependencies for stable tagged commits, something along the lines of:
>
> cc: stable@kernel.org # dep1 dep2
>
> Grep through the git log and see how often this is actually used.

Well, probably more common is that prerequisites are in the same patchset, and
the prerequisites are tagged for stable too. Whereas AUTOSEL often just picks
patch X of N. Also, developers and maintainers who tag patches for stable are
probably more likely to help with the stable process in general and make sure
patches are backported correctly...

Anyway, the point is, AUTOSEL needs to be fixed to stop inappropriately
cherry-picking patch X of N so often.

> > a multi-patch series, and if so are earlier patches needed as prerequisites".
> > There also needs to be more soak time in mainline, and more review time.
>
> Tricky bit with mainline/review time is that very few of our users
> actually run -rc trees.
>
> We end up hitting many of the regressions because the commits actually
> end up in stable trees. Should it work that way? No, but our testing
> story around -rc releases is quite lacking.

Well, in the bug that affected me, it *was* found on mainline almost
immediately. It just took a bit longer than the extremely aggressive 7-day
AUTOSEL period to be fixed.

Oh sorry again, one example is not "evidence". Please disregard my invalid
non-evidence-based opinion.

> I'm not sure how feedback in the form of "this sucks but I'm sure it
> could be much better" is useful.

I've already given you some specific suggestions.

I can't force you to listen to them, of course.

- Eric
Re: AUTOSEL process [ In reply to ]
On Tue, Feb 28, 2023 at 01:25:57AM +0000, Eric Biggers wrote:
> On Mon, Feb 27, 2023 at 07:52:39PM -0500, Sasha Levin wrote:
> > > > > Nothing has changed, but that doesn't mean that your process is actually
> > > > > working. 7 days might be appropriate for something that looks like a security
> > > > > fix, but not for a random commit with no indications it is fixing anything.
> > > >
> > > > How do we know if this is working or not though? How do you quantify the
> > > > amount of useful commits?
> > >
> > > Sasha, 7 days is too short. People have to be allowed to take holiday.
> >
> > That's true, and I don't have strong objections to making it longer. How
> > often did it happen though? We don't end up getting too many replies
> > past the 7 day window.
> >
> > I'll bump it to 14 days for a few months and see if it changes anything.
>
> It's not just for the review time, but also for the longer soak time in
> mainline.

There's a tradeoff to find. I'm sure there are way many more stable users
than mainline users. Does this mean we have to break stable from time to
time to detect regressions ? Sadly, yes. And it's not specific to Linux,
it's the same for virtually any other project that supports maintenance
branches. I personally like the principle of delaying backports to older
branches so that users relying on much older branches know they're having
a much more stable experience because the rare regressions that happen in
faster moving branches have more time to be caught. But that requires
incredibly more complex management.

On another project I'm sometimes seeing regressions caused by fixes pop
up after 6 months of exposure in stable. And comparatively, regressions
caused by new features tend to pop up faster, and users occasionally face
such bugs in stable just because the backport got delayed. So there's no
perfect balance, the problem is that any code change must be executed in
field a few times to know if it's solid or not. The larger the exposure
(i.e. stable) the faster regressions will be reported. The more frequent
the code is triggered, the faster as well. Fixes for bugs that are very
hard to trigger can cause regressions that will take ages to be reported.
But nonetheless users want these fixes because most of the time they are
correct.

When I was maintaining extended LTS kernels such as 2.6.32 or 3.10, users
were mostly upgrading when they were waiting for a specific fix. And even
there, despite patches having being present in regular stable kernels for
months, we faced regressions. Sometimes a fix was not suitable for that
branch, sometimes it was incorrectly backported, etc. What's certain
however, is that the longer you wait for a backport, the more difficult
it becomes to find someone who still remembers well about that fix and
its specificities, even during the review. This really has to be taken
into account when suggesting increased delays.

I really think that it's important to get backports early in recent stable
branches. E.g. we could keep these 7 days till the last LTS branch, and
let them cook one or two extra weeks before reaching older releases. But
we wouldn't want to delay important fixes too much (which are still likely
to cause regressions, especially security fixes which tend to focus on a
specifically reported case). Maybe we could imagine that the Cc: stable
could have a variant to mean "urgent" for important fixes that we want
to bypass the wait time, and that by default other ones would flow a bit
more slowly. This could satisfy most users by staying on the branch that
brings them the update rate they need. But that would certainly be quite
some extra work for Greg and for reviewers and that's certainly not cool,
so we need to be reasonable here as well.

Just my two cents,
Willy
Re: AUTOSEL process [ In reply to ]
> > I'm not sure how feedback in the form of "this sucks but I'm sure it
> > could be much better" is useful.
>
> I've already given you some specific suggestions.
>
> I can't force you to listen to them, of course.
>

Eric,

As you probably know, this is not the first time that the subject of the
AUTOSEL process has been discussed.
Here is one example from fsdevel with a few other suggestions [1].

But just so you know, as a maintainer, you have the option to request that
patches to your subsystem will not be selected by AUTOSEL and run your
own process to select, test and submit fixes to stable trees.

xfs maintainers have done that many years ago.
This choice has consequences though - for years, no xfs fixes were flowing
into stable trees at all, because no one was doing the backport work.
It is hard to imagine that LTS kernel users were more happy about this
situation than they would be from occasional regressions, but who knows...

It has taken a long time until we found the resources and finally started a
process of reviewing, testing and submitting xfs fixes to stable trees and this
process involves a lot of resources (3 maintainers + $$$), so opting out of
AUTOSEL is not a clear win.

I will pencil down yet another discussion on fs and stable process at
LSFMM23 to update on the current status with xfs, but it is hard to
believe that this time we will be able to make significant changes to
the AUTOSEL process.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20201204160227.GA577125@mit.edu/#t
Re: AUTOSEL process [ In reply to ]
On Tue, Feb 28, 2023 at 12:41:07PM +0200, Amir Goldstein wrote:
> > > I'm not sure how feedback in the form of "this sucks but I'm sure it
> > > could be much better" is useful.
> >
> > I've already given you some specific suggestions.
> >
> > I can't force you to listen to them, of course.
> >
>
> Eric,
>
> As you probably know, this is not the first time that the subject of the
> AUTOSEL process has been discussed.
> Here is one example from fsdevel with a few other suggestions [1].
>
> But just so you know, as a maintainer, you have the option to request that
> patches to your subsystem will not be selected by AUTOSEL and run your
> own process to select, test and submit fixes to stable trees.

Yes, and simply put, that's the answer for any subsystem or maintainer
that does not want their patches picked using the AUTOSEL tool.

The problem that the AUTOSEL tool is solving is real, we have whole
major subsystems where no patches are ever marked as "for stable" and so
real bugfixes are never backported properly.

In an ideal world, all maintainers would properly mark their patches for
stable backporting (as documented for the past 15+ years, with a cc:
stable tag, NOT a Fixes: tag), but we do not live in that world, and
hence, the need for the AUTOSEL work.

thanks,

greg k-h
Re: AUTOSEL process [ In reply to ]
On Mon, Feb 27, 2023 at 07:41:31PM -0800, Eric Biggers wrote:
>On Mon, Feb 27, 2023 at 08:53:31PM -0500, Sasha Levin wrote:
>> >
>> > I'm shocked that these are the statistics you use to claim the current AUTOSEL
>> > process is working. I think they actually show quite the opposite!
>> >
>> > First, since many AUTOSEL commits aren't actually fixes but nearly all
>> > stable-tagged commits *are* fixes, the rate of regressions per commit would need
>> > to be lower for AUTOSEL commits than for stable-tagged commits in order for
>> > AUTOSEL commits to have the same rate of regressions *per fix*. Your numbers
>> > suggest a similar regression rate *per commit*. Thus, AUTOSEL probably
>> > introduces more regressions *per fix* than stable-tagged commits.
>>
>> Interesting claim. How many of the AUTOSEL commits are "actual" fixes?
>> How do you know if a commit is a fix for anything or not?
>>
>> Could you try and back claims with some evidence?
>>
>> Yes, in a perfect world where we know if a commit is a fix we could
>> avoid introducing regressions into the stable trees. Heck, maybe we could
>> even stop writing buggy code to begin with?
>
>Are you seriously trying to claim that a random commit your neural network
>picked up is just as likely to be a fix as a commit that the author explicitly
>tagged as a fix and/or for stable?

I'd like to think that this is the case after the initial selection and
multiple rounds of reviews, yes.

>That's quite an extraordinary claim, and it's not true from my experience. Lots
>of AUTOSEL patches that get Cc'ed to me, if I'm familiar enough with the area to
>understand fairly well whether the patch is a "fix", are not actually fixes. Or
>are very borderline "fixes" that don't meet stable criteria. (Note, I generally
>only bother responding to AUTOSEL if I think a patch is actually going to cause
>a problem. So a lack of response isn't necessarily agreement that a patch is
>really suitable for stable...)
>
>Oh sorry, personal experience is not "evidence". Please disregard my invalid
>non-evidence-based opinion.
>
>> > (Of course, stable-tagged commits sometimes have missing prerequisite bugs too.
>> > But it's expected to be at a lower rate, since the original developers and
>> > maintainers are directly involved in adding the stable tags. These are the
>> > people who are more familiar than anyone else with prerequisites.)
>>
>> You'd be surprised. There is documentation around how one would annotate
>> dependencies for stable tagged commits, something along the lines of:
>>
>> cc: stable@kernel.org # dep1 dep2
>>
>> Grep through the git log and see how often this is actually used.
>
>Well, probably more common is that prerequisites are in the same patchset, and
>the prerequisites are tagged for stable too. Whereas AUTOSEL often just picks
>patch X of N. Also, developers and maintainers who tag patches for stable are
>probably more likely to help with the stable process in general and make sure
>patches are backported correctly...
>
>Anyway, the point is, AUTOSEL needs to be fixed to stop inappropriately
>cherry-picking patch X of N so often.

That's a fair point.

>> > a multi-patch series, and if so are earlier patches needed as prerequisites".
>> > There also needs to be more soak time in mainline, and more review time.
>>
>> Tricky bit with mainline/review time is that very few of our users
>> actually run -rc trees.
>>
>> We end up hitting many of the regressions because the commits actually
>> end up in stable trees. Should it work that way? No, but our testing
>> story around -rc releases is quite lacking.
>
>Well, in the bug that affected me, it *was* found on mainline almost
>immediately. It just took a bit longer than the extremely aggressive 7-day
>AUTOSEL period to be fixed.
>
>Oh sorry again, one example is not "evidence". Please disregard my invalid
>non-evidence-based opinion.

I'm happy that we're in agreement that significant process changes can't
happen because of opinions or anecdotal examples.

In all seriousness, I will work on addressing the issues that happened
around the commit(s) you've pointed out and improve our existing
process.

--
Thanks,
Sasha
Re: AUTOSEL process [ In reply to ]
On 2/28/23 06:28, Greg KH wrote:
>> But just so you know, as a maintainer, you have the option to request that
>> patches to your subsystem will not be selected by AUTOSEL and run your
>> own process to select, test and submit fixes to stable trees.
>
> Yes, and simply put, that's the answer for any subsystem or maintainer
> that does not want their patches picked using the AUTOSEL tool.
>
> The problem that the AUTOSEL tool is solving is real, we have whole
> major subsystems where no patches are ever marked as "for stable" and so
> real bugfixes are never backported properly.

Yeah, I agree.

And I'm throwing this out here [.after having time to think about it due to an
internet outage], but, would Cc'ing the patch's relevant subsystems on AUTOSEL
emails help? This was sort of mentioned in this email[1] from Eric, and I
think it _could_ help? I don't know, just something that crossed my mind earlier.

>
> In an ideal world, all maintainers would properly mark their patches for
> stable backporting (as documented for the past 15+ years, with a cc:
> stable tag, NOT a Fixes: tag), but we do not live in that world, and
> hence, the need for the AUTOSEL work.

(I wish we did... Oh well.)

[1] https://lore.kernel.org/stable/Y%2Fzswi91axMN8OsA@sol.localdomain/

-- Slade
Re: AUTOSEL process [ In reply to ]
On Tue, Feb 28, 2023 at 09:05:16PM -0500, Slade Watkins wrote:
> On 2/28/23 06:28, Greg KH wrote:
> >> But just so you know, as a maintainer, you have the option to request that
> >> patches to your subsystem will not be selected by AUTOSEL and run your
> >> own process to select, test and submit fixes to stable trees.
> >
> > Yes, and simply put, that's the answer for any subsystem or maintainer
> > that does not want their patches picked using the AUTOSEL tool.
> >
> > The problem that the AUTOSEL tool is solving is real, we have whole
> > major subsystems where no patches are ever marked as "for stable" and so
> > real bugfixes are never backported properly.
>
> Yeah, I agree.
>
> And I'm throwing this out here [.after having time to think about it due to an
> internet outage], but, would Cc'ing the patch's relevant subsystems on AUTOSEL
> emails help? This was sort of mentioned in this email[1] from Eric, and I
> think it _could_ help? I don't know, just something that crossed my mind earlier.
>

AFAICT, that's already being done now, which is good. What I was talking about
is that the subsystem lists aren't included on the *other* stable emails. Most
importantly, the "FAILED: patch failed to apply to stable tree" emails.

- Eric
Re: AUTOSEL process [ In reply to ]
On Tue, Feb 28, 2023 at 09:05:16PM -0500, Slade Watkins wrote:
> On 2/28/23 06:28, Greg KH wrote:
> >> But just so you know, as a maintainer, you have the option to request that
> >> patches to your subsystem will not be selected by AUTOSEL and run your
> >> own process to select, test and submit fixes to stable trees.
> >
> > Yes, and simply put, that's the answer for any subsystem or maintainer
> > that does not want their patches picked using the AUTOSEL tool.
> >
> > The problem that the AUTOSEL tool is solving is real, we have whole
> > major subsystems where no patches are ever marked as "for stable" and so
> > real bugfixes are never backported properly.
>
> Yeah, I agree.
>
> And I'm throwing this out here [.after having time to think about it due to an
> internet outage], but, would Cc'ing the patch's relevant subsystems on AUTOSEL
> emails help? This was sort of mentioned in this email[1] from Eric, and I
> think it _could_ help? I don't know, just something that crossed my mind earlier.

I don't know, maybe? Note that determining a patch's "subsystem" at
many times is difficult in an automated fashion, have any idea how to do
that reliably that doesn't just hit lkml all the time?

But again, how is that going to help much, the people who should be
saying "no" are the ones on the signed-off-by and cc: lines in the patch
itself.

thanks,

greg k-h
Re: AUTOSEL process [ In reply to ]
On Tue, Feb 28, 2023 at 09:13:56PM -0800, Eric Biggers wrote:
> On Tue, Feb 28, 2023 at 09:05:16PM -0500, Slade Watkins wrote:
> > On 2/28/23 06:28, Greg KH wrote:
> > >> But just so you know, as a maintainer, you have the option to request that
> > >> patches to your subsystem will not be selected by AUTOSEL and run your
> > >> own process to select, test and submit fixes to stable trees.
> > >
> > > Yes, and simply put, that's the answer for any subsystem or maintainer
> > > that does not want their patches picked using the AUTOSEL tool.
> > >
> > > The problem that the AUTOSEL tool is solving is real, we have whole
> > > major subsystems where no patches are ever marked as "for stable" and so
> > > real bugfixes are never backported properly.
> >
> > Yeah, I agree.
> >
> > And I'm throwing this out here [.after having time to think about it due to an
> > internet outage], but, would Cc'ing the patch's relevant subsystems on AUTOSEL
> > emails help? This was sort of mentioned in this email[1] from Eric, and I
> > think it _could_ help? I don't know, just something that crossed my mind earlier.
> >
>
> AFAICT, that's already being done now, which is good. What I was talking about
> is that the subsystem lists aren't included on the *other* stable emails. Most
> importantly, the "FAILED: patch failed to apply to stable tree" emails.

Why would the FAILED emails want to go to a mailing list? If the people
that were part of making the patch don't want to respond to a FAILED
email, why would anyone on the mailing list?

But hey, I'll be glad to take a change to my script to add that
functionality if you want to make it, it's here:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/scripts/bad_stable

thanks,

greg k-h

1 2 3 4  View All