Mailing List Archive

News item: Multiple root kernel command-line arguments
Hi,

please review the news item below:

- I am not 100% happy with the title but the 50 char limit
doesn't allow any more details.

- No Display-If condition because it is neither a genkernel nor
kexec-tools issue. We maybe even have additional packages
which are appending to kernel command-line I am not aware of.

- In theory this shouldn't be a news for anyone: If you don't
use a persistent device name, you are basically asking for
troubles like that. However, people just using kexec from
kexec-tools package maybe unaware that auto-detection of
ROOT device which is the default might cause trouble like
that because it won't use persistent names.

- Experiencing a boot failure is always bad -- especially for
headless/remote systems so a warning shouldn't hurt.

- Latest kexec-tools ebuild in repository now also warns user
in pkg_postinst, see
https://gitweb.gentoo.org/repo/gentoo.git/tree/sys-apps/kexec-tools/kexec-tools-2.0.20-r3.ebuild?id=61c03ffab76740c0420e3c8a3185d047d461f7a7#n111


---
Title: Multiple root kernel command-line arguments
Author: Thomas Deutschmann <whissi@gentoo.org>
Posted: 2020-08-05
Revision: 1
News-Item-Format: 2.0

Due to genkernel-4.1 development which is changing device manager
from MDEV to (E)UDEV it was noticed that some tools like kexec
append an additional root argument to kernel command-line. If these
tools will set root to a non-persistent device name like
root=/dev/dm-3, the next boot might fail if there is *no* root device
named like that in start environment (i.e. initramfs).

While kexec's runscript was changed in >=sys-apps/kexec-tools-2.0.20-r2
to no longer append root kernel command-line argument when an option
like "--reuse-cmdline" (default) is used, a cold reboot *without*
kexec maybe needed to restore kernel command-line.

NOTE: This issue is *not* specific to kexec or genkernel usage.
Kernel will always use last set root kernel command-line argument.
Any tool which might be appending root argument without a persistent
device name might cause a boot failure if system cannot find that
referenced root device during boot.

To avoid boot problems user should revise their current kernel
command-line (/proc/cmdline) to ensure that only *one* root kernel
command-line argument is set. The usage of persistent device names
like root=UUID=<...> is highly recommended.


--
Regards,
Thomas Deutschmann / Gentoo Linux Developer
C4DD 695F A713 8F24 2AA1 5638 5849 7EE5 1D5D 74A5
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On Wed, 2020-08-05 at 14:02 +0200, Thomas Deutschmann wrote:
> Hi,
>
> please review the news item below:
>
> - I am not 100% happy with the title but the 50 char limit
> doesn't allow any more details.

Yes, the title doesn't say a thing why would anyone want to read this
news item or not.

>
> - No Display-If condition because it is neither a genkernel nor
> kexec-tools issue. We maybe even have additional packages
> which are appending to kernel command-line I am not aware of.

Showing this news on all old and new Gentoo systems makes little sense.
Either someone is newly affected, then Display-if should determine it,
or someone is *not* newly affected, then you're either telling him
something he already knows or something that is of little value to him.

News items should be precisely this -- news. Not random pieces of
information you've just discovered and want to share with everyone.
This is what documentation is about, and it should be in some kernel-
related piece of documentation (handbook?) and not scattered around
in news items.

--
Best regards,
Micha? Górny
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On 2020-08-06 14:22, Micha? Górny wrote:
>> - I am not 100% happy with the title but the 50 char limit
>> doesn't allow any more details.
>
> Yes, the title doesn't say a thing why would anyone want to read this
> news item or not.

Maybe

> Be aware of possible reboot problems

instead?


>> - No Display-If condition because it is neither a genkernel nor
>> kexec-tools issue. We maybe even have additional packages
>> which are appending to kernel command-line I am not aware of.
>
> Showing this news on all old and new Gentoo systems makes little sense.
> Either someone is newly affected, then Display-if should determine it,
> or someone is *not* newly affected, then you're either telling him
> something he already knows or something that is of little value to him.
>
> News items should be precisely this -- news. Not random pieces of
> information you've just discovered and want to share with everyone.
> This is what documentation is about, and it should be in some kernel-
> related piece of documentation (handbook?) and not scattered around
> in news items.

Sure, you are basically repeating what I wrote in my prolog.

But the reason why I drafted that news item despite of this is the
consideration that an unbootable system outweigh the risk to waste
anyone's time to read something even if they are not affected. Note that
news items will appear through multiple channels. So if this will help
someone who didn't read documentation before or just didn't realize the
obvious risk he/she is taking when using non-persistent names ("It
worked that way for me past 15 years!") I believe it has served its purpose.


--
Regards,
Thomas Deutschmann / Gentoo Linux Developer
C4DD 695F A713 8F24 2AA1 5638 5849 7EE5 1D5D 74A5
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On Thu, 2020-08-06 at 17:14 +0200, Thomas Deutschmann wrote:
> On 2020-08-06 14:22, Micha? Górny wrote:
> > > - I am not 100% happy with the title but the 50 char limit
> > > doesn't allow any more details.
> >
> > Yes, the title doesn't say a thing why would anyone want to read this
> > news item or not.
>
> Maybe
>
> > Be aware of possible reboot problems
>
> instead?
>
>
> > > - No Display-If condition because it is neither a genkernel nor
> > > kexec-tools issue. We maybe even have additional packages
> > > which are appending to kernel command-line I am not aware of.
> >
> > Showing this news on all old and new Gentoo systems makes little sense.
> > Either someone is newly affected, then Display-if should determine it,
> > or someone is *not* newly affected, then you're either telling him
> > something he already knows or something that is of little value to him.
> >
> > News items should be precisely this -- news. Not random pieces of
> > information you've just discovered and want to share with everyone.
> > This is what documentation is about, and it should be in some kernel-
> > related piece of documentation (handbook?) and not scattered around
> > in news items.
>
> Sure, you are basically repeating what I wrote in my prolog.
>
> But the reason why I drafted that news item despite of this is the
> consideration that an unbootable system outweigh the risk to waste
> anyone's time to read something even if they are not affected. Note that
> news items will appear through multiple channels. So if this will help
> someone who didn't read documentation before or just didn't realize the
> obvious risk he/she is taking when using non-persistent names ("It
> worked that way for me past 15 years!") I believe it has served its purpose.

I'm not sure if you've noticed but there are people actively working
towards removing stale news items and trying not to dump everything
on once on a user freshly installing the system. Don't you consider
this a worthwhile goal?


--
Best regards,
Micha? Górny
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On 2020-08-06 17:44, Micha? Górny wrote:
> I'm not sure if you've noticed but there are people actively working
> towards removing stale news items and trying not to dump everything
> on once on a user freshly installing the system. Don't you consider
> this a worthwhile goal?

I don't see how this is conflicting.

This news item can probably go away after 1-2 years.

But for now, people who were just lucky will probably trigger this when
upgrading to genkernel-4.1 on their first reboot due to switched device
manager.

But again: It's not a genkernel issue, so displaying that only for
people who have genkernel installed would miss a bunch of users.


--
Regards,
Thomas Deutschmann / Gentoo Linux Developer
C4DD 695F A713 8F24 2AA1 5638 5849 7EE5 1D5D 74A5
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On Thu, Aug 06, 2020 at 05:59:14PM +0200, Thomas Deutschmann wrote:
> On 2020-08-06 17:44, Micha? Górny wrote:
> > I'm not sure if you've noticed but there are people actively working
> > towards removing stale news items and trying not to dump everything
> > on once on a user freshly installing the system. Don't you consider
> > this a worthwhile goal?
>
> I don't see how this is conflicting.
>
> This news item can probably go away after 1-2 years.
>
> But for now, people who were just lucky will probably trigger this when
> upgrading to genkernel-4.1 on their first reboot due to switched device
> manager.
>
> But again: It's not a genkernel issue, so displaying that only for
> people who have genkernel installed would miss a bunch of users.
>

Wait, changes were made to genkernel to switch from mdev to (e)udev
which causes breakage, but it is *not* an issue with genkernel?

Aside from this, do we have any evidence or bugs validating that users
experience breakage with randomly named boot devices in kexec?

It is great that you found an issue, but why try and be agnostic as to
which one caused the issue? It looks worse that we cannot simply say:

"genkernel changed for the better and things *may* break now... please
read this!"

Instead, we are pushing a news item to a lot of people simply because we
*assume* it may be an issue for others with no evidence.

--
Cheers,
Aaron
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On Thu, Aug 6, 2020 at 11:59 AM Thomas Deutschmann <whissi@gentoo.org> wrote:
>
> On 2020-08-06 17:44, Micha? Górny wrote:
> > I'm not sure if you've noticed but there are people actively working
> > towards removing stale news items and trying not to dump everything
> > on once on a user freshly installing the system. Don't you consider
> > this a worthwhile goal?
>
> I don't see how this is conflicting.
>
> This news item can probably go away after 1-2 years.
>
> But for now, people who were just lucky will probably trigger this when
> upgrading to genkernel-4.1 on their first reboot due to switched device
> manager.
>
> But again: It's not a genkernel issue, so displaying that only for
> people who have genkernel installed would miss a bunch of users.

I would guess that most users do not utilize kexec at all, and this
news item is irrelevant for them.

Personally, I agree that this is not worth spamming every Gentoo user.
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On Thu, Aug 6, 2020 at 1:41 PM Mike Gilbert <floppym@gentoo.org> wrote:
>
> On Thu, Aug 6, 2020 at 11:59 AM Thomas Deutschmann <whissi@gentoo.org> wrote:
> >
> > On 2020-08-06 17:44, Micha? Górny wrote:
> > > I'm not sure if you've noticed but there are people actively working
> > > towards removing stale news items and trying not to dump everything
> > > on once on a user freshly installing the system. Don't you consider
> > > this a worthwhile goal?
> >
> > I don't see how this is conflicting.
> >
> > This news item can probably go away after 1-2 years.
> >
> > But for now, people who were just lucky will probably trigger this when
> > upgrading to genkernel-4.1 on their first reboot due to switched device
> > manager.
> >
> > But again: It's not a genkernel issue, so displaying that only for
> > people who have genkernel installed would miss a bunch of users.
>
> I would guess that most users do not utilize kexec at all, and this
> news item is irrelevant for them.
>
> Personally, I agree that this is not worth spamming every Gentoo user.
>

Has anything even changed with kexec? Or is this an issue that has
been an issue for many years in kexec, that will suddenly become an
issue in genkernel? In that case it is news from a genkernel
perspective, and something anybody with a correctly-booting system
fixed a long time ago if they're using kexec.

--
Rich
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On 2020-08-06 20:56, Rich Freeman wrote:
> Has anything even changed with kexec? Or is this an issue that has
> been an issue for many years in kexec, that will suddenly become an
> issue in genkernel? In that case it is news from a genkernel
> perspective, and something anybody with a correctly-booting system
> fixed a long time ago if they're using kexec.

Well, first it was an annoyance I became aware of myself when I noticed
a system having dozen of root arguments in kernel command-line. I think
we even talked about this in #gentoo-base a while ago:

> # cat /proc/cmdline
> domdadm dolvm dosshd crypt_root=UUID=a-b-c-d root=UUID=e-f-g-h rootfs=xfs scandelay=3 root_trim=yes vga=0x317 gk.log.keep=/var/log/genkernel-boot.log root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2

...you can count how often the system was rebooted using kexec ;)

This week I also received a bug report from a user who upgraded to
genkernel-4.1 where first reboot failed but everything was working after
a reset (cold boot).

During my investigation I was able to trigger this by myself, for
example when I close and re-open LVM volume and trigger new symlink for
re-opened root volume (this sounds like a non-typical use case for some
people but when dealing LVM backups/snapshots it's not that uncommon).

So this became a bug for me in our kexec runscript which I fixed
https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4860fce5434f46d90e913ff10515a9a256fc6c6a
and already warn about
https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=61c03ffab76740c0420e3c8a3185d047d461f7a7

But note: This is not even a kexec issue per-se. If you use kexec on
your own with your scripts (which is also not that uncommon) you maybe
also appending additional root argument which has the potential to cause
boot failures in case you are using non-permanent device names and
something will be different in start environment.


--
Regards,
Thomas Deutschmann / Gentoo Linux Developer
C4DD 695F A713 8F24 2AA1 5638 5849 7EE5 1D5D 74A5
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
Hi,

On 2020/08/06 21:45, Thomas Deutschmann wrote:
> On 2020-08-06 20:56, Rich Freeman wrote:
>> Has anything even changed with kexec? Or is this an issue that has
>> been an issue for many years in kexec, that will suddenly become an
>> issue in genkernel? In that case it is news from a genkernel
>> perspective, and something anybody with a correctly-booting system
>> fixed a long time ago if they're using kexec.
> Well, first it was an annoyance I became aware of myself when I noticed
> a system having dozen of root arguments in kernel command-line. I think
> we even talked about this in #gentoo-base a while ago:
>
>> # cat /proc/cmdline
>> domdadm dolvm dosshd crypt_root=UUID=a-b-c-d root=UUID=e-f-g-h rootfs=xfs scandelay=3 root_trim=yes vga=0x317 gk.log.keep=/var/log/genkernel-boot.log root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2 root=/dev/dm-2
> ...you can count how often the system was rebooted using kexec ;)
>
> This week I also received a bug report from a user who upgraded to
> genkernel-4.1 where first reboot failed but everything was working after
> a reset (cold boot).
>
> During my investigation I was able to trigger this by myself, for
> example when I close and re-open LVM volume and trigger new symlink for
> re-opened root volume (this sounds like a non-typical use case for some
> people but when dealing LVM backups/snapshots it's not that uncommon).
>
> So this became a bug for me in our kexec runscript which I fixed
> https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4860fce5434f46d90e913ff10515a9a256fc6c6a
> and already warn about
> https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=61c03ffab76740c0420e3c8a3185d047d461f7a7

Can you detect in the runscript that this will trigger and issue a cold
reboot instead if this would trigger?

Having never used kexec before ... I may well be missing the point.  But
I'd rather have the system issue a cold reboot if kexec (which sounds
really cool in principle) stands any chance of failing.

Kind Regards,
Jaco
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On 2020-08-06 19:20, Aaron Bauman wrote:
> Wait, changes were made to genkernel to switch from mdev to (e)udev
> which causes breakage, but it is *not* an issue with genkernel?

Exactly.

This failure can happen with genkernel version created 15 years ago,
with new genkernel-4.1 which switched device manager or even with dracut
-- the mistake is using non-permanent device names for things like root.

I assume that most user don't do that. At least their default boot entry
in /boot/extlinux/extlinux.cfg or via /etc/default/grub will have a
permanent name -- but the problem are tools/scripts appending to that
existing command-line. They will overwrite a good value...

And it's even more a problem because even when you notice "Ah, something
is appending root argument" you won't question that because the value
you notice matches your expectation from POV of current running system.
So you have to realize that this is a non-permanent value which could be
different on next boot because you did X which caused and offset in
numbering for example...


> Aside from this, do we have any evidence or bugs validating that users
> experience breakage with randomly named boot devices in kexec?
>
> It is great that you found an issue, but why try and be agnostic as to
> which one caused the issue? It looks worse that we cannot simply say:
>
> "genkernel changed for the better and things *may* break now... please
> read this!"
>
> Instead, we are pushing a news item to a lot of people simply because we
> *assume* it may be an issue for others with no evidence.

Well, the purpose of this is to educate and avoid problems for
headless/server users. But if so many devs seem to care about pushing
maybe unrelated information and believe that avoiding that has much more
value than avoid a problem like an unbootable system for just a few
people (and for headless/servers this is a major problem in case you
cannot trigger remote reboot)... ¯\_(?)_/¯


--
Regards,
Thomas Deutschmann / Gentoo Linux Developer
C4DD 695F A713 8F24 2AA1 5638 5849 7EE5 1D5D 74A5
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On 8/6/20 10:58 PM, Thomas Deutschmann wrote:
> Well, the purpose of this is to educate and avoid problems for
> headless/server users. But if so many devs seem to care about pushing
> maybe unrelated information and believe that avoiding that has much more
> value than avoid a problem like an unbootable system for just a few
> people (and for headless/servers this is a major problem in case you
> cannot trigger remote reboot)... ¯\_(?)_/¯
>
Yeah let's break some setups and make people change distributions instead.

I'd support showing it. Weren't we all taught that too much
communication is better than no communication?

-- juippis
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On 8/5/20 5:02 AM, Thomas Deutschmann wrote:
> Hi,
>
> please review the news item below:
>

Not all arches support --reuse-cmdline btw.
It may be only x86 which supports it.

This looks like a candidate wiki page or a word of warning in a
handbook, not a news item.
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On 2020-08-06 21:50, Jaco Kroon wrote:
> Can you detect in the runscript that this will trigger and issue a cold
> reboot instead if this would trigger?
>
> Having never used kexec before ... I may well be missing the point.  But
> I'd rather have the system issue a cold reboot if kexec (which sounds
> really cool in principle) stands any chance of failing.

It's software. Of course you can do everything you can do in POSIX shell
script :)

But I doubt that something like this is practicable -- we also should
avoid overengineering.

Like I was thinking by myself if we should teach kexec runscript to
return persistent name instead (utilizing lsblk for example) but this
will raises question like what to do if tools aren't available and maybe
user's start environment can't even handle root=UUID=... value :/


--
Regards,
Thomas Deutschmann / Gentoo Linux Developer
C4DD 695F A713 8F24 2AA1 5638 5849 7EE5 1D5D 74A5
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On Thu, 2020-08-06 at 23:03 +0300, Joonas Niilola wrote:
> On 8/6/20 10:58 PM, Thomas Deutschmann wrote:
> > Well, the purpose of this is to educate and avoid problems for
> > headless/server users. But if so many devs seem to care about pushing
> > maybe unrelated information and believe that avoiding that has much more
> > value than avoid a problem like an unbootable system for just a few
> > people (and for headless/servers this is a major problem in case you
> > cannot trigger remote reboot)... ¯\_(?)_/¯
> >
> Yeah let's break some setups and make people change distributions instead.
>
> I'd support showing it. Weren't we all taught that too much
> communication is better than no communication?
>

That's actually bullshit. Too much noise leads to people stopping to
read stuff, and losing important info as a result. Compare: our mailing
lists.

--
Best regards,
Micha? Górny
Re: News item: Multiple root kernel command-line arguments [ In reply to ]
On Thu, Aug 6, 2020 at 4:19 PM Micha? Górny <mgorny@gentoo.org> wrote:
>
> On Thu, 2020-08-06 at 23:03 +0300, Joonas Niilola wrote:
> > On 8/6/20 10:58 PM, Thomas Deutschmann wrote:
> > > Well, the purpose of this is to educate and avoid problems for
> > > headless/server users. But if so many devs seem to care about pushing
> > > maybe unrelated information and believe that avoiding that has much more
> > > value than avoid a problem like an unbootable system for just a few
> > > people (and for headless/servers this is a major problem in case you
> > > cannot trigger remote reboot)... ¯\_(?)_/¯
> > >
> > Yeah let's break some setups and make people change distributions instead.
> >
> > I'd support showing it. Weren't we all taught that too much
> > communication is better than no communication?
> >
>
> That's actually bullshit. Too much noise leads to people stopping to
> read stuff, and losing important info as a result. Compare: our mailing
> lists.

Well, we could solve that problem and the Foundation funding
challenges by switching the mailing list to a paid-superchat system.

The only thing that might bother some would be having to give me a
slot on the sponsor's page, but then again I would be paying your
salary at that point. :)

More seriously, I think the simplest compromise is to just display the
news item to those with the relevant genkernel versions installed.
While the general principle of using UUIDs and such in boot lines is
important, this is better placed in the handbook, wiki, and so on.
The news should focus on what is actually changing, IMO.

--
Rich