Mailing List Archive

block: sleeping in atomic warnings
These are static checker warnings from Smatch. The line numbers are
based on next-20230203. To reproduce these warnings then you need to
have the latest Smatch from git and you need to rebuild the cross
function probably four times. I have reviewed these. The first few
seem like real issues. I can't make heads or tails out of the
__blk_mq_run_hw_queue() warning. I suspect that the last warning is a
false positive. I remember I reported some a while back but never
heard back. https://lore.kernel.org/all/YNx1r8Jr3+t4bch%2F@mwanda/

regards,
dan carpenter

block/blk-crypto-profile.c:382 __blk_crypto_evict_key() warn: sleeping in atomic context
block/blk-crypto-profile.c:390 __blk_crypto_evict_key() warn: sleeping in atomic context
put_super() <- disables preempt
__iterate_supers() <- disables preempt
iterate_supers() <- disables preempt
iterate_supers_type() <- disables preempt
get_super() <- disables preempt
user_get_super() <- disables preempt
-> __put_super()
-> fscrypt_destroy_keyring()
-> fscrypt_put_master_key_activeref()
-> fscrypt_destroy_prepared_key()
-> fscrypt_destroy_inline_crypt_key()
-> blk_crypto_evict_key()
blk_crypto_evict_key() <duplicate>
-> blk_crypto_fallback_evict_key()
-> __blk_crypto_evict_key()

block/blk-mq.c:206 blk_freeze_queue() warn: sleeping in atomic context
rexmit_timer() <- disables preempt
-> aoedev_downdev()
-> blk_mq_freeze_queue()
-> blk_freeze_queue()

block/blk-mq.c:4083 blk_mq_destroy_queue() warn: sleeping in atomic context
nvme_fc_match_disconn_ls() <- disables preempt
-> nvme_fc_ctrl_put()
-> nvme_fc_ctrl_free()
-> nvme_remove_admin_tag_set()
nvme_fc_ctrl_free() <duplicate>
-> nvme_remove_io_tag_set()
-> blk_mq_destroy_queue()

block/blk-mq.c:2174 __blk_mq_run_hw_queue() warn: sleeping in atomic context
__blk_mq_run_hw_queue() <duplicate>
-> blk_mq_sched_dispatch_requests()
-> __blk_mq_sched_dispatch_requests()
-> blk_mq_do_dispatch_sched()
blk_mq_do_dispatch_sched() <duplicate>
-> __blk_mq_do_dispatch_sched()
-> blk_mq_dispatch_hctx_list()
__blk_mq_do_dispatch_sched() <duplicate>
__blk_mq_sched_dispatch_requests() <duplicate>
-> blk_mq_do_dispatch_ctx()
__blk_mq_sched_dispatch_requests() <duplicate>
-> blk_mq_dispatch_rq_list()
__blk_mq_do_dispatch_sched() <duplicate>
blk_mq_do_dispatch_ctx() <duplicate>
-> blk_mq_delay_run_hw_queues()
-> blk_mq_delay_run_hw_queue()
sg_remove_sfp_usercontext() <- disables preempt
-> sg_finish_rem_req()
dd_insert_requests() <- disables preempt
-> dd_insert_request()
-> blk_mq_free_requests()
mspro_block_complete_req() <- disables preempt
mspro_queue_rq() <- disables preempt
-> mspro_block_issue_req()
mspro_block_complete_req() <- disables preempt <duplicate>
aoe_flush_iocq_by_index() <- disables preempt
rexmit_timer() <- disables preempt
-> aoedev_downdev()
-> aoe_failip()
aoedev_downdev() <duplicate>
-> downdev_frame()
-> aoe_failbuf()
-> aoe_end_buf()
aoe_failip() <duplicate>
-> aoe_end_request()
-> __blk_mq_end_request()
-> blk_mq_free_request()
-> __blk_mq_free_request()
-> blk_mq_sched_restart()
-> __blk_mq_sched_restart()
blk_mq_sched_dispatch_requests() <duplicate>
blk_mq_dispatch_rq_list() <duplicate>
rexmit_timer() <- disables preempt <duplicate>
aoe_end_request() <duplicate>
bfq_finish_requeue_request() <- disables preempt
-> bfq_completed_request()
bfq_idle_slice_timer_body() <- disables preempt
bfq_pd_offline() <- disables preempt
-> bfq_put_async_queues()
-> __bfq_put_async_bfqq()
bfq_bio_merge() <- disables preempt
bfq_insert_request() <- disables preempt
-> bfq_init_rq()
-> bfq_bic_update_cgroup()
-> __bfq_bic_change_cgroup()
-> bfq_sync_bfqq_move()
bfq_pd_offline() <- disables preempt <duplicate>
-> bfq_reparent_active_queues()
-> bfq_reparent_leaf_entity()
-> bfq_bfqq_move()
-> bfq_schedule_dispatch()
nvme_fc_match_disconn_ls() <- disables preempt
-> nvme_fc_ctrl_put()
-> nvme_fc_ctrl_free()
-> nvme_unquiesce_admin_queue()
-> blk_mq_unquiesce_queue()
-> blk_mq_run_hw_queues()
virtblk_done() <- disables preempt
virtblk_poll() <- disables preempt
-> blk_mq_start_stopped_hw_queues()
-> blk_mq_start_stopped_hw_queue()
-> blk_mq_run_hw_queue()
-> __blk_mq_delay_run_hw_queue()
-> __blk_mq_run_hw_queue()

block/blk-wbt.c:843 wbt_init() warn: sleeping in atomic context
ioc_qos_write() <- disables preempt
-> wbt_enable_default()
-> wbt_init()
Re: block: sleeping in atomic warnings [ In reply to ]
On Tue, Feb 7, 2023 at 6:06 AM Dan Carpenter <error27@gmail.com> wrote:
>
> block/blk-crypto-profile.c:382 __blk_crypto_evict_key() warn: sleeping in atomic context
> block/blk-crypto-profile.c:390 __blk_crypto_evict_key() warn: sleeping in atomic context

Yeah, that looks very real, but doesn't really seem to be a block bug.

__put_super() has a big comment that it's called under the sb_lock
spinlock, so it's all in atomic context, but then:

> -> __put_super()
> -> fscrypt_destroy_keyring()
> -> fscrypt_put_master_key_activeref()
> -> fscrypt_destroy_prepared_key()
> -> fscrypt_destroy_inline_crypt_key()
> -> blk_crypto_evict_key()

and we have a comment in __blk_crypto_evict_key() that it must be
called in "process context".

However, the *normal* unmount sequence does all the cleanup *before*
it gets sb_lock, and calls fscrypt_destroy_keyring() in process
context, which is probably why it never triggers in practice, because
the "last put" is normally there, not in __put_super.

Eric? Al?

It smells like __put_super() may need to do some parts delayed, not
under sb_lock.

Linus
Re: block: sleeping in atomic warnings [ In reply to ]
On Tue, Feb 07, 2023 at 08:15:04AM -0800, Linus Torvalds wrote:
> On Tue, Feb 7, 2023 at 6:06 AM Dan Carpenter <error27@gmail.com> wrote:
> >
> > block/blk-crypto-profile.c:382 __blk_crypto_evict_key() warn: sleeping in atomic context
> > block/blk-crypto-profile.c:390 __blk_crypto_evict_key() warn: sleeping in atomic context
>
> Yeah, that looks very real, but doesn't really seem to be a block bug.
>
> __put_super() has a big comment that it's called under the sb_lock
> spinlock, so it's all in atomic context, but then:
>
> > -> __put_super()
> > -> fscrypt_destroy_keyring()
> > -> fscrypt_put_master_key_activeref()
> > -> fscrypt_destroy_prepared_key()
> > -> fscrypt_destroy_inline_crypt_key()
> > -> blk_crypto_evict_key()
>
> and we have a comment in __blk_crypto_evict_key() that it must be
> called in "process context".
>
> However, the *normal* unmount sequence does all the cleanup *before*
> it gets sb_lock, and calls fscrypt_destroy_keyring() in process
> context, which is probably why it never triggers in practice, because
> the "last put" is normally there, not in __put_super.
>
> Eric? Al?
>
> It smells like __put_super() may need to do some parts delayed, not
> under sb_lock.
>

It's a false positive. See the comment above fscrypt_destroy_keyring(), which
is meant to explain this, though I can update the comment to be clearer. If the
filesystem has been mounted, then fscrypt_destroy_keyring() is called from
generic_shutdown_super(), which can sleep, and the call from __put_super() is a
no-op. If the filesystem has not been mounted, then the call from __put_super()
is needed, but blk_crypto_evict_key() can never be executed in that case.

- Eric
Re: block: sleeping in atomic warnings [ In reply to ]
On 2/7/23 7:06?AM, Dan Carpenter wrote:
> block/blk-mq.c:206 blk_freeze_queue() warn: sleeping in atomic context
> rexmit_timer() <- disables preempt
> -> aoedev_downdev()
> -> blk_mq_freeze_queue()
> -> blk_freeze_queue()

That is definitely a legit bug, aoe should punt to a workqueue or
similar.

> block/blk-mq.c:4083 blk_mq_destroy_queue() warn: sleeping in atomic context
> nvme_fc_match_disconn_ls() <- disables preempt
> -> nvme_fc_ctrl_put()
> -> nvme_fc_ctrl_free()
> -> nvme_remove_admin_tag_set()
> nvme_fc_ctrl_free() <duplicate>
> -> nvme_remove_io_tag_set()
> -> blk_mq_destroy_queue()

Also looks like a legitimate bug.

> block/blk-mq.c:2174 __blk_mq_run_hw_queue() warn: sleeping in atomic context
> __blk_mq_run_hw_queue() <duplicate>
> -> blk_mq_sched_dispatch_requests()
> -> __blk_mq_sched_dispatch_requests()
> -> blk_mq_do_dispatch_sched()
> blk_mq_do_dispatch_sched() <duplicate>
> -> __blk_mq_do_dispatch_sched()
> -> blk_mq_dispatch_hctx_list()
> __blk_mq_do_dispatch_sched() <duplicate>
> __blk_mq_sched_dispatch_requests() <duplicate>
> -> blk_mq_do_dispatch_ctx()
> __blk_mq_sched_dispatch_requests() <duplicate>
> -> blk_mq_dispatch_rq_list()
> __blk_mq_do_dispatch_sched() <duplicate>
> blk_mq_do_dispatch_ctx() <duplicate>
> -> blk_mq_delay_run_hw_queues()
> -> blk_mq_delay_run_hw_queue()

This one I'm not really following... Would it be possible to expand the
reporting to be a bit more verbose?

> sg_remove_sfp_usercontext() <- disables preempt
> block/blk-wbt.c:843 wbt_init() warn: sleeping in atomic context
> ioc_qos_write() <- disables preempt
> -> wbt_enable_default()
> -> wbt_init()

Also definitely a bug.

--
Jens Axboe
Re: block: sleeping in atomic warnings [ In reply to ]
On Tue, Feb 07, 2023 at 10:24:45AM -0800, Linus Torvalds wrote:
> On Tue, Feb 7, 2023 at 9:53 AM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > It's a false positive. See the comment above fscrypt_destroy_keyring()
>
> Hmm. Ok. Unfortunate.
>
> > If the filesystem has not been mounted, then the call from __put_super()
> > is needed, but blk_crypto_evict_key() can never be executed in that case.
>
> It's not all that clear that some *other* error might not have
> happened to keep the mount from actually succeeding, but after the
> keys have been instantiated?
>
> IOW, what's the thing that makes "blk_crypto_evict_key() can never be
> executed in that case" be obvious?
>
> I think _that_ is what might want a comment, about how we always call
> generic_shutdown_super() before the last put_super() happens.
>
> It does seem like Dan's automated checks could be useful, but if
> there's no sane way to avoid the false positives, it's always going to
> be a lot of noise ;(
>

blk_crypto_evict_key() only runs if a key was prepared for inline encryption,
which can only happen if a user does I/O to an encrypted file. That can only
happen after the filesystem was successfully mounted.

Also note that keys are normally added using an ioctl, which can only be
executed after the filesystem was mounted. The only exception is the key
associated with the "test_dummy_encryption" mount option.

By the way, the following code is in generic_shutdown_super(), and not in
__put_super(), for a very similar reason:

if (sb->s_dio_done_wq) {
destroy_workqueue(sb->s_dio_done_wq);
sb->s_dio_done_wq = NULL;
}

That code is only needed if there has been user I/O to the filesystem, which
again can only have happened if the filesystem was successfully mounted.

- Eric
Re: block: sleeping in atomic warnings [ In reply to ]
On Tue, Feb 7, 2023 at 9:53 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> It's a false positive. See the comment above fscrypt_destroy_keyring()

Hmm. Ok. Unfortunate.

> If the filesystem has not been mounted, then the call from __put_super()
> is needed, but blk_crypto_evict_key() can never be executed in that case.

It's not all that clear that some *other* error might not have
happened to keep the mount from actually succeeding, but after the
keys have been instantiated?

IOW, what's the thing that makes "blk_crypto_evict_key() can never be
executed in that case" be obvious?

I think _that_ is what might want a comment, about how we always call
generic_shutdown_super() before the last put_super() happens.

It does seem like Dan's automated checks could be useful, but if
there's no sane way to avoid the false positives, it's always going to
be a lot of noise ;(

Linus
Re: block: sleeping in atomic warnings [ In reply to ]
On Tue, Feb 7, 2023 at 10:36 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> Also note that keys are normally added using an ioctl, which can only be
> executed after the filesystem was mounted. The only exception is the key
> associated with the "test_dummy_encryption" mount option.

Could we perhaps then replace the

fscrypt_destroy_keyring(s);

with a more specific

fscrypt_destroy_dummy_keyring(s);

thing, that would only handle the dummy encryption case?

Ot could we just *fix* the dummy encryption test to actually work like
real encryption cases, so that it doesn't have this bogus case?


> By the way, the following code is in generic_shutdown_super(), and not in
> __put_super(), for a very similar reason:

Well, but that isn't a problem, exactly because that code isn't in
__put_super().

Put another way: the problem with the dummy encryption appears to be
exactly that it doesn't actually act like any real encryption would,
and then triggers that "this whole sequence gets called even under the
spinlock, even though it's documented to not be valid for that case,
because we added a bogus test-case that doesn't actually match
reality".

Looking at Jens' reply to the other cases, Dan's tool seems to be on
the money here except for this self-inflicted bogus crypto key thing.

Linus
Re: block: sleeping in atomic warnings [ In reply to ]
On Tue, Feb 7, 2023 at 10:57 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Looking at Jens' reply to the other cases, Dan's tool seems to be on
> the money here except for this self-inflicted bogus crypto key thing.

Oh, except you probably wouldn't have seen it, because Jens replied to
the original message where you weren't cc'd.

So here's the context for you wrt the other parts of the report:

https://lore.kernel.org/all/4321724d-9a24-926c-5d2d-5d5d902bda72@kernel.dk/

where there isn't that same indirection through the super-block, but
more of a clear-cut "this is all in the block layer".

Linus
Re: block: sleeping in atomic warnings [ In reply to ]
On Tue, Feb 07, 2023 at 10:57:08AM -0800, Linus Torvalds wrote:
> On Tue, Feb 7, 2023 at 10:36 AM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > Also note that keys are normally added using an ioctl, which can only be
> > executed after the filesystem was mounted. The only exception is the key
> > associated with the "test_dummy_encryption" mount option.
>
> Could we perhaps then replace the
>
> fscrypt_destroy_keyring(s);
>
> with a more specific
>
> fscrypt_destroy_dummy_keyring(s);
>
> thing, that would only handle the dummy encryption case?


Sure, they would still need to do most of the same things though.

> Or could we just *fix* the dummy encryption test to actually work like
> real encryption cases, so that it doesn't have this bogus case?

We've wanted to do that for a very long time, but there never has been a way to
actually do it. Especially with the filesystem-level keyring now, if the kernel
doesn't automatically add the key for test_dummy_encryption, then userspace
would have to do it *every time it mounts the filesystem*.

The point of the "test_dummy_encryption" mount option is that you can just add
it to the mount options and run existing tests, such as a full run of xfstests,
and test all the encrypted I/O paths that way. Which is extremely useful; it
wouldn't really be possible to properly test the encryption feature without it.

So that's why we've gone through some pain to keep "test_dummy_encryption"
working over time.

Now, it's possible that "the kernel automatically adds the key for
test_dummy_encryption" could be implemented a bit differently. It maybe could
be done at the last minute, when the key is being looked for due to a user
filesystem operation, instead of during the mount itself. That would eliminate
the need to call fscrypt_destroy_keyring() from __put_super(), which would avoid
the issue being discussed here. I'll see if there's a good way to do that.

- Eric
Re: block: sleeping in atomic warnings [ In reply to ]
On Tue, Feb 7, 2023 at 11:35 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> The point of the "test_dummy_encryption" mount option is that you can just add
> it to the mount options and run existing tests, such as a full run of xfstests,
> and test all the encrypted I/O paths that way. Which is extremely useful; it
> wouldn't really be possible to properly test the encryption feature without it.

Yes, I see how useful that is, but:

> Now, it's possible that "the kernel automatically adds the key for
> test_dummy_encryption" could be implemented a bit differently. It maybe could
> be done at the last minute, when the key is being looked for due to a user
> filesystem operation, instead of during the mount itself.

Yeah, that sounds like it would be the right solution, and get rid of
the fscrypt_destroy_keyring() case in __put_super().

Hmm.

I guess the filesystem only ever sees the '->get_tree()' call, and
then never gets any "this mount succeeded" callback.

And we do have at least that

error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
if (unlikely(error)) {
fc_drop_locked(fc);
return error;
}

error case that does fc_drop_locked() -> deactivate_locked_super() ->
put_super().

Hmm. It does get "kill_sb()", if that happens, so if

(a) the filesystem registers the keys late only in the success case

and

(b) ->kill_sb() does the fscrypt_destroy_keyring(s)

then I *think* everything would be fine, and put_super() doesn't need to do it.

Or am I missing some case?

Linus
Re: block: sleeping in atomic warnings [ In reply to ]
Hi,

? 2023/02/08 2:31, Jens Axboe ??:

>> block/blk-wbt.c:843 wbt_init() warn: sleeping in atomic context
>> ioc_qos_write() <- disables preempt
>> -> wbt_enable_default()
>> -> wbt_init()
>
> Also definitely a bug.
>

This won't happen currently, wbt_init() will be called while
initializing device, and later wbt_enable_devault() from iocost won't
call wbt_init().

However, we might support rq_qos destruction dynamically in the
future, I'll fix this warning.

Thanks,
Kuai
Re: block: sleeping in atomic warnings [ In reply to ]
On Tue, Feb 07, 2023 at 07:35:54PM +0000, Eric Biggers wrote:
> Now, it's possible that "the kernel automatically adds the key for
> test_dummy_encryption" could be implemented a bit differently. It maybe could
> be done at the last minute, when the key is being looked for due to a user
> filesystem operation, instead of during the mount itself. That would eliminate
> the need to call fscrypt_destroy_keyring() from __put_super(), which would avoid
> the issue being discussed here. I'll see if there's a good way to do that.

"[PATCH 0/5] Add the test_dummy_encryption key on-demand"
(https://lore.kernel.org/linux-fscrypt/20230208062107.199831-1-ebiggers@kernel.org/T/#u)
implements this.

- Eric