Mailing List Archive

1 2  View All
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
On 3/14/23 21:25, Marc Smith wrote:
> On Mon, Feb 8, 2021 at 7:49?PM Guoqing Jiang
> <guoqing.jiang@cloud.ionos.com> wrote:
>> Hi Donald,
>>
>> On 2/8/21 19:41, Donald Buczek wrote:
>>> Dear Guoqing,
>>>
>>> On 08.02.21 15:53, Guoqing Jiang wrote:
>>>>
>>>> On 2/8/21 12:38, Donald Buczek wrote:
>>>>>> 5. maybe don't hold reconfig_mutex when try to unregister
>>>>>> sync_thread, like this.
>>>>>>
>>>>>> /* resync has finished, collect result */
>>>>>> mddev_unlock(mddev);
>>>>>> md_unregister_thread(&mddev->sync_thread);
>>>>>> mddev_lock(mddev);
>>>>> As above: While we wait for the sync thread to terminate, wouldn't it
>>>>> be a problem, if another user space operation takes the mutex?
>>>> I don't think other places can be blocked while hold mutex, otherwise
>>>> these places can cause potential deadlock. Please try above two lines
>>>> change. And perhaps others have better idea.
>>> Yes, this works. No deadlock after >11000 seconds,
>>>
>>> (Time till deadlock from previous runs/seconds: 1723, 37, 434, 1265,
>>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
>> Great. I will send a formal patch with your reported-by and tested-by.
>>
>> Thanks,
>> Guoqing
> I'm still hitting this issue with Linux 5.4.229 -- it looks like 1/2
> of the patches that supposedly resolve this were applied to the stable
> kernels, however, one was omitted due to a regression:
> md: don't unregister sync_thread with reconfig_mutex held (upstream
> commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
>
> I don't see any follow-up on the thread from June 8th 2022 asking for
> this patch to be dropped from all stable kernels since it caused a
> regression.
>
> The patch doesn't appear to be present in the current mainline kernel
> (6.3-rc2) either. So I assume this issue is still present there, or it
> was resolved differently and I just can't find the commit/patch.

It should be fixed by commit 9dfbdafda3b3"md: unlock mddev before reap
sync_thread in action_store".

Thanks,
Guoqing
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
On Tue, Mar 14, 2023 at 9:55?AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
>
>
> On 3/14/23 21:25, Marc Smith wrote:
> > On Mon, Feb 8, 2021 at 7:49?PM Guoqing Jiang
> > <guoqing.jiang@cloud.ionos.com> wrote:
> >> Hi Donald,
> >>
> >> On 2/8/21 19:41, Donald Buczek wrote:
> >>> Dear Guoqing,
> >>>
> >>> On 08.02.21 15:53, Guoqing Jiang wrote:
> >>>>
> >>>> On 2/8/21 12:38, Donald Buczek wrote:
> >>>>>> 5. maybe don't hold reconfig_mutex when try to unregister
> >>>>>> sync_thread, like this.
> >>>>>>
> >>>>>> /* resync has finished, collect result */
> >>>>>> mddev_unlock(mddev);
> >>>>>> md_unregister_thread(&mddev->sync_thread);
> >>>>>> mddev_lock(mddev);
> >>>>> As above: While we wait for the sync thread to terminate, wouldn't it
> >>>>> be a problem, if another user space operation takes the mutex?
> >>>> I don't think other places can be blocked while hold mutex, otherwise
> >>>> these places can cause potential deadlock. Please try above two lines
> >>>> change. And perhaps others have better idea.
> >>> Yes, this works. No deadlock after >11000 seconds,
> >>>
> >>> (Time till deadlock from previous runs/seconds: 1723, 37, 434, 1265,
> >>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
> >> Great. I will send a formal patch with your reported-by and tested-by.
> >>
> >> Thanks,
> >> Guoqing
> > I'm still hitting this issue with Linux 5.4.229 -- it looks like 1/2
> > of the patches that supposedly resolve this were applied to the stable
> > kernels, however, one was omitted due to a regression:
> > md: don't unregister sync_thread with reconfig_mutex held (upstream
> > commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
> >
> > I don't see any follow-up on the thread from June 8th 2022 asking for
> > this patch to be dropped from all stable kernels since it caused a
> > regression.
> >
> > The patch doesn't appear to be present in the current mainline kernel
> > (6.3-rc2) either. So I assume this issue is still present there, or it
> > was resolved differently and I just can't find the commit/patch.
>
> It should be fixed by commit 9dfbdafda3b3"md: unlock mddev before reap
> sync_thread in action_store".

Okay, let me try applying that patch... it does not appear to be
present in my 5.4.229 kernel source. Thanks.

--Marc


>
> Thanks,
> Guoqing
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
? 2023/03/14 21:55, Guoqing Jiang ??:
>
>
> On 3/14/23 21:25, Marc Smith wrote:
>> On Mon, Feb 8, 2021 at 7:49?PM Guoqing Jiang
>> <guoqing.jiang@cloud.ionos.com> wrote:
>>> Hi Donald,
>>>
>>> On 2/8/21 19:41, Donald Buczek wrote:
>>>> Dear Guoqing,
>>>>
>>>> On 08.02.21 15:53, Guoqing Jiang wrote:
>>>>>
>>>>> On 2/8/21 12:38, Donald Buczek wrote:
>>>>>>> 5. maybe don't hold reconfig_mutex when try to unregister
>>>>>>> sync_thread, like this.
>>>>>>>
>>>>>>>           /* resync has finished, collect result */
>>>>>>>           mddev_unlock(mddev);
>>>>>>>           md_unregister_thread(&mddev->sync_thread);
>>>>>>>           mddev_lock(mddev);
>>>>>> As above: While we wait for the sync thread to terminate, wouldn't it
>>>>>> be a problem, if another user space operation takes the mutex?
>>>>> I don't think other places can be blocked while hold mutex, otherwise
>>>>> these places can cause potential deadlock. Please try above two lines
>>>>> change. And perhaps others have better idea.
>>>> Yes, this works. No deadlock after >11000 seconds,
>>>>
>>>> (Time till deadlock from previous runs/seconds: 1723, 37, 434, 1265,
>>>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
>>> Great. I will send a formal patch with your reported-by and tested-by.
>>>
>>> Thanks,
>>> Guoqing
>> I'm still hitting this issue with Linux 5.4.229 -- it looks like 1/2
>> of the patches that supposedly resolve this were applied to the stable
>> kernels, however, one was omitted due to a regression:
>> md: don't unregister sync_thread with reconfig_mutex held (upstream
>> commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
Hi, Guoqing,

Just borrow this thread to discuss, I think this commit might have
problem in some corner cases:

t1: t2:
action_store
mddev_lock
if (mddev->sync_thread)
mddev_unlock
md_unregister_thread
md_check_recovery
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
queue_work(md_misc_wq, &mddev->del_work)
mddev_lock_nointr
md_reap_sync_thread
// clear running
mddev_lock

t3:
md_start_sync
// running is not set

Our test report a problem that can be cause by this in theory, by we
can't be sure for now...

We thought about how to fix this, instead of calling
md_register_thread() here to wait for sync_thread to be done
synchronisely, we do this asynchronously like what md_set_readonly() and
do_md_stop() does.

What do you think?

Thanks,
Kuai
>>
>> I don't see any follow-up on the thread from June 8th 2022 asking for
>> this patch to be dropped from all stable kernels since it caused a
>> regression.
>>
>> The patch doesn't appear to be present in the current mainline kernel
>> (6.3-rc2) either. So I assume this issue is still present there, or it
>> was resolved differently and I just can't find the commit/patch.
>
> It should be fixed by commit 9dfbdafda3b3"md: unlock mddev before reap
> sync_thread in action_store".
>
> Thanks,
> Guoqing
> .
>
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
Hi,

I can just comment, that the simple patch I proposed at https://lore.kernel.org/linux-raid/bc342de0-98d2-1733-39cd-cc1999777ff3@molgen.mpg.de/ works for us with several different kernel versions and currently 195 raid6 jbods on 105 systems going through several "idle->sync->idle" transitions each month for over two years now.

So if you suffer from the problem and are able to add patches to the kernel you use, you might give it a try.

Best
Donald

On 3/14/23 14:25, Marc Smith wrote:
> On Mon, Feb 8, 2021 at 7:49?PM Guoqing Jiang
> <guoqing.jiang@cloud.ionos.com> wrote:t
>>
>> Hi Donald,
>>
>> On 2/8/21 19:41, Donald Buczek wrote:
>>> Dear Guoqing,
>>>
>>> On 08.02.21 15:53, Guoqing Jiang wrote:
>>>>
>>>>
>>>> On 2/8/21 12:38, Donald Buczek wrote:
>>>>>> 5. maybe don't hold reconfig_mutex when try to unregister
>>>>>> sync_thread, like this.
>>>>>>
>>>>>> /* resync has finished, collect result */
>>>>>> mddev_unlock(mddev);
>>>>>> md_unregister_thread(&mddev->sync_thread);
>>>>>> mddev_lock(mddev);
>>>>>
>>>>> As above: While we wait for the sync thread to terminate, wouldn't it
>>>>> be a problem, if another user space operation takes the mutex?
>>>>
>>>> I don't think other places can be blocked while hold mutex, otherwise
>>>> these places can cause potential deadlock. Please try above two lines
>>>> change. And perhaps others have better idea.
>>>
>>> Yes, this works. No deadlock after >11000 seconds,
>>>
>>> (Time till deadlock from previous runs/seconds: 1723, 37, 434, 1265,
>>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
>>
>> Great. I will send a formal patch with your reported-by and tested-by.
>>
>> Thanks,
>> Guoqing
>
> I'm still hitting this issue with Linux 5.4.229 -- it looks like 1/2
> of the patches that supposedly resolve this were applied to the stable
> kernels, however, one was omitted due to a regression:
> md: don't unregister sync_thread with reconfig_mutex held (upstream
> commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
>
> I don't see any follow-up on the thread from June 8th 2022 asking for
> this patch to be dropped from all stable kernels since it caused a
> regression.
>
> The patch doesn't appear to be present in the current mainline kernel
> (6.3-rc2) either. So I assume this issue is still present there, or it
> was resolved differently and I just can't find the commit/patch.
>
> I can induce the issue by using Donald's script above which will
> eventually result in hangs:
> ...
> 147948.504621] INFO: task md_test_2.sh:68033 blocked for more than 122 seconds.
> [147948.504624] Tainted: P OE 5.4.229-esos.prod #1
> [147948.504624] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [147948.504625] md_test_2.sh D 0 68033 1 0x00000004
> [147948.504627] Call Trace:
> [147948.504634] __schedule+0x4ab/0x4f3
> [147948.504637] ? usleep_range+0x7a/0x7a
> [147948.504638] schedule+0x67/0x81
> [147948.504639] schedule_timeout+0x2c/0xe5
> [147948.504643] ? do_raw_spin_lock+0x2b/0x52
> [147948.504644] __wait_for_common+0xc4/0x13a
> [147948.504647] ? wake_up_q+0x40/0x40
> [147948.504649] kthread_stop+0x9a/0x117
> [147948.504653] md_unregister_thread+0x43/0x4d
> [147948.504655] md_reap_sync_thread+0x1c/0x1d5
> [147948.504657] action_store+0xc9/0x284
> [147948.504658] md_attr_store+0x9f/0xb8
> [147948.504661] kernfs_fop_write+0x10a/0x14c
> [147948.504664] vfs_write+0xa0/0xdd
> [147948.504666] ksys_write+0x71/0xba
> [147948.504668] do_syscall_64+0x52/0x60
> [147948.504671] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
> ...
> [147948.504748] INFO: task md120_resync:135315 blocked for more than
> 122 seconds.
> [147948.504749] Tainted: P OE 5.4.229-esos.prod #1
> [147948.504749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [147948.504749] md120_resync D 0 135315 2 0x80004000
> [147948.504750] Call Trace:
> [147948.504752] __schedule+0x4ab/0x4f3
> [147948.504754] ? printk+0x53/0x6a
> [147948.504755] schedule+0x67/0x81
> [147948.504756] md_do_sync+0xae7/0xdd9
> [147948.504758] ? remove_wait_queue+0x41/0x41
> [147948.504759] md_thread+0x128/0x151
> [147948.504761] ? _raw_spin_lock_irqsave+0x31/0x5d
> [147948.504762] ? md_start_sync+0xdc/0xdc
> [147948.504763] kthread+0xe4/0xe9
> [147948.504764] ? kthread_flush_worker+0x70/0x70
> [147948.504765] ret_from_fork+0x35/0x40
> ...
>
> This happens on 'raid6' MD RAID arrays that initially have sync_action==resync.
>
> Any guidance would be greatly appreciated.
>
> --Marc

--
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
On 3/15/23 11:02, Yu Kuai wrote:
>
>
> ? 2023/03/14 21:55, Guoqing Jiang ??:
>>
>>
>> On 3/14/23 21:25, Marc Smith wrote:
>>> On Mon, Feb 8, 2021 at 7:49?PM Guoqing Jiang
>>> <guoqing.jiang@cloud.ionos.com> wrote:
>>>> Hi Donald,
>>>>
>>>> On 2/8/21 19:41, Donald Buczek wrote:
>>>>> Dear Guoqing,
>>>>>
>>>>> On 08.02.21 15:53, Guoqing Jiang wrote:
>>>>>>
>>>>>> On 2/8/21 12:38, Donald Buczek wrote:
>>>>>>>> 5. maybe don't hold reconfig_mutex when try to unregister
>>>>>>>> sync_thread, like this.
>>>>>>>>
>>>>>>>>           /* resync has finished, collect result */
>>>>>>>>           mddev_unlock(mddev);
>>>>>>>> md_unregister_thread(&mddev->sync_thread);
>>>>>>>>           mddev_lock(mddev);
>>>>>>> As above: While we wait for the sync thread to terminate,
>>>>>>> wouldn't it
>>>>>>> be a problem, if another user space operation takes the mutex?
>>>>>> I don't think other places can be blocked while hold mutex,
>>>>>> otherwise
>>>>>> these places can cause potential deadlock. Please try above two
>>>>>> lines
>>>>>> change. And perhaps others have better idea.
>>>>> Yes, this works. No deadlock after >11000 seconds,
>>>>>
>>>>> (Time till deadlock from previous runs/seconds: 1723, 37, 434, 1265,
>>>>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
>>>> Great. I will send a formal patch with your reported-by and tested-by.
>>>>
>>>> Thanks,
>>>> Guoqing
>>> I'm still hitting this issue with Linux 5.4.229 -- it looks like 1/2
>>> of the patches that supposedly resolve this were applied to the stable
>>> kernels, however, one was omitted due to a regression:
>>> md: don't unregister sync_thread with reconfig_mutex held (upstream
>>> commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
> Hi, Guoqing,
>
> Just borrow this thread to discuss, I think this commit might have
> problem in some corner cases:
>
> t1:                t2:
> action_store
>  mddev_lock
>   if (mddev->sync_thread)
>    mddev_unlock
>    md_unregister_thread
>                 md_check_recovery
>                  set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
>                  queue_work(md_misc_wq, &mddev->del_work)
>    mddev_lock_nointr
>    md_reap_sync_thread
>    // clear running
>  mddev_lock
>
> t3:
> md_start_sync
> // running is not set

What does 'running' mean? MD_RECOVERY_RUNNING?

> Our test report a problem that can be cause by this in theory, by we
> can't be sure for now...

I guess you tried to describe racy between

action_store -> md_register_thread

and

md_start_sync -> md_register_thread

Didn't you already fix them in the series?

[PATCH -next 0/5] md: fix uaf for sync_thread

Sorry, I didn't follow the problem and also your series, I might try your
test with latest mainline kernel if the test is available somewhere.

> We thought about how to fix this, instead of calling
> md_register_thread() here to wait for sync_thread to be done
> synchronisely,

IMO, md_register_thread just create and wake a thread, not sure why it
waits for sync_thread.

> we do this asynchronously like what md_set_readonly() and 
> do_md_stop() does.

Still, I don't have clear picture about the problem, so I can't judge it.

Thanks,
Guoqing
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
Hi,

? 2023/03/15 17:30, Guoqing Jiang ??:
>
>> Just borrow this thread to discuss, I think this commit might have
>> problem in some corner cases:
>>
>> t1:                t2:
>> action_store
>>  mddev_lock
>>   if (mddev->sync_thread)
>>    mddev_unlock
>>    md_unregister_thread
>>                 md_check_recovery
>>                  set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
>>                  queue_work(md_misc_wq, &mddev->del_work)
>>    mddev_lock_nointr
>>    md_reap_sync_thread
>>    // clear running
>>  mddev_lock
>>
>> t3:
>> md_start_sync
>> // running is not set
>
> What does 'running' mean? MD_RECOVERY_RUNNING?
>
>> Our test report a problem that can be cause by this in theory, by we
>> can't be sure for now...
>
> I guess you tried to describe racy between
>
> action_store -> md_register_thread
>
> and
>
> md_start_sync -> md_register_thread
>
> Didn't you already fix them in the series?
>
> [PATCH -next 0/5] md: fix uaf for sync_thread
>
> Sorry, I didn't follow the problem and also your series, I might try your
> test with latest mainline kernel if the test is available somewhere.
>
>> We thought about how to fix this, instead of calling
>> md_register_thread() here to wait for sync_thread to be done
>> synchronisely,
>
> IMO, md_register_thread just create and wake a thread, not sure why it
> waits for sync_thread.
>
>> we do this asynchronously like what md_set_readonly() and do_md_stop()
>> does.
>
> Still, I don't have clear picture about the problem, so I can't judge it.
>

Sorry that I didn't explain the problem clear. Let me explain the
problem we meet first:

1) raid10d is waiting for sync_thread to stop:
raid10d
md_unregister_thread
kthread_stop

2) sync_thread is waiting for io to finish:
md_do_sync
wait_event(... atomic_read(&mddev->recovery_active) == 0)

3) io is waiting for raid10d to finish(online crash found 2 io in
conf->retry_list)

Additional information from online crash:
mddev->recovery = 29, // DONE, RUNING, INTR is set

PID: 138293 TASK: ffff0000de89a900 CPU: 7 COMMAND: "md0_resync"
#0 [ffffa00107c178a0] __switch_to at ffffa0010001d75c
#1 [ffffa00107c178d0] __schedule at ffffa001017c7f14
#2 [ffffa00107c179f0] schedule at ffffa001017c880c
#3 [ffffa00107c17a20] md_do_sync at ffffa0010129cdb4
#4 [ffffa00107c17d50] md_thread at ffffa00101290d9c
#5 [ffffa00107c17e50] kthread at ffffa00100187a74

PID: 138294 TASK: ffff0000eba13d80 CPU: 5 COMMAND: "md0_resync"
#0 [ffffa00107e47a60] __switch_to at ffffa0010001d75c
#1 [ffffa00107e47a90] __schedule at ffffa001017c7f14
#2 [ffffa00107e47bb0] schedule at ffffa001017c880c
#3 [ffffa00107e47be0] schedule_timeout at ffffa001017d1298
#4 [ffffa00107e47d50] md_thread at ffffa00101290ee8
#5 [ffffa00107e47e50] kthread at ffffa00100187a74
// there are two sync_thread for md0

I believe the root cause is that two sync_thread exist for the same
mddev, and this is how I think this is possible:

t1: t2:
action_store
mddev_lock
if (mddev->sync_thread)
mddev_unlock
md_unregister_thread
// first sync_thread is done
md_check_recovery
set_bit(MD_RECOVERY_RUNNING, &mddev->recovery)
queue_work(md_misc_wq, &mddev->del_work)
mddev_lock_nointr
md_reap_sync_thread
// MD_RECOVERY_RUNNING is cleared
mddev_unlock

t3:
md_start_sync
// second sync_thread is registed

t3:
md_check_recovery
queue_work(md_misc_wq, &mddev->del_work)
// MD_RECOVERY_RUNNING is not set, a new sync_thread can be started

This is just guess, I can't reporduce the problem yet. Please let me
know if you have any questions

Thanks,
Kuai
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
On Tue, Mar 14, 2023 at 10:45?AM Marc Smith <msmith626@gmail.com> wrote:
>
> On Tue, Mar 14, 2023 at 9:55?AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
> >
> >
> >
> > On 3/14/23 21:25, Marc Smith wrote:
> > > On Mon, Feb 8, 2021 at 7:49?PM Guoqing Jiang
> > > <guoqing.jiang@cloud.ionos.com> wrote:
> > >> Hi Donald,
> > >>
> > >> On 2/8/21 19:41, Donald Buczek wrote:
> > >>> Dear Guoqing,
> > >>>
> > >>> On 08.02.21 15:53, Guoqing Jiang wrote:
> > >>>>
> > >>>> On 2/8/21 12:38, Donald Buczek wrote:
> > >>>>>> 5. maybe don't hold reconfig_mutex when try to unregister
> > >>>>>> sync_thread, like this.
> > >>>>>>
> > >>>>>> /* resync has finished, collect result */
> > >>>>>> mddev_unlock(mddev);
> > >>>>>> md_unregister_thread(&mddev->sync_thread);
> > >>>>>> mddev_lock(mddev);
> > >>>>> As above: While we wait for the sync thread to terminate, wouldn't it
> > >>>>> be a problem, if another user space operation takes the mutex?
> > >>>> I don't think other places can be blocked while hold mutex, otherwise
> > >>>> these places can cause potential deadlock. Please try above two lines
> > >>>> change. And perhaps others have better idea.
> > >>> Yes, this works. No deadlock after >11000 seconds,
> > >>>
> > >>> (Time till deadlock from previous runs/seconds: 1723, 37, 434, 1265,
> > >>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
> > >> Great. I will send a formal patch with your reported-by and tested-by.
> > >>
> > >> Thanks,
> > >> Guoqing
> > > I'm still hitting this issue with Linux 5.4.229 -- it looks like 1/2
> > > of the patches that supposedly resolve this were applied to the stable
> > > kernels, however, one was omitted due to a regression:
> > > md: don't unregister sync_thread with reconfig_mutex held (upstream
> > > commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
> > >
> > > I don't see any follow-up on the thread from June 8th 2022 asking for
> > > this patch to be dropped from all stable kernels since it caused a
> > > regression.
> > >
> > > The patch doesn't appear to be present in the current mainline kernel
> > > (6.3-rc2) either. So I assume this issue is still present there, or it
> > > was resolved differently and I just can't find the commit/patch.
> >
> > It should be fixed by commit 9dfbdafda3b3"md: unlock mddev before reap
> > sync_thread in action_store".
>
> Okay, let me try applying that patch... it does not appear to be
> present in my 5.4.229 kernel source. Thanks.

Yes, applying this '9dfbdafda3b3 "md: unlock mddev before reap
sync_thread in action_store"' patch on top of vanilla 5.4.229 source
appears to fix the problem for me -- I can't reproduce the issue with
the script, and it's been running for >24 hours now. (Previously I was
able to induce the issue within a matter of minutes.)


>
> --Marc
>
>
> >
> > Thanks,
> > Guoqing
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
On Thu, Mar 16, 2023 at 8:25?AM Marc Smith <msmith626@gmail.com> wrote:
>
> On Tue, Mar 14, 2023 at 10:45?AM Marc Smith <msmith626@gmail.com> wrote:
> >
> > On Tue, Mar 14, 2023 at 9:55?AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
> > >
> > >
> > >
> > > On 3/14/23 21:25, Marc Smith wrote:
> > > > On Mon, Feb 8, 2021 at 7:49?PM Guoqing Jiang
> > > > <guoqing.jiang@cloud.ionos.com> wrote:
> > > >> Hi Donald,
> > > >>
> > > >> On 2/8/21 19:41, Donald Buczek wrote:
> > > >>> Dear Guoqing,
> > > >>>
> > > >>> On 08.02.21 15:53, Guoqing Jiang wrote:
> > > >>>>
> > > >>>> On 2/8/21 12:38, Donald Buczek wrote:
> > > >>>>>> 5. maybe don't hold reconfig_mutex when try to unregister
> > > >>>>>> sync_thread, like this.
> > > >>>>>>
> > > >>>>>> /* resync has finished, collect result */
> > > >>>>>> mddev_unlock(mddev);
> > > >>>>>> md_unregister_thread(&mddev->sync_thread);
> > > >>>>>> mddev_lock(mddev);
> > > >>>>> As above: While we wait for the sync thread to terminate, wouldn't it
> > > >>>>> be a problem, if another user space operation takes the mutex?
> > > >>>> I don't think other places can be blocked while hold mutex, otherwise
> > > >>>> these places can cause potential deadlock. Please try above two lines
> > > >>>> change. And perhaps others have better idea.
> > > >>> Yes, this works. No deadlock after >11000 seconds,
> > > >>>
> > > >>> (Time till deadlock from previous runs/seconds: 1723, 37, 434, 1265,
> > > >>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
> > > >> Great. I will send a formal patch with your reported-by and tested-by.
> > > >>
> > > >> Thanks,
> > > >> Guoqing
> > > > I'm still hitting this issue with Linux 5.4.229 -- it looks like 1/2
> > > > of the patches that supposedly resolve this were applied to the stable
> > > > kernels, however, one was omitted due to a regression:
> > > > md: don't unregister sync_thread with reconfig_mutex held (upstream
> > > > commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
> > > >
> > > > I don't see any follow-up on the thread from June 8th 2022 asking for
> > > > this patch to be dropped from all stable kernels since it caused a
> > > > regression.
> > > >
> > > > The patch doesn't appear to be present in the current mainline kernel
> > > > (6.3-rc2) either. So I assume this issue is still present there, or it
> > > > was resolved differently and I just can't find the commit/patch.
> > >
> > > It should be fixed by commit 9dfbdafda3b3"md: unlock mddev before reap
> > > sync_thread in action_store".
> >
> > Okay, let me try applying that patch... it does not appear to be
> > present in my 5.4.229 kernel source. Thanks.
>
> Yes, applying this '9dfbdafda3b3 "md: unlock mddev before reap
> sync_thread in action_store"' patch on top of vanilla 5.4.229 source
> appears to fix the problem for me -- I can't reproduce the issue with
> the script, and it's been running for >24 hours now. (Previously I was
> able to induce the issue within a matter of minutes.)

Hi Marc,

Could you please run your reproducer on the md-tmp branch?

https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=md-tmp

This contains a different version of the fix by Yu Kuai.

Thanks,
Song
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
On Tue, 3/28/23 17:01 Song Liu wrote:
> On Thu, Mar 16, 2023 at 8:25=E2=80=AFAM Marc Smith <msmith626@gmail.com>
> wr=
> ote:
> >
> > On Tue, Mar 14, 2023 at 10:45=E2=80=AFAM Marc Smith
> <msmith626@gmail.com>=
> wrote:
> > >
> > > On Tue, Mar 14, 2023 at 9:55=E2=80=AFAM Guoqing Jiang
> <guoqing.jiang@li=
> nux.dev> wrote:
> > > >
> > > >
> > > >
> > > > On 3/14/23 21:25, Marc Smith wrote:
> > > > > On Mon, Feb 8, 2021 at 7:49=E2=80=AFPM Guoqing Jiang
> > > > > <guoqing.jiang@cloud.ionos.com> wrote:
> > > > >> Hi Donald,
> > > > >>
> > > > >> On 2/8/21 19:41, Donald Buczek wrote:
> > > > >>> Dear Guoqing,
> > > > >>>
> > > > >>> On 08.02.21 15:53, Guoqing Jiang wrote:
> > > > >>>>
> > > > >>>> On 2/8/21 12:38, Donald Buczek wrote:
> > > > >>>>>> 5. maybe don't hold reconfig_mutex when try to unregister
> > > > >>>>>> sync_thread, like this.
> > > > >>>>>>
> > > > >>>>>> /* resync has finished, collect result */
> > > > >>>>>> mddev_unlock(mddev);
> > > > >>>>>> md_unregister_thread(&mddev->sync_thread);
> > > > >>>>>> mddev_lock(mddev);
> > > > >>>>> As above: While we wait for the sync thread to terminate,
> would=
> n't it
> > > > >>>>> be a problem, if another user space operation takes the mutex?
> > > > >>>> I don't think other places can be blocked while hold mutex,
> othe=
> rwise
> > > > >>>> these places can cause potential deadlock. Please try above
> two =
> lines
> > > > >>>> change. And perhaps others have better idea.
> > > > >>> Yes, this works. No deadlock after >11000 seconds,
> > > > >>>
> > > > >>> (Time till deadlock from previous runs/seconds: 1723, 37,
> 434, 12=
> 65,
> > > > >>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
> > > > >> Great. I will send a formal patch with your reported-by and
> tested=
> -by.
> > > > >>
> > > > >> Thanks,
> > > > >> Guoqing
> > > > > I'm still hitting this issue with Linux 5.4.229 -- it looks
> like 1/=
> 2
> > > > > of the patches that supposedly resolve this were applied to the
> sta=
> ble
> > > > > kernels, however, one was omitted due to a regression:
> > > > > md: don't unregister sync_thread with reconfig_mutex held
> (upstream
> > > > > commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
> > > > >
> > > > > I don't see any follow-up on the thread from June 8th 2022
> asking f=
> or
> > > > > this patch to be dropped from all stable kernels since it caused a
> > > > > regression.
> > > > >
> > > > > The patch doesn't appear to be present in the current mainline
> kern=
> el
> > > > > (6.3-rc2) either. So I assume this issue is still present
> there, or=
> it
> > > > > was resolved differently and I just can't find the commit/patch.
> > > >
> > > > It should be fixed by commit 9dfbdafda3b3"md: unlock mddev before
> rea=
> p
> > > > sync_thread in action_store".
> > >
> > > Okay, let me try applying that patch... it does not appear to be
> > > present in my 5.4.229 kernel source. Thanks.
> >
> > Yes, applying this '9dfbdafda3b3 "md: unlock mddev before reap
> > sync_thread in action_store"' patch on top of vanilla 5.4.229 source
> > appears to fix the problem for me -- I can't reproduce the issue with
> > the script, and it's been running for >24 hours now. (Previously I was
> > able to induce the issue within a matter of minutes.)
>
> Hi Marc,
>
> Could you please run your reproducer on the md-tmp branch?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp
>
> This contains a different version of the fix by Yu Kuai.
>
> Thanks,
> Song
>

Hi Song, I can easily reproduce this issue on 5.10.133 and 5.10.53. The change
"9dfbdafda3b3 "md: unlock mddev before reap" does not fix the issue for me.

But I did pull the changes from the md-tmp branch you are refering:
https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp

I was not totally clear on which change exactly to pull, but I pulled
the following changes:
2023-03-28 md: enhance checking in md_check_recovery()md-tmp Yu Kuai 1 -7/+15
2023-03-28 md: wake up 'resync_wait' at last in md_reap_sync_thread() Yu Kuai 1 -1/+1
2023-03-28 md: refactor idle/frozen_sync_thread() Yu Kuai 2 -4/+22
2023-03-28 md: add a mutex to synchronize idle and frozen in action_store() Yu Kuai 2 -0/+8
2023-03-28 md: refactor action_store() for 'idle' and 'frozen' Yu Kuai 1 -16/+45

I used to be able to reproduce the lockup within minutes, but with those
changes the test system has been running for more than 120 hours.

When you said a "different fix", can you confirm that I grabbed the right
changes and that I need all 5 of them.

And second question was, has this fix been submitted upstream yet?
If so which kernel version?

Thank you
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
Hi,

?? 2023/08/23 5:16, Dragan Stancevic ???:
> On Tue, 3/28/23 17:01 Song Liu wrote:
>> On Thu, Mar 16, 2023 at 8:25=E2=80=AFAM Marc Smith <msmith626@gmail.com>
>> wr=
>> ote:
>> >
>> > On Tue, Mar 14, 2023 at 10:45=E2=80=AFAM Marc Smith
>> <msmith626@gmail.com>=
>> wrote:
>> > >
>> > > On Tue, Mar 14, 2023 at 9:55=E2=80=AFAM Guoqing Jiang
>> <guoqing.jiang@li=
>> nux.dev> wrote:
>> > > >
>> > > >
>> > > >
>> > > > On 3/14/23 21:25, Marc Smith wrote:
>> > > > > On Mon, Feb 8, 2021 at 7:49=E2=80=AFPM Guoqing Jiang
>> > > > > <guoqing.jiang@cloud.ionos.com> wrote:
>> > > > >> Hi Donald,
>> > > > >>
>> > > > >> On 2/8/21 19:41, Donald Buczek wrote:
>> > > > >>> Dear Guoqing,
>> > > > >>>
>> > > > >>> On 08.02.21 15:53, Guoqing Jiang wrote:
>> > > > >>>>
>> > > > >>>> On 2/8/21 12:38, Donald Buczek wrote:
>> > > > >>>>>> 5. maybe don't hold reconfig_mutex when try to unregister
>> > > > >>>>>> sync_thread, like this.
>> > > > >>>>>>
>> > > > >>>>>> /* resync has finished, collect result */
>> > > > >>>>>> mddev_unlock(mddev);
>> > > > >>>>>> md_unregister_thread(&mddev->sync_thread);
>> > > > >>>>>> mddev_lock(mddev);
>> > > > >>>>> As above: While we wait for the sync thread to terminate,
>> would=
>> n't it
>> > > > >>>>> be a problem, if another user space operation takes the mutex?
>> > > > >>>> I don't think other places can be blocked while hold mutex,
>> othe=
>> rwise
>> > > > >>>> these places can cause potential deadlock. Please try above
>> two =
>> lines
>> > > > >>>> change. And perhaps others have better idea.
>> > > > >>> Yes, this works. No deadlock after >11000 seconds,
>> > > > >>>
>> > > > >>> (Time till deadlock from previous runs/seconds: 1723, 37,
>> 434, 12=
>> 65,
>> > > > >>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
>> > > > >> Great. I will send a formal patch with your reported-by and
>> tested=
>> -by.
>> > > > >>
>> > > > >> Thanks,
>> > > > >> Guoqing
>> > > > > I'm still hitting this issue with Linux 5.4.229 -- it looks
>> like 1/=
>> 2
>> > > > > of the patches that supposedly resolve this were applied to the
>> sta=
>> ble
>> > > > > kernels, however, one was omitted due to a regression:
>> > > > > md: don't unregister sync_thread with reconfig_mutex held
>> (upstream
>> > > > > commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
>> > > > >
>> > > > > I don't see any follow-up on the thread from June 8th 2022
>> asking f=
>> or
>> > > > > this patch to be dropped from all stable kernels since it caused a
>> > > > > regression.
>> > > > >
>> > > > > The patch doesn't appear to be present in the current mainline
>> kern=
>> el
>> > > > > (6.3-rc2) either. So I assume this issue is still present
>> there, or=
>> it
>> > > > > was resolved differently and I just can't find the commit/patch.
>> > > >
>> > > > It should be fixed by commit 9dfbdafda3b3"md: unlock mddev before
>> rea=
>> p
>> > > > sync_thread in action_store".
>> > >
>> > > Okay, let me try applying that patch... it does not appear to be
>> > > present in my 5.4.229 kernel source. Thanks.
>> >
>> > Yes, applying this '9dfbdafda3b3 "md: unlock mddev before reap
>> > sync_thread in action_store"' patch on top of vanilla 5.4.229 source
>> > appears to fix the problem for me -- I can't reproduce the issue with
>> > the script, and it's been running for >24 hours now. (Previously I was
>> > able to induce the issue within a matter of minutes.)
>>
>> Hi Marc,
>>
>> Could you please run your reproducer on the md-tmp branch?
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp
>>
>> This contains a different version of the fix by Yu Kuai.
>>
>> Thanks,
>> Song
>>
>
> Hi Song, I can easily reproduce this issue on 5.10.133 and 5.10.53. The change
> "9dfbdafda3b3 "md: unlock mddev before reap" does not fix the issue for me.
>
> But I did pull the changes from the md-tmp branch you are refering:
> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp
>
> I was not totally clear on which change exactly to pull, but I pulled
> the following changes:
> 2023-03-28 md: enhance checking in md_check_recovery()md-tmp Yu Kuai 1 -7/+15
> 2023-03-28 md: wake up 'resync_wait' at last in md_reap_sync_thread() Yu Kuai 1 -1/+1
> 2023-03-28 md: refactor idle/frozen_sync_thread() Yu Kuai 2 -4/+22
> 2023-03-28 md: add a mutex to synchronize idle and frozen in action_store() Yu Kuai 2 -0/+8
> 2023-03-28 md: refactor action_store() for 'idle' and 'frozen' Yu Kuai 1 -16/+45
>
> I used to be able to reproduce the lockup within minutes, but with those
> changes the test system has been running for more than 120 hours.
>
> When you said a "different fix", can you confirm that I grabbed the right
> changes and that I need all 5 of them.

Yes, you grabbed the right changes, and these patches is merged to
linux-next as well.
>
> And second question was, has this fix been submitted upstream yet?
> If so which kernel version?

This fix is currently in linux-next, and will be applied to v6.6-rc1
soon.

Thanks,
Kuai

>
> Thank you
>
>
> .
>
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
Hi Kuai-

On 8/22/23 20:22, Yu Kuai wrote:
> Hi,
>
> ? 2023/08/23 5:16, Dragan Stancevic ??:
>> On Tue, 3/28/23 17:01 Song Liu wrote:
>>> On Thu, Mar 16, 2023 at 8:25=E2=80=AFAM Marc Smith <msmith626@gmail.com>
>>> wr=
>>> ote:
>>>   >
>>>   > On Tue, Mar 14, 2023 at 10:45=E2=80=AFAM Marc Smith
>>> <msmith626@gmail.com>=
>>>    wrote:
>>>   > >
>>>   > > On Tue, Mar 14, 2023 at 9:55=E2=80=AFAM Guoqing Jiang
>>> <guoqing.jiang@li=
>>> nux.dev> wrote:
>>>   > > >
>>>   > > >
>>>   > > >
>>>   > > > On 3/14/23 21:25, Marc Smith wrote:
>>>   > > > > On Mon, Feb 8, 2021 at 7:49=E2=80=AFPM Guoqing Jiang
>>>   > > > > <guoqing.jiang@cloud.ionos.com> wrote:
>>>   > > > >> Hi Donald,
>>>   > > > >>
>>>   > > > >> On 2/8/21 19:41, Donald Buczek wrote:
>>>   > > > >>> Dear Guoqing,
>>>   > > > >>>
>>>   > > > >>> On 08.02.21 15:53, Guoqing Jiang wrote:
>>>   > > > >>>>
>>>   > > > >>>> On 2/8/21 12:38, Donald Buczek wrote:
>>>   > > > >>>>>> 5. maybe don't hold reconfig_mutex when try to unregister
>>>   > > > >>>>>> sync_thread, like this.
>>>   > > > >>>>>>
>>>   > > > >>>>>>           /* resync has finished, collect result */
>>>   > > > >>>>>>           mddev_unlock(mddev);
>>>   > > > >>>>>>           md_unregister_thread(&mddev->sync_thread);
>>>   > > > >>>>>>           mddev_lock(mddev);
>>>   > > > >>>>> As above: While we wait for the sync thread to terminate,
>>> would=
>>> n't it
>>>   > > > >>>>> be a problem, if another user space operation takes the
>>> mutex?
>>>   > > > >>>> I don't think other places can be blocked while hold mutex,
>>> othe=
>>> rwise
>>>   > > > >>>> these places can cause potential deadlock. Please try above
>>> two =
>>> lines
>>>   > > > >>>> change. And perhaps others have better idea.
>>>   > > > >>> Yes, this works. No deadlock after >11000 seconds,
>>>   > > > >>>
>>>   > > > >>> (Time till deadlock from previous runs/seconds: 1723, 37,
>>> 434, 12=
>>> 65,
>>>   > > > >>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
>>>   > > > >> Great. I will send a formal patch with your reported-by and
>>> tested=
>>> -by.
>>>   > > > >>
>>>   > > > >> Thanks,
>>>   > > > >> Guoqing
>>>   > > > > I'm still hitting this issue with Linux 5.4.229 -- it looks
>>> like 1/=
>>> 2
>>>   > > > > of the patches that supposedly resolve this were applied to
>>> the
>>> sta=
>>> ble
>>>   > > > > kernels, however, one was omitted due to a regression:
>>>   > > > > md: don't unregister sync_thread with reconfig_mutex held
>>> (upstream
>>>   > > > > commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
>>>   > > > >
>>>   > > > > I don't see any follow-up on the thread from June 8th 2022
>>> asking f=
>>> or
>>>   > > > > this patch to be dropped from all stable kernels since it
>>> caused a
>>>   > > > > regression.
>>>   > > > >
>>>   > > > > The patch doesn't appear to be present in the current mainline
>>> kern=
>>> el
>>>   > > > > (6.3-rc2) either. So I assume this issue is still present
>>> there, or=
>>>    it
>>>   > > > > was resolved differently and I just can't find the
>>> commit/patch.
>>>   > > >
>>>   > > > It should be fixed by commit 9dfbdafda3b3"md: unlock mddev
>>> before
>>> rea=
>>> p
>>>   > > > sync_thread in action_store".
>>>   > >
>>>   > > Okay, let me try applying that patch... it does not appear to be
>>>   > > present in my 5.4.229 kernel source. Thanks.
>>>   >
>>>   > Yes, applying this '9dfbdafda3b3 "md: unlock mddev before reap
>>>   > sync_thread in action_store"' patch on top of vanilla 5.4.229 source
>>>   > appears to fix the problem for me -- I can't reproduce the issue
>>> with
>>>   > the script, and it's been running for >24 hours now. (Previously
>>> I was
>>>   > able to induce the issue within a matter of minutes.)
>>>
>>> Hi Marc,
>>>
>>> Could you please run your reproducer on the md-tmp branch?
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp
>>>
>>> This contains a different version of the fix by Yu Kuai.
>>>
>>> Thanks,
>>> Song
>>>
>>
>> Hi Song, I can easily reproduce this issue on 5.10.133 and 5.10.53.
>> The change
>> "9dfbdafda3b3 "md: unlock mddev before reap" does not fix the issue
>> for me.
>>
>> But I did pull the changes from the md-tmp branch you are refering:
>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp
>>
>> I was not totally clear on which change exactly to pull, but I pulled
>> the following changes:
>> 2023-03-28 md: enhance checking in md_check_recovery()md-tmp    Yu
>> Kuai    1 -7/+15
>> 2023-03-28 md: wake up 'resync_wait' at last in
>> md_reap_sync_thread()    Yu Kuai    1 -1/+1
>> 2023-03-28 md: refactor idle/frozen_sync_thread()    Yu Kuai    2 -4/+22
>> 2023-03-28 md: add a mutex to synchronize idle and frozen in
>> action_store()    Yu Kuai    2 -0/+8
>> 2023-03-28 md: refactor action_store() for 'idle' and 'frozen'    Yu
>> Kuai    1 -16/+45
>>
>> I used to be able to reproduce the lockup within minutes, but with those
>> changes the test system has been running for more than 120 hours.
>>
>> When you said a "different fix", can you confirm that I grabbed the right
>> changes and that I need all 5 of them.
>
> Yes, you grabbed the right changes, and these patches is merged to
> linux-next as well.
>>
>> And second question was, has this fix been submitted upstream yet?
>> If so which kernel version?
>
> This fix is currently in linux-next, and will be applied to v6.6-rc1
> soon.

Thank you, that is great news. I'd like to see this change backported to
5.10 and 6.1, do you have any plans of backporting to any of the
previous kernels?

If not, I would like to try to get your changes into 5.10 and 6.1 if
Greg will accept them.


Four out of five of your changes were a straight cherry-pick into 5.10,
one needed a minor conflict resolution. But I can definitely confirm
that your changes fix the lockup issue on 5.10

I am now switching to 6.1 and will test the changes there too.


Thanks


--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
Hi,

? 2023/08/23 23:33, Dragan Stancevic ??:
> Hi Kuai-
>
> On 8/22/23 20:22, Yu Kuai wrote:
>> Hi,
>>
>> ? 2023/08/23 5:16, Dragan Stancevic ??:
>>> On Tue, 3/28/23 17:01 Song Liu wrote:
>>>> On Thu, Mar 16, 2023 at 8:25=E2=80=AFAM Marc Smith
>>>> <msmith626@gmail.com>
>>>> wr=
>>>> ote:
>>>>   >
>>>>   > On Tue, Mar 14, 2023 at 10:45=E2=80=AFAM Marc Smith
>>>> <msmith626@gmail.com>=
>>>>    wrote:
>>>>   > >
>>>>   > > On Tue, Mar 14, 2023 at 9:55=E2=80=AFAM Guoqing Jiang
>>>> <guoqing.jiang@li=
>>>> nux.dev> wrote:
>>>>   > > >
>>>>   > > >
>>>>   > > >
>>>>   > > > On 3/14/23 21:25, Marc Smith wrote:
>>>>   > > > > On Mon, Feb 8, 2021 at 7:49=E2=80=AFPM Guoqing Jiang
>>>>   > > > > <guoqing.jiang@cloud.ionos.com> wrote:
>>>>   > > > >> Hi Donald,
>>>>   > > > >>
>>>>   > > > >> On 2/8/21 19:41, Donald Buczek wrote:
>>>>   > > > >>> Dear Guoqing,
>>>>   > > > >>>
>>>>   > > > >>> On 08.02.21 15:53, Guoqing Jiang wrote:
>>>>   > > > >>>>
>>>>   > > > >>>> On 2/8/21 12:38, Donald Buczek wrote:
>>>>   > > > >>>>>> 5. maybe don't hold reconfig_mutex when try to
>>>> unregister
>>>>   > > > >>>>>> sync_thread, like this.
>>>>   > > > >>>>>>
>>>>   > > > >>>>>>           /* resync has finished, collect result */
>>>>   > > > >>>>>>           mddev_unlock(mddev);
>>>>   > > > >>>>>>           md_unregister_thread(&mddev->sync_thread);
>>>>   > > > >>>>>>           mddev_lock(mddev);
>>>>   > > > >>>>> As above: While we wait for the sync thread to terminate,
>>>> would=
>>>> n't it
>>>>   > > > >>>>> be a problem, if another user space operation takes
>>>> the mutex?
>>>>   > > > >>>> I don't think other places can be blocked while hold
>>>> mutex,
>>>> othe=
>>>> rwise
>>>>   > > > >>>> these places can cause potential deadlock. Please try
>>>> above
>>>> two =
>>>> lines
>>>>   > > > >>>> change. And perhaps others have better idea.
>>>>   > > > >>> Yes, this works. No deadlock after >11000 seconds,
>>>>   > > > >>>
>>>>   > > > >>> (Time till deadlock from previous runs/seconds: 1723, 37,
>>>> 434, 12=
>>>> 65,
>>>>   > > > >>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
>>>>   > > > >> Great. I will send a formal patch with your reported-by and
>>>> tested=
>>>> -by.
>>>>   > > > >>
>>>>   > > > >> Thanks,
>>>>   > > > >> Guoqing
>>>>   > > > > I'm still hitting this issue with Linux 5.4.229 -- it looks
>>>> like 1/=
>>>> 2
>>>>   > > > > of the patches that supposedly resolve this were applied
>>>> to the
>>>> sta=
>>>> ble
>>>>   > > > > kernels, however, one was omitted due to a regression:
>>>>   > > > > md: don't unregister sync_thread with reconfig_mutex held
>>>> (upstream
>>>>   > > > > commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
>>>>   > > > >
>>>>   > > > > I don't see any follow-up on the thread from June 8th 2022
>>>> asking f=
>>>> or
>>>>   > > > > this patch to be dropped from all stable kernels since it
>>>> caused a
>>>>   > > > > regression.
>>>>   > > > >
>>>>   > > > > The patch doesn't appear to be present in the current
>>>> mainline
>>>> kern=
>>>> el
>>>>   > > > > (6.3-rc2) either. So I assume this issue is still present
>>>> there, or=
>>>>    it
>>>>   > > > > was resolved differently and I just can't find the
>>>> commit/patch.
>>>>   > > >
>>>>   > > > It should be fixed by commit 9dfbdafda3b3"md: unlock mddev
>>>> before
>>>> rea=
>>>> p
>>>>   > > > sync_thread in action_store".
>>>>   > >
>>>>   > > Okay, let me try applying that patch... it does not appear to be
>>>>   > > present in my 5.4.229 kernel source. Thanks.
>>>>   >
>>>>   > Yes, applying this '9dfbdafda3b3 "md: unlock mddev before reap
>>>>   > sync_thread in action_store"' patch on top of vanilla 5.4.229
>>>> source
>>>>   > appears to fix the problem for me -- I can't reproduce the issue
>>>> with
>>>>   > the script, and it's been running for >24 hours now. (Previously
>>>> I was
>>>>   > able to induce the issue within a matter of minutes.)
>>>>
>>>> Hi Marc,
>>>>
>>>> Could you please run your reproducer on the md-tmp branch?
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp
>>>>
>>>>
>>>> This contains a different version of the fix by Yu Kuai.
>>>>
>>>> Thanks,
>>>> Song
>>>>
>>>
>>> Hi Song, I can easily reproduce this issue on 5.10.133 and 5.10.53.
>>> The change
>>> "9dfbdafda3b3 "md: unlock mddev before reap" does not fix the issue
>>> for me.
>>>
>>> But I did pull the changes from the md-tmp branch you are refering:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp
>>>
>>>
>>> I was not totally clear on which change exactly to pull, but I pulled
>>> the following changes:
>>> 2023-03-28 md: enhance checking in md_check_recovery()md-tmp    Yu
>>> Kuai    1 -7/+15
>>> 2023-03-28 md: wake up 'resync_wait' at last in
>>> md_reap_sync_thread()    Yu Kuai    1 -1/+1
>>> 2023-03-28 md: refactor idle/frozen_sync_thread()    Yu Kuai    2 -4/+22
>>> 2023-03-28 md: add a mutex to synchronize idle and frozen in
>>> action_store()    Yu Kuai    2 -0/+8
>>> 2023-03-28 md: refactor action_store() for 'idle' and 'frozen'    Yu
>>> Kuai    1 -16/+45
>>>
>>> I used to be able to reproduce the lockup within minutes, but with those
>>> changes the test system has been running for more than 120 hours.
>>>
>>> When you said a "different fix", can you confirm that I grabbed the
>>> right
>>> changes and that I need all 5 of them.
>>
>> Yes, you grabbed the right changes, and these patches is merged to
>> linux-next as well.
>>>
>>> And second question was, has this fix been submitted upstream yet?
>>> If so which kernel version?
>>
>> This fix is currently in linux-next, and will be applied to v6.6-rc1
>> soon.
>
> Thank you, that is great news. I'd like to see this change backported to
> 5.10 and 6.1, do you have any plans of backporting to any of the
> previous kernels?
>
> If not, I would like to try to get your changes into 5.10 and 6.1 if
> Greg will accept them.
>

I don't have plans yet, so feel free to do this, I guess these patches
won't be picked automatically due to the conflict. Feel free to ask if
you meet any problems.

Thanks,
Kuai

>
> Four out of five of your changes were a straight cherry-pick into 5.10,
> one needed a minor conflict resolution. But I can definitely confirm
> that your changes fix the lockup issue on 5.10
>
> I am now switching to 6.1 and will test the changes there too.
>
>
> Thanks
>
>
> --
> Peace can only come as a natural consequence
> of universal enlightenment -Dr. Nikola Tesla
>
>
> .
>
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
Hi Kuai,


On 8/23/23 20:18, Yu Kuai wrote:
> Hi,
>
> ? 2023/08/23 23:33, Dragan Stancevic ??:
>> Hi Kuai-
>>
>> On 8/22/23 20:22, Yu Kuai wrote:
>>> Hi,
>>>
>>> ? 2023/08/23 5:16, Dragan Stancevic ??:
>>>> On Tue, 3/28/23 17:01 Song Liu wrote:
>>>>> On Thu, Mar 16, 2023 at 8:25=E2=80=AFAM Marc Smith
>>>>> <msmith626@gmail.com>
>>>>> wr=
>>>>> ote:
>>>>>   >
>>>>>   > On Tue, Mar 14, 2023 at 10:45=E2=80=AFAM Marc Smith
>>>>> <msmith626@gmail.com>=
>>>>>    wrote:
>>>>>   > >
>>>>>   > > On Tue, Mar 14, 2023 at 9:55=E2=80=AFAM Guoqing Jiang
>>>>> <guoqing.jiang@li=
>>>>> nux.dev> wrote:
>>>>>   > > >
>>>>>   > > >
>>>>>   > > >
>>>>>   > > > On 3/14/23 21:25, Marc Smith wrote:
>>>>>   > > > > On Mon, Feb 8, 2021 at 7:49=E2=80=AFPM Guoqing Jiang
>>>>>   > > > > <guoqing.jiang@cloud.ionos.com> wrote:
>>>>>   > > > >> Hi Donald,
>>>>>   > > > >>
>>>>>   > > > >> On 2/8/21 19:41, Donald Buczek wrote:
>>>>>   > > > >>> Dear Guoqing,
>>>>>   > > > >>>
>>>>>   > > > >>> On 08.02.21 15:53, Guoqing Jiang wrote:
>>>>>   > > > >>>>
>>>>>   > > > >>>> On 2/8/21 12:38, Donald Buczek wrote:
>>>>>   > > > >>>>>> 5. maybe don't hold reconfig_mutex when try to
>>>>> unregister
>>>>>   > > > >>>>>> sync_thread, like this.
>>>>>   > > > >>>>>>
>>>>>   > > > >>>>>>           /* resync has finished, collect result */
>>>>>   > > > >>>>>>           mddev_unlock(mddev);
>>>>>   > > > >>>>>>           md_unregister_thread(&mddev->sync_thread);
>>>>>   > > > >>>>>>           mddev_lock(mddev);
>>>>>   > > > >>>>> As above: While we wait for the sync thread to
>>>>> terminate,
>>>>> would=
>>>>> n't it
>>>>>   > > > >>>>> be a problem, if another user space operation takes
>>>>> the mutex?
>>>>>   > > > >>>> I don't think other places can be blocked while hold
>>>>> mutex,
>>>>> othe=
>>>>> rwise
>>>>>   > > > >>>> these places can cause potential deadlock. Please try
>>>>> above
>>>>> two =
>>>>> lines
>>>>>   > > > >>>> change. And perhaps others have better idea.
>>>>>   > > > >>> Yes, this works. No deadlock after >11000 seconds,
>>>>>   > > > >>>
>>>>>   > > > >>> (Time till deadlock from previous runs/seconds: 1723, 37,
>>>>> 434, 12=
>>>>> 65,
>>>>>   > > > >>> 3500, 1136, 109, 1892, 1060, 664, 84, 315, 12, 820 )
>>>>>   > > > >> Great. I will send a formal patch with your reported-by and
>>>>> tested=
>>>>> -by.
>>>>>   > > > >>
>>>>>   > > > >> Thanks,
>>>>>   > > > >> Guoqing
>>>>>   > > > > I'm still hitting this issue with Linux 5.4.229 -- it looks
>>>>> like 1/=
>>>>> 2
>>>>>   > > > > of the patches that supposedly resolve this were applied
>>>>> to the
>>>>> sta=
>>>>> ble
>>>>>   > > > > kernels, however, one was omitted due to a regression:
>>>>>   > > > > md: don't unregister sync_thread with reconfig_mutex held
>>>>> (upstream
>>>>>   > > > > commit 8b48ec23cc51a4e7c8dbaef5f34ebe67e1a80934)
>>>>>   > > > >
>>>>>   > > > > I don't see any follow-up on the thread from June 8th 2022
>>>>> asking f=
>>>>> or
>>>>>   > > > > this patch to be dropped from all stable kernels since it
>>>>> caused a
>>>>>   > > > > regression.
>>>>>   > > > >
>>>>>   > > > > The patch doesn't appear to be present in the current
>>>>> mainline
>>>>> kern=
>>>>> el
>>>>>   > > > > (6.3-rc2) either. So I assume this issue is still present
>>>>> there, or=
>>>>>    it
>>>>>   > > > > was resolved differently and I just can't find the
>>>>> commit/patch.
>>>>>   > > >
>>>>>   > > > It should be fixed by commit 9dfbdafda3b3"md: unlock mddev
>>>>> before
>>>>> rea=
>>>>> p
>>>>>   > > > sync_thread in action_store".
>>>>>   > >
>>>>>   > > Okay, let me try applying that patch... it does not appear to be
>>>>>   > > present in my 5.4.229 kernel source. Thanks.
>>>>>   >
>>>>>   > Yes, applying this '9dfbdafda3b3 "md: unlock mddev before reap
>>>>>   > sync_thread in action_store"' patch on top of vanilla 5.4.229
>>>>> source
>>>>>   > appears to fix the problem for me -- I can't reproduce the
>>>>> issue with
>>>>>   > the script, and it's been running for >24 hours now.
>>>>> (Previously I was
>>>>>   > able to induce the issue within a matter of minutes.)
>>>>>
>>>>> Hi Marc,
>>>>>
>>>>> Could you please run your reproducer on the md-tmp branch?
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp
>>>>>
>>>>> This contains a different version of the fix by Yu Kuai.
>>>>>
>>>>> Thanks,
>>>>> Song
>>>>>
>>>>
>>>> Hi Song, I can easily reproduce this issue on 5.10.133 and 5.10.53.
>>>> The change
>>>> "9dfbdafda3b3 "md: unlock mddev before reap" does not fix the issue
>>>> for me.
>>>>
>>>> But I did pull the changes from the md-tmp branch you are refering:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=3Dmd-tmp
>>>>
>>>> I was not totally clear on which change exactly to pull, but I pulled
>>>> the following changes:
>>>> 2023-03-28 md: enhance checking in md_check_recovery()md-tmp    Yu
>>>> Kuai    1 -7/+15
>>>> 2023-03-28 md: wake up 'resync_wait' at last in
>>>> md_reap_sync_thread()    Yu Kuai    1 -1/+1
>>>> 2023-03-28 md: refactor idle/frozen_sync_thread()    Yu Kuai    2
>>>> -4/+22
>>>> 2023-03-28 md: add a mutex to synchronize idle and frozen in
>>>> action_store()    Yu Kuai    2 -0/+8
>>>> 2023-03-28 md: refactor action_store() for 'idle' and 'frozen'    Yu
>>>> Kuai    1 -16/+45
>>>>
>>>> I used to be able to reproduce the lockup within minutes, but with
>>>> those
>>>> changes the test system has been running for more than 120 hours.
>>>>
>>>> When you said a "different fix", can you confirm that I grabbed the
>>>> right
>>>> changes and that I need all 5 of them.
>>>
>>> Yes, you grabbed the right changes, and these patches is merged to
>>> linux-next as well.
>>>>
>>>> And second question was, has this fix been submitted upstream yet?
>>>> If so which kernel version?
>>>
>>> This fix is currently in linux-next, and will be applied to v6.6-rc1
>>> soon.
>>
>> Thank you, that is great news. I'd like to see this change backported
>> to 5.10 and 6.1, do you have any plans of backporting to any of the
>> previous kernels?
>>
>> If not, I would like to try to get your changes into 5.10 and 6.1 if
>> Greg will accept them.
>>
>
> I don't have plans yet, so feel free to do this, I guess these patches
> won't be picked automatically due to the conflict. Feel free to ask if
> you meet any problems.

Just a followup on 6.1 testing. I tried reproducing this problem for 5
days with 6.1.42 kernel without your patches and I was not able to
reproduce it.

It seems that 6.1 has some other code that prevents this from happening.

On 5.10 I can reproduce it within minutes to an hour.



>
> Thanks,
> Kuai
>
>>
>> Four out of five of your changes were a straight cherry-pick into
>> 5.10, one needed a minor conflict resolution. But I can definitely
>> confirm that your changes fix the lockup issue on 5.10
>>
>> I am now switching to 6.1 and will test the changes there too.
>>
>>
>> Thanks
>>
>>
>> --
>> Peace can only come as a natural consequence
>> of universal enlightenment -Dr. Nikola Tesla
>>
>>
>> .
>>
>

--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
Hi,

? 2023/08/29 4:32, Dragan Stancevic ??:

> Just a followup on 6.1 testing. I tried reproducing this problem for 5
> days with 6.1.42 kernel without your patches and I was not able to
> reproduce it.
>
> It seems that 6.1 has some other code that prevents this from happening.
>

I see that there are lots of patches for raid456 between 5.10 and 6.1,
however, I remember that I used to reporduce the deadlock after 6.1, and
it's true it's not easy to reporduce, see below:

https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/

My guess is that 6.1 is harder to reporduce than 5.10 due to some
changes inside raid456.

By the way, raid10 had a similiar deadlock, and can be fixed the same
way, so it make sense to backport these patches.

https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com

Thanks,
Kuai


> On 5.10 I can reproduce it within minutes to an hour.
>
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
Hi,

? 2023/08/30 9:36, Yu Kuai ??:
> Hi,
>
> ? 2023/08/29 4:32, Dragan Stancevic ??:
>
>> Just a followup on 6.1 testing. I tried reproducing this problem for 5
>> days with 6.1.42 kernel without your patches and I was not able to
>> reproduce it.

oops, I forgot that you need to backport this patch first to reporduce
this problem:

https://lore.kernel.org/all/20230529132037.2124527-2-yukuai1@huaweicloud.com/

The patch fix the deadlock as well, but it introduce some regressions.

Thanks,
Kuai

>>
>> It seems that 6.1 has some other code that prevents this from happening.
>>
>
> I see that there are lots of patches for raid456 between 5.10 and 6.1,
> however, I remember that I used to reporduce the deadlock after 6.1, and
> it's true it's not easy to reporduce, see below:
>
> https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/
>
>
> My guess is that 6.1 is harder to reporduce than 5.10 due to some
> changes inside raid456.
>
> By the way, raid10 had a similiar deadlock, and can be fixed the same
> way, so it make sense to backport these patches.
>
> https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com
>
> Thanks,
> Kuai
>
>
>> On 5.10 I can reproduce it within minutes to an hour.
>>
>
> .
>
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
On 9/4/23 22:50, Yu Kuai wrote:
> Hi,
>
> ? 2023/08/30 9:36, Yu Kuai ??:
>> Hi,
>>
>> ? 2023/08/29 4:32, Dragan Stancevic ??:
>>
>>> Just a followup on 6.1 testing. I tried reproducing this problem for
>>> 5 days with 6.1.42 kernel without your patches and I was not able to
>>> reproduce it.
>
> oops, I forgot that you need to backport this patch first to reporduce
> this problem:
>
> https://lore.kernel.org/all/20230529132037.2124527-2-yukuai1@huaweicloud.com/
>
> The patch fix the deadlock as well, but it introduce some regressions.

Ha, jinx :) I was about to email you that I isolated that change with
the testing over the weekend that made it more difficult to reproduce in
6.1 and that the original change must be reverted :)



>
> Thanks,
> Kuai
>
>>>
>>> It seems that 6.1 has some other code that prevents this from happening.
>>>
>>
>> I see that there are lots of patches for raid456 between 5.10 and 6.1,
>> however, I remember that I used to reporduce the deadlock after 6.1, and
>> it's true it's not easy to reporduce, see below:
>>
>> https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/
>>
>> My guess is that 6.1 is harder to reporduce than 5.10 due to some
>> changes inside raid456.
>>
>> By the way, raid10 had a similiar deadlock, and can be fixed the same
>> way, so it make sense to backport these patches.
>>
>> https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com
>>
>> Thanks,
>> Kuai
>>
>>
>>> On 5.10 I can reproduce it within minutes to an hour.
>>>
>>
>> .
>>
>

--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
On 9/5/23 3:54 PM, Dragan Stancevic wrote:
> On 9/4/23 22:50, Yu Kuai wrote:
>> Hi,
>>
>> ? 2023/08/30 9:36, Yu Kuai ??:
>>> Hi,
>>>
>>> ? 2023/08/29 4:32, Dragan Stancevic ??:
>>>
>>>> Just a followup on 6.1 testing. I tried reproducing this problem for 5 days with 6.1.42 kernel without your patches and I was not able to reproduce it.
>>
>> oops, I forgot that you need to backport this patch first to reporduce
>> this problem:
>>
>> https://lore.kernel.org/all/20230529132037.2124527-2-yukuai1@huaweicloud.com/
>>
>> The patch fix the deadlock as well, but it introduce some regressions.

We've just got an unplanned lock up on "check" to "idle" transition with 6.1.52 after a few hours on a backup server. For the last 2 1/2 years we used the patch I originally proposed with multiple kernel versions [1]. But this no longer seems to be valid or maybe its even destructive in combination with the other changes.

But I totally lost track of the further development. As I understood, there are patches queue up in mainline, which might go into 6.1, too, but have not landed there which should fix the problem?

Can anyone give me exact references to the patches I'd need to apply to 6.1.52, so that I could probably fix my problem and also test the patches for you those on production systems with a load which tends to run into that problem easily?

Thanks

Donald

[1]: https://lore.kernel.org/linux-raid/bc342de0-98d2-1733-39cd-cc1999777ff3@molgen.mpg.de/

> Ha, jinx :) I was about to email you that I isolated that change with the testing over the weekend that made it more difficult to reproduce in 6.1 and that the original change must be reverted :)
>
>
>
>>
>> Thanks,
>> Kuai
>>
>>>>
>>>> It seems that 6.1 has some other code that prevents this from happening.
>>>>
>>>
>>> I see that there are lots of patches for raid456 between 5.10 and 6.1,
>>> however, I remember that I used to reporduce the deadlock after 6.1, and
>>> it's true it's not easy to reporduce, see below:
>>>
>>> https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/
>>>
>>> My guess is that 6.1 is harder to reporduce than 5.10 due to some
>>> changes inside raid456.
>>>
>>> By the way, raid10 had a similiar deadlock, and can be fixed the same
>>> way, so it make sense to backport these patches.
>>>
>>> https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com
>>>
>>> Thanks,
>>> Kuai
>>>
>>>
>>>> On 5.10 I can reproduce it within minutes to an hour.
>>>>
>>>
>>> .
>>>
>>
>


--
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
Hi Donald-

On 9/13/23 04:08, Donald Buczek wrote:
> On 9/5/23 3:54 PM, Dragan Stancevic wrote:
>> On 9/4/23 22:50, Yu Kuai wrote:
>>> Hi,
>>>
>>> ? 2023/08/30 9:36, Yu Kuai ??:
>>>> Hi,
>>>>
>>>> ? 2023/08/29 4:32, Dragan Stancevic ??:
>>>>
>>>>> Just a followup on 6.1 testing. I tried reproducing this problem for 5 days with 6.1.42 kernel without your patches and I was not able to reproduce it.
>>>
>>> oops, I forgot that you need to backport this patch first to reporduce
>>> this problem:
>>>
>>> https://lore.kernel.org/all/20230529132037.2124527-2-yukuai1@huaweicloud.com/
>>>
>>> The patch fix the deadlock as well, but it introduce some regressions.
>
> We've just got an unplanned lock up on "check" to "idle" transition with 6.1.52 after a few hours on a backup server. For the last 2 1/2 years we used the patch I originally proposed with multiple kernel versions [1]. But this no longer seems to be valid or maybe its even destructive in combination with the other changes.
>
> But I totally lost track of the further development. As I understood, there are patches queue up in mainline, which might go into 6.1, too, but have not landed there which should fix the problem?
>
> Can anyone give me exact references to the patches I'd need to apply to 6.1.52, so that I could probably fix my problem and also test the patches for you those on production systems with a load which tends to run into that problem easily?

Here is a list of changes for 6.1:

e5e9b9cb71a0 md: factor out a helper to wake up md_thread directly
f71209b1f21c md: enhance checking in md_check_recovery()
753260ed0b46 md: wake up 'resync_wait' at last in md_reap_sync_thread()
130443d60b1b md: refactor idle/frozen_sync_thread() to fix deadlock
6f56f0c4f124 md: add a mutex to synchronize idle and frozen in
action_store()
64e5e09afc14 md: refactor action_store() for 'idle' and 'frozen'
a865b96c513b Revert "md: unlock mddev before reap sync_thread in
action_store"

You can get them from the following tree:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git


>
> Thanks
>
> Donald
>
> [1]: https://lore.kernel.org/linux-raid/bc342de0-98d2-1733-39cd-cc1999777ff3@molgen.mpg.de/
>
>> Ha, jinx :) I was about to email you that I isolated that change with the testing over the weekend that made it more difficult to reproduce in 6.1 and that the original change must be reverted :)
>>
>>
>>
>>>
>>> Thanks,
>>> Kuai
>>>
>>>>>
>>>>> It seems that 6.1 has some other code that prevents this from happening.
>>>>>
>>>>
>>>> I see that there are lots of patches for raid456 between 5.10 and 6.1,
>>>> however, I remember that I used to reporduce the deadlock after 6.1, and
>>>> it's true it's not easy to reporduce, see below:
>>>>
>>>> https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/
>>>>
>>>> My guess is that 6.1 is harder to reporduce than 5.10 due to some
>>>> changes inside raid456.
>>>>
>>>> By the way, raid10 had a similiar deadlock, and can be fixed the same
>>>> way, so it make sense to backport these patches.
>>>>
>>>> https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com
>>>>
>>>> Thanks,
>>>> Kuai
>>>>
>>>>
>>>>> On 5.10 I can reproduce it within minutes to an hour.
>>>>>
>>>>
>>>> .
>>>>
>>>
>>
>
>

--
Peace can only come as a natural consequence
of universal enlightenment -Dr. Nikola Tesla
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
On 9/13/23 16:16, Dragan Stancevic wrote:
> Hi Donald-
>
> On 9/13/23 04:08, Donald Buczek wrote:
>> On 9/5/23 3:54 PM, Dragan Stancevic wrote:
>>> On 9/4/23 22:50, Yu Kuai wrote:
>>>> Hi,
>>>>
>>>> ? 2023/08/30 9:36, Yu Kuai ??:
>>>>> Hi,
>>>>>
>>>>> ? 2023/08/29 4:32, Dragan Stancevic ??:
>>>>>
>>>>>> Just a followup on 6.1 testing. I tried reproducing this problem for 5 days with 6.1.42 kernel without your patches and I was not able to reproduce it.
>>>>
>>>> oops, I forgot that you need to backport this patch first to reporduce
>>>> this problem:
>>>>
>>>> https://lore.kernel.org/all/20230529132037.2124527-2-yukuai1@huaweicloud.com/
>>>>
>>>> The patch fix the deadlock as well, but it introduce some regressions.
>>
>> We've just got an unplanned lock up on "check" to "idle" transition with 6.1.52 after a few hours on a backup server. For the last 2 1/2 years we used the patch I originally proposed with multiple kernel versions [1]. But this no longer seems to be valid or maybe its even destructive in combination with the other changes.
>>
>> But I totally lost track of the further development. As I understood, there are patches queue up in mainline, which might go into 6.1, too, but have not landed there which should fix the problem?
>>
>> Can anyone give me exact references to the patches I'd need to apply to 6.1.52, so that I could probably fix my problem and also test the patches for you those on production systems with a load which tends to run into that problem easily?
>
> Here is a list of changes for 6.1:
>
> e5e9b9cb71a0 md: factor out a helper to wake up md_thread directly
> f71209b1f21c md: enhance checking in md_check_recovery()
> 753260ed0b46 md: wake up 'resync_wait' at last in md_reap_sync_thread()
> 130443d60b1b md: refactor idle/frozen_sync_thread() to fix deadlock
> 6f56f0c4f124 md: add a mutex to synchronize idle and frozen in action_store()
> 64e5e09afc14 md: refactor action_store() for 'idle' and 'frozen'
> a865b96c513b Revert "md: unlock mddev before reap sync_thread in action_store"

Thanks!

I've put these patches on v6.1.52. I've started a script which transitions the three md-devices of a very active backup server through idle->check->idle every 6 minutes a few ours ago. It went through ~400 iterations till now. No lock-ups so far.

LGTM !

Donald

buczek@done:~$ dmesg|grep "data-check of RAID array"|wc
393 2820 18864
buczek@done:~$ cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] [multipath]
md2 : active raid6 sdc[0] sdo[15] sdn[14] sdm[13] sdl[12] sdk[11] sdj[10] sdi[9] sdh[8] sdg[7] sdf[6] sde[5] sdd[4] sdr[3] sdq[2] sdp[1]
109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
[=========>...........] check = 47.1% (3681799128/7813894144) finish=671.8min speed=102496K/sec
bitmap: 0/59 pages [0KB], 65536KB chunk

md1 : active raid6 sdaa[0] sdz[15] sdy[14] sdx[13] sdw[12] sdv[11] sdu[10] sdt[16] sds[8] sdah[7] sdag[17] sdaf[5] sdae[4] sdad[3] sdac[2] sdab[1]
109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
[=======>.............] check = 38.5% (3009484896/7813894144) finish=811.0min speed=98720K/sec
bitmap: 0/59 pages [0KB], 65536KB chunk

md0 : active raid6 sdai[0] sdax[15] sdaw[16] sdav[13] sdau[12] sdat[11] sdas[10] sdar[9] sdaq[8] sdap[7] sdao[6] sdan[17] sdam[4] sdal[3] sdak[2] sdaj[1]
109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
[========>............] check = 42.3% (3311789940/7813894144) finish=911.9min speed=82272K/sec
bitmap: 6/59 pages [24KB], 65536KB chunk

unused devices: <none>





>
> You can get them from the following tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>
>
>>
>> Thanks
>>
>>    Donald
>>
>> [1]: https://lore.kernel.org/linux-raid/bc342de0-98d2-1733-39cd-cc1999777ff3@molgen.mpg.de/
>>
>>> Ha, jinx :) I was about to email you that I isolated that change with the testing over the weekend that made it more difficult to reproduce in 6.1 and that the original change must be reverted :)
>>>
>>>
>>>
>>>>
>>>> Thanks,
>>>> Kuai
>>>>
>>>>>>
>>>>>> It seems that 6.1 has some other code that prevents this from happening.
>>>>>>
>>>>>
>>>>> I see that there are lots of patches for raid456 between 5.10 and 6.1,
>>>>> however, I remember that I used to reporduce the deadlock after 6.1, and
>>>>> it's true it's not easy to reporduce, see below:
>>>>>
>>>>> https://lore.kernel.org/linux-raid/e9067438-d713-f5f3-0d3d-9e6b0e9efa0e@huaweicloud.com/
>>>>>
>>>>> My guess is that 6.1 is harder to reporduce than 5.10 due to some
>>>>> changes inside raid456.
>>>>>
>>>>> By the way, raid10 had a similiar deadlock, and can be fixed the same
>>>>> way, so it make sense to backport these patches.
>>>>>
>>>>> https://lore.kernel.org/r/20230529132037.2124527-5-yukuai1@huaweicloud.com
>>>>>
>>>>> Thanks,
>>>>> Kuai
>>>>>
>>>>>
>>>>>> On 5.10 I can reproduce it within minutes to an hour.
>>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>>>
>>
>>
>

--
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
On 9/14/23 08:03, Donald Buczek wrote:
> On 9/13/23 16:16, Dragan Stancevic wrote:
>> Hi Donald-
>> [...]
>> Here is a list of changes for 6.1:
>>
>> e5e9b9cb71a0 md: factor out a helper to wake up md_thread directly
>> f71209b1f21c md: enhance checking in md_check_recovery()
>> 753260ed0b46 md: wake up 'resync_wait' at last in md_reap_sync_thread()
>> 130443d60b1b md: refactor idle/frozen_sync_thread() to fix deadlock
>> 6f56f0c4f124 md: add a mutex to synchronize idle and frozen in action_store()
>> 64e5e09afc14 md: refactor action_store() for 'idle' and 'frozen'
>> a865b96c513b Revert "md: unlock mddev before reap sync_thread in action_store"
>
> Thanks!
>
> I've put these patches on v6.1.52. I've started a script which transitions the three md-devices of a very active backup server through idle->check->idle every 6 minutes a few ours ago.  It went through ~400 iterations till now. No lock-ups so far.

Oh dear, looks like the deadlock problem is _not_fixed with these patches.

We've had a lockup again after ~3 days of operation. Again, the `echo idle > $sys/md/sync_action` is hanging:

# # /proc/70554/task/70554: mdcheck.safe : /bin/bash /usr/bin/mdcheck.safe --continue --duration 06:00
# cat /proc/70554/task/70554/stack

[<0>] action_store+0x17f/0x390
[<0>] md_attr_store+0x83/0xf0
[<0>] kernfs_fop_write_iter+0x117/0x1b0
[<0>] vfs_write+0x2ce/0x400
[<0>] ksys_write+0x5f/0xe0
[<0>] do_syscall_64+0x43/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce

And everything else going to that specific raid (md0) is dead, too. No task is busy looping.

So as it looks now, we cant go from 5.15.X to 6.1.X as we would like to do. These patches don't fix the problem and our own patch no longer works with 6.1. Unfortunately, this happened on a production system which I need to reboot and is not available for further analysis. We'd need to reproduce the problem on a dedicated machine to really work on it.

Here's some more possibly interesting procfs output and some examples of tasks.

/sys/devices/virtual/block/md0/inflight : 0 3936

#/proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid6] [raid5] [raid4] [multipath]
md1 : active raid6 sdae[0] sdad[15] sdac[14] sdab[13] sdaa[12] sdz[11] sdy[10] sdx[9] sdw[8] sdv[7] sdu[6] sdt[5] sds[4] sdah[3] sdag[2] sdaf[1]
109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
bitmap: 0/59 pages [0KB], 65536KB chunk

md0 : active raid6 sdc[0] sdr[17] sdq[16] sdp[13] sdo[12] sdn[11] sdm[10] sdl[9] sdk[8] sdj[7] sdi[6] sdh[5] sdg[4] sdf[3] sde[2] sdd[1]
109394518016 blocks super 1.2 level 6, 512k chunk, algorithm 2 [16/16] [UUUUUUUUUUUUUUUU]
[===>.................] check = 15.9% (1242830396/7813894144) finish=14788.4min speed=7405K/sec
bitmap: 53/59 pages [212KB], 65536KB chunk

unused devices: <none>

# # /proc/66024/task/66024: md0_resync :
# cat /proc/66024/task/66024/stack

[<0>] raid5_get_active_stripe+0x20f/0x4d0
[<0>] raid5_sync_request+0x38b/0x3b0
[<0>] md_do_sync.cold+0x40c/0x985
[<0>] md_thread+0xb1/0x160
[<0>] kthread+0xe7/0x110
[<0>] ret_from_fork+0x22/0x30

# # /proc/939/task/939: md0_raid6 :
# cat /proc/939/task/939/stack

[<0>] md_thread+0x12d/0x160
[<0>] kthread+0xe7/0x110
[<0>] ret_from_fork+0x22/0x30

# # /proc/1228/task/1228: xfsaild/md0 :
# cat /proc/1228/task/1228/stack

[<0>] raid5_get_active_stripe+0x20f/0x4d0
[<0>] raid5_make_request+0x24c/0x1170
[<0>] md_handle_request+0x131/0x220
[<0>] __submit_bio+0x89/0x130
[<0>] submit_bio_noacct_nocheck+0x160/0x360
[<0>] _xfs_buf_ioapply+0x26c/0x420
[<0>] __xfs_buf_submit+0x64/0x1d0
[<0>] xfs_buf_delwri_submit_buffers+0xc5/0x1e0
[<0>] xfsaild+0x2a0/0x880
[<0>] kthread+0xe7/0x110
[<0>] ret_from_fork+0x22/0x30

# # /proc/49747/task/49747: kworker/24:2+xfs-inodegc/md0 :
# cat /proc/49747/task/49747/stack

[<0>] xfs_buf_lock+0x35/0xf0
[<0>] xfs_buf_find_lock+0x45/0xf0
[<0>] xfs_buf_get_map+0x17d/0xa60
[<0>] xfs_buf_read_map+0x52/0x280
[<0>] xfs_trans_read_buf_map+0x115/0x350
[<0>] xfs_btree_read_buf_block.constprop.0+0x9a/0xd0
[<0>] xfs_btree_lookup_get_block+0x97/0x170
[<0>] xfs_btree_lookup+0xc4/0x4a0
[<0>] xfs_difree_finobt+0x62/0x250
[<0>] xfs_difree+0x130/0x1c0
[<0>] xfs_ifree+0x86/0x510
[<0>] xfs_inactive_ifree.isra.0+0xa2/0x1c0
[<0>] xfs_inactive+0xf8/0x170
[<0>] xfs_inodegc_worker+0x90/0x140
[<0>] process_one_work+0x1c7/0x3c0
[<0>] worker_thread+0x4d/0x3c0
[<0>] kthread+0xe7/0x110
[<0>] ret_from_fork+0x22/0x30

# # /proc/49844/task/49844: kworker/30:3+xfs-sync/md0 :
# cat /proc/49844/task/49844/stack

[<0>] __flush_workqueue+0x10e/0x390
[<0>] xlog_cil_push_now.isra.0+0x25/0x90
[<0>] xlog_cil_force_seq+0x7c/0x240
[<0>] xfs_log_force+0x83/0x240
[<0>] xfs_log_worker+0x3b/0xd0
[<0>] process_one_work+0x1c7/0x3c0
[<0>] worker_thread+0x4d/0x3c0
[<0>] kthread+0xe7/0x110
[<0>] ret_from_fork+0x22/0x30


# # /proc/52646/task/52646: kworker/u263:2+xfs-cil/md0 :
# cat /proc/52646/task/52646/stack

[<0>] raid5_get_active_stripe+0x20f/0x4d0
[<0>] raid5_make_request+0x24c/0x1170
[<0>] md_handle_request+0x131/0x220
[<0>] __submit_bio+0x89/0x130
[<0>] submit_bio_noacct_nocheck+0x160/0x360
[<0>] xlog_state_release_iclog+0xf6/0x1d0
[<0>] xlog_write_get_more_iclog_space+0x79/0xf0
[<0>] xlog_write+0x334/0x3b0
[<0>] xlog_cil_push_work+0x501/0x740
[<0>] process_one_work+0x1c7/0x3c0
[<0>] worker_thread+0x4d/0x3c0
[<0>] kthread+0xe7/0x110
[<0>] ret_from_fork+0x22/0x30

# # /proc/52753/task/52753: rm : rm -rf /project/pbackup_gone/data/C8029/home_Cyang/home_Cyang:202306011248:C3019.BEING_DELETED
# cat /proc/52753/task/52753/stack

[<0>] xfs_buf_lock+0x35/0xf0
[<0>] xfs_buf_find_lock+0x45/0xf0
[<0>] xfs_buf_get_map+0x17d/0xa60
[<0>] xfs_buf_read_map+0x52/0x280
[<0>] xfs_trans_read_buf_map+0x115/0x350
[<0>] xfs_read_agi+0x98/0x140
[<0>] xfs_iunlink+0x63/0x1f0
[<0>] xfs_remove+0x280/0x3a0
[<0>] xfs_vn_unlink+0x53/0xa0
[<0>] vfs_rmdir.part.0+0x5e/0x1e0
[<0>] do_rmdir+0x15c/0x1c0
[<0>] __x64_sys_unlinkat+0x4b/0x60
[<0>] do_syscall_64+0x43/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x64/0xce

Best
Donald

--
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
On 9/17/23 10:55, Donald Buczek wrote:
> On 9/14/23 08:03, Donald Buczek wrote:
>> On 9/13/23 16:16, Dragan Stancevic wrote:
>>> Hi Donald-
>>> [...]
>>> Here is a list of changes for 6.1:
>>>
>>> e5e9b9cb71a0 md: factor out a helper to wake up md_thread directly
>>> f71209b1f21c md: enhance checking in md_check_recovery()
>>> 753260ed0b46 md: wake up 'resync_wait' at last in md_reap_sync_thread()
>>> 130443d60b1b md: refactor idle/frozen_sync_thread() to fix deadlock
>>> 6f56f0c4f124 md: add a mutex to synchronize idle and frozen in action_store()
>>> 64e5e09afc14 md: refactor action_store() for 'idle' and 'frozen'
>>> a865b96c513b Revert "md: unlock mddev before reap sync_thread in action_store"
>>
>> Thanks!
>>
>> I've put these patches on v6.1.52. I've started a script which transitions the three md-devices of a very active backup server through idle->check->idle every 6 minutes a few ours ago.  It went through ~400 iterations till now. No lock-ups so far.
>
> Oh dear, looks like the deadlock problem is _not_fixed with these patches.

Some more info after another incident:

- We've hit the deadlock with 5.15.131 (so it is NOT introduced by any of the above patches)
- The symptoms are not exactly the same as with the original year-old problem. Differences:
- - mdX_raid6 is NOT busy looping
- - /sys/devices/virtual/block/mdX/md/array_state says "active" not "write pending"
- - `echo active > /sys/devices/virtual/block/mdX/md/array_state` does not resolve the deadlock
- - After hours in the deadlock state the system resumed operation when a script of mine read(!) lots of sysfs files.
- But in both cases, `echo idle > /sys/devices/virtual/block/mdX/md/sync_action` hangs as does all I/O operation on the raid.

The fact that we didn't hit the problem for many month on 5.15.94 might hint that it was introduced between 5.15.94 and 5.15.131

We'll try to reproduce the problem on a test machine for analysis, but this make take time (vacation imminent for one...).

But its not like these patches caused the problem. Any maybe they _did_ fix the original problem, as we didn't hit that one.

Best

Donald

--
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
Hi,

? 2023/09/24 22:35, Donald Buczek ??:
> On 9/17/23 10:55, Donald Buczek wrote:
>> On 9/14/23 08:03, Donald Buczek wrote:
>>> On 9/13/23 16:16, Dragan Stancevic wrote:
>>>> Hi Donald-
>>>> [...]
>>>> Here is a list of changes for 6.1:
>>>>
>>>> e5e9b9cb71a0 md: factor out a helper to wake up md_thread directly
>>>> f71209b1f21c md: enhance checking in md_check_recovery()
>>>> 753260ed0b46 md: wake up 'resync_wait' at last in md_reap_sync_thread()
>>>> 130443d60b1b md: refactor idle/frozen_sync_thread() to fix deadlock
>>>> 6f56f0c4f124 md: add a mutex to synchronize idle and frozen in
>>>> action_store()
>>>> 64e5e09afc14 md: refactor action_store() for 'idle' and 'frozen'
>>>> a865b96c513b Revert "md: unlock mddev before reap sync_thread in
>>>> action_store"
>>>
>>> Thanks!
>>>
>>> I've put these patches on v6.1.52. I've started a script which
>>> transitions the three md-devices of a very active backup server
>>> through idle->check->idle every 6 minutes a few ours ago.  It went
>>> through ~400 iterations till now. No lock-ups so far.
>>
>> Oh dear, looks like the deadlock problem is _not_fixed with these
>> patches.
>
> Some more info after another incident:
>
> - We've hit the deadlock with 5.15.131 (so it is NOT introduced by any
> of the above patches)
> - The symptoms are not exactly the same as with the original year-old
> problem. Differences:
> - - mdX_raid6 is NOT busy looping
> - - /sys/devices/virtual/block/mdX/md/array_state says "active" not
> "write pending"
> - - `echo active > /sys/devices/virtual/block/mdX/md/array_state` does
> not resolve the deadlock
> - - After hours in the deadlock state the system resumed operation when
> a script of mine read(!) lots of sysfs files.
> - But in both cases, `echo idle >
> /sys/devices/virtual/block/mdX/md/sync_action` hangs as does all I/O
> operation on the raid.
>
> The fact that we didn't hit the problem for many month on 5.15.94 might
> hint that it was introduced between 5.15.94 and 5.15.131
>
> We'll try to reproduce the problem on a test machine for analysis, but
> this make take time (vacation imminent for one...).
>
> But its not like these patches caused the problem. Any maybe they _did_
> fix the original problem, as we didn't hit that one.

Sorry for the late reply, yes, this looks like a different problem. I'm
pretty confident that the orignal problem is fixed since that echo
idle/frozen doesn't hold the lock 'reconfig_mutex' to wait for
sync_thread to be done.

I'll check patches between 5.15.94 and 5.15.131.

Thanks,
Kuai

>
> Best
>
>   Donald
>
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
On 9/25/23 03:11, Yu Kuai wrote:
> Hi,
>
> ? 2023/09/24 22:35, Donald Buczek ??:
>> On 9/17/23 10:55, Donald Buczek wrote:
>>> On 9/14/23 08:03, Donald Buczek wrote:
>>>> On 9/13/23 16:16, Dragan Stancevic wrote:
>>>>> Hi Donald-
>>>>> [...]
>>>>> Here is a list of changes for 6.1:
>>>>>
>>>>> e5e9b9cb71a0 md: factor out a helper to wake up md_thread directly
>>>>> f71209b1f21c md: enhance checking in md_check_recovery()
>>>>> 753260ed0b46 md: wake up 'resync_wait' at last in md_reap_sync_thread()
>>>>> 130443d60b1b md: refactor idle/frozen_sync_thread() to fix deadlock
>>>>> 6f56f0c4f124 md: add a mutex to synchronize idle and frozen in action_store()
>>>>> 64e5e09afc14 md: refactor action_store() for 'idle' and 'frozen'
>>>>> a865b96c513b Revert "md: unlock mddev before reap sync_thread in action_store"
>>>>
>>>> Thanks!
>>>>
>>>> I've put these patches on v6.1.52. I've started a script which transitions the three md-devices of a very active backup server through idle->check->idle every 6 minutes a few ours ago.  It went through ~400 iterations till now. No lock-ups so far.
>>>
>>> Oh dear, looks like the deadlock problem is _not_fixed with these patches.
>>
>> Some more info after another incident:
>>
>> - We've hit the deadlock with 5.15.131 (so it is NOT introduced by any of the above patches)
>> - The symptoms are not exactly the same as with the original year-old problem. Differences:
>> - - mdX_raid6 is NOT busy looping
>> - - /sys/devices/virtual/block/mdX/md/array_state says "active" not "write pending"
>> - - `echo active > /sys/devices/virtual/block/mdX/md/array_state` does not resolve the deadlock
>> - - After hours in the deadlock state the system resumed operation when a script of mine read(!) lots of sysfs files.
>> - But in both cases, `echo idle > /sys/devices/virtual/block/mdX/md/sync_action` hangs as does all I/O operation on the raid.
>>
>> The fact that we didn't hit the problem for many month on 5.15.94 might hint that it was introduced between 5.15.94 and 5.15.131
>>
>> We'll try to reproduce the problem on a test machine for analysis, but this make take time (vacation imminent for one...).
>>
>> But its not like these patches caused the problem. Any maybe they _did_ fix the original problem, as we didn't hit that one.
>
> Sorry for the late reply, yes, this looks like a different problem. I'm
> pretty confident that the orignal problem is fixed since that echo
> idle/frozen doesn't hold the lock 'reconfig_mutex' to wait for
> sync_thread to be done.
>
> I'll check patches between 5.15.94 and 5.15.131.

We've got another event today. Some more information to save you work. I'm sorry, this comes dripping in, but as I said, currently we can't reproduce it and hit it on production machines only, where we have limited time to analyze:

* In the last two events, "echo idle > sys/devices/virtual/block/mdX/md/sync_action" was not even executing. This is not a trigger, but was a random victim when it happened the first time. This deceived me to believe this is some variation of the old problem.

* It's not filesystem related, yesterday `blkid -o value -s LABEL /dev/md1` was hanging, too, and today, for example, `df`.

* /sys/devices/virtual/block/md0/inflight today was (frozen at) "2 579"

* iotop showed no disk activity (on the raid) at all. Only a single member device had activity from time to time (usually after ~30 seconds, but sometimes after a few seconds) with usually 1-4 tps, but sometimes more, max 136 tps.

* As I said, I use a script to take a snapshot of various /sys and /proc information and running this script resolved the deadlock twice.

* The recorded stack traces of mdX_raid6 of the hanging raid recorded in the two events were

[<0>] md_bitmap_unplug.part.0+0xce/0x100
[<0>] raid5d+0xe4/0x5a0
[<0>] md_thread+0xab/0x160
[<0>] kthread+0x127/0x150
[<0>] ret_from_fork+0x22/0x30

and

[<0>] md_super_wait+0x72/0xa0
[<0>] md_bitmap_unplug.part.0+0xce/0x100
[<0>] raid5d+0xe4/0x5a0
[<0>] md_thread+0xab/0x160
[<0>] kthread+0x127/0x150
[<0>] ret_from_fork+0x22/0x30

But note, that these probably were taken after the previous commands in the script already unfroze the system. Today I've manually looked at the stack while the system was still frozen, and it was just

[<0>] md_thread+0x122/0x160
[<0>] kthread+0x127/0x150
[<0>] ret_from_fork+0x22/0x30

* Because I knew that my script seems to unblock the system, I've run it slowly line by line to see what actually unfreezes the system. There is one loop which takes "comm" "cmdline" and "stack" of all threads:

for task in /proc/*/task/*; do
echo "# # $task: $(cat $task/comm) : $(cat $task/cmdline | xargs -0 echo)"
cmd cat $task/stack
done

I've added a few "read" to single-step it. Unfortunately, when it came to the 64 nfsd threads, I've got a bit impatient and hit "return" faster then I should have and when the unfreeze happened, I couldn't say exactly were it was triggered. But it must have been somewhere in this tail:

# # /proc/1299/task/1299: nfsd

[<0>] svc_recv+0x7a7/0x8c0 [sunrpc]
[<0>] nfsd+0xd6/0x140 [nfsd]
[<0>] kthread+0x127/0x150
[<0>] ret_from_fork+0x22/0x30

# # /proc/13/task/13: ksoftirqd/0

[<0>] smpboot_thread_fn+0xf3/0x140
[<0>] kthread+0x127/0x150
[<0>] ret_from_fork+0x22/0x30

# # /proc/130/task/130: cpuhp/22

[<0>] smpboot_thread_fn+0xf3/0x140
[<0>] kthread+0x127/0x150
[<0>] ret_from_fork+0x22/0x30

# # /proc/1300/task/1300: nfsd

[<0>] svc_recv+0x7a7/0x8c0 [sunrpc]
[<0>] nfsd+0xd6/0x140 [nfsd]
[<0>] kthread+0x127/0x150
[<0>] ret_from_fork+0x22/0x30

## (3 more repetitions of other nfsd threads which exactly the same stack skipped here ##

So it appears, that possibly a cat /proc/PID/stack of a "ksoftirqd" or a (maybe) a "cpuhp" thread unblocks the system. "nfsd" seems unlikely, as there shouldn't and wasn't anything nfs-mounted from this system.

Conclusion: This is probably not related to mdraid at all and might be a problem of the block or some infrastructure subsystem. Do you agree?

Best

Donald
--
Donald Buczek
buczek@molgen.mpg.de
Tel: +49 30 8413 1433
Re: md_raid: mdX_raid6 looping after sync_action "check" to "idle" transition [ In reply to ]
Hi,

? 2023/09/25 17:11, Donald Buczek ??:
> On 9/25/23 03:11, Yu Kuai wrote:
>> Hi,
>>
>> ? 2023/09/24 22:35, Donald Buczek ??:
>>> On 9/17/23 10:55, Donald Buczek wrote:
>>>> On 9/14/23 08:03, Donald Buczek wrote:
>>>>> On 9/13/23 16:16, Dragan Stancevic wrote:
>>>>>> Hi Donald-
>>>>>> [...]
>>>>>> Here is a list of changes for 6.1:
>>>>>>
>>>>>> e5e9b9cb71a0 md: factor out a helper to wake up md_thread directly
>>>>>> f71209b1f21c md: enhance checking in md_check_recovery()
>>>>>> 753260ed0b46 md: wake up 'resync_wait' at last in
>>>>>> md_reap_sync_thread()
>>>>>> 130443d60b1b md: refactor idle/frozen_sync_thread() to fix deadlock
>>>>>> 6f56f0c4f124 md: add a mutex to synchronize idle and frozen in
>>>>>> action_store()
>>>>>> 64e5e09afc14 md: refactor action_store() for 'idle' and 'frozen'
>>>>>> a865b96c513b Revert "md: unlock mddev before reap sync_thread in
>>>>>> action_store"
>>>>>
>>>>> Thanks!
>>>>>
>>>>> I've put these patches on v6.1.52. I've started a script which
>>>>> transitions the three md-devices of a very active backup server
>>>>> through idle->check->idle every 6 minutes a few ours ago.  It went
>>>>> through ~400 iterations till now. No lock-ups so far.
>>>>
>>>> Oh dear, looks like the deadlock problem is _not_fixed with these
>>>> patches.
>>>
>>> Some more info after another incident:
>>>
>>> - We've hit the deadlock with 5.15.131 (so it is NOT introduced by
>>> any of the above patches)
>>> - The symptoms are not exactly the same as with the original year-old
>>> problem. Differences:
>>> - - mdX_raid6 is NOT busy looping
>>> - - /sys/devices/virtual/block/mdX/md/array_state says "active" not
>>> "write pending"
>>> - - `echo active > /sys/devices/virtual/block/mdX/md/array_state`
>>> does not resolve the deadlock
>>> - - After hours in the deadlock state the system resumed operation
>>> when a script of mine read(!) lots of sysfs files.
>>> - But in both cases, `echo idle >
>>> /sys/devices/virtual/block/mdX/md/sync_action` hangs as does all I/O
>>> operation on the raid.
>>>
>>> The fact that we didn't hit the problem for many month on 5.15.94
>>> might hint that it was introduced between 5.15.94 and 5.15.131
>>>
>>> We'll try to reproduce the problem on a test machine for analysis,
>>> but this make take time (vacation imminent for one...).
>>>
>>> But its not like these patches caused the problem. Any maybe they
>>> _did_ fix the original problem, as we didn't hit that one.
>>
>> Sorry for the late reply, yes, this looks like a different problem. I'm
>> pretty confident that the orignal problem is fixed since that echo
>> idle/frozen doesn't hold the lock 'reconfig_mutex' to wait for
>> sync_thread to be done.
>>
>> I'll check patches between 5.15.94 and 5.15.131.
>
> We've got another event today. Some more information to save you work.
> I'm sorry, this comes dripping in, but as I said, currently we can't
> reproduce it and hit it on production machines only, where we have
> limited time to analyze:

There is a way to clarify if io is stuck in underlying disks:

Once the problem is triggered and there are no disk activity:

cat /sys/kernel/debug/block/[disk]/hctx*/sched_tags | grep busy
cat /sys/kernel/debug/block/[disk]/hctx*/tags | grep busy

If busy is not 0, means that io is stuck in underlying disk, then this
problem is not related to raid, otherwise raid doesn't issue any io to
underlyiung dikss and this problem is related to raid.

>
> * In the last two events, "echo idle >
> sys/devices/virtual/block/mdX/md/sync_action" was not even executing.
> This is not a trigger, but was a random victim when it happened the
> first time. This deceived me to believe this is some variation of the
> old problem.
>
> * It's not filesystem related, yesterday `blkid -o value -s LABEL
> /dev/md1` was hanging, too, and today, for example, `df`.
>
> * /sys/devices/virtual/block/md0/inflight today was (frozen at) "2
> 579"
>
> * iotop showed no disk activity (on the raid) at all. Only a single
> member device had activity from time to time (usually after ~30 seconds,
> but sometimes after a few seconds) with usually 1-4 tps, but sometimes
> more, max 136 tps.
>
> * As I said, I use a script to take a snapshot of various /sys and /proc
> information and running this script resolved the deadlock twice.
>
> * The recorded stack traces of mdX_raid6 of the hanging raid recorded in
> the two events were
>
>     [<0>] md_bitmap_unplug.part.0+0xce/0x100
>     [<0>] raid5d+0xe4/0x5a0
>     [<0>] md_thread+0xab/0x160
>     [<0>] kthread+0x127/0x150
>     [<0>] ret_from_fork+0x22/0x30
>
> and
>
>     [<0>] md_super_wait+0x72/0xa0
>     [<0>] md_bitmap_unplug.part.0+0xce/0x100
>     [<0>] raid5d+0xe4/0x5a0
>     [<0>] md_thread+0xab/0x160
>     [<0>] kthread+0x127/0x150
>     [<0>] ret_from_fork+0x22/0x30

Above stack shows that raid issue bitmap io to underlying disk and is
waiting for such io to be done, except for bitmap io is broken in raid,
this problem should not related to raid, above debugfs can help to
clarify this.

Thanks,
Kuai

>
> But note, that these probably were taken after the previous commands in
> the script already unfroze the system. Today I've manually looked at the
> stack while the system was still frozen, and it was just
>
>     [<0>] md_thread+0x122/0x160
>     [<0>] kthread+0x127/0x150
>     [<0>] ret_from_fork+0x22/0x30
>
> * Because I knew that my script seems to unblock the system, I've run it
> slowly line by line to see what actually unfreezes the system. There is
> one loop which takes "comm" "cmdline" and "stack" of all threads:
>
>     for task in /proc/*/task/*; do
>         echo  "# # $task: $(cat $task/comm) : $(cat $task/cmdline |
> xargs -0 echo)"
>         cmd cat $task/stack
>     done
>
> I've added a few "read" to single-step it. Unfortunately, when it came
> to the 64 nfsd threads, I've got a bit impatient and hit "return" faster
> then I should have and when the unfreeze happened, I couldn't say
> exactly were it was triggered. But it must have been somewhere in this
> tail:
>
> # # /proc/1299/task/1299: nfsd
>
> [<0>] svc_recv+0x7a7/0x8c0 [sunrpc]
> [<0>] nfsd+0xd6/0x140 [nfsd]
> [<0>] kthread+0x127/0x150
> [<0>] ret_from_fork+0x22/0x30
>
> # # /proc/13/task/13: ksoftirqd/0
>
> [<0>] smpboot_thread_fn+0xf3/0x140
> [<0>] kthread+0x127/0x150
> [<0>] ret_from_fork+0x22/0x30
>
> # # /proc/130/task/130: cpuhp/22
>
> [<0>] smpboot_thread_fn+0xf3/0x140
> [<0>] kthread+0x127/0x150
> [<0>] ret_from_fork+0x22/0x30
>
> # # /proc/1300/task/1300: nfsd
>
> [<0>] svc_recv+0x7a7/0x8c0 [sunrpc]
> [<0>] nfsd+0xd6/0x140 [nfsd]
> [<0>] kthread+0x127/0x150
> [<0>] ret_from_fork+0x22/0x30
>
> ## (3 more repetitions of other nfsd threads which exactly the same
> stack skipped here ##
>
> So it appears, that possibly a cat /proc/PID/stack of a "ksoftirqd" or a
> (maybe) a "cpuhp" thread unblocks the system. "nfsd" seems unlikely, as
> there shouldn't and wasn't anything nfs-mounted from this system.
>
> Conclusion: This is probably not related to mdraid at all and might be a
> problem of the block or some infrastructure subsystem. Do you agree?
>
> Best
>
>   Donald

1 2  View All