Mailing List Archive: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex

[PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex

linmiaohe at huawei

Jul 29, 2021, 6:18 AM

Post #1 of 18 (435 views)

Since percpu_charge_mutex is only used inside drain_all_stock(), we can
narrow the scope of percpu_charge_mutex by moving it here.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6580c2381a3e..a03e24e57cd9 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2050,7 +2050,6 @@ struct memcg_stock_pcp {
#define FLUSHING_CACHED_CHARGE 0
};
static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
-static DEFINE_MUTEX(percpu_charge_mutex);

#ifdef CONFIG_MEMCG_KMEM
static void drain_obj_stock(struct obj_stock *stock);
@@ -2209,6 +2208,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
*/
static void drain_all_stock(struct mem_cgroup *root_memcg)
{
+ static DEFINE_MUTEX(percpu_charge_mutex);
int cpu, curcpu;

/* If someone's already draining, avoid adding running more workers. */
--
2.23.0

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

songmuchun at bytedance

Jul 29, 2021, 8:15 PM

Post #2 of 18 (435 views)

On Thu, Jul 29, 2021 at 8:58 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> Since percpu_charge_mutex is only used inside drain_all_stock(), we can
> narrow the scope of percpu_charge_mutex by moving it here.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

LGTM.

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

Jul 29, 2021, 8:15 PM

Post #3 of 18 (435 views)

On Thu, Jul 29, 2021 at 08:57:52PM +0800, Miaohe Lin wrote:
> Since percpu_charge_mutex is only used inside drain_all_stock(), we can
> narrow the scope of percpu_charge_mutex by moving it here.
>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
> mm/memcontrol.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6580c2381a3e..a03e24e57cd9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2050,7 +2050,6 @@ struct memcg_stock_pcp {
> #define FLUSHING_CACHED_CHARGE 0
> };
> static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> -static DEFINE_MUTEX(percpu_charge_mutex);
>
> #ifdef CONFIG_MEMCG_KMEM
> static void drain_obj_stock(struct obj_stock *stock);
> @@ -2209,6 +2208,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> */
> static void drain_all_stock(struct mem_cgroup *root_memcg)
> {
> + static DEFINE_MUTEX(percpu_charge_mutex);
> int cpu, curcpu;

It's considered a good practice to protect data instead of code paths. After
the proposed change it becomes obvious that the opposite is done here: the mutex
is used to prevent a simultaneous execution of the code of the drain_all_stock()
function.

Actually we don't need a mutex here: nobody ever sleeps on it. So I'd replace
it with a simple atomic variable or even a single bitfield. Then the change will
be better justified, IMO.

Thanks!

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

Jul 30, 2021, 12:15 AM

Post #4 of 18 (435 views)

On Thu 29-07-21 20:57:52, Miaohe Lin wrote:
> Since percpu_charge_mutex is only used inside drain_all_stock(), we can
> narrow the scope of percpu_charge_mutex by moving it here.

Makes sense and this is usually my preference as well. We used to have
other caller back then so I couldn't.

> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
> mm/memcontrol.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6580c2381a3e..a03e24e57cd9 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2050,7 +2050,6 @@ struct memcg_stock_pcp {
> #define FLUSHING_CACHED_CHARGE 0
> };
> static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> -static DEFINE_MUTEX(percpu_charge_mutex);
>
> #ifdef CONFIG_MEMCG_KMEM
> static void drain_obj_stock(struct obj_stock *stock);
> @@ -2209,6 +2208,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> */
> static void drain_all_stock(struct mem_cgroup *root_memcg)
> {
> + static DEFINE_MUTEX(percpu_charge_mutex);
> int cpu, curcpu;
>
> /* If someone's already draining, avoid adding running more workers. */
> --
> 2.23.0

--
Michal Hocko
SUSE Labs

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

Jul 30, 2021, 12:15 AM

Post #5 of 18 (435 views)

On Thu 29-07-21 20:06:45, Roman Gushchin wrote:
> On Thu, Jul 29, 2021 at 08:57:52PM +0800, Miaohe Lin wrote:
> > Since percpu_charge_mutex is only used inside drain_all_stock(), we can
> > narrow the scope of percpu_charge_mutex by moving it here.
> >
> > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> > ---
> > mm/memcontrol.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 6580c2381a3e..a03e24e57cd9 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2050,7 +2050,6 @@ struct memcg_stock_pcp {
> > #define FLUSHING_CACHED_CHARGE 0
> > };
> > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> > -static DEFINE_MUTEX(percpu_charge_mutex);
> >
> > #ifdef CONFIG_MEMCG_KMEM
> > static void drain_obj_stock(struct obj_stock *stock);
> > @@ -2209,6 +2208,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> > */
> > static void drain_all_stock(struct mem_cgroup *root_memcg)
> > {
> > + static DEFINE_MUTEX(percpu_charge_mutex);
> > int cpu, curcpu;
>
> It's considered a good practice to protect data instead of code paths. After
> the proposed change it becomes obvious that the opposite is done here: the mutex
> is used to prevent a simultaneous execution of the code of the drain_all_stock()
> function.

The purpose of the lock was indeed to orchestrate callers more than any
data structure consistency.

> Actually we don't need a mutex here: nobody ever sleeps on it. So I'd replace
> it with a simple atomic variable or even a single bitfield. Then the change will
> be better justified, IMO.

Yes, mutex can be replaced by an atomic in a follow up patch.
--
Michal Hocko
SUSE Labs

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

linmiaohe at huawei

Jul 31, 2021, 12:15 AM

Post #6 of 18 (434 views)

On 2021/7/30 14:50, Michal Hocko wrote:
> On Thu 29-07-21 20:06:45, Roman Gushchin wrote:
>> On Thu, Jul 29, 2021 at 08:57:52PM +0800, Miaohe Lin wrote:
>>> Since percpu_charge_mutex is only used inside drain_all_stock(), we can
>>> narrow the scope of percpu_charge_mutex by moving it here.
>>>
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>> mm/memcontrol.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index 6580c2381a3e..a03e24e57cd9 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -2050,7 +2050,6 @@ struct memcg_stock_pcp {
>>> #define FLUSHING_CACHED_CHARGE 0
>>> };
>>> static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
>>> -static DEFINE_MUTEX(percpu_charge_mutex);
>>>
>>> #ifdef CONFIG_MEMCG_KMEM
>>> static void drain_obj_stock(struct obj_stock *stock);
>>> @@ -2209,6 +2208,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>>> */
>>> static void drain_all_stock(struct mem_cgroup *root_memcg)
>>> {
>>> + static DEFINE_MUTEX(percpu_charge_mutex);
>>> int cpu, curcpu;
>>
>> It's considered a good practice to protect data instead of code paths. After
>> the proposed change it becomes obvious that the opposite is done here: the mutex
>> is used to prevent a simultaneous execution of the code of the drain_all_stock()
>> function.
>
> The purpose of the lock was indeed to orchestrate callers more than any
> data structure consistency.
>
>> Actually we don't need a mutex here: nobody ever sleeps on it. So I'd replace
>> it with a simple atomic variable or even a single bitfield. Then the change will
>> be better justified, IMO.
>
> Yes, mutex can be replaced by an atomic in a follow up patch.
>

Thanks for both of you. It's a really good suggestion. What do you mean is something like below?

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 616d1a72ece3..508a96e80980 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2208,11 +2208,11 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
*/
static void drain_all_stock(struct mem_cgroup *root_memcg)
{
- static DEFINE_MUTEX(percpu_charge_mutex);
int cpu, curcpu;
+ static atomic_t drain_all_stocks = ATOMIC_INIT(-1);

/* If someone's already draining, avoid adding running more workers. */
- if (!mutex_trylock(&percpu_charge_mutex))
+ if (!atomic_inc_not_zero(&drain_all_stocks))
return;
/*
* Notify other cpus that system-wide "drain" is running
@@ -2244,7 +2244,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
}
}
put_cpu();
- mutex_unlock(&percpu_charge_mutex);
+ atomic_dec(&drain_all_stocks);
}

static int memcg_hotplug_cpu_dead(unsigned int cpu)

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

Aug 2, 2021, 12:15 AM

Post #7 of 18 (434 views)

On Sat 31-07-21 10:29:52, Miaohe Lin wrote:
> On 2021/7/30 14:50, Michal Hocko wrote:
> > On Thu 29-07-21 20:06:45, Roman Gushchin wrote:
> >> On Thu, Jul 29, 2021 at 08:57:52PM +0800, Miaohe Lin wrote:
> >>> Since percpu_charge_mutex is only used inside drain_all_stock(), we can
> >>> narrow the scope of percpu_charge_mutex by moving it here.
> >>>
> >>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >>> ---
> >>> mm/memcontrol.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >>> index 6580c2381a3e..a03e24e57cd9 100644
> >>> --- a/mm/memcontrol.c
> >>> +++ b/mm/memcontrol.c
> >>> @@ -2050,7 +2050,6 @@ struct memcg_stock_pcp {
> >>> #define FLUSHING_CACHED_CHARGE 0
> >>> };
> >>> static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> >>> -static DEFINE_MUTEX(percpu_charge_mutex);
> >>>
> >>> #ifdef CONFIG_MEMCG_KMEM
> >>> static void drain_obj_stock(struct obj_stock *stock);
> >>> @@ -2209,6 +2208,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> >>> */
> >>> static void drain_all_stock(struct mem_cgroup *root_memcg)
> >>> {
> >>> + static DEFINE_MUTEX(percpu_charge_mutex);
> >>> int cpu, curcpu;
> >>
> >> It's considered a good practice to protect data instead of code paths. After
> >> the proposed change it becomes obvious that the opposite is done here: the mutex
> >> is used to prevent a simultaneous execution of the code of the drain_all_stock()
> >> function.
> >
> > The purpose of the lock was indeed to orchestrate callers more than any
> > data structure consistency.
> >
> >> Actually we don't need a mutex here: nobody ever sleeps on it. So I'd replace
> >> it with a simple atomic variable or even a single bitfield. Then the change will
> >> be better justified, IMO.
> >
> > Yes, mutex can be replaced by an atomic in a follow up patch.
> >
>
> Thanks for both of you. It's a really good suggestion. What do you mean is something like below?
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 616d1a72ece3..508a96e80980 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2208,11 +2208,11 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> */
> static void drain_all_stock(struct mem_cgroup *root_memcg)
> {
> - static DEFINE_MUTEX(percpu_charge_mutex);
> int cpu, curcpu;
> + static atomic_t drain_all_stocks = ATOMIC_INIT(-1);
> /* If someone's already draining, avoid adding running more workers. */
> - if (!mutex_trylock(&percpu_charge_mutex))
> + if (!atomic_inc_not_zero(&drain_all_stocks))
> return;
> /*
> * Notify other cpus that system-wide "drain" is running
> @@ -2244,7 +2244,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
> }
> }
> put_cpu();
> - mutex_unlock(&percpu_charge_mutex);
> + atomic_dec(&drain_all_stocks);

Yes this would work. I would just s@drain_all_stocks@drainers@ or
something similar to better express the intention.

> }
>
> static int memcg_hotplug_cpu_dead(unsigned int cpu)

--
Michal Hocko
SUSE Labs

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

linmiaohe at huawei

Aug 2, 2021, 6:15 AM

Post #8 of 18 (434 views)

On 2021/8/2 14:49, Michal Hocko wrote:
> On Sat 31-07-21 10:29:52, Miaohe Lin wrote:
>> On 2021/7/30 14:50, Michal Hocko wrote:
>>> On Thu 29-07-21 20:06:45, Roman Gushchin wrote:
>>>> On Thu, Jul 29, 2021 at 08:57:52PM +0800, Miaohe Lin wrote:
>>>>> Since percpu_charge_mutex is only used inside drain_all_stock(), we can
>>>>> narrow the scope of percpu_charge_mutex by moving it here.
>>>>>
>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>>> ---
>>>>> mm/memcontrol.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>>>> index 6580c2381a3e..a03e24e57cd9 100644
>>>>> --- a/mm/memcontrol.c
>>>>> +++ b/mm/memcontrol.c
>>>>> @@ -2050,7 +2050,6 @@ struct memcg_stock_pcp {
>>>>> #define FLUSHING_CACHED_CHARGE 0
>>>>> };
>>>>> static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
>>>>> -static DEFINE_MUTEX(percpu_charge_mutex);
>>>>>
>>>>> #ifdef CONFIG_MEMCG_KMEM
>>>>> static void drain_obj_stock(struct obj_stock *stock);
>>>>> @@ -2209,6 +2208,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>>>>> */
>>>>> static void drain_all_stock(struct mem_cgroup *root_memcg)
>>>>> {
>>>>> + static DEFINE_MUTEX(percpu_charge_mutex);
>>>>> int cpu, curcpu;
>>>>
>>>> It's considered a good practice to protect data instead of code paths. After
>>>> the proposed change it becomes obvious that the opposite is done here: the mutex
>>>> is used to prevent a simultaneous execution of the code of the drain_all_stock()
>>>> function.
>>>
>>> The purpose of the lock was indeed to orchestrate callers more than any
>>> data structure consistency.
>>>
>>>> Actually we don't need a mutex here: nobody ever sleeps on it. So I'd replace
>>>> it with a simple atomic variable or even a single bitfield. Then the change will
>>>> be better justified, IMO.
>>>
>>> Yes, mutex can be replaced by an atomic in a follow up patch.
>>>
>>
>> Thanks for both of you. It's a really good suggestion. What do you mean is something like below?
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 616d1a72ece3..508a96e80980 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2208,11 +2208,11 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>> */
>> static void drain_all_stock(struct mem_cgroup *root_memcg)
>> {
>> - static DEFINE_MUTEX(percpu_charge_mutex);
>> int cpu, curcpu;
>> + static atomic_t drain_all_stocks = ATOMIC_INIT(-1);
>> /* If someone's already draining, avoid adding running more workers. */
>> - if (!mutex_trylock(&percpu_charge_mutex))
>> + if (!atomic_inc_not_zero(&drain_all_stocks))
>> return;
>> /*
>> * Notify other cpus that system-wide "drain" is running
>> @@ -2244,7 +2244,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
>> }
>> }
>> put_cpu();
>> - mutex_unlock(&percpu_charge_mutex);
>> + atomic_dec(&drain_all_stocks);
>
> Yes this would work. I would just s@drain_all_stocks@drainers@ or
> something similar to better express the intention.
>

Sounds good. Will do it in v2. Many thanks.

>> }
>>
>> static int memcg_hotplug_cpu_dead(unsigned int cpu)
>

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

Aug 2, 2021, 10:15 PM

Post #9 of 18 (434 views)

On Sat, Jul 31, 2021 at 10:29:52AM +0800, Miaohe Lin wrote:
> On 2021/7/30 14:50, Michal Hocko wrote:
> > On Thu 29-07-21 20:06:45, Roman Gushchin wrote:
> >> On Thu, Jul 29, 2021 at 08:57:52PM +0800, Miaohe Lin wrote:
> >>> Since percpu_charge_mutex is only used inside drain_all_stock(), we can
> >>> narrow the scope of percpu_charge_mutex by moving it here.
> >>>
> >>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >>> ---
> >>> mm/memcontrol.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >>> index 6580c2381a3e..a03e24e57cd9 100644
> >>> --- a/mm/memcontrol.c
> >>> +++ b/mm/memcontrol.c
> >>> @@ -2050,7 +2050,6 @@ struct memcg_stock_pcp {
> >>> #define FLUSHING_CACHED_CHARGE 0
> >>> };
> >>> static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> >>> -static DEFINE_MUTEX(percpu_charge_mutex);
> >>>
> >>> #ifdef CONFIG_MEMCG_KMEM
> >>> static void drain_obj_stock(struct obj_stock *stock);
> >>> @@ -2209,6 +2208,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> >>> */
> >>> static void drain_all_stock(struct mem_cgroup *root_memcg)
> >>> {
> >>> + static DEFINE_MUTEX(percpu_charge_mutex);
> >>> int cpu, curcpu;
> >>
> >> It's considered a good practice to protect data instead of code paths. After
> >> the proposed change it becomes obvious that the opposite is done here: the mutex
> >> is used to prevent a simultaneous execution of the code of the drain_all_stock()
> >> function.
> >
> > The purpose of the lock was indeed to orchestrate callers more than any
> > data structure consistency.
> >
> >> Actually we don't need a mutex here: nobody ever sleeps on it. So I'd replace
> >> it with a simple atomic variable or even a single bitfield. Then the change will
> >> be better justified, IMO.
> >
> > Yes, mutex can be replaced by an atomic in a follow up patch.
> >
>
> Thanks for both of you. It's a really good suggestion. What do you mean is something like below?
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 616d1a72ece3..508a96e80980 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2208,11 +2208,11 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> */
> static void drain_all_stock(struct mem_cgroup *root_memcg)
> {
> - static DEFINE_MUTEX(percpu_charge_mutex);
> int cpu, curcpu;
> + static atomic_t drain_all_stocks = ATOMIC_INIT(-1);
>
> /* If someone's already draining, avoid adding running more workers. */
> - if (!mutex_trylock(&percpu_charge_mutex))
> + if (!atomic_inc_not_zero(&drain_all_stocks))
> return;

It should work, but why not a simple atomic_cmpxchg(&drain_all_stocks, 0, 1) and
initialize it to 0? Maybe it's just my preference, but IMO (0, 1) is easier
to understand than (-1, 0) here. Not a strong opinion though, up to you.

Thanks!

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

linmiaohe at huawei

Aug 3, 2021, 12:15 AM

Post #10 of 18 (434 views)

On 2021/8/3 11:40, Roman Gushchin wrote:
> On Sat, Jul 31, 2021 at 10:29:52AM +0800, Miaohe Lin wrote:
>> On 2021/7/30 14:50, Michal Hocko wrote:
>>> On Thu 29-07-21 20:06:45, Roman Gushchin wrote:
>>>> On Thu, Jul 29, 2021 at 08:57:52PM +0800, Miaohe Lin wrote:
>>>>> Since percpu_charge_mutex is only used inside drain_all_stock(), we can
>>>>> narrow the scope of percpu_charge_mutex by moving it here.
>>>>>
>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>>> ---
>>>>> mm/memcontrol.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>>>> index 6580c2381a3e..a03e24e57cd9 100644
>>>>> --- a/mm/memcontrol.c
>>>>> +++ b/mm/memcontrol.c
>>>>> @@ -2050,7 +2050,6 @@ struct memcg_stock_pcp {
>>>>> #define FLUSHING_CACHED_CHARGE 0
>>>>> };
>>>>> static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
>>>>> -static DEFINE_MUTEX(percpu_charge_mutex);
>>>>>
>>>>> #ifdef CONFIG_MEMCG_KMEM
>>>>> static void drain_obj_stock(struct obj_stock *stock);
>>>>> @@ -2209,6 +2208,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>>>>> */
>>>>> static void drain_all_stock(struct mem_cgroup *root_memcg)
>>>>> {
>>>>> + static DEFINE_MUTEX(percpu_charge_mutex);
>>>>> int cpu, curcpu;
>>>>
>>>> It's considered a good practice to protect data instead of code paths. After
>>>> the proposed change it becomes obvious that the opposite is done here: the mutex
>>>> is used to prevent a simultaneous execution of the code of the drain_all_stock()
>>>> function.
>>>
>>> The purpose of the lock was indeed to orchestrate callers more than any
>>> data structure consistency.
>>>
>>>> Actually we don't need a mutex here: nobody ever sleeps on it. So I'd replace
>>>> it with a simple atomic variable or even a single bitfield. Then the change will
>>>> be better justified, IMO.
>>>
>>> Yes, mutex can be replaced by an atomic in a follow up patch.
>>>
>>
>> Thanks for both of you. It's a really good suggestion. What do you mean is something like below?
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 616d1a72ece3..508a96e80980 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2208,11 +2208,11 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>> */
>> static void drain_all_stock(struct mem_cgroup *root_memcg)
>> {
>> - static DEFINE_MUTEX(percpu_charge_mutex);
>> int cpu, curcpu;
>> + static atomic_t drain_all_stocks = ATOMIC_INIT(-1);
>>
>> /* If someone's already draining, avoid adding running more workers. */
>> - if (!mutex_trylock(&percpu_charge_mutex))
>> + if (!atomic_inc_not_zero(&drain_all_stocks))
>> return;
>
> It should work, but why not a simple atomic_cmpxchg(&drain_all_stocks, 0, 1) and
> initialize it to 0? Maybe it's just my preference, but IMO (0, 1) is easier
> to understand than (-1, 0) here. Not a strong opinion though, up to you.
>

I think this would improve the readability. What you mean is something like below ?

Many thanks.

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 616d1a72ece3..6210b1124929 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2208,11 +2208,11 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
*/
static void drain_all_stock(struct mem_cgroup *root_memcg)
{
- static DEFINE_MUTEX(percpu_charge_mutex);
int cpu, curcpu;
+ static atomic_t drainer = ATOMIC_INIT(0);

/* If someone's already draining, avoid adding running more workers. */
- if (!mutex_trylock(&percpu_charge_mutex))
+ if (atomic_cmpxchg(&drainer, 0, 1) != 0)
return;
/*
* Notify other cpus that system-wide "drain" is running
@@ -2244,7 +2244,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
}
}
put_cpu();
- mutex_unlock(&percpu_charge_mutex);
+ atomic_set(&drainer, 0);
}

> Thanks!
> .
>

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

Aug 3, 2021, 12:15 AM

Post #11 of 18 (434 views)

On Tue 03-08-21 14:29:13, Miaohe Lin wrote:
[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 616d1a72ece3..6210b1124929 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2208,11 +2208,11 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> */
> static void drain_all_stock(struct mem_cgroup *root_memcg)
> {
> - static DEFINE_MUTEX(percpu_charge_mutex);
> int cpu, curcpu;
> + static atomic_t drainer = ATOMIC_INIT(0);
>
> /* If someone's already draining, avoid adding running more workers. */
> - if (!mutex_trylock(&percpu_charge_mutex))
> + if (atomic_cmpxchg(&drainer, 0, 1) != 0)
> return;
> /*
> * Notify other cpus that system-wide "drain" is running
> @@ -2244,7 +2244,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
> }
> }
> put_cpu();
> - mutex_unlock(&percpu_charge_mutex);
> + atomic_set(&drainer, 0);

atomic_set doesn't imply memory barrier IIRC. Is this safe?

--
Michal Hocko
SUSE Labs

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

Aug 3, 2021, 12:15 AM

Post #12 of 18 (434 views)

I’d go with atomic_dec().

Sent from my iPhone

> On Aug 3, 2021, at 00:11, Michal Hocko <mhocko@suse.com> wrote:
>
> ?On Tue 03-08-21 14:29:13, Miaohe Lin wrote:
> [...]
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 616d1a72ece3..6210b1124929 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2208,11 +2208,11 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>> */
>> static void drain_all_stock(struct mem_cgroup *root_memcg)
>> {
>> - static DEFINE_MUTEX(percpu_charge_mutex);
>> int cpu, curcpu;
>> + static atomic_t drainer = ATOMIC_INIT(0);
>>
>> /* If someone's already draining, avoid adding running more workers. */
>> - if (!mutex_trylock(&percpu_charge_mutex))
>> + if (atomic_cmpxchg(&drainer, 0, 1) != 0)
>> return;
>> /*
>> * Notify other cpus that system-wide "drain" is running
>> @@ -2244,7 +2244,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
>> }
>> }
>> put_cpu();
>> - mutex_unlock(&percpu_charge_mutex);
>> + atomic_set(&drainer, 0);
>
> atomic_set doesn't imply memory barrier IIRC. Is this safe?
>
> --
> Michal Hocko
> SUSE Labs

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

Aug 3, 2021, 1:05 AM

Post #13 of 18 (434 views)

On Tue 03-08-21 07:13:35, Roman Gushchin wrote:
> I’d go with atomic_dec().

which is not implying memory barriers either. You would need
atomic_dec_return or some other explicit barrier IIRC.
--
Michal Hocko
SUSE Labs

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

songmuchun at bytedance

Aug 3, 2021, 4:15 AM

Post #14 of 18 (434 views)

On Tue, Aug 3, 2021 at 2:29 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2021/8/3 11:40, Roman Gushchin wrote:
> > On Sat, Jul 31, 2021 at 10:29:52AM +0800, Miaohe Lin wrote:
> >> On 2021/7/30 14:50, Michal Hocko wrote:
> >>> On Thu 29-07-21 20:06:45, Roman Gushchin wrote:
> >>>> On Thu, Jul 29, 2021 at 08:57:52PM +0800, Miaohe Lin wrote:
> >>>>> Since percpu_charge_mutex is only used inside drain_all_stock(), we can
> >>>>> narrow the scope of percpu_charge_mutex by moving it here.
> >>>>>
> >>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >>>>> ---
> >>>>> mm/memcontrol.c | 2 +-
> >>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >>>>> index 6580c2381a3e..a03e24e57cd9 100644
> >>>>> --- a/mm/memcontrol.c
> >>>>> +++ b/mm/memcontrol.c
> >>>>> @@ -2050,7 +2050,6 @@ struct memcg_stock_pcp {
> >>>>> #define FLUSHING_CACHED_CHARGE 0
> >>>>> };
> >>>>> static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> >>>>> -static DEFINE_MUTEX(percpu_charge_mutex);
> >>>>>
> >>>>> #ifdef CONFIG_MEMCG_KMEM
> >>>>> static void drain_obj_stock(struct obj_stock *stock);
> >>>>> @@ -2209,6 +2208,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> >>>>> */
> >>>>> static void drain_all_stock(struct mem_cgroup *root_memcg)
> >>>>> {
> >>>>> + static DEFINE_MUTEX(percpu_charge_mutex);
> >>>>> int cpu, curcpu;
> >>>>
> >>>> It's considered a good practice to protect data instead of code paths. After
> >>>> the proposed change it becomes obvious that the opposite is done here: the mutex
> >>>> is used to prevent a simultaneous execution of the code of the drain_all_stock()
> >>>> function.
> >>>
> >>> The purpose of the lock was indeed to orchestrate callers more than any
> >>> data structure consistency.
> >>>
> >>>> Actually we don't need a mutex here: nobody ever sleeps on it. So I'd replace
> >>>> it with a simple atomic variable or even a single bitfield. Then the change will
> >>>> be better justified, IMO.
> >>>
> >>> Yes, mutex can be replaced by an atomic in a follow up patch.
> >>>
> >>
> >> Thanks for both of you. It's a really good suggestion. What do you mean is something like below?
> >>
> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >> index 616d1a72ece3..508a96e80980 100644
> >> --- a/mm/memcontrol.c
> >> +++ b/mm/memcontrol.c
> >> @@ -2208,11 +2208,11 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> >> */
> >> static void drain_all_stock(struct mem_cgroup *root_memcg)
> >> {
> >> - static DEFINE_MUTEX(percpu_charge_mutex);
> >> int cpu, curcpu;
> >> + static atomic_t drain_all_stocks = ATOMIC_INIT(-1);
> >>
> >> /* If someone's already draining, avoid adding running more workers. */
> >> - if (!mutex_trylock(&percpu_charge_mutex))
> >> + if (!atomic_inc_not_zero(&drain_all_stocks))
> >> return;
> >
> > It should work, but why not a simple atomic_cmpxchg(&drain_all_stocks, 0, 1) and
> > initialize it to 0? Maybe it's just my preference, but IMO (0, 1) is easier
> > to understand than (-1, 0) here. Not a strong opinion though, up to you.
> >
>
> I think this would improve the readability. What you mean is something like below ?
>
> Many thanks.
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 616d1a72ece3..6210b1124929 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -2208,11 +2208,11 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> */
> static void drain_all_stock(struct mem_cgroup *root_memcg)
> {
> - static DEFINE_MUTEX(percpu_charge_mutex);
> int cpu, curcpu;
> + static atomic_t drainer = ATOMIC_INIT(0);
>
> /* If someone's already draining, avoid adding running more workers. */
> - if (!mutex_trylock(&percpu_charge_mutex))
> + if (atomic_cmpxchg(&drainer, 0, 1) != 0)

I'd like to use atomic_cmpxchg_acquire() here.

> return;
> /*
> * Notify other cpus that system-wide "drain" is running
> @@ -2244,7 +2244,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
> }
> }
> put_cpu();
> - mutex_unlock(&percpu_charge_mutex);
> + atomic_set(&drainer, 0);

So use atomic_set_release() here to cooperate with
atomic_cmpxchg_acquire().

Thanks.

> }
>
> > Thanks!
> > .
> >
>

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

linmiaohe at huawei

Aug 3, 2021, 4:15 AM

Post #15 of 18 (434 views)

On 2021/8/3 17:33, Muchun Song wrote:
> On Tue, Aug 3, 2021 at 2:29 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2021/8/3 11:40, Roman Gushchin wrote:
>>> On Sat, Jul 31, 2021 at 10:29:52AM +0800, Miaohe Lin wrote:
>>>> On 2021/7/30 14:50, Michal Hocko wrote:
>>>>> On Thu 29-07-21 20:06:45, Roman Gushchin wrote:
>>>>>> On Thu, Jul 29, 2021 at 08:57:52PM +0800, Miaohe Lin wrote:
>>>>>>> Since percpu_charge_mutex is only used inside drain_all_stock(), we can
>>>>>>> narrow the scope of percpu_charge_mutex by moving it here.
>>>>>>>
>>>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>>>>> ---
>>>>>>> mm/memcontrol.c | 2 +-
>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>>>>>> index 6580c2381a3e..a03e24e57cd9 100644
>>>>>>> --- a/mm/memcontrol.c
>>>>>>> +++ b/mm/memcontrol.c
>>>>>>> @@ -2050,7 +2050,6 @@ struct memcg_stock_pcp {
>>>>>>> #define FLUSHING_CACHED_CHARGE 0
>>>>>>> };
>>>>>>> static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
>>>>>>> -static DEFINE_MUTEX(percpu_charge_mutex);
>>>>>>>
>>>>>>> #ifdef CONFIG_MEMCG_KMEM
>>>>>>> static void drain_obj_stock(struct obj_stock *stock);
>>>>>>> @@ -2209,6 +2208,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>>>>>>> */
>>>>>>> static void drain_all_stock(struct mem_cgroup *root_memcg)
>>>>>>> {
>>>>>>> + static DEFINE_MUTEX(percpu_charge_mutex);
>>>>>>> int cpu, curcpu;
>>>>>>
>>>>>> It's considered a good practice to protect data instead of code paths. After
>>>>>> the proposed change it becomes obvious that the opposite is done here: the mutex
>>>>>> is used to prevent a simultaneous execution of the code of the drain_all_stock()
>>>>>> function.
>>>>>
>>>>> The purpose of the lock was indeed to orchestrate callers more than any
>>>>> data structure consistency.
>>>>>
>>>>>> Actually we don't need a mutex here: nobody ever sleeps on it. So I'd replace
>>>>>> it with a simple atomic variable or even a single bitfield. Then the change will
>>>>>> be better justified, IMO.
>>>>>
>>>>> Yes, mutex can be replaced by an atomic in a follow up patch.
>>>>>
>>>>
>>>> Thanks for both of you. It's a really good suggestion. What do you mean is something like below?
>>>>
>>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>>> index 616d1a72ece3..508a96e80980 100644
>>>> --- a/mm/memcontrol.c
>>>> +++ b/mm/memcontrol.c
>>>> @@ -2208,11 +2208,11 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>>>> */
>>>> static void drain_all_stock(struct mem_cgroup *root_memcg)
>>>> {
>>>> - static DEFINE_MUTEX(percpu_charge_mutex);
>>>> int cpu, curcpu;
>>>> + static atomic_t drain_all_stocks = ATOMIC_INIT(-1);
>>>>
>>>> /* If someone's already draining, avoid adding running more workers. */
>>>> - if (!mutex_trylock(&percpu_charge_mutex))
>>>> + if (!atomic_inc_not_zero(&drain_all_stocks))
>>>> return;
>>>
>>> It should work, but why not a simple atomic_cmpxchg(&drain_all_stocks, 0, 1) and
>>> initialize it to 0? Maybe it's just my preference, but IMO (0, 1) is easier
>>> to understand than (-1, 0) here. Not a strong opinion though, up to you.
>>>
>>
>> I think this would improve the readability. What you mean is something like below ?
>>
>> Many thanks.
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 616d1a72ece3..6210b1124929 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -2208,11 +2208,11 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
>> */
>> static void drain_all_stock(struct mem_cgroup *root_memcg)
>> {
>> - static DEFINE_MUTEX(percpu_charge_mutex);
>> int cpu, curcpu;
>> + static atomic_t drainer = ATOMIC_INIT(0);
>>
>> /* If someone's already draining, avoid adding running more workers. */
>> - if (!mutex_trylock(&percpu_charge_mutex))
>> + if (atomic_cmpxchg(&drainer, 0, 1) != 0)
>
> I'd like to use atomic_cmpxchg_acquire() here.
>
>> return;
>> /*
>> * Notify other cpus that system-wide "drain" is running
>> @@ -2244,7 +2244,7 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
>> }
>> }
>> put_cpu();
>> - mutex_unlock(&percpu_charge_mutex);
>> + atomic_set(&drainer, 0);
>
> So use atomic_set_release() here to cooperate with
> atomic_cmpxchg_acquire().

I think this will work well. Many thanks!

>
> Thanks.
>
>> }
>>
>>> Thanks!
>>> .
>>>
>>
> .
>

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

hannes at cmpxchg

Aug 3, 2021, 8:15 AM

Post #16 of 18 (434 views)

On Fri, Jul 30, 2021 at 08:50:02AM +0200, Michal Hocko wrote:
> On Thu 29-07-21 20:06:45, Roman Gushchin wrote:
> > On Thu, Jul 29, 2021 at 08:57:52PM +0800, Miaohe Lin wrote:
> > > Since percpu_charge_mutex is only used inside drain_all_stock(), we can
> > > narrow the scope of percpu_charge_mutex by moving it here.
> > >
> > > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> > > ---
> > > mm/memcontrol.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 6580c2381a3e..a03e24e57cd9 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -2050,7 +2050,6 @@ struct memcg_stock_pcp {
> > > #define FLUSHING_CACHED_CHARGE 0
> > > };
> > > static DEFINE_PER_CPU(struct memcg_stock_pcp, memcg_stock);
> > > -static DEFINE_MUTEX(percpu_charge_mutex);
> > >
> > > #ifdef CONFIG_MEMCG_KMEM
> > > static void drain_obj_stock(struct obj_stock *stock);
> > > @@ -2209,6 +2208,7 @@ static void refill_stock(struct mem_cgroup *memcg, unsigned int nr_pages)
> > > */
> > > static void drain_all_stock(struct mem_cgroup *root_memcg)
> > > {
> > > + static DEFINE_MUTEX(percpu_charge_mutex);
> > > int cpu, curcpu;
> >
> > It's considered a good practice to protect data instead of code paths. After
> > the proposed change it becomes obvious that the opposite is done here: the mutex
> > is used to prevent a simultaneous execution of the code of the drain_all_stock()
> > function.
>
> The purpose of the lock was indeed to orchestrate callers more than any
> data structure consistency.

It doesn't seem like we need the lock at all.

The comment says it's so we don't spawn more workers when flushing is
already underway. But a work cannot be queued more than once - if it
were just about that, we'd needlessly duplicate the
test_and_set_bit(WORK_STRUCT_PENDING_BIT) in queue_work_on().

git history shows we tried to remove it once:

commit 8521fc50d433507a7cdc96bec280f9e5888a54cc
Author: Michal Hocko <mhocko@suse.cz>
Date: Tue Jul 26 16:08:29 2011 -0700

memcg: get rid of percpu_charge_mutex lock

but it turned out that the lock did in fact protect a data structure:
the stock itself. Specifically stock->cached:

commit 9f50fad65b87a8776ae989ca059ad6c17925dfc3
Author: Michal Hocko <mhocko@suse.cz>
Date: Tue Aug 9 11:56:26 2011 +0200

Revert "memcg: get rid of percpu_charge_mutex lock"

This reverts commit 8521fc50d433507a7cdc96bec280f9e5888a54cc.

The patch incorrectly assumes that using atomic FLUSHING_CACHED_CHARGE
bit operations is sufficient but that is not true. Johannes Weiner has
reported a crash during parallel memory cgroup removal:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
Oops: 0000 [#1] PREEMPT SMP
Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3
RIP: 0010:[<ffffffff81083b70>] css_is_ancestor+0x20/0x70
RSP: 0018:ffff880077b09c88 EFLAGS: 00010202
Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310)
Call Trace:
[<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40
[<ffffffff810feccf>] drain_all_stock+0x11f/0x170
[<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0
[<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20
[<ffffffff81080559>] cgroup_rmdir+0xb9/0x500
[<ffffffff81114d26>] vfs_rmdir+0x86/0xe0
[<ffffffff81114e7b>] do_rmdir+0xfb/0x110
[<ffffffff81114ea6>] sys_rmdir+0x16/0x20
[<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b

We are crashing because we try to dereference cached memcg when we are
checking whether we should wait for draining on the cache. The cache is
already cleaned up, though.

There is also a theoretical chance that the cached memcg gets freed
between we test for the FLUSHING_CACHED_CHARGE and dereference it in
mem_cgroup_same_or_subtree:

CPU0 CPU1 CPU2
mem=stock->cached
stock->cached=NULL
clear_bit
test_and_set_bit
test_bit() ...
<preempted> mem_cgroup_destroy
use after free

The percpu_charge_mutex protected from this race because sync draining
is exclusive.

It is safer to revert now and come up with a more parallel
implementation later.

I didn't remember this one at all!

However, when you look at the codebase from back then, there was no
rcu-protection for memcg lifetime, and drain_stock() didn't double
check stock->cached inside the work. Hence the crash during a race.

The drain code is different now: drain_local_stock() disables IRQs
which holds up rcu, and then calls drain_stock() and drain_obj_stock()
which both check stock->cached one more time before the deref.

With workqueue managing concurrency, and rcu ensuring memcg lifetime
during the drain, this lock indeed seems unnecessary now.

Unless I'm missing something, it should just be removed instead.

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

Aug 4, 2021, 2:15 AM

Post #17 of 18 (434 views)

On Tue 03-08-21 10:15:36, Johannes Weiner wrote:
[...]
> git history shows we tried to remove it once:
>
> commit 8521fc50d433507a7cdc96bec280f9e5888a54cc
> Author: Michal Hocko <mhocko@suse.cz>
> Date: Tue Jul 26 16:08:29 2011 -0700
>
> memcg: get rid of percpu_charge_mutex lock
>
> but it turned out that the lock did in fact protect a data structure:
> the stock itself. Specifically stock->cached:
>
> commit 9f50fad65b87a8776ae989ca059ad6c17925dfc3
> Author: Michal Hocko <mhocko@suse.cz>
> Date: Tue Aug 9 11:56:26 2011 +0200
>
> Revert "memcg: get rid of percpu_charge_mutex lock"
>
> This reverts commit 8521fc50d433507a7cdc96bec280f9e5888a54cc.
>
> The patch incorrectly assumes that using atomic FLUSHING_CACHED_CHARGE
> bit operations is sufficient but that is not true. Johannes Weiner has
> reported a crash during parallel memory cgroup removal:
>
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
> IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
> Oops: 0000 [#1] PREEMPT SMP
> Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3
> RIP: 0010:[<ffffffff81083b70>] css_is_ancestor+0x20/0x70
> RSP: 0018:ffff880077b09c88 EFLAGS: 00010202
> Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310)
> Call Trace:
> [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40
> [<ffffffff810feccf>] drain_all_stock+0x11f/0x170
> [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0
> [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20
> [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500
> [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0
> [<ffffffff81114e7b>] do_rmdir+0xfb/0x110
> [<ffffffff81114ea6>] sys_rmdir+0x16/0x20
> [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b
>
> We are crashing because we try to dereference cached memcg when we are
> checking whether we should wait for draining on the cache. The cache is
> already cleaned up, though.
>
> There is also a theoretical chance that the cached memcg gets freed
> between we test for the FLUSHING_CACHED_CHARGE and dereference it in
> mem_cgroup_same_or_subtree:
>
> CPU0 CPU1 CPU2
> mem=stock->cached
> stock->cached=NULL
> clear_bit
> test_and_set_bit
> test_bit() ...
> <preempted> mem_cgroup_destroy
> use after free
>
> The percpu_charge_mutex protected from this race because sync draining
> is exclusive.
>
> It is safer to revert now and come up with a more parallel
> implementation later.
>
> I didn't remember this one at all!

Me neither. Thanks for looking that up!

> However, when you look at the codebase from back then, there was no
> rcu-protection for memcg lifetime, and drain_stock() didn't double
> check stock->cached inside the work. Hence the crash during a race.
>
> The drain code is different now: drain_local_stock() disables IRQs
> which holds up rcu, and then calls drain_stock() and drain_obj_stock()
> which both check stock->cached one more time before the deref.
>
> With workqueue managing concurrency, and rcu ensuring memcg lifetime
> during the drain, this lock indeed seems unnecessary now.
>
> Unless I'm missing something, it should just be removed instead.

I do not think you are missing anything. We can drop the lock and
simplify the code. The above information would be great to have in the
changelog.

Thanks!
--
Michal Hocko
SUSE Labs

Re: [PATCH 2/5] mm, memcg: narrow the scope of percpu_charge_mutex [ In reply to ]

linmiaohe at huawei

Aug 4, 2021, 8:15 PM

Post #18 of 18 (434 views)

On 2021/8/4 16:20, Michal Hocko wrote:
> On Tue 03-08-21 10:15:36, Johannes Weiner wrote:
> [...]
>> git history shows we tried to remove it once:
>>
>> commit 8521fc50d433507a7cdc96bec280f9e5888a54cc
>> Author: Michal Hocko <mhocko@suse.cz>
>> Date: Tue Jul 26 16:08:29 2011 -0700
>>
>> memcg: get rid of percpu_charge_mutex lock
>>
>> but it turned out that the lock did in fact protect a data structure:
>> the stock itself. Specifically stock->cached:
>>
>> commit 9f50fad65b87a8776ae989ca059ad6c17925dfc3
>> Author: Michal Hocko <mhocko@suse.cz>
>> Date: Tue Aug 9 11:56:26 2011 +0200
>>
>> Revert "memcg: get rid of percpu_charge_mutex lock"
>>
>> This reverts commit 8521fc50d433507a7cdc96bec280f9e5888a54cc.
>>
>> The patch incorrectly assumes that using atomic FLUSHING_CACHED_CHARGE
>> bit operations is sufficient but that is not true. Johannes Weiner has
>> reported a crash during parallel memory cgroup removal:
>>
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
>> IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
>> Oops: 0000 [#1] PREEMPT SMP
>> Pid: 19677, comm: rmdir Tainted: G W 3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3
>> RIP: 0010:[<ffffffff81083b70>] css_is_ancestor+0x20/0x70
>> RSP: 0018:ffff880077b09c88 EFLAGS: 00010202
>> Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310)
>> Call Trace:
>> [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40
>> [<ffffffff810feccf>] drain_all_stock+0x11f/0x170
>> [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0
>> [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20
>> [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500
>> [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0
>> [<ffffffff81114e7b>] do_rmdir+0xfb/0x110
>> [<ffffffff81114ea6>] sys_rmdir+0x16/0x20
>> [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b
>>
>> We are crashing because we try to dereference cached memcg when we are
>> checking whether we should wait for draining on the cache. The cache is
>> already cleaned up, though.
>>
>> There is also a theoretical chance that the cached memcg gets freed
>> between we test for the FLUSHING_CACHED_CHARGE and dereference it in
>> mem_cgroup_same_or_subtree:
>>
>> CPU0 CPU1 CPU2
>> mem=stock->cached
>> stock->cached=NULL
>> clear_bit
>> test_and_set_bit
>> test_bit() ...
>> <preempted> mem_cgroup_destroy
>> use after free
>>
>> The percpu_charge_mutex protected from this race because sync draining
>> is exclusive.
>>
>> It is safer to revert now and come up with a more parallel
>> implementation later.
>>
>> I didn't remember this one at all!
>
> Me neither. Thanks for looking that up!
>
>> However, when you look at the codebase from back then, there was no
>> rcu-protection for memcg lifetime, and drain_stock() didn't double
>> check stock->cached inside the work. Hence the crash during a race.
>>
>> The drain code is different now: drain_local_stock() disables IRQs
>> which holds up rcu, and then calls drain_stock() and drain_obj_stock()
>> which both check stock->cached one more time before the deref.
>>
>> With workqueue managing concurrency, and rcu ensuring memcg lifetime
>> during the drain, this lock indeed seems unnecessary now.
>>
>> Unless I'm missing something, it should just be removed instead.
>
> I do not think you are missing anything. We can drop the lock and
> simplify the code. The above information would be great to have in the
> changelog.
>

Am I supposed to revert this with the above information in the changelog and add
Suggested-by for both of you?

Many thanks.

> Thanks!
>