Mailing List Archive

[PATCH 4/4] rcu/nocb: Make shrinker to iterate only NOCB CPUs
Callbacks can only be queued as lazy on NOCB CPUs, therefore iterating
over the NOCB mask is enough for both counting and scanning. Just lock
the mostly uncontended barrier mutex on counting as well in order to
keep rcu_nocb_mask stable.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/rcu/tree_nocb.h | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index a3dc7465b0b2..185c0c9a60d4 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1319,13 +1319,21 @@ lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
int cpu;
unsigned long count = 0;

+ if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
+ return 0;
+
+ /* Protect rcu_nocb_mask against concurrent (de-)offloading. */
+ mutex_lock(&rcu_state.barrier_mutex);
+
/* Snapshot count of all CPUs */
- for_each_possible_cpu(cpu) {
+ for_each_cpu(cpu, rcu_nocb_mask) {
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);

count += READ_ONCE(rdp->lazy_len);
}

+ mutex_unlock(&rcu_state.barrier_mutex);
+
return count ? count : SHRINK_EMPTY;
}

@@ -1336,6 +1344,8 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
unsigned long flags;
unsigned long count = 0;

+ if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
+ return 0;
/*
* Protect against concurrent (de-)offloading. Otherwise nocb locking
* may be ignored or imbalanced.
@@ -1343,7 +1353,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
mutex_lock(&rcu_state.barrier_mutex);

/* Snapshot count of all CPUs */
- for_each_possible_cpu(cpu) {
+ for_each_cpu(cpu, rcu_nocb_mask) {
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
int _count;

--
2.34.1
Re: [PATCH 4/4] rcu/nocb: Make shrinker to iterate only NOCB CPUs [ In reply to ]
On Wed, Mar 22, 2023 at 08:44:56PM +0100, Frederic Weisbecker wrote:
> Callbacks can only be queued as lazy on NOCB CPUs, therefore iterating
> over the NOCB mask is enough for both counting and scanning. Just lock
> the mostly uncontended barrier mutex on counting as well in order to
> keep rcu_nocb_mask stable.
>

Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>

thanks,

- Joel


> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
> kernel/rcu/tree_nocb.h | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index a3dc7465b0b2..185c0c9a60d4 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1319,13 +1319,21 @@ lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> int cpu;
> unsigned long count = 0;
>
> + if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
> + return 0;
> +
> + /* Protect rcu_nocb_mask against concurrent (de-)offloading. */
> + mutex_lock(&rcu_state.barrier_mutex);
> +
> /* Snapshot count of all CPUs */
> - for_each_possible_cpu(cpu) {
> + for_each_cpu(cpu, rcu_nocb_mask) {
> struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>
> count += READ_ONCE(rdp->lazy_len);
> }
>
> + mutex_unlock(&rcu_state.barrier_mutex);
> +
> return count ? count : SHRINK_EMPTY;
> }
>
> @@ -1336,6 +1344,8 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> unsigned long flags;
> unsigned long count = 0;
>
> + if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
> + return 0;
> /*
> * Protect against concurrent (de-)offloading. Otherwise nocb locking
> * may be ignored or imbalanced.
> @@ -1343,7 +1353,7 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> mutex_lock(&rcu_state.barrier_mutex);
>
> /* Snapshot count of all CPUs */
> - for_each_possible_cpu(cpu) {
> + for_each_cpu(cpu, rcu_nocb_mask) {
> struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> int _count;
>
> --
> 2.34.1
>
[PATCH 4/4] rcu/nocb: Make shrinker to iterate only NOCB CPUs [ In reply to ]
Callbacks can only be queued as lazy on NOCB CPUs, therefore iterating
over the NOCB mask is enough for both counting and scanning. Just lock
the mostly uncontended barrier mutex on counting as well in order to
keep rcu_nocb_mask stable.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
kernel/rcu/tree_nocb.h | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index dfa9c10d6727..43229d2b0c44 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1319,13 +1319,22 @@ lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
int cpu;
unsigned long count = 0;

+ if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
+ return 0;
+
+ /* Protect rcu_nocb_mask against concurrent (de-)offloading. */
+ if (!mutex_trylock(&rcu_state.barrier_mutex))
+ return 0;
+
/* Snapshot count of all CPUs */
- for_each_possible_cpu(cpu) {
+ for_each_cpu(cpu, rcu_nocb_mask) {
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);

count += READ_ONCE(rdp->lazy_len);
}

+ mutex_unlock(&rcu_state.barrier_mutex);
+
return count ? count : SHRINK_EMPTY;
}

@@ -1336,6 +1345,8 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
unsigned long flags;
unsigned long count = 0;

+ if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
+ return 0;
/*
* Protect against concurrent (de-)offloading. Otherwise nocb locking
* may be ignored or imbalanced.
@@ -1351,11 +1362,11 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
}

/* Snapshot count of all CPUs */
- for_each_possible_cpu(cpu) {
+ for_each_cpu(cpu, rcu_nocb_mask) {
struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
int _count;

- if (!rcu_rdp_is_offloaded(rdp))
+ if (WARN_ON_ONCE(!rcu_rdp_is_offloaded(rdp)))
continue;

if (!READ_ONCE(rdp->lazy_len))
--
2.34.1
Re: [PATCH 4/4] rcu/nocb: Make shrinker to iterate only NOCB CPUs [ In reply to ]
On Wed, Mar 29, 2023 at 06:02:03PM +0200, Frederic Weisbecker wrote:
> Callbacks can only be queued as lazy on NOCB CPUs, therefore iterating
> over the NOCB mask is enough for both counting and scanning. Just lock
> the mostly uncontended barrier mutex on counting as well in order to
> keep rcu_nocb_mask stable.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Looks plausible. ;-)

What are you doing to test this? For that matter, what should rcutorture
be doing to test this? My guess is that the current callback flooding in
rcu_torture_fwd_prog_cr() should do the trick, but figured I should ask.

Thanx, Paul

> ---
> kernel/rcu/tree_nocb.h | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index dfa9c10d6727..43229d2b0c44 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1319,13 +1319,22 @@ lazy_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
> int cpu;
> unsigned long count = 0;
>
> + if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
> + return 0;
> +
> + /* Protect rcu_nocb_mask against concurrent (de-)offloading. */
> + if (!mutex_trylock(&rcu_state.barrier_mutex))
> + return 0;
> +
> /* Snapshot count of all CPUs */
> - for_each_possible_cpu(cpu) {
> + for_each_cpu(cpu, rcu_nocb_mask) {
> struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>
> count += READ_ONCE(rdp->lazy_len);
> }
>
> + mutex_unlock(&rcu_state.barrier_mutex);
> +
> return count ? count : SHRINK_EMPTY;
> }
>
> @@ -1336,6 +1345,8 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> unsigned long flags;
> unsigned long count = 0;
>
> + if (WARN_ON_ONCE(!cpumask_available(rcu_nocb_mask)))
> + return 0;
> /*
> * Protect against concurrent (de-)offloading. Otherwise nocb locking
> * may be ignored or imbalanced.
> @@ -1351,11 +1362,11 @@ lazy_rcu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
> }
>
> /* Snapshot count of all CPUs */
> - for_each_possible_cpu(cpu) {
> + for_each_cpu(cpu, rcu_nocb_mask) {
> struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> int _count;
>
> - if (!rcu_rdp_is_offloaded(rdp))
> + if (WARN_ON_ONCE(!rcu_rdp_is_offloaded(rdp)))
> continue;
>
> if (!READ_ONCE(rdp->lazy_len))
> --
> 2.34.1
>
Re: [PATCH 4/4] rcu/nocb: Make shrinker to iterate only NOCB CPUs [ In reply to ]
On Wed, Mar 29, 2023 at 01:58:06PM -0700, Paul E. McKenney wrote:
> On Wed, Mar 29, 2023 at 06:02:03PM +0200, Frederic Weisbecker wrote:
> > Callbacks can only be queued as lazy on NOCB CPUs, therefore iterating
> > over the NOCB mask is enough for both counting and scanning. Just lock
> > the mostly uncontended barrier mutex on counting as well in order to
> > keep rcu_nocb_mask stable.
> >
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>
> Looks plausible. ;-)
>
> What are you doing to test this? For that matter, what should rcutorture
> be doing to test this? My guess is that the current callback flooding in
> rcu_torture_fwd_prog_cr() should do the trick, but figured I should ask.

All I did was to trigger these shrinker callbacks through debugfs
(https://docs.kernel.org/admin-guide/mm/shrinker_debugfs.html)

But rcutorture isn't testing it because:

- No torture config has CONFIG_RCU_LAZY
- rcutorture doesn't do any lazy call_rcu() (always calls hurry for the
main RCU flavour).

And I suspect rcutorture isn't ready for accepting the lazy delay, that would
require some special treatment.

Thanks.
Re: [PATCH 4/4] rcu/nocb: Make shrinker to iterate only NOCB CPUs [ In reply to ]
On Wed, Mar 29, 2023 at 11:35:36PM +0200, Frederic Weisbecker wrote:
> On Wed, Mar 29, 2023 at 01:58:06PM -0700, Paul E. McKenney wrote:
> > On Wed, Mar 29, 2023 at 06:02:03PM +0200, Frederic Weisbecker wrote:
> > > Callbacks can only be queued as lazy on NOCB CPUs, therefore iterating
> > > over the NOCB mask is enough for both counting and scanning. Just lock
> > > the mostly uncontended barrier mutex on counting as well in order to
> > > keep rcu_nocb_mask stable.
> > >
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> >
> > Looks plausible. ;-)
> >
> > What are you doing to test this? For that matter, what should rcutorture
> > be doing to test this? My guess is that the current callback flooding in
> > rcu_torture_fwd_prog_cr() should do the trick, but figured I should ask.
>
> All I did was to trigger these shrinker callbacks through debugfs
> (https://docs.kernel.org/admin-guide/mm/shrinker_debugfs.html)
>
> But rcutorture isn't testing it because:
>
> - No torture config has CONFIG_RCU_LAZY
> - rcutorture doesn't do any lazy call_rcu() (always calls hurry for the
> main RCU flavour).
>
> And I suspect rcutorture isn't ready for accepting the lazy delay, that would
> require some special treatment.

All fair points!

And yes, any non-lazy callback would delazify everything, so as it
is currently constituted, it would not be testing very much of the
lazy-callback state space.

Thanx, Paul