Mailing List Archive: [PATCH 1 of 6 v2] xen: sched_credit: improve picking up the idlal CPU for a VCPU

[PATCH 1 of 6 v2] xen: sched_credit: improve picking up the idlal CPU for a VCPU

Dec 11, 2012, 6:52 PM

Post #1 of 9 (272 views)

In _csched_cpu_pick() we try to select the best possible CPU for
running a VCPU, considering the characteristics of the underlying
hardware (i.e., how many threads, core, sockets, and how busy they
are). What we want is "the idle execution vehicle with the most
idling neighbours in its grouping".

In order to achieve it, we select a CPU from the VCPU's affinity,
giving preference to its current processor if possible, as the basis
for the comparison with all the other CPUs. Problem is, to discount
the VCPU itself when computing this "idleness" (in an attempt to be
fair wrt its current processor), we arbitrarily and unconditionally
consider that selected CPU as idle, even when it is not the case,
for instance:
1. If the CPU is not the one where the VCPU is running (perhaps due
to the affinity being changed);
2. The CPU is where the VCPU is running, but it has other VCPUs in
its runq, so it won't go idle even if the VCPU in question goes.

This is exemplified in the trace below:

] 3.466115364 x|------|------| d10v1 22005(2:2:5) 3 [ a 1 8 ]
... ... ...
3.466122856 x|------|------| d10v1 runstate_change d10v1 running->offline
3.466123046 x|------|------| d?v? runstate_change d32767v0 runnable->running
... ... ...
] 3.466126887 x|------|------| d32767v0 28004(2:8:4) 3 [ a 1 8 ]

22005(...) line (the first line) means _csched_cpu_pick() was called on
VCPU 1 of domain 10, while it is running on CPU 0, and it choose CPU 8,
which is busy ('|'), even if there are plenty of idle CPUs. That is
because, as a consequence of changing the VCPU affinity, CPU 8 was
chosen as the basis for the comparison, and therefore considered idle
(its bit gets unconditionally set in the bitmask representing the idle
CPUs). 28004(...) line means the VCPU is woken up and queued on CPU 8's
runq, where it waits for a context switch or a migration, in order to
be able to execute.

This change fixes things by only considering the "guessed" CPU idle if
the VCPU in question is both running there and is its only runnable
VCPU.

While at it, change the name of the two variables (within
_csched_cpu_pick() ) counting the numbers of idlers for `cpu' and
`nxt' in `nr_idlers_cpu' and `nr_idlers_nxt', which makes their job
a little more evident than now that they're just called `weight_*'.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -59,6 +59,8 @@
#define CSCHED_VCPU(_vcpu) ((struct csched_vcpu *) (_vcpu)->sched_priv)
#define CSCHED_DOM(_dom) ((struct csched_dom *) (_dom)->sched_priv)
#define RUNQ(_cpu) (&(CSCHED_PCPU(_cpu)->runq))
+/* Is the first element of _cpu's runq its idle vcpu? */
+#define IS_RUNQ_IDLE(_cpu) (is_idle_vcpu(__runq_elem(RUNQ(_cpu)->next)->vcpu))

/*
@@ -479,9 +481,14 @@ static int
* distinct cores first and guarantees we don't do something stupid
* like run two VCPUs on co-hyperthreads while there are idle cores
* or sockets.
+ *
+ * Notice that, when computing the "idleness" of cpu, we may want to
+ * discount vc. That is, iff vc is the currently running and the only
+ * runnable vcpu on cpu, we add cpu to the idlers.
*/
cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers);
- cpumask_set_cpu(cpu, &idlers);
+ if ( current_on_cpu(cpu) == vc && IS_RUNQ_IDLE(cpu) )
+ cpumask_set_cpu(cpu, &idlers);
cpumask_and(&cpus, &cpus, &idlers);
cpumask_clear_cpu(cpu, &cpus);

@@ -489,7 +496,7 @@ static int
{
cpumask_t cpu_idlers;
cpumask_t nxt_idlers;
- int nxt, weight_cpu, weight_nxt;
+ int nxt, nr_idlers_cpu, nr_idlers_nxt;
int migrate_factor;

nxt = cpumask_cycle(cpu, &cpus);
@@ -513,12 +520,12 @@ static int
cpumask_and(&nxt_idlers, &idlers, per_cpu(cpu_core_mask, nxt));
}

- weight_cpu = cpumask_weight(&cpu_idlers);
- weight_nxt = cpumask_weight(&nxt_idlers);
+ nr_idlers_cpu = cpumask_weight(&cpu_idlers);
+ nr_idlers_nxt = cpumask_weight(&nxt_idlers);
/* smt_power_savings: consolidate work rather than spreading it */
if ( sched_smt_power_savings ?
- weight_cpu > weight_nxt :
- weight_cpu * migrate_factor < weight_nxt )
+ nr_idlers_cpu > nr_idlers_nxt :
+ nr_idlers_cpu * migrate_factor < nr_idlers_nxt )
{
cpumask_and(&nxt_idlers, &cpus, &nxt_idlers);
spc = CSCHED_PCPU(nxt);
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -396,6 +396,9 @@ extern struct vcpu *idle_vcpu[NR_CPUS];
#define is_idle_domain(d) ((d)->domain_id == DOMID_IDLE)
#define is_idle_vcpu(v) (is_idle_domain((v)->domain))

+#define current_on_cpu(_c) \
+ ( (per_cpu(schedule_data, _c).curr) )
+
#define DOMAIN_DESTROYED (1<<31) /* assumes atomic_t is >= 32 bits */
#define put_domain(_d) \
if ( atomic_dec_and_test(&(_d)->refcnt) ) domain_destroy(_d)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [PATCH 1 of 6 v2] xen: sched_credit: improve picking up the idlal CPU for a VCPU [ In reply to ]

JBeulich at suse

Dec 12, 2012, 2:04 AM

Post #2 of 9 (260 views)

Permalink

>>> On 12.12.12 at 03:52, Dario Faggioli <dario.faggioli@citrix.com> wrote:
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -59,6 +59,8 @@
> #define CSCHED_VCPU(_vcpu) ((struct csched_vcpu *) (_vcpu)->sched_priv)
> #define CSCHED_DOM(_dom) ((struct csched_dom *) (_dom)->sched_priv)
> #define RUNQ(_cpu) (&(CSCHED_PCPU(_cpu)->runq))
> +/* Is the first element of _cpu's runq its idle vcpu? */
> +#define IS_RUNQ_IDLE(_cpu) (is_idle_vcpu(__runq_elem(RUNQ(_cpu)->next)->vcpu))
>
>
> /*
> @@ -479,9 +481,14 @@ static int
> * distinct cores first and guarantees we don't do something stupid
> * like run two VCPUs on co-hyperthreads while there are idle cores
> * or sockets.
> + *
> + * Notice that, when computing the "idleness" of cpu, we may want to
> + * discount vc. That is, iff vc is the currently running and the only
> + * runnable vcpu on cpu, we add cpu to the idlers.
> */
> cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers);
> - cpumask_set_cpu(cpu, &idlers);
> + if ( current_on_cpu(cpu) == vc && IS_RUNQ_IDLE(cpu) )
> + cpumask_set_cpu(cpu, &idlers);
> cpumask_and(&cpus, &cpus, &idlers);
> cpumask_clear_cpu(cpu, &cpus);
>
> @@ -489,7 +496,7 @@ static int
> {
> cpumask_t cpu_idlers;
> cpumask_t nxt_idlers;
> - int nxt, weight_cpu, weight_nxt;
> + int nxt, nr_idlers_cpu, nr_idlers_nxt;
> int migrate_factor;
>
> nxt = cpumask_cycle(cpu, &cpus);
> @@ -513,12 +520,12 @@ static int
> cpumask_and(&nxt_idlers, &idlers, per_cpu(cpu_core_mask, nxt));
> }
>
> - weight_cpu = cpumask_weight(&cpu_idlers);
> - weight_nxt = cpumask_weight(&nxt_idlers);
> + nr_idlers_cpu = cpumask_weight(&cpu_idlers);
> + nr_idlers_nxt = cpumask_weight(&nxt_idlers);
> /* smt_power_savings: consolidate work rather than spreading it */
> if ( sched_smt_power_savings ?
> - weight_cpu > weight_nxt :
> - weight_cpu * migrate_factor < weight_nxt )
> + nr_idlers_cpu > nr_idlers_nxt :
> + nr_idlers_cpu * migrate_factor < nr_idlers_nxt )
> {
> cpumask_and(&nxt_idlers, &cpus, &nxt_idlers);
> spc = CSCHED_PCPU(nxt);

Despite you mentioning this in the description, these last two hunks
are, afaict, only renaming variables (and that's even debatable, as
the current names aren't really misleading imo), and hence I don't
think belong in a patch that clearly has the potential for causing
(performance) regressions.

That said - I don't think it will (and even more, I'm agreeable to the
change done).

> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -396,6 +396,9 @@ extern struct vcpu *idle_vcpu[NR_CPUS];
> #define is_idle_domain(d) ((d)->domain_id == DOMID_IDLE)
> #define is_idle_vcpu(v) (is_idle_domain((v)->domain))
>
> +#define current_on_cpu(_c) \
> + ( (per_cpu(schedule_data, _c).curr) )
> +

This, imo, really belings into sched-if.h.

Plus - what's the point of double parentheses, when in fact none
at all would be needed?

And finally, why "_c" and not just "c"?

Jan

> #define DOMAIN_DESTROYED (1<<31) /* assumes atomic_t is >= 32 bits */
> #define put_domain(_d) \
> if ( atomic_dec_and_test(&(_d)->refcnt) ) domain_destroy(_d)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [PATCH 1 of 6 v2] xen: sched_credit: improve picking up the idlal CPU for a VCPU [ In reply to ]

dario.faggioli at citrix

Dec 12, 2012, 2:19 AM

Post #3 of 9 (261 views)

Permalink

On Wed, 2012-12-12 at 10:04 +0000, Jan Beulich wrote:
> > - weight_cpu = cpumask_weight(&cpu_idlers);
> > - weight_nxt = cpumask_weight(&nxt_idlers);
> > + nr_idlers_cpu = cpumask_weight(&cpu_idlers);
> > + nr_idlers_nxt = cpumask_weight(&nxt_idlers);
> > /* smt_power_savings: consolidate work rather than spreading it */
> > if ( sched_smt_power_savings ?
> > - weight_cpu > weight_nxt :
> > - weight_cpu * migrate_factor < weight_nxt )
> > + nr_idlers_cpu > nr_idlers_nxt :
> > + nr_idlers_cpu * migrate_factor < nr_idlers_nxt )
> > {
> > cpumask_and(&nxt_idlers, &cpus, &nxt_idlers);
> > spc = CSCHED_PCPU(nxt);
>
> Despite you mentioning this in the description, these last two hunks
> are, afaict, only renaming variables (and that's even debatable, as
> the current names aren't really misleading imo), and hence I don't
> think belong in a patch that clearly has the potential for causing
> (performance) regressions.
>
Ok, I think I can live with the current names too... Just a matter of
taste. :-)

> That said - I don't think it will (and even more, I'm agreeable to the
> change done).
>
It has been benchmarked, together with the next change, and the results
are in the changelog of 2/6. Numbers there show that the combination of
those two changes are much more an improvement than anything else, at
least for the workloads I considered (which includes sysbench and
specjbb2005).

Anyway, I think I see your point, and I can either move the remane
somewhere else or kill it entirely.

> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -396,6 +396,9 @@ extern struct vcpu *idle_vcpu[NR_CPUS];
> > #define is_idle_domain(d) ((d)->domain_id == DOMID_IDLE)
> > #define is_idle_vcpu(v) (is_idle_domain((v)->domain))
> >
> > +#define current_on_cpu(_c) \
> > + ( (per_cpu(schedule_data, _c).curr) )
> > +
>
> This, imo, really belings into sched-if.h.
>
Ok.

> Plus - what's the point of double parentheses, when in fact none
> at all would be needed?
>
> And finally, why "_c" and not just "c"?
>
Nothing particular, just "personal macro style", I guess, which I can
convert to what you ask and resend.

Thanks,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Re: [PATCH 1 of 6 v2] xen: sched_credit: improve picking up the idlal CPU for a VCPU [ In reply to ]

JBeulich at suse

Dec 12, 2012, 2:30 AM

Post #4 of 9 (263 views)

Permalink

>>> On 12.12.12 at 11:19, Dario Faggioli <dario.faggioli@citrix.com> wrote:
> On Wed, 2012-12-12 at 10:04 +0000, Jan Beulich wrote:
>> Despite you mentioning this in the description, these last two hunks
>> are, afaict, only renaming variables (and that's even debatable, as
>> the current names aren't really misleading imo), and hence I don't
>> think belong in a patch that clearly has the potential for causing
>> (performance) regressions.
>>
> Ok, I think I can live with the current names too... Just a matter of
> taste. :-)
>
>> That said - I don't think it will (and even more, I'm agreeable to the
>> change done).
>>
> It has been benchmarked, together with the next change, and the results
> are in the changelog of 2/6. Numbers there show that the combination of
> those two changes are much more an improvement than anything else, at
> least for the workloads I considered (which includes sysbench and
> specjbb2005).
>
> Anyway, I think I see your point, and I can either move the remane
> somewhere else or kill it entirely.

Yes please; I'll leave it to George to decide upon an eventual
separate renaming patch.

Btw., when you resend, can you please also fix the subject, so
grepping the changeset titles for "idle" would actually hit on this
change?

Thanks, Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [PATCH 1 of 6 v2] xen: sched_credit: improve picking up the idlal CPU for a VCPU [ In reply to ]

dario.faggioli at citrix

Dec 12, 2012, 2:38 AM

Post #5 of 9 (261 views)

Permalink

On Wed, 2012-12-12 at 10:30 +0000, Jan Beulich wrote:
> > Anyway, I think I see your point, and I can either move the remane
> > somewhere else or kill it entirely.
>
> Yes please; I'll leave it to George to decide upon an eventual
> separate renaming patch.
>
Ok.

> Btw., when you resend, can you please also fix the subject, so
> grepping the changeset titles for "idle" would actually hit on this
> change?
>
Ups! My bad, sorry for that. I sure will.

Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Re: [PATCH 1 of 6 v2] xen: sched_credit: improve picking up the idlal CPU for a VCPU [ In reply to ]

george.dunlap at eu

Dec 14, 2012, 11:16 AM

Post #6 of 9 (254 views)

Permalink

On 12/12/12 02:52, Dario Faggioli wrote:
> In _csched_cpu_pick() we try to select the best possible CPU for
> running a VCPU, considering the characteristics of the underlying
> hardware (i.e., how many threads, core, sockets, and how busy they
> are). What we want is "the idle execution vehicle with the most
> idling neighbours in its grouping".
>
> In order to achieve it, we select a CPU from the VCPU's affinity,
> giving preference to its current processor if possible, as the basis
> for the comparison with all the other CPUs. Problem is, to discount
> the VCPU itself when computing this "idleness" (in an attempt to be
> fair wrt its current processor), we arbitrarily and unconditionally
> consider that selected CPU as idle, even when it is not the case,
> for instance:
> 1. If the CPU is not the one where the VCPU is running (perhaps due
> to the affinity being changed);
> 2. The CPU is where the VCPU is running, but it has other VCPUs in
> its runq, so it won't go idle even if the VCPU in question goes.

Good catch -- thanks. Comments below.

> diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c
> +++ b/xen/common/sched_credit.c
> @@ -59,6 +59,8 @@
> #define CSCHED_VCPU(_vcpu) ((struct csched_vcpu *) (_vcpu)->sched_priv)
> #define CSCHED_DOM(_dom) ((struct csched_dom *) (_dom)->sched_priv)
> #define RUNQ(_cpu) (&(CSCHED_PCPU(_cpu)->runq))
> +/* Is the first element of _cpu's runq its idle vcpu? */
> +#define IS_RUNQ_IDLE(_cpu) (is_idle_vcpu(__runq_elem(RUNQ(_cpu)->next)->vcpu))
>
>
> /*
> @@ -479,9 +481,14 @@ static int
> * distinct cores first and guarantees we don't do something stupid
> * like run two VCPUs on co-hyperthreads while there are idle cores
> * or sockets.
> + *
> + * Notice that, when computing the "idleness" of cpu, we may want to
> + * discount vc. That is, iff vc is the currently running and the only
> + * runnable vcpu on cpu, we add cpu to the idlers.
> */
> cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers);
> - cpumask_set_cpu(cpu, &idlers);
> + if ( current_on_cpu(cpu) == vc && IS_RUNQ_IDLE(cpu) )
> + cpumask_set_cpu(cpu, &idlers);

Why bother with this whole "current_on_cpu()" thing, when you can just
look at vc->processor? I.e.:

if ( cpu == vc->processor && IS_RUNQ_IDLE(cpu) )

> cpumask_and(&cpus, &cpus, &idlers);
> cpumask_clear_cpu(cpu, &cpus);
>
> @@ -489,7 +496,7 @@ static int
> {
> cpumask_t cpu_idlers;
> cpumask_t nxt_idlers;
> - int nxt, weight_cpu, weight_nxt;
> + int nxt, nr_idlers_cpu, nr_idlers_nxt;

I think Jan is right, this probably should be a separate patch.

-George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [PATCH 1 of 6 v2] xen: sched_credit: improve picking up the idlal CPU for a VCPU [ In reply to ]

george.dunlap at eu

Dec 14, 2012, 11:50 AM

Post #7 of 9 (250 views)

Permalink

On 12/12/12 10:04, Jan Beulich wrote:
>>>> On 12.12.12 at 03:52, Dario Faggioli <dario.faggioli@citrix.com> wrote:
>> --- a/xen/common/sched_credit.c
>> +++ b/xen/common/sched_credit.c
>> @@ -59,6 +59,8 @@
>> #define CSCHED_VCPU(_vcpu) ((struct csched_vcpu *) (_vcpu)->sched_priv)
>> #define CSCHED_DOM(_dom) ((struct csched_dom *) (_dom)->sched_priv)
>> #define RUNQ(_cpu) (&(CSCHED_PCPU(_cpu)->runq))
>> +/* Is the first element of _cpu's runq its idle vcpu? */
>> +#define IS_RUNQ_IDLE(_cpu) (is_idle_vcpu(__runq_elem(RUNQ(_cpu)->next)->vcpu))
>>
>>
>> /*
>> @@ -479,9 +481,14 @@ static int
>> * distinct cores first and guarantees we don't do something stupid
>> * like run two VCPUs on co-hyperthreads while there are idle cores
>> * or sockets.
>> + *
>> + * Notice that, when computing the "idleness" of cpu, we may want to
>> + * discount vc. That is, iff vc is the currently running and the only
>> + * runnable vcpu on cpu, we add cpu to the idlers.
>> */
>> cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers);
>> - cpumask_set_cpu(cpu, &idlers);
>> + if ( current_on_cpu(cpu) == vc && IS_RUNQ_IDLE(cpu) )
>> + cpumask_set_cpu(cpu, &idlers);
>> cpumask_and(&cpus, &cpus, &idlers);
>> cpumask_clear_cpu(cpu, &cpus);
>>
>> @@ -489,7 +496,7 @@ static int
>> {
>> cpumask_t cpu_idlers;
>> cpumask_t nxt_idlers;
>> - int nxt, weight_cpu, weight_nxt;
>> + int nxt, nr_idlers_cpu, nr_idlers_nxt;
>> int migrate_factor;
>>
>> nxt = cpumask_cycle(cpu, &cpus);
>> @@ -513,12 +520,12 @@ static int
>> cpumask_and(&nxt_idlers, &idlers, per_cpu(cpu_core_mask, nxt));
>> }
>>
>> - weight_cpu = cpumask_weight(&cpu_idlers);
>> - weight_nxt = cpumask_weight(&nxt_idlers);
>> + nr_idlers_cpu = cpumask_weight(&cpu_idlers);
>> + nr_idlers_nxt = cpumask_weight(&nxt_idlers);
>> /* smt_power_savings: consolidate work rather than spreading it */
>> if ( sched_smt_power_savings ?
>> - weight_cpu > weight_nxt :
>> - weight_cpu * migrate_factor < weight_nxt )
>> + nr_idlers_cpu > nr_idlers_nxt :
>> + nr_idlers_cpu * migrate_factor < nr_idlers_nxt )
>> {
>> cpumask_and(&nxt_idlers, &cpus, &nxt_idlers);
>> spc = CSCHED_PCPU(nxt);
> Despite you mentioning this in the description, these last two hunks
> are, afaict, only renaming variables (and that's even debatable, as
> the current names aren't really misleading imo), and hence I don't
> think belong in a patch that clearly has the potential for causing
> (performance) regressions.
>
> That said - I don't think it will (and even more, I'm agreeable to the
> change done).
>
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -396,6 +396,9 @@ extern struct vcpu *idle_vcpu[NR_CPUS];
>> #define is_idle_domain(d) ((d)->domain_id == DOMID_IDLE)
>> #define is_idle_vcpu(v) (is_idle_domain((v)->domain))
>>
>> +#define current_on_cpu(_c) \
>> + ( (per_cpu(schedule_data, _c).curr) )
>> +
> This, imo, really belings into sched-if.h.

Hmm, it looks like there are a number of things that could live in
either sched-if.h or sched.h; but I think this one probably most closely
links with thins like vcpu_is_runnable() and cpu_is_haltable(), both of
which are in sched.h; so sched.h is where I'd put it.

> Plus - what's the point of double parentheses, when in fact none
> at all would be needed?
>
> And finally, why "_c" and not just "c"?

I think the underscore is pretty standard in macros.

There's certainly no need for double parentheses though.

-George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [PATCH 1 of 6 v2] xen: sched_credit: improve picking up the idlal CPU for a VCPU [ In reply to ]

JBeulich at suse

Dec 17, 2012, 12:35 AM

Post #8 of 9 (251 views)

Permalink

>>> On 14.12.12 at 20:50, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> On 12/12/12 10:04, Jan Beulich wrote:
>>>>> On 12.12.12 at 03:52, Dario Faggioli <dario.faggioli@citrix.com> wrote:
>>> --- a/xen/common/sched_credit.c
>>> +++ b/xen/common/sched_credit.c
>>> @@ -59,6 +59,8 @@
>>> #define CSCHED_VCPU(_vcpu) ((struct csched_vcpu *) (_vcpu)->sched_priv)
>>> #define CSCHED_DOM(_dom) ((struct csched_dom *) (_dom)->sched_priv)
>>> #define RUNQ(_cpu) (&(CSCHED_PCPU(_cpu)->runq))
>>> +/* Is the first element of _cpu's runq its idle vcpu? */
>>> +#define IS_RUNQ_IDLE(_cpu) (is_idle_vcpu(__runq_elem(RUNQ(_cpu)->next)->vcpu))
>>>
>>>
>>> /*
>>> @@ -479,9 +481,14 @@ static int
>>> * distinct cores first and guarantees we don't do something stupid
>>> * like run two VCPUs on co-hyperthreads while there are idle cores
>>> * or sockets.
>>> + *
>>> + * Notice that, when computing the "idleness" of cpu, we may want to
>>> + * discount vc. That is, iff vc is the currently running and the only
>>> + * runnable vcpu on cpu, we add cpu to the idlers.
>>> */
>>> cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers);
>>> - cpumask_set_cpu(cpu, &idlers);
>>> + if ( current_on_cpu(cpu) == vc && IS_RUNQ_IDLE(cpu) )
>>> + cpumask_set_cpu(cpu, &idlers);
>>> cpumask_and(&cpus, &cpus, &idlers);
>>> cpumask_clear_cpu(cpu, &cpus);
>>>
>>> @@ -489,7 +496,7 @@ static int
>>> {
>>> cpumask_t cpu_idlers;
>>> cpumask_t nxt_idlers;
>>> - int nxt, weight_cpu, weight_nxt;
>>> + int nxt, nr_idlers_cpu, nr_idlers_nxt;
>>> int migrate_factor;
>>>
>>> nxt = cpumask_cycle(cpu, &cpus);
>>> @@ -513,12 +520,12 @@ static int
>>> cpumask_and(&nxt_idlers, &idlers, per_cpu(cpu_core_mask, nxt));
>>> }
>>>
>>> - weight_cpu = cpumask_weight(&cpu_idlers);
>>> - weight_nxt = cpumask_weight(&nxt_idlers);
>>> + nr_idlers_cpu = cpumask_weight(&cpu_idlers);
>>> + nr_idlers_nxt = cpumask_weight(&nxt_idlers);
>>> /* smt_power_savings: consolidate work rather than spreading it */
>>> if ( sched_smt_power_savings ?
>>> - weight_cpu > weight_nxt :
>>> - weight_cpu * migrate_factor < weight_nxt )
>>> + nr_idlers_cpu > nr_idlers_nxt :
>>> + nr_idlers_cpu * migrate_factor < nr_idlers_nxt )
>>> {
>>> cpumask_and(&nxt_idlers, &cpus, &nxt_idlers);
>>> spc = CSCHED_PCPU(nxt);
>> Despite you mentioning this in the description, these last two hunks
>> are, afaict, only renaming variables (and that's even debatable, as
>> the current names aren't really misleading imo), and hence I don't
>> think belong in a patch that clearly has the potential for causing
>> (performance) regressions.
>>
>> That said - I don't think it will (and even more, I'm agreeable to the
>> change done).
>>
>>> --- a/xen/include/xen/sched.h
>>> +++ b/xen/include/xen/sched.h
>>> @@ -396,6 +396,9 @@ extern struct vcpu *idle_vcpu[NR_CPUS];
>>> #define is_idle_domain(d) ((d)->domain_id == DOMID_IDLE)
>>> #define is_idle_vcpu(v) (is_idle_domain((v)->domain))
>>>
>>> +#define current_on_cpu(_c) \
>>> + ( (per_cpu(schedule_data, _c).curr) )
>>> +
>> This, imo, really belings into sched-if.h.
>
> Hmm, it looks like there are a number of things that could live in
> either sched-if.h or sched.h; but I think this one probably most closely
> links with thins like vcpu_is_runnable() and cpu_is_haltable(), both of
> which are in sched.h; so sched.h is where I'd put it.

Any use of schedule_data, the type of which is declared in
sched-if.h, should be in sched-if.h - someone only including
sched.h can't make use of it anyway (and it's intended to be
used by scheduler code, i.e. shouldn't be visible to other
code).

>> Plus - what's the point of double parentheses, when in fact none
>> at all would be needed?
>>
>> And finally, why "_c" and not just "c"?
>
> I think the underscore is pretty standard in macros.

It's bad practice imo; I have always understood this as
questionable attempts of people to avoid name clashes (which
is understandable only for variables declared locally inside a
macro definition).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [PATCH 1 of 6 v2] xen: sched_credit: improve picking up the idlal CPU for a VCPU [ In reply to ]

dario.faggioli at citrix

Dec 17, 2012, 6:36 AM

Post #9 of 9 (251 views)

Permalink

On Mon, 2012-12-17 at 08:35 +0000, Jan Beulich wrote:
> >>> On 14.12.12 at 20:50, George Dunlap <george.dunlap@eu.citrix.com> wrote:
> > On 12/12/12 10:04, Jan Beulich wrote:
> >> This, imo, really belings into sched-if.h.
> >
> > Hmm, it looks like there are a number of things that could live in
> > either sched-if.h or sched.h; but I think this one probably most closely
> > links with thins like vcpu_is_runnable() and cpu_is_haltable(), both of
> > which are in sched.h; so sched.h is where I'd put it.
>
> Any use of schedule_data, the type of which is declared in
> sched-if.h, should be in sched-if.h - someone only including
> sched.h can't make use of it anyway (and it's intended to be
> used by scheduler code, i.e. shouldn't be visible to other
> code).
>
Ok, this argument, I find quite convincing, I think I'm putting the
macro in sched-if.h

Thanks and Regards,
Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Mailing List Archive

Attached Files:

Attached Files:

Attached Files: