Mailing List Archive

atropos scheduler broken
Hi everyone,

I've been playing around with the atropos scheduler last couple of
days, and I'm quite convinced that it *does not* enforce the soft real
time guarantees. Maybe I'm using the wrong parameters or something, so
let me describe my experiment:

o first I create 2 VMs -- VM1 and VM2
o then I change their atropos params as follows:
$ xm atropos 1 10 100 1 1
$ xm atropos 2 70 100 1 1
Ideally, this should guarantee that VM1 gets 10ns of CPU time every
100ns, and VM2 gets 70ns every 100ns, and that any left over CPU time
will be shared between the 2.

o after this I write a simple program that computes fibonacci numbers
using naive recursion to eat away all the CPU, and loops around
indefinitely. Programs in both VMs are identical, and I start them
within seconds of each other.

o I take reading from xm list a few seconds after the programs start,
as my base reference:
CPU-TIME
VM1 1 63 0 ----- 173.5 9601
VM2 2 63 0 ----- 10.9 9602

o Thereafter I take readings every few seconds. The abosolute values
of CPU time are not that important, if the *rate* at which CPU time
increases in both the VMs can reflect the atropos scheduling, that is
just as well.

Here are some of the subsequent readings:
VM1 1 63 0 ----- 178.0 9601
VM2 2 63 0 ----- 15.4 9602

VM1 1 63 0 ----- 216.9 9601
VM2 2 63 0 ----- 54.1 9602

VM1 1 63 0 ----- 308.4 9601
VM2 2 63 0 ----- 145.3 9602

VM1 1 63 0 ----- 428.4 9601
VM2 2 63 0 ----- 265.1 9602

As can be seen, the CPU times of both VMs are increasing at almost
*identical* rates. If the atropos params were working, VM2's cpu time
should have been increasing a lot faster.

Has anyone had this problem before? I'll start looking at the code,
but since I'm not familiar with Xen's scheduling code, it might be a
while. In the meanwhile if anyone has any pointers, it will be great.

TIA
--
Diwaker Gupta
http://resolute.ucsd.edu/diwaker


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel
Re: atropos scheduler broken [ In reply to ]
>I've been playing around with the atropos scheduler last couple of
>days, and I'm quite convinced that it *does not* enforce the soft real
>time guarantees.

It is quite possible our current implementation is bugged -- we've
not gotten around to extensive testing in the recent past.

> Maybe I'm using the wrong parameters or something, so let me describe
> my experiment:
>
>o first I create 2 VMs -- VM1 and VM2
>o then I change their atropos params as follows:
>$ xm atropos 1 10 100 1 1
>$ xm atropos 2 70 100 1 1
>Ideally, this should guarantee that VM1 gets 10ns of CPU time every
>100ns, and VM2 gets 70ns every 100ns, and that any left over CPU time
>will be shared between the 2.

Well your parameters are somewhat aggressive -- although times
are specified in nanoseconds this is for precision rather than for
allowing 10ns slices and 100ns periods (which translates into at
least 10 millions context switches a second). x86 CPUs don't really
turn corners too fast, and so this is a considerable overhead.

Atropos doesn't work it it's in overload (>= 100%), which includes
both allocated slices and all overhead for context switching, running
through the scheduler, and certain irq handling.

Your latency values are also rather aggressive - 1ns means that if
a domain blocks for any reason (e.g. to do I/O) then when it unblocks
it's new period will start at most 1ns after the current pass through
the scheduler. There's a small modification in the current implementation
which means this may not bit quite as hard as it could, but even so
any domain waiting more than 100ns for something could cause an immediate
reentry into the scheduler after unblocking due to this.

One simple thing to try is to scale your scheduling parameters to
something more reasonable; e.g.

$ xm atropos 1 10000 100000 50000 1
$ xm atropos 2 70000 100000 50000 1

Let us know how well this works -- if this is also broken, then we
have a real bug.

cheers,

S.


p.s. you're not running on SMP are you? if so, the domains will be
on different CPUs and hence the x flag will cause each of them
to get approximately the same allocation, just as you observed.
Re: atropos scheduler broken [ In reply to ]
> >o first I create 2 VMs -- VM1 and VM2
> >o then I change their atropos params as follows:
> >$ xm atropos 1 10 100 1 1
> >$ xm atropos 2 70 100 1 1
> >Ideally, this should guarantee that VM1 gets 10ns of CPU time every
> >100ns, and VM2 gets 70ns every 100ns, and that any left over CPU time
> >will be shared between the 2.


You might find the following program useful while testing out the
scheduler. It prints the amount of CPU it's getting once a
second. Atropos was working fine for CPU bound domains a few
months back, but had some fairly odd behaviour for IO intensive
domains. Because no one has been using it its probably rotted a
bit. The original algorithm (used in the Nemesis OS) worked just
fine, so this is just an implementation issue.

Ian

/******************************************************************************
* slurp.c
*
* Slurps spare CPU cycles and prints a percentage estimate every second.
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


/* rpcc: get full 64-bit Pentium TSC value */
static __inline__ unsigned long long int rpcc(void)
{
unsigned int __h, __l;
__asm__ __volatile__ ("rdtsc" :"=a" (__l), "=d" (__h));
return (((unsigned long long)__h) << 32) + __l;
}


/*
* find_cpu_speed:
* Interrogates /proc/cpuinfo for the processor clock speed.
*
* Returns: speed of processor in MHz, rounded down to nearest whole MHz.
*/
#define MAX_LINE_LEN 50
int find_cpu_speed(void)
{
FILE *f;
char s[MAX_LINE_LEN], *a, *b;

if ( (f = fopen("/proc/cpuinfo", "r")) == NULL ) goto out;

while ( fgets(s, MAX_LINE_LEN, f) )
{
if ( strstr(s, "cpu MHz") )
{
/* Find the start of the speed value, and stop at the dec point. */
if ( !(a=strpbrk(s,"0123456789")) || !(b=strpbrk(a,".")) ) break;
*b = '\0';
fclose(f);
return(atoi(a));
}
}

out:
fprintf(stderr, "find_cpu_speed: error parsing /proc/cpuinfo for cpu MHz");
exit(1);
}


int main(void)
{
int mhz, i;

/*
* no_preempt_estimate is our estimate, in clock cycles, of how long it
* takes to execute one iteration of the main loop when we aren't
* preempted. 50000 cycles is an overestimate, which we want because:
* (a) On the first pass through the loop, diff will be almost 0,
* which will knock the estimate down to <40000 immediately.
* (b) It's safer to approach real value from above than from below --
* note that this algorithm is unstable if n_p_e gets too small!
*/
unsigned int no_preempt_estimate = 50000;

/*
* prev = timestamp on previous iteration;
* this = timestamp on this iteration;
* diff = difference between the above two stamps;
* start = timestamp when we last printed CPU % estimate;
*/
unsigned long long int prev, this, diff, start;

/*
* preempt_time = approx. cycles we've been preempted for since last stats
* display.
*/
unsigned long long int preempt_time = 0;

/* Required in order to print intermediate results at fixed period. */
mhz = find_cpu_speed();
printf("CPU speed = %d MHz\n", mhz);

start = prev = rpcc();

for ( ; ; )
{
/*
* By looping for a while here we hope to reduce affect of getting
* preempted in critical "timestamp swapping" section of the loop.
* In addition, it should ensure that 'no_preempt_estimate' stays
* reasonably large which helps keep this algorithm stable.
*/
for ( i = 0; i < 10000; i++ );

/*
* The critical bit! Getting preempted here will shaft us a bit,
* but the loop above should make this a rare occurrence.
*/
this = rpcc();
diff = this - prev;
prev = this;

/* if ( diff > (1.5 * preempt_estimate) */
if ( diff > no_preempt_estimate + (no_preempt_estimate>>1) )
{
/* We were probably preempted for a while. */
preempt_time += diff - no_preempt_estimate;
}
else
{
/*
* Looks like we weren't preempted -- update our time estimate:
* New estimate = 0.75*old_est + 0.25*curr_diff
*/
no_preempt_estimate =
(no_preempt_estimate>>1) + (no_preempt_estimate>>2) +
(diff>>2);
}

/* Dump CPU time every second. */
if ( (this - start) / mhz > 1000000 )
{
printf("Slurped %.2f%% CPU, TSC %08x\n",
100.0*((this-start-preempt_time)/((double)this-start)),
this);
start = this;
preempt_time = 0;
}
}

return(0);
}
Re: atropos scheduler broken [ In reply to ]
> It is quite possible our current implementation is bugged -- we've
> not gotten around to extensive testing in the recent past.

AFAIK it doesn't behave quite correct. There's some difficult-to-spot bug
somewhere in the code - it may well only be a small tweak once tracked down.

This currently looks unlikely to be fixed for 2.0 but hopefully will be fully
operational in 2.1.

> p.s. you're not running on SMP are you? if so, the domains will be
> on different CPUs and hence the x flag will cause each of them
> to get approximately the same allocation, just as you observed.

That's also a good point: Xen effectively runs a uniprocessor scheduler ON
EACH CPU, with load balancing across CPUs achieved by CPU pinning in domain
configs or using xm. If you have one domain on each CPU with the xtratime
flag set, they'll get all the CPU they want...

HTH,
Mark
Re: atropos scheduler broken [ In reply to ]
Hi everyone,

Thats for the replies. Here's the update:

I used Ian's slurp program, with the params suggested by Steven.
Actually I had myself been thinking about the small values that I had
been using, but I was not sure what kind of impact they would have.

Here are the params I used (kind of an extreme case, but I just wanted
to be sure that if there was *some* change, I would be able to see
it):
xm atropos 1 10000 200000 50000 0
xm atropos 2 150000 200000 50000 0

So with these changes, here's a snippet of slurp's output from both
the VMs (VM2 is started a few seconds after VM1)

VM1:
CPU speed = 498 MHz
Slurped 90.72% CPU, TSC 6f70c573
Slurped 97.76% CPU, TSC 8d20a825
Slurped 98.77% CPU, TSC aacff8d9
Slurped 99.14% CPU, TSC c87f42c0
Slurped 99.36% CPU, TSC e62ef069
Slurped 98.22% CPU, TSC 03ded2ca
Slurped 98.71% CPU, TSC 218de63c
Slurped 76.88% CPU, TSC 3f40d8a3
Slurped 46.75% CPU, TSC 5cf03633
Slurped 39.86% CPU, TSC 7ab377bc
Slurped 47.18% CPU, TSC 986e75c5
Slurped 59.25% CPU, TSC b61ddba2
Slurped 51.54% CPU, TSC d3ccf714

VM2:
Slurped 53.26% CPU, TSC 52e34564
Slurped 55.14% CPU, TSC 70ae5af6
Slurped 57.19% CPU, TSC 8e5d809e
Slurped 42.62% CPU, TSC ac0cbb65
Slurped 42.80% CPU, TSC c9bc96d9
Slurped 56.01% CPU, TSC e7766b1c
Slurped 54.60% CPU, TSC 0530e391
Slurped 57.15% CPU, TSC 22e0003b
Slurped 56.18% CPU, TSC 40a1e234
Slurped 57.09% CPU, TSC 5e50c733
Slurped 56.75% CPU, TSC 7c125064
Slurped 55.62% CPU, TSC 99c1f384
Slurped 59.20% CPU, TSC b77143db
Slurped 47.35% CPU, TSC d5365862
Slurped 37.21% CPU, TSC f2e61a4e
Slurped 54.03% CPU, TSC 1095a58f
Slurped 59.79% CPU, TSC 2e675a34

Observations:
o When VM2 is not running, VM1 effectively gets *all* the CPU, even if
the xtratime bit is set to 0.
o When VM2 starts running, it looks like they get roughly equal CPU.
There doesn't seem to be any 'atropos' scheduling happening.

So how should one go about debugging Xen?
--
Diwaker Gupta
http://resolute.ucsd.edu/diwaker