Mailing List Archive

xenstat_domain_cpu_ns() occasionally returns a huge value
Hi,

I was writing a little utility to dump out domain CPU times and I
noticed that occasionally xenstat_domain_cpu_ns() returns an
erroneous huge value like 9223488034477457013.

Attached is a small test program that just requests every domain's
CPU time in a tight loop; it received such a result after less than
3 minutes running in dom0 of a host with only dom0 and two other PV
domains running:

$ make cpu_time_test
cc -Wall -lxenstat -lyajl -Wl,-rpath,/usr/lib/xen-4.12/lib -L/usr/lib/xen-4.12/lib cpu_time_test.c -o cpu_time_test
$ sudo time ./cpu_time_test
Got a weird CPU time 9223488108.867305 >100 years (cpu_ns=9223488108867304818)
Command exited with non-zero status 1
84.07user 41.90system 2:40.20elapsed 78%CPU (0avgtext+0avgdata 39780maxresident)k
0inputs+0outputs (0major+9541minor)pagefaults 0swaps

The erroneous results are always somewhere above 922xxxxxxxxxxxxxxxx
nanoseconds (some 285 years of CPU time if it were genuine!). Then
the next reading will be normal. Very occasionally I've seen two in
a row. I see this on both 4.12 and 4.10.

My C is very rusty so I've probably made a simple error and don't
want to bother xen-devel with it; can someone familiar with using
the xenstat interface please tell me what I've done wrong here?

Thanks,
Andy
Re: xenstat_domain_cpu_ns() occasionally returns a huge value [ In reply to ]
On 06.10.19 07:19, Andy Smith wrote:
> Hi,
>
> I was writing a little utility to dump out domain CPU times and I
> noticed that occasionally xenstat_domain_cpu_ns() returns an
> erroneous huge value like 9223488034477457013.
>
> Attached is a small test program that just requests every domain's
> CPU time in a tight loop; it received such a result after less than
> 3 minutes running in dom0 of a host with only dom0 and two other PV
> domains running:
>
> $ make cpu_time_test
> cc -Wall -lxenstat -lyajl -Wl,-rpath,/usr/lib/xen-4.12/lib -L/usr/lib/xen-4.12/lib cpu_time_test.c -o cpu_time_test
> $ sudo time ./cpu_time_test
> Got a weird CPU time 9223488108.867305 >100 years (cpu_ns=9223488108867304818)
> Command exited with non-zero status 1
> 84.07user 41.90system 2:40.20elapsed 78%CPU (0avgtext+0avgdata 39780maxresident)k
> 0inputs+0outputs (0major+9541minor)pagefaults 0swaps
>
> The erroneous results are always somewhere above 922xxxxxxxxxxxxxxxx
> nanoseconds (some 285 years of CPU time if it were genuine!). Then
> the next reading will be normal. Very occasionally I've seen two in
> a row. I see this on both 4.12 and 4.10.
>
> My C is very rusty so I've probably made a simple error and don't
> want to bother xen-devel with it; can someone familiar with using
> the xenstat interface please tell me what I've done wrong here?

I believe chances are rather high this is the bug which was corrected
recently with Xen commit f28c4c4c10bdacb.

Andy, you can easily avoid that problem by removing the highest bit
of the runtime value, e.g.

correct_value = reported_runtime & ~(1ULL << 63);

Jan, I think that patch should be included in stable versions.


Juergen

_______________________________________________
Xen-users mailing list
Xen-users@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-users
Re: xenstat_domain_cpu_ns() occasionally returns a huge value [ In reply to ]
On 06.10.2019 11:01, Jürgen Groß wrote:
> On 06.10.19 07:19, Andy Smith wrote:
>> Hi,
>>
>> I was writing a little utility to dump out domain CPU times and I
>> noticed that occasionally xenstat_domain_cpu_ns() returns an
>> erroneous huge value like 9223488034477457013.
>>
>> Attached is a small test program that just requests every domain's
>> CPU time in a tight loop; it received such a result after less than
>> 3 minutes running in dom0 of a host with only dom0 and two other PV
>> domains running:
>>
>> $ make cpu_time_test
>> cc -Wall -lxenstat -lyajl -Wl,-rpath,/usr/lib/xen-4.12/lib -L/usr/lib/xen-4.12/lib cpu_time_test.c -o cpu_time_test
>> $ sudo time ./cpu_time_test
>> Got a weird CPU time 9223488108.867305 >100 years (cpu_ns=9223488108867304818)
>> Command exited with non-zero status 1
>> 84.07user 41.90system 2:40.20elapsed 78%CPU (0avgtext+0avgdata 39780maxresident)k
>> 0inputs+0outputs (0major+9541minor)pagefaults 0swaps
>>
>> The erroneous results are always somewhere above 922xxxxxxxxxxxxxxxx
>> nanoseconds (some 285 years of CPU time if it were genuine!). Then
>> the next reading will be normal. Very occasionally I've seen two in
>> a row. I see this on both 4.12 and 4.10.
>>
>> My C is very rusty so I've probably made a simple error and don't
>> want to bother xen-devel with it; can someone familiar with using
>> the xenstat interface please tell me what I've done wrong here?
>
> I believe chances are rather high this is the bug which was corrected
> recently with Xen commit f28c4c4c10bdacb.
>
> Andy, you can easily avoid that problem by removing the highest bit
> of the runtime value, e.g.
>
> correct_value = reported_runtime & ~(1ULL << 63);
>
> Jan, I think that patch should be included in stable versions.

I have it queued already; I've merely been waiting for it to pass the
pus gate to master.

Jan

_______________________________________________
Xen-users mailing list
Xen-users@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-users