Mailing List Archive

[Bug report] Security issue in "xl vcpu-set"
Hi,

"xl vcpu-set" is commonly used to hotplug/unhotplug vCPUs of an SMP VM.
However, the current implementation of this command makes the driver
domain vulnerable to denial-of-service attack: in certain cases, this command
consumes too many CPU cycles in dom0, adversely affecting dom0's other
tasks (e.g., IO processing, monitoring, etc.)

[An illustrative example]
Say, with a Linux PV guest called "vm01", when vm01 just boots or reboots
(e.g., in its grub period), if dom0 issues "xl vcpu-set vm01 xxx" at this
moment, the following will happen:
(1) "xl vcpu-set" hangs, until vm01 has loaded its kernel successfully.
(2) in dom0, "oxenstored" consumes 100% of a single core.

So, if a malicious guest purposely stays in its grub period (or other
purposely designed phases, as long as it does not load its kernel),
"xl vcpu-set" will hang _forever_, and dom0 has to sacrifice one core.

[Affected versions]
This problem has been there since libxl was introduced in Xen 4.1 release.

[Implementation problem]
"xl vcpu-set" involves a "loop" as follows (retry_transaction):
--------------
static int libxl__set_vcpuonline_xenstore(...)
{
... ...
retry_transaction:
t = xs_transaction_start(CTX->xsh);
for (i = 0; i <= info.vcpu_max_id; i++)
libxl__xs_write(gc, t,
libxl__sprintf(gc, "%s/cpu/%u/availability", dompath, i),
"%s", libxl_bitmap_test(cpumap, i) ? "online" : "offline");
if (!xs_transaction_end(CTX->xsh, t, 0)) {
if (errno == EAGAIN)
goto retry_transaction;
}
... ...
}
--------------

[Possible solution]
In principle, the correctness of "xl vcpu-set" should _not_ depend on any
guest's behaviors.
One possible fix might be like this: if xs_transaction_end does not succeed,
either ignore it or retry for a pre-defined timeout period (rather
than forever).

[Suggestions]
I find that the loop method like "retry_transaction" is used at several places
in libxl. You probably want to revisit the potential negative effect
it brings.

Please take a look and help confirm my reported bug. Thanks.

(Cc'd to two authors listed in libxl.c)

Luwei Cheng
Department of Computer Science
The University of Hong Kong

_______________________________________________
Xen-api mailing list
Xen-api@lists.xen.org
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
Re: [Bug report] Security issue in "xl vcpu-set" [ In reply to ]
Since this was copied to xen-api@ it is now public, so redirecting to
the correct list (xen-devel@). I kept xen-api since oxenstored might be
involved. I dropped Vincent since he is no longer involved in libxl
development.

On Fri, 2015-05-29 at 13:44 +0800, lwcheng@cs.hku.hk wrote:
> Hi,
>
> "xl vcpu-set" is commonly used to hotplug/unhotplug vCPUs of an SMP VM.
> However, the current implementation of this command makes the driver
> domain vulnerable to denial-of-service attack: in certain cases, this command
> consumes too many CPU cycles in dom0, adversely affecting dom0's other
> tasks (e.g., IO processing, monitoring, etc.)
>
> [An illustrative example]
> Say, with a Linux PV guest called "vm01", when vm01 just boots or reboots
> (e.g., in its grub period)

Do you mean pygrub or pvgrub here?

> , if dom0 issues "xl vcpu-set vm01 xxx" at this
> moment, the following will happen:
> (1) "xl vcpu-set" hangs, until vm01 has loaded its kernel successfully.
> (2) in dom0, "oxenstored" consumes 100% of a single core.

It's not clear to me why this should relate to the status of the guest,
AFAIK there is no reason for a xenstore transaction to be affected by
whether or not the guest has loaded its kernel.

Certainly if it is spinning forever there is a bug somewhere, but I
don't think it relates to the use of a transaction in this way.

Ian.

> So, if a malicious guest purposely stays in its grub period (or other
> purposely designed phases, as long as it does not load its kernel),
> "xl vcpu-set" will hang _forever_, and dom0 has to sacrifice one core.
>
> [Affected versions]
> This problem has been there since libxl was introduced in Xen 4.1 release.
>
> [Implementation problem]
> "xl vcpu-set" involves a "loop" as follows (retry_transaction):
> --------------
> static int libxl__set_vcpuonline_xenstore(...)
> {
> ... ...
> retry_transaction:
> t = xs_transaction_start(CTX->xsh);
> for (i = 0; i <= info.vcpu_max_id; i++)
> libxl__xs_write(gc, t,
> libxl__sprintf(gc, "%s/cpu/%u/availability", dompath, i),
> "%s", libxl_bitmap_test(cpumap, i) ? "online" : "offline");
> if (!xs_transaction_end(CTX->xsh, t, 0)) {
> if (errno == EAGAIN)
> goto retry_transaction;
> }
> ... ...
> }
> --------------
>
> [Possible solution]
> In principle, the correctness of "xl vcpu-set" should _not_ depend on any
> guest's behaviors.
> One possible fix might be like this: if xs_transaction_end does not succeed,
> either ignore it or retry for a pre-defined timeout period (rather
> than forever).
>
> [Suggestions]
> I find that the loop method like "retry_transaction" is used at several places
> in libxl. You probably want to revisit the potential negative effect
> it brings.
>
> Please take a look and help confirm my reported bug. Thanks.
>
> (Cc'd to two authors listed in libxl.c)
>
> Luwei Cheng
> Department of Computer Science
> The University of Hong Kong



_______________________________________________
Xen-api mailing list
Xen-api@lists.xen.org
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
Re: [Bug report] Security issue in "xl vcpu-set" [ In reply to ]
Hi Ian,

Thanks for your reply. Please read my inline reply to your questions.

Quoting Ian Campbell <ian.campbell@citrix.com>:
> Since this was copied to xen-api@ it is now public, so redirecting to
> the correct list (xen-devel@). I kept xen-api since oxenstored might be
> involved. I dropped Vincent since he is no longer involved in libxl
> development.
>
> On Fri, 2015-05-29 at 13:44 +0800, lwcheng@cs.hku.hk wrote:
>> Hi,
>>
>> "xl vcpu-set" is commonly used to hotplug/unhotplug vCPUs of an SMP VM.
>> However, the current implementation of this command makes the driver
>> domain vulnerable to denial-of-service attack: in certain cases,
>> this command
>> consumes too many CPU cycles in dom0, adversely affecting dom0's other
>> tasks (e.g., IO processing, monitoring, etc.)
>>
>> [An illustrative example]
>> Say, with a Linux PV guest called "vm01", when vm01 just boots or reboots
>> (e.g., in its grub period)
>
> Do you mean pygrub or pvgrub here?

My VM uses pygrub: Xen-4.5.0 + Linux 3.14.35 (for both dom0 and domU).

>
>> , if dom0 issues "xl vcpu-set vm01 xxx" at this
>> moment, the following will happen:
>> (1) "xl vcpu-set" hangs, until vm01 has loaded its kernel successfully.
>> (2) in dom0, "oxenstored" consumes 100% of a single core.
>
> It's not clear to me why this should relate to the status of the guest,
> AFAIK there is no reason for a xenstore transaction to be affected by
> whether or not the guest has loaded its kernel.
>
> Certainly if it is spinning forever there is a bug somewhere, but I
> don't think it relates to the use of a transaction in this way.

You may check /var/log/xenstored-access.log: when "xl vcpu-set" hangs,
xenstore keeps writing to "/local/domain/xx/cpu/xx/availability",
indicating that it is looping in retry_transaction.


>
> Ian.
>
>> So, if a malicious guest purposely stays in its grub period (or other
>> purposely designed phases, as long as it does not load its kernel),
>> "xl vcpu-set" will hang _forever_, and dom0 has to sacrifice one core.
>>
>> [Affected versions]
>> This problem has been there since libxl was introduced in Xen 4.1 release.
>>
>> [Implementation problem]
>> "xl vcpu-set" involves a "loop" as follows (retry_transaction):
>> --------------
>> static int libxl__set_vcpuonline_xenstore(...)
>> {
>> ... ...
>> retry_transaction:
>> t = xs_transaction_start(CTX->xsh);
>> for (i = 0; i <= info.vcpu_max_id; i++)
>> libxl__xs_write(gc, t,
>> libxl__sprintf(gc, "%s/cpu/%u/availability", dompath, i),
>> "%s", libxl_bitmap_test(cpumap, i) ? "online" : "offline");
>> if (!xs_transaction_end(CTX->xsh, t, 0)) {
>> if (errno == EAGAIN)
>> goto retry_transaction;
>> }
>> ... ...
>> }
>> --------------
>>
>> [Possible solution]
>> In principle, the correctness of "xl vcpu-set" should _not_ depend on any
>> guest's behaviors.
>> One possible fix might be like this: if xs_transaction_end does not succeed,
>> either ignore it or retry for a pre-defined timeout period (rather
>> than forever).
>>
>> [Suggestions]
>> I find that the loop method like "retry_transaction" is used at
>> several places
>> in libxl. You probably want to revisit the potential negative effect
>> it brings.
>>
>> Please take a look and help confirm my reported bug. Thanks.
>>
>> (Cc'd to two authors listed in libxl.c)
>>
>> Luwei Cheng
>> Department of Computer Science
>> The University of Hong Kong
>
>



_______________________________________________
Xen-api mailing list
Xen-api@lists.xen.org
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
Re: [Bug report] Security issue in "xl vcpu-set" [ In reply to ]
Hi,

Wonder if there is any follow-ups from the relevant developers:
(1) are you able to reproduce the "spinning" behavior of "xl vcpu-set"?
(2) if yes, can you confirm that it is due to looping with
"retry_transaction"?

Best,
Luwei


Quoting lwcheng@cs.hku.hk:

> Hi Ian,
>
> Thanks for your reply. Please read my inline reply to your questions.
>
> Quoting Ian Campbell <ian.campbell@citrix.com>:
>> Since this was copied to xen-api@ it is now public, so redirecting to
>> the correct list (xen-devel@). I kept xen-api since oxenstored might be
>> involved. I dropped Vincent since he is no longer involved in libxl
>> development.
>>
>> On Fri, 2015-05-29 at 13:44 +0800, lwcheng@cs.hku.hk wrote:
>>> Hi,
>>>
>>> "xl vcpu-set" is commonly used to hotplug/unhotplug vCPUs of an SMP VM.
>>> However, the current implementation of this command makes the driver
>>> domain vulnerable to denial-of-service attack: in certain cases,
>>> this command
>>> consumes too many CPU cycles in dom0, adversely affecting dom0's other
>>> tasks (e.g., IO processing, monitoring, etc.)
>>>
>>> [An illustrative example]
>>> Say, with a Linux PV guest called "vm01", when vm01 just boots or reboots
>>> (e.g., in its grub period)
>>
>> Do you mean pygrub or pvgrub here?
>
> My VM uses pygrub: Xen-4.5.0 + Linux 3.14.35 (for both dom0 and domU).
>
>>
>>> , if dom0 issues "xl vcpu-set vm01 xxx" at this
>>> moment, the following will happen:
>>> (1) "xl vcpu-set" hangs, until vm01 has loaded its kernel successfully.
>>> (2) in dom0, "oxenstored" consumes 100% of a single core.
>>
>> It's not clear to me why this should relate to the status of the guest,
>> AFAIK there is no reason for a xenstore transaction to be affected by
>> whether or not the guest has loaded its kernel.
>>
>> Certainly if it is spinning forever there is a bug somewhere, but I
>> don't think it relates to the use of a transaction in this way.
>
> You may check /var/log/xenstored-access.log: when "xl vcpu-set" hangs,
> xenstore keeps writing to "/local/domain/xx/cpu/xx/availability",
> indicating that it is looping in retry_transaction.
>
>
>>
>> Ian.
>>
>>> So, if a malicious guest purposely stays in its grub period (or other
>>> purposely designed phases, as long as it does not load its kernel),
>>> "xl vcpu-set" will hang _forever_, and dom0 has to sacrifice one core.
>>>
>>> [Affected versions]
>>> This problem has been there since libxl was introduced in Xen 4.1 release.
>>>
>>> [Implementation problem]
>>> "xl vcpu-set" involves a "loop" as follows (retry_transaction):
>>> --------------
>>> static int libxl__set_vcpuonline_xenstore(...)
>>> {
>>> ... ...
>>> retry_transaction:
>>> t = xs_transaction_start(CTX->xsh);
>>> for (i = 0; i <= info.vcpu_max_id; i++)
>>> libxl__xs_write(gc, t,
>>> libxl__sprintf(gc, "%s/cpu/%u/availability", dompath, i),
>>> "%s", libxl_bitmap_test(cpumap, i) ? "online" : "offline");
>>> if (!xs_transaction_end(CTX->xsh, t, 0)) {
>>> if (errno == EAGAIN)
>>> goto retry_transaction;
>>> }
>>> ... ...
>>> }
>>> --------------
>>>
>>> [Possible solution]
>>> In principle, the correctness of "xl vcpu-set" should _not_ depend on any
>>> guest's behaviors.
>>> One possible fix might be like this: if xs_transaction_end does
>>> not succeed,
>>> either ignore it or retry for a pre-defined timeout period (rather
>>> than forever).
>>>
>>> [Suggestions]
>>> I find that the loop method like "retry_transaction" is used at
>>> several places
>>> in libxl. You probably want to revisit the potential negative effect
>>> it brings.
>>>
>>> Please take a look and help confirm my reported bug. Thanks.
>>>
>>> (Cc'd to two authors listed in libxl.c)
>>>
>>> Luwei Cheng
>>> Department of Computer Science
>>> The University of Hong Kong
>>
>>
>
>
>



_______________________________________________
Xen-api mailing list
Xen-api@lists.xen.org
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
Re: [Bug report] Security issue in "xl vcpu-set" [ In reply to ]
(redirecting to xen-devel as I originally intended)

On Wed, 2015-06-03 at 13:02 +0800, lwcheng@cs.hku.hk wrote:
> Hi,
>
> Wonder if there is any follow-ups from the relevant developers:
> (1) are you able to reproduce the "spinning" behavior of "xl vcpu-set"?

I am with Xen 4.5, but not with xen-unstable AFAICT.

> (2) if yes, can you confirm that it is due to looping with
> "retry_transaction"?

I don't think so. You are _supposed_ to retry failed transactions in
this way, it's how they work.

The issue is that the transaction is failing repeatedly in such a
manner. This might be due to a lack of error checking within the loop,
or it could possibly be a bug in the xenstore daemon.

Ian.

>
> Best,
> Luwei
>
>
> Quoting lwcheng@cs.hku.hk:
>
> > Hi Ian,
> >
> > Thanks for your reply. Please read my inline reply to your questions.
> >
> > Quoting Ian Campbell <ian.campbell@citrix.com>:
> >> Since this was copied to xen-api@ it is now public, so redirecting to
> >> the correct list (xen-devel@). I kept xen-api since oxenstored might be
> >> involved. I dropped Vincent since he is no longer involved in libxl
> >> development.
> >>
> >> On Fri, 2015-05-29 at 13:44 +0800, lwcheng@cs.hku.hk wrote:
> >>> Hi,
> >>>
> >>> "xl vcpu-set" is commonly used to hotplug/unhotplug vCPUs of an SMP VM.
> >>> However, the current implementation of this command makes the driver
> >>> domain vulnerable to denial-of-service attack: in certain cases,
> >>> this command
> >>> consumes too many CPU cycles in dom0, adversely affecting dom0's other
> >>> tasks (e.g., IO processing, monitoring, etc.)
> >>>
> >>> [An illustrative example]
> >>> Say, with a Linux PV guest called "vm01", when vm01 just boots or reboots
> >>> (e.g., in its grub period)
> >>
> >> Do you mean pygrub or pvgrub here?
> >
> > My VM uses pygrub: Xen-4.5.0 + Linux 3.14.35 (for both dom0 and domU).
> >
> >>
> >>> , if dom0 issues "xl vcpu-set vm01 xxx" at this
> >>> moment, the following will happen:
> >>> (1) "xl vcpu-set" hangs, until vm01 has loaded its kernel successfully.
> >>> (2) in dom0, "oxenstored" consumes 100% of a single core.
> >>
> >> It's not clear to me why this should relate to the status of the guest,
> >> AFAIK there is no reason for a xenstore transaction to be affected by
> >> whether or not the guest has loaded its kernel.
> >>
> >> Certainly if it is spinning forever there is a bug somewhere, but I
> >> don't think it relates to the use of a transaction in this way.
> >
> > You may check /var/log/xenstored-access.log: when "xl vcpu-set" hangs,
> > xenstore keeps writing to "/local/domain/xx/cpu/xx/availability",
> > indicating that it is looping in retry_transaction.
> >
> >
> >>
> >> Ian.
> >>
> >>> So, if a malicious guest purposely stays in its grub period (or other
> >>> purposely designed phases, as long as it does not load its kernel),
> >>> "xl vcpu-set" will hang _forever_, and dom0 has to sacrifice one core.
> >>>
> >>> [Affected versions]
> >>> This problem has been there since libxl was introduced in Xen 4.1 release.
> >>>
> >>> [Implementation problem]
> >>> "xl vcpu-set" involves a "loop" as follows (retry_transaction):
> >>> --------------
> >>> static int libxl__set_vcpuonline_xenstore(...)
> >>> {
> >>> ... ...
> >>> retry_transaction:
> >>> t = xs_transaction_start(CTX->xsh);
> >>> for (i = 0; i <= info.vcpu_max_id; i++)
> >>> libxl__xs_write(gc, t,
> >>> libxl__sprintf(gc, "%s/cpu/%u/availability", dompath, i),
> >>> "%s", libxl_bitmap_test(cpumap, i) ? "online" : "offline");
> >>> if (!xs_transaction_end(CTX->xsh, t, 0)) {
> >>> if (errno == EAGAIN)
> >>> goto retry_transaction;
> >>> }
> >>> ... ...
> >>> }
> >>> --------------
> >>>
> >>> [Possible solution]
> >>> In principle, the correctness of "xl vcpu-set" should _not_ depend on any
> >>> guest's behaviors.
> >>> One possible fix might be like this: if xs_transaction_end does
> >>> not succeed,
> >>> either ignore it or retry for a pre-defined timeout period (rather
> >>> than forever).
> >>>
> >>> [Suggestions]
> >>> I find that the loop method like "retry_transaction" is used at
> >>> several places
> >>> in libxl. You probably want to revisit the potential negative effect
> >>> it brings.
> >>>
> >>> Please take a look and help confirm my reported bug. Thanks.
> >>>
> >>> (Cc'd to two authors listed in libxl.c)
> >>>
> >>> Luwei Cheng
> >>> Department of Computer Science
> >>> The University of Hong Kong
> >>
> >>
> >
> >
> >
>
>



_______________________________________________
Xen-api mailing list
Xen-api@lists.xen.org
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api