Mailing List Archive

oxenstored performance issue when starting VMs in parallel
Hello,

We are developing a solution based on Xen 4.13 who is constantly
creating / destroying VMs.

To summarize our lifecycle :

- xl restore vmX
- xl cd-insert ....
- We do our stuff for ~ 2 minutes
- xl destroy vmX

So our VMs have a life of approximately 2 minutes.

The number of VMs we ran in parallel depends on the underlying server.

We are seeing the issue with our larger server who is running 30 VMs
(HVM) in parallel.

On this server oxenstored is constantly running at 100% cpu usage and
some VMs are almost stucked or unresponsive.

This is not an hardware issue, 72 xeon cores, 160 GB of memory and
very fast I/O subsystem.
Everything else is running smoothly on the server.

what we witness in the xenstore-access.log is that the number of WATCH
event is matching the number of currently running VMs

so for example for a single WRITE event is followed by around 30 watch events :

[20200918T15:15:18.045Z] A41354 write
/local/domain/0/backend/qdisk/1311/5632
[20200918T15:15:18.046Z] A41248 w event
backend/qdisk/1311/5632 38ed11d9-9a38-4022-ad75-7c571d4886ed
[20200918T15:15:18.046Z] A41257 w event
backend/qdisk/1311/5632 98fa91b8-e88b-4667-9813-d95196257288
[20200918T15:15:18.046Z] A40648 w event
backend/qdisk/1311/5632 e6fd9a35-61ec-4750-93eb-999fb7f662fc
[20200918T15:15:18.046Z] A40542 w event
backend/qdisk/1311/5632 6a39c858-2fd4-46e4-a810-485a41328f8c
[20200918T15:15:18.046Z] A41141 w event
backend/qdisk/1311/5632 8762d552-b4b4-41ef-a2aa-23700f790ea2
[20200918T15:15:18.046Z] A41310 w event
backend/qdisk/1311/5632 4dc2a9ae-6388-4b0c-9c98-df3c897a832f
[20200918T15:15:18.046Z] A40660 w event
backend/qdisk/1311/5632 6abf244d-5939-4540-b176-4ec7d14b392c
[20200918T15:15:18.046Z] A41347 w event
backend/qdisk/1311/5632 ecb93157-9929-43e2-8ed4-f5e78ab2f37d
[20200918T15:15:18.046Z] A41015 w event
backend/qdisk/1311/5632 a1fec49f-e7cc-4059-87d3-ce43f386746e
[20200918T15:15:18.046Z] A41167 w event
backend/qdisk/1311/5632 e9419014-9fd2-47c0-b79d-30f99d9530d6
[20200918T15:15:18.046Z] A41100 w event
backend/qdisk/1311/5632 a2754a91-ecd6-4b6b-87ea-b68db8b888df
[20200918T15:15:18.046Z] A41147 w event
backend/qdisk/1311/5632 176a1c3c-add7-4710-a7ee-3b5548d7a56a
[20200918T15:15:18.046Z] A41305 w event
backend/qdisk/1311/5632 afe7933b-c92d-4403-8d6c-2e530558c937
[20200918T15:15:18.046Z] A40616 w event
backend/qdisk/1311/5632 35fa45e0-21e8-4666-825b-0c3d629f378d
[20200918T15:15:18.046Z] A40951 w event
backend/qdisk/1311/5632 230eb42f-d700-46ce-af61-89242847a978
[20200918T15:15:18.046Z] A40567 w event
backend/qdisk/1311/5632 39cc7ffb-5045-4120-beb7-778073927c93
[20200918T15:15:18.046Z] A41363 w event
backend/qdisk/1311/5632 9e42e74a-80fb-46e8-81f2-718628bf70f6
[20200918T15:15:18.046Z] A40740 w event
backend/qdisk/1311/5632 1a64af31-fee6-45be-b8d8-c98baa5e162f
[20200918T15:15:18.046Z] A40632 w event
backend/qdisk/1311/5632 466ef522-cb76-4117-8e93-42471897c353
[20200918T15:15:18.046Z] A41319 w event
backend/qdisk/1311/5632 19ea986b-e303-4180-b833-c691b2b32819
[20200918T15:15:18.046Z] A40677 w event
backend/qdisk/1311/5632 fb01629a-033b-41d6-8349-cec82e570238
[20200918T15:15:18.046Z] A41152 w event
backend/qdisk/1311/5632 84ce9e29-a5cc-42a1-a47b-497b95767885
[20200918T15:15:18.047Z] A41233 w event
backend/qdisk/1311/5632 ea944ad3-3af6-4688-8076-db1eac25d8e9
[20200918T15:15:18.047Z] A41069 w event
backend/qdisk/1311/5632 ce57e169-e1ea-4fb5-b97f-23e651f49d79
[20200918T15:15:18.047Z] A41287 w event
backend/qdisk/1311/5632 d31110c8-ae0b-4b9d-b71f-aa2985addd1a
[20200918T15:15:18.047Z] A40683 w event
backend/qdisk/1311/5632 f0e4b0a0-fad0-4bb7-b01e-b8a31107ba3d
[20200918T15:15:18.047Z] A41177 w event
backend/qdisk/1311/5632 9ff80e49-4cca-4ec9-901a-d30198104f29
[20200918T15:15:18.047Z] D0 w event
backend/qdisk/1311/5632 FFFFFFFF8276B520
[20200918T15:15:18.047Z] A40513 w event
backend/qdisk/1311/5632 d35a9a42-c15e-492c-a70d-d8b20bafec8f
[20200918T15:15:18.047Z] A41354 w event
backend/qdisk/1311/5632 e4456ca4-70f4-4afc-9ba1-4a1cfd74c8e6

We are not sure this is the root cause of the issue but this is the
only real difference we can see in the log.

We don't understand why the number of WATCH events is related to the
number of concurrent running VM.
A watch event should be registered and only fired for the current
domain ID, so a write for a specific node path should only trigger one
watch event and not 30 in our case.

Any ideas / comments ?

Thanks

Jerome Leseinne
Re: oxenstored performance issue when starting VMs in parallel [ In reply to ]
El lun., 21 sept. 2020 a las 15:10, jerome leseinne (<
jerome.leseinne@gmail.com>) escribió:

> Hello,
>
> We are developing a solution based on Xen 4.13 who is constantly
> creating / destroying VMs.
>
> To summarize our lifecycle :
>
> - xl restore vmX
> - xl cd-insert ....
> - We do our stuff for ~ 2 minutes
> - xl destroy vmX
>
> So our VMs have a life of approximately 2 minutes.
>
> The number of VMs we ran in parallel depends on the underlying server.
>
> We are seeing the issue with our larger server who is running 30 VMs
> (HVM) in parallel.
>
> On this server oxenstored is constantly running at 100% cpu usage and
> some VMs are almost stucked or unresponsive.
>
> This is not an hardware issue, 72 xeon cores, 160 GB of memory and
> very fast I/O subsystem.
> Everything else is running smoothly on the server.
>
> what we witness in the xenstore-access.log is that the number of WATCH
> event is matching the number of currently running VMs
>
> so for example for a single WRITE event is followed by around 30 watch
> events :
>
> [20200918T15:15:18.045Z] A41354 write
> /local/domain/0/backend/qdisk/1311/5632
> [20200918T15:15:18.046Z] A41248 w event
> backend/qdisk/1311/5632 38ed11d9-9a38-4022-ad75-7c571d4886ed
> [20200918T15:15:18.046Z] A41257 w event
> backend/qdisk/1311/5632 98fa91b8-e88b-4667-9813-d95196257288
> [20200918T15:15:18.046Z] A40648 w event
> backend/qdisk/1311/5632 e6fd9a35-61ec-4750-93eb-999fb7f662fc
> [20200918T15:15:18.046Z] A40542 w event
> backend/qdisk/1311/5632 6a39c858-2fd4-46e4-a810-485a41328f8c
> [20200918T15:15:18.046Z] A41141 w event
> backend/qdisk/1311/5632 8762d552-b4b4-41ef-a2aa-23700f790ea2
> [20200918T15:15:18.046Z] A41310 w event
> backend/qdisk/1311/5632 4dc2a9ae-6388-4b0c-9c98-df3c897a832f
> [20200918T15:15:18.046Z] A40660 w event
> backend/qdisk/1311/5632 6abf244d-5939-4540-b176-4ec7d14b392c
> [20200918T15:15:18.046Z] A41347 w event
> backend/qdisk/1311/5632 ecb93157-9929-43e2-8ed4-f5e78ab2f37d
> [20200918T15:15:18.046Z] A41015 w event
> backend/qdisk/1311/5632 a1fec49f-e7cc-4059-87d3-ce43f386746e
> [20200918T15:15:18.046Z] A41167 w event
> backend/qdisk/1311/5632 e9419014-9fd2-47c0-b79d-30f99d9530d6
> [20200918T15:15:18.046Z] A41100 w event
> backend/qdisk/1311/5632 a2754a91-ecd6-4b6b-87ea-b68db8b888df
> [20200918T15:15:18.046Z] A41147 w event
> backend/qdisk/1311/5632 176a1c3c-add7-4710-a7ee-3b5548d7a56a
> [20200918T15:15:18.046Z] A41305 w event
> backend/qdisk/1311/5632 afe7933b-c92d-4403-8d6c-2e530558c937
> [20200918T15:15:18.046Z] A40616 w event
> backend/qdisk/1311/5632 35fa45e0-21e8-4666-825b-0c3d629f378d
> [20200918T15:15:18.046Z] A40951 w event
> backend/qdisk/1311/5632 230eb42f-d700-46ce-af61-89242847a978
> [20200918T15:15:18.046Z] A40567 w event
> backend/qdisk/1311/5632 39cc7ffb-5045-4120-beb7-778073927c93
> [20200918T15:15:18.046Z] A41363 w event
> backend/qdisk/1311/5632 9e42e74a-80fb-46e8-81f2-718628bf70f6
> [20200918T15:15:18.046Z] A40740 w event
> backend/qdisk/1311/5632 1a64af31-fee6-45be-b8d8-c98baa5e162f
> [20200918T15:15:18.046Z] A40632 w event
> backend/qdisk/1311/5632 466ef522-cb76-4117-8e93-42471897c353
> [20200918T15:15:18.046Z] A41319 w event
> backend/qdisk/1311/5632 19ea986b-e303-4180-b833-c691b2b32819
> [20200918T15:15:18.046Z] A40677 w event
> backend/qdisk/1311/5632 fb01629a-033b-41d6-8349-cec82e570238
> [20200918T15:15:18.046Z] A41152 w event
> backend/qdisk/1311/5632 84ce9e29-a5cc-42a1-a47b-497b95767885
> [20200918T15:15:18.047Z] A41233 w event
> backend/qdisk/1311/5632 ea944ad3-3af6-4688-8076-db1eac25d8e9
> [20200918T15:15:18.047Z] A41069 w event
> backend/qdisk/1311/5632 ce57e169-e1ea-4fb5-b97f-23e651f49d79
> [20200918T15:15:18.047Z] A41287 w event
> backend/qdisk/1311/5632 d31110c8-ae0b-4b9d-b71f-aa2985addd1a
> [20200918T15:15:18.047Z] A40683 w event
> backend/qdisk/1311/5632 f0e4b0a0-fad0-4bb7-b01e-b8a31107ba3d
> [20200918T15:15:18.047Z] A41177 w event
> backend/qdisk/1311/5632 9ff80e49-4cca-4ec9-901a-d30198104f29
> [20200918T15:15:18.047Z] D0 w event
> backend/qdisk/1311/5632 FFFFFFFF8276B520
> [20200918T15:15:18.047Z] A40513 w event
> backend/qdisk/1311/5632 d35a9a42-c15e-492c-a70d-d8b20bafec8f
> [20200918T15:15:18.047Z] A41354 w event
> backend/qdisk/1311/5632 e4456ca4-70f4-4afc-9ba1-4a1cfd74c8e6
>
> We are not sure this is the root cause of the issue but this is the
> only real difference we can see in the log.
>
> We don't understand why the number of WATCH events is related to the
> number of concurrent running VM.
> A watch event should be registered and only fired for the current
> domain ID, so a write for a specific node path should only trigger one
> watch event and not 30 in our case.
>
> Any ideas / comments ?
>
> Thanks
>
> Jerome Leseinne
>
>
Jerome,
we are experiencing very similar issues in Xen v4.12.3 (Debian 10.4) with a
similar setup (128GB RAM, 48 cores), in our case we start and stop dozens
of HVM VMs in parallel using restore from a memory saved file and analyzing
automatically a software behaviour inside the guest during a few minutes.

Any ideas/comments for improving the oxenstore performance will be very
welcome.
Re: oxenstored performance issue when starting VMs in parallel [ In reply to ]
we tested on the latest 4.14, same issue.

we tried a oxenstored replacement using lixs : https://github.com/cnplab/lixs

This basically solves the problem, no more 100% CPU (or only a few
spikes) , all the VMs are responsive!

One problem though, everything works fine except during the "xl
destroy", xl is complaining it cannot delete the VIF interface, so
there is a VIF leakage which lead, after a few hours / days, to some
issue with the dom0 complaining about network interface and has to be
rebooted....

So lixs is not a solution and it is no longer in active
maintenance/evolution since 4 years.
A supported Xen solution/workaround would be better...

Jerome

Le lun. 21 sept. 2020 à 17:25, Fanny Dwargee <fdwargee6@gmail.com> a écrit :
>
>
>
> El lun., 21 sept. 2020 a las 15:10, jerome leseinne (<jerome.leseinne@gmail.com>) escribió:
>>
>> Hello,
>>
>> We are developing a solution based on Xen 4.13 who is constantly
>> creating / destroying VMs.
>>
>> To summarize our lifecycle :
>>
>> - xl restore vmX
>> - xl cd-insert ....
>> - We do our stuff for ~ 2 minutes
>> - xl destroy vmX
>>
>> So our VMs have a life of approximately 2 minutes.
>>
>> The number of VMs we ran in parallel depends on the underlying server.
>>
>> We are seeing the issue with our larger server who is running 30 VMs
>> (HVM) in parallel.
>>
>> On this server oxenstored is constantly running at 100% cpu usage and
>> some VMs are almost stucked or unresponsive.
>>
>> This is not an hardware issue, 72 xeon cores, 160 GB of memory and
>> very fast I/O subsystem.
>> Everything else is running smoothly on the server.
>>
>> what we witness in the xenstore-access.log is that the number of WATCH
>> event is matching the number of currently running VMs
>>
>> so for example for a single WRITE event is followed by around 30 watch events :
>>
>> [20200918T15:15:18.045Z] A41354 write
>> /local/domain/0/backend/qdisk/1311/5632
>> [20200918T15:15:18.046Z] A41248 w event
>> backend/qdisk/1311/5632 38ed11d9-9a38-4022-ad75-7c571d4886ed
>> [20200918T15:15:18.046Z] A41257 w event
>> backend/qdisk/1311/5632 98fa91b8-e88b-4667-9813-d95196257288
>> [20200918T15:15:18.046Z] A40648 w event
>> backend/qdisk/1311/5632 e6fd9a35-61ec-4750-93eb-999fb7f662fc
>> [20200918T15:15:18.046Z] A40542 w event
>> backend/qdisk/1311/5632 6a39c858-2fd4-46e4-a810-485a41328f8c
>> [20200918T15:15:18.046Z] A41141 w event
>> backend/qdisk/1311/5632 8762d552-b4b4-41ef-a2aa-23700f790ea2
>> [20200918T15:15:18.046Z] A41310 w event
>> backend/qdisk/1311/5632 4dc2a9ae-6388-4b0c-9c98-df3c897a832f
>> [20200918T15:15:18.046Z] A40660 w event
>> backend/qdisk/1311/5632 6abf244d-5939-4540-b176-4ec7d14b392c
>> [20200918T15:15:18.046Z] A41347 w event
>> backend/qdisk/1311/5632 ecb93157-9929-43e2-8ed4-f5e78ab2f37d
>> [20200918T15:15:18.046Z] A41015 w event
>> backend/qdisk/1311/5632 a1fec49f-e7cc-4059-87d3-ce43f386746e
>> [20200918T15:15:18.046Z] A41167 w event
>> backend/qdisk/1311/5632 e9419014-9fd2-47c0-b79d-30f99d9530d6
>> [20200918T15:15:18.046Z] A41100 w event
>> backend/qdisk/1311/5632 a2754a91-ecd6-4b6b-87ea-b68db8b888df
>> [20200918T15:15:18.046Z] A41147 w event
>> backend/qdisk/1311/5632 176a1c3c-add7-4710-a7ee-3b5548d7a56a
>> [20200918T15:15:18.046Z] A41305 w event
>> backend/qdisk/1311/5632 afe7933b-c92d-4403-8d6c-2e530558c937
>> [20200918T15:15:18.046Z] A40616 w event
>> backend/qdisk/1311/5632 35fa45e0-21e8-4666-825b-0c3d629f378d
>> [20200918T15:15:18.046Z] A40951 w event
>> backend/qdisk/1311/5632 230eb42f-d700-46ce-af61-89242847a978
>> [20200918T15:15:18.046Z] A40567 w event
>> backend/qdisk/1311/5632 39cc7ffb-5045-4120-beb7-778073927c93
>> [20200918T15:15:18.046Z] A41363 w event
>> backend/qdisk/1311/5632 9e42e74a-80fb-46e8-81f2-718628bf70f6
>> [20200918T15:15:18.046Z] A40740 w event
>> backend/qdisk/1311/5632 1a64af31-fee6-45be-b8d8-c98baa5e162f
>> [20200918T15:15:18.046Z] A40632 w event
>> backend/qdisk/1311/5632 466ef522-cb76-4117-8e93-42471897c353
>> [20200918T15:15:18.046Z] A41319 w event
>> backend/qdisk/1311/5632 19ea986b-e303-4180-b833-c691b2b32819
>> [20200918T15:15:18.046Z] A40677 w event
>> backend/qdisk/1311/5632 fb01629a-033b-41d6-8349-cec82e570238
>> [20200918T15:15:18.046Z] A41152 w event
>> backend/qdisk/1311/5632 84ce9e29-a5cc-42a1-a47b-497b95767885
>> [20200918T15:15:18.047Z] A41233 w event
>> backend/qdisk/1311/5632 ea944ad3-3af6-4688-8076-db1eac25d8e9
>> [20200918T15:15:18.047Z] A41069 w event
>> backend/qdisk/1311/5632 ce57e169-e1ea-4fb5-b97f-23e651f49d79
>> [20200918T15:15:18.047Z] A41287 w event
>> backend/qdisk/1311/5632 d31110c8-ae0b-4b9d-b71f-aa2985addd1a
>> [20200918T15:15:18.047Z] A40683 w event
>> backend/qdisk/1311/5632 f0e4b0a0-fad0-4bb7-b01e-b8a31107ba3d
>> [20200918T15:15:18.047Z] A41177 w event
>> backend/qdisk/1311/5632 9ff80e49-4cca-4ec9-901a-d30198104f29
>> [20200918T15:15:18.047Z] D0 w event
>> backend/qdisk/1311/5632 FFFFFFFF8276B520
>> [20200918T15:15:18.047Z] A40513 w event
>> backend/qdisk/1311/5632 d35a9a42-c15e-492c-a70d-d8b20bafec8f
>> [20200918T15:15:18.047Z] A41354 w event
>> backend/qdisk/1311/5632 e4456ca4-70f4-4afc-9ba1-4a1cfd74c8e6
>>
>> We are not sure this is the root cause of the issue but this is the
>> only real difference we can see in the log.
>>
>> We don't understand why the number of WATCH events is related to the
>> number of concurrent running VM.
>> A watch event should be registered and only fired for the current
>> domain ID, so a write for a specific node path should only trigger one
>> watch event and not 30 in our case.
>>
>> Any ideas / comments ?
>>
>> Thanks
>>
>> Jerome Leseinne
>>
>
> Jerome,
> we are experiencing very similar issues in Xen v4.12.3 (Debian 10.4) with a similar setup (128GB RAM, 48 cores), in our case we start and stop dozens of HVM VMs in parallel using restore from a memory saved file and analyzing automatically a software behaviour inside the guest during a few minutes.
>
> Any ideas/comments for improving the oxenstore performance will be very welcome.
>
Re: oxenstored performance issue when starting VMs in parallel [ In reply to ]
On 21/09/2020 14:05, jerome leseinne wrote:
> Hello,

Hello,

I am only CCing Edwin who is working on OXenStored. Hopefully, he will
be able to give you some pointers.

> We are developing a solution based on Xen 4.13 who is constantly
> creating / destroying VMs.
>
> To summarize our lifecycle :
>
> - xl restore vmX
> - xl cd-insert ....
> - We do our stuff for ~ 2 minutes
> - xl destroy vmX
>
> So our VMs have a life of approximately 2 minutes.
>
> The number of VMs we ran in parallel depends on the underlying server.
>
> We are seeing the issue with our larger server who is running 30 VMs
> (HVM) in parallel.
>
> On this server oxenstored is constantly running at 100% cpu usage and
> some VMs are almost stucked or unresponsive.
>
> This is not an hardware issue, 72 xeon cores, 160 GB of memory and
> very fast I/O subsystem.
> Everything else is running smoothly on the server.
>
> what we witness in the xenstore-access.log is that the number of WATCH
> event is matching the number of currently running VMs
>
> so for example for a single WRITE event is followed by around 30 watch events :
>
> [20200918T15:15:18.045Z] A41354 write
> /local/domain/0/backend/qdisk/1311/5632
> [20200918T15:15:18.046Z] A41248 w event
> backend/qdisk/1311/5632 38ed11d9-9a38-4022-ad75-7c571d4886ed
> [20200918T15:15:18.046Z] A41257 w event
> backend/qdisk/1311/5632 98fa91b8-e88b-4667-9813-d95196257288
> [20200918T15:15:18.046Z] A40648 w event
> backend/qdisk/1311/5632 e6fd9a35-61ec-4750-93eb-999fb7f662fc
> [20200918T15:15:18.046Z] A40542 w event
> backend/qdisk/1311/5632 6a39c858-2fd4-46e4-a810-485a41328f8c
> [20200918T15:15:18.046Z] A41141 w event
> backend/qdisk/1311/5632 8762d552-b4b4-41ef-a2aa-23700f790ea2
> [20200918T15:15:18.046Z] A41310 w event
> backend/qdisk/1311/5632 4dc2a9ae-6388-4b0c-9c98-df3c897a832f
> [20200918T15:15:18.046Z] A40660 w event
> backend/qdisk/1311/5632 6abf244d-5939-4540-b176-4ec7d14b392c
> [20200918T15:15:18.046Z] A41347 w event
> backend/qdisk/1311/5632 ecb93157-9929-43e2-8ed4-f5e78ab2f37d
> [20200918T15:15:18.046Z] A41015 w event
> backend/qdisk/1311/5632 a1fec49f-e7cc-4059-87d3-ce43f386746e
> [20200918T15:15:18.046Z] A41167 w event
> backend/qdisk/1311/5632 e9419014-9fd2-47c0-b79d-30f99d9530d6
> [20200918T15:15:18.046Z] A41100 w event
> backend/qdisk/1311/5632 a2754a91-ecd6-4b6b-87ea-b68db8b888df
> [20200918T15:15:18.046Z] A41147 w event
> backend/qdisk/1311/5632 176a1c3c-add7-4710-a7ee-3b5548d7a56a
> [20200918T15:15:18.046Z] A41305 w event
> backend/qdisk/1311/5632 afe7933b-c92d-4403-8d6c-2e530558c937
> [20200918T15:15:18.046Z] A40616 w event
> backend/qdisk/1311/5632 35fa45e0-21e8-4666-825b-0c3d629f378d
> [20200918T15:15:18.046Z] A40951 w event
> backend/qdisk/1311/5632 230eb42f-d700-46ce-af61-89242847a978
> [20200918T15:15:18.046Z] A40567 w event
> backend/qdisk/1311/5632 39cc7ffb-5045-4120-beb7-778073927c93
> [20200918T15:15:18.046Z] A41363 w event
> backend/qdisk/1311/5632 9e42e74a-80fb-46e8-81f2-718628bf70f6
> [20200918T15:15:18.046Z] A40740 w event
> backend/qdisk/1311/5632 1a64af31-fee6-45be-b8d8-c98baa5e162f
> [20200918T15:15:18.046Z] A40632 w event
> backend/qdisk/1311/5632 466ef522-cb76-4117-8e93-42471897c353
> [20200918T15:15:18.046Z] A41319 w event
> backend/qdisk/1311/5632 19ea986b-e303-4180-b833-c691b2b32819
> [20200918T15:15:18.046Z] A40677 w event
> backend/qdisk/1311/5632 fb01629a-033b-41d6-8349-cec82e570238
> [20200918T15:15:18.046Z] A41152 w event
> backend/qdisk/1311/5632 84ce9e29-a5cc-42a1-a47b-497b95767885
> [20200918T15:15:18.047Z] A41233 w event
> backend/qdisk/1311/5632 ea944ad3-3af6-4688-8076-db1eac25d8e9
> [20200918T15:15:18.047Z] A41069 w event
> backend/qdisk/1311/5632 ce57e169-e1ea-4fb5-b97f-23e651f49d79
> [20200918T15:15:18.047Z] A41287 w event
> backend/qdisk/1311/5632 d31110c8-ae0b-4b9d-b71f-aa2985addd1a
> [20200918T15:15:18.047Z] A40683 w event
> backend/qdisk/1311/5632 f0e4b0a0-fad0-4bb7-b01e-b8a31107ba3d
> [20200918T15:15:18.047Z] A41177 w event
> backend/qdisk/1311/5632 9ff80e49-4cca-4ec9-901a-d30198104f29
> [20200918T15:15:18.047Z] D0 w event
> backend/qdisk/1311/5632 FFFFFFFF8276B520
> [20200918T15:15:18.047Z] A40513 w event
> backend/qdisk/1311/5632 d35a9a42-c15e-492c-a70d-d8b20bafec8f
> [20200918T15:15:18.047Z] A41354 w event
> backend/qdisk/1311/5632 e4456ca4-70f4-4afc-9ba1-4a1cfd74c8e6
>
> We are not sure this is the root cause of the issue but this is the
> only real difference we can see in the log.
>
> We don't understand why the number of WATCH events is related to the
> number of concurrent running VM.
> A watch event should be registered and only fired for the current
> domain ID, so a write for a specific node path should only trigger one
> watch event and not 30 in our case.
>
> Any ideas / comments ?

Cheers,

--
Julien Grall
Re: oxenstored performance issue when starting VMs in parallel [ In reply to ]
On Mon, 2020-09-21 at 17:40 +0100, Julien Grall wrote:
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open
> attachments unless you have verified the sender and know the content
> is safe.
>
> On 21/09/2020 14:05, jerome leseinne wrote:
> > Hello,
>
> Hello,
>
> I am only CCing Edwin who is working on OXenStored. Hopefully, he
> will
> be able to give you some pointers.
>
> > We are developing a solution based on Xen 4.13 who is constantly
> > creating / destroying VMs.
> >
> > To summarize our lifecycle :
> >
> > - xl restore vmX
> > - xl cd-insert ....
> > - We do our stuff for ~ 2 minutes
> > - xl destroy vmX
> >
> > So our VMs have a life of approximately 2 minutes.
> >
> > The number of VMs we ran in parallel depends on the underlying
> > server.
> >
> > We are seeing the issue with our larger server who is running 30
> > VMs
> > (HVM) in parallel.
> >
> > On this server oxenstored is constantly running at 100% cpu usage
> > and
> > some VMs are almost stucked or unresponsive.
> >
> > This is not an hardware issue, 72 xeon cores, 160 GB of memory and
> > very fast I/O subsystem.
> > Everything else is running smoothly on the server.
> >
> > what we witness in the xenstore-access.log is that the number of
> > WATCH
> > event is matching the number of currently running VMs
> >
> > so for example for a single WRITE event is followed by around 30
> > watch events :
> >
> > [20200918T15:15:18.045Z] A41354 write
> > /local/domain/0/backend/qdisk/1311/5632
> > [20200918T15:15:18.046Z] A41248 w event
> > backend/qdisk/1311/5632 38ed11d9-9a38-4022-ad75-7c571d4886ed
> > [20200918T15:15:18.046Z] A41257 w event
> > backend/qdisk/1311/5632 98fa91b8-e88b-4667-9813-d95196257288
> > [20200918T15:15:18.046Z] A40648 w event
> > backend/qdisk/1311/5632 e6fd9a35-61ec-4750-93eb-999fb7f662fc
> > [20200918T15:15:18.046Z] A40542 w event

Hi,

All the A<id> here point to another unix domain socket. They have all
registered a watch on this key (or a parent).

To find out what key they are watching you can look for the watch
registration:
grep 'A41248 .*watch' /var/log/xenstored-access.log

You can find out all xenstore clients connected to oxenstored, find all
the numbers in the port column here:
ss -a --unix -p|grep xenstored
Then look up those numbers, e.g. assuming you see 22269:
ss -a --unix -p|grep 22269

This should tell you what the processes are.

I have a suspicion that it is related to this code in qemu:

xenbus->backend_watch =
xen_bus_add_watch(xenbus, "", /* domain root node */
"backend", xen_bus_backend_changed,
&local_err);

If I read the code right this seems to set up a watch on
/local/domain/0/backend, so if you have N qemus running (one for each
disk) then you get N^2 watch events fired in total.

Best regards,
--Edwin

> > backend/qdisk/1311/5632 6a39c858-2fd4-46e4-a810-485a41328f8c
> > [20200918T15:15:18.046Z] A41141 w event
> > backend/qdisk/1311/5632 8762d552-b4b4-41ef-a2aa-23700f790ea2
> > [20200918T15:15:18.046Z] A41310 w event
> > backend/qdisk/1311/5632 4dc2a9ae-6388-4b0c-9c98-df3c897a832f
> > [20200918T15:15:18.046Z] A40660 w event
> > backend/qdisk/1311/5632 6abf244d-5939-4540-b176-4ec7d14b392c
> > [20200918T15:15:18.046Z] A41347 w event
> > backend/qdisk/1311/5632 ecb93157-9929-43e2-8ed4-f5e78ab2f37d
> > [20200918T15:15:18.046Z] A41015 w event
> > backend/qdisk/1311/5632 a1fec49f-e7cc-4059-87d3-ce43f386746e
> > [20200918T15:15:18.046Z] A41167 w event
> > backend/qdisk/1311/5632 e9419014-9fd2-47c0-b79d-30f99d9530d6
> > [20200918T15:15:18.046Z] A41100 w event
> > backend/qdisk/1311/5632 a2754a91-ecd6-4b6b-87ea-b68db8b888df
> > [20200918T15:15:18.046Z] A41147 w event
> > backend/qdisk/1311/5632 176a1c3c-add7-4710-a7ee-3b5548d7a56a
> > [20200918T15:15:18.046Z] A41305 w event
> > backend/qdisk/1311/5632 afe7933b-c92d-4403-8d6c-2e530558c937
> > [20200918T15:15:18.046Z] A40616 w event
> > backend/qdisk/1311/5632 35fa45e0-21e8-4666-825b-0c3d629f378d
> > [20200918T15:15:18.046Z] A40951 w event
> > backend/qdisk/1311/5632 230eb42f-d700-46ce-af61-89242847a978
> > [20200918T15:15:18.046Z] A40567 w event
> > backend/qdisk/1311/5632 39cc7ffb-5045-4120-beb7-778073927c93
> > [20200918T15:15:18.046Z] A41363 w event
> > backend/qdisk/1311/5632 9e42e74a-80fb-46e8-81f2-718628bf70f6
> > [20200918T15:15:18.046Z] A40740 w event
> > backend/qdisk/1311/5632 1a64af31-fee6-45be-b8d8-c98baa5e162f
> > [20200918T15:15:18.046Z] A40632 w event
> > backend/qdisk/1311/5632 466ef522-cb76-4117-8e93-42471897c353
> > [20200918T15:15:18.046Z] A41319 w event
> > backend/qdisk/1311/5632 19ea986b-e303-4180-b833-c691b2b32819
> > [20200918T15:15:18.046Z] A40677 w event
> > backend/qdisk/1311/5632 fb01629a-033b-41d6-8349-cec82e570238
> > [20200918T15:15:18.046Z] A41152 w event
> > backend/qdisk/1311/5632 84ce9e29-a5cc-42a1-a47b-497b95767885
> > [20200918T15:15:18.047Z] A41233 w event
> > backend/qdisk/1311/5632 ea944ad3-3af6-4688-8076-db1eac25d8e9
> > [20200918T15:15:18.047Z] A41069 w event
> > backend/qdisk/1311/5632 ce57e169-e1ea-4fb5-b97f-23e651f49d79
> > [20200918T15:15:18.047Z] A41287 w event
> > backend/qdisk/1311/5632 d31110c8-ae0b-4b9d-b71f-aa2985addd1a
> > [20200918T15:15:18.047Z] A40683 w event
> > backend/qdisk/1311/5632 f0e4b0a0-fad0-4bb7-b01e-b8a31107ba3d
> > [20200918T15:15:18.047Z] A41177 w event
> > backend/qdisk/1311/5632 9ff80e49-4cca-4ec9-901a-d30198104f29
> > [20200918T15:15:18.047Z] D0 w event
> > backend/qdisk/1311/5632 FFFFFFFF8276B520
> > [20200918T15:15:18.047Z] A40513 w event
> > backend/qdisk/1311/5632 d35a9a42-c15e-492c-a70d-d8b20bafec8f
> > [20200918T15:15:18.047Z] A41354 w event
> > backend/qdisk/1311/5632 e4456ca4-70f4-4afc-9ba1-4a1cfd74c8e6
> >
> > We are not sure this is the root cause of the issue but this is the
> > only real difference we can see in the log.
> >
> > We don't understand why the number of WATCH events is related to
> > the
> > number of concurrent running VM.
> > A watch event should be registered and only fired for the current
> > domain ID, so a write for a specific node path should only trigger
> > one
> > watch event and not 30 in our case.
> >
> > Any ideas / comments ?
>
> Cheers,
>
Re: oxenstored performance issue when starting VMs in parallel [ In reply to ]
Hi,

Edwin you rock ! This call in qemu is effectively the culprit !
I have disabled this xen_bus_add_watch call and re-run test on our big server:

- oxenstored is now between 10% to 20% CPU usage (previously was
100% all the time)
- All our VMs are responsive
- All our VM start in less than 10 seconds (before the fix some VMs
could take more than one minute to be fully up
- Dom0 is more responsive

Disabling the watch may not be the ideal solution ( I let the qemu
experts answer this and the possible side effects), but in our
scenario and usage this fixes the problem and dramatically boosts the
performance.

So far we haven't seen any side effect, all the xl orders are ok, the
VMs are fully functional, no devices leak (like network vif for
exemple) and once all the VMs are down a call to xenstore-ls show that
the store is indeed empty (dom0 excluded)

We will continue additional testing and stress but in all cases a
huge thanks to you and Julien for your help on this issue !

Jerome

Le lun. 21 sept. 2020 à 18:57, Edwin Torok <edvin.torok@citrix.com> a écrit :
>
> On Mon, 2020-09-21 at 17:40 +0100, Julien Grall wrote:
> > [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open
> > attachments unless you have verified the sender and know the content
> > is safe.
> >
> > On 21/09/2020 14:05, jerome leseinne wrote:
> > > Hello,
> >
> > Hello,
> >
> > I am only CCing Edwin who is working on OXenStored. Hopefully, he
> > will
> > be able to give you some pointers.
> >
> > > We are developing a solution based on Xen 4.13 who is constantly
> > > creating / destroying VMs.
> > >
> > > To summarize our lifecycle :
> > >
> > > - xl restore vmX
> > > - xl cd-insert ....
> > > - We do our stuff for ~ 2 minutes
> > > - xl destroy vmX
> > >
> > > So our VMs have a life of approximately 2 minutes.
> > >
> > > The number of VMs we ran in parallel depends on the underlying
> > > server.
> > >
> > > We are seeing the issue with our larger server who is running 30
> > > VMs
> > > (HVM) in parallel.
> > >
> > > On this server oxenstored is constantly running at 100% cpu usage
> > > and
> > > some VMs are almost stucked or unresponsive.
> > >
> > > This is not an hardware issue, 72 xeon cores, 160 GB of memory and
> > > very fast I/O subsystem.
> > > Everything else is running smoothly on the server.
> > >
> > > what we witness in the xenstore-access.log is that the number of
> > > WATCH
> > > event is matching the number of currently running VMs
> > >
> > > so for example for a single WRITE event is followed by around 30
> > > watch events :
> > >
> > > [20200918T15:15:18.045Z] A41354 write
> > > /local/domain/0/backend/qdisk/1311/5632
> > > [20200918T15:15:18.046Z] A41248 w event
> > > backend/qdisk/1311/5632 38ed11d9-9a38-4022-ad75-7c571d4886ed
> > > [20200918T15:15:18.046Z] A41257 w event
> > > backend/qdisk/1311/5632 98fa91b8-e88b-4667-9813-d95196257288
> > > [20200918T15:15:18.046Z] A40648 w event
> > > backend/qdisk/1311/5632 e6fd9a35-61ec-4750-93eb-999fb7f662fc
> > > [20200918T15:15:18.046Z] A40542 w event
>
> Hi,
>
> All the A<id> here point to another unix domain socket. They have all
> registered a watch on this key (or a parent).
>
> To find out what key they are watching you can look for the watch
> registration:
> grep 'A41248 .*watch' /var/log/xenstored-access.log
>
> You can find out all xenstore clients connected to oxenstored, find all
> the numbers in the port column here:
> ss -a --unix -p|grep xenstored
> Then look up those numbers, e.g. assuming you see 22269:
> ss -a --unix -p|grep 22269
>
> This should tell you what the processes are.
>
> I have a suspicion that it is related to this code in qemu:
>
> xenbus->backend_watch =
> xen_bus_add_watch(xenbus, "", /* domain root node */
> "backend", xen_bus_backend_changed,
> &local_err);
>
> If I read the code right this seems to set up a watch on
> /local/domain/0/backend, so if you have N qemus running (one for each
> disk) then you get N^2 watch events fired in total.
>
> Best regards,
> --Edwin
>
> > > backend/qdisk/1311/5632 6a39c858-2fd4-46e4-a810-485a41328f8c
> > > [20200918T15:15:18.046Z] A41141 w event
> > > backend/qdisk/1311/5632 8762d552-b4b4-41ef-a2aa-23700f790ea2
> > > [20200918T15:15:18.046Z] A41310 w event
> > > backend/qdisk/1311/5632 4dc2a9ae-6388-4b0c-9c98-df3c897a832f
> > > [20200918T15:15:18.046Z] A40660 w event
> > > backend/qdisk/1311/5632 6abf244d-5939-4540-b176-4ec7d14b392c
> > > [20200918T15:15:18.046Z] A41347 w event
> > > backend/qdisk/1311/5632 ecb93157-9929-43e2-8ed4-f5e78ab2f37d
> > > [20200918T15:15:18.046Z] A41015 w event
> > > backend/qdisk/1311/5632 a1fec49f-e7cc-4059-87d3-ce43f386746e
> > > [20200918T15:15:18.046Z] A41167 w event
> > > backend/qdisk/1311/5632 e9419014-9fd2-47c0-b79d-30f99d9530d6
> > > [20200918T15:15:18.046Z] A41100 w event
> > > backend/qdisk/1311/5632 a2754a91-ecd6-4b6b-87ea-b68db8b888df
> > > [20200918T15:15:18.046Z] A41147 w event
> > > backend/qdisk/1311/5632 176a1c3c-add7-4710-a7ee-3b5548d7a56a
> > > [20200918T15:15:18.046Z] A41305 w event
> > > backend/qdisk/1311/5632 afe7933b-c92d-4403-8d6c-2e530558c937
> > > [20200918T15:15:18.046Z] A40616 w event
> > > backend/qdisk/1311/5632 35fa45e0-21e8-4666-825b-0c3d629f378d
> > > [20200918T15:15:18.046Z] A40951 w event
> > > backend/qdisk/1311/5632 230eb42f-d700-46ce-af61-89242847a978
> > > [20200918T15:15:18.046Z] A40567 w event
> > > backend/qdisk/1311/5632 39cc7ffb-5045-4120-beb7-778073927c93
> > > [20200918T15:15:18.046Z] A41363 w event
> > > backend/qdisk/1311/5632 9e42e74a-80fb-46e8-81f2-718628bf70f6
> > > [20200918T15:15:18.046Z] A40740 w event
> > > backend/qdisk/1311/5632 1a64af31-fee6-45be-b8d8-c98baa5e162f
> > > [20200918T15:15:18.046Z] A40632 w event
> > > backend/qdisk/1311/5632 466ef522-cb76-4117-8e93-42471897c353
> > > [20200918T15:15:18.046Z] A41319 w event
> > > backend/qdisk/1311/5632 19ea986b-e303-4180-b833-c691b2b32819
> > > [20200918T15:15:18.046Z] A40677 w event
> > > backend/qdisk/1311/5632 fb01629a-033b-41d6-8349-cec82e570238
> > > [20200918T15:15:18.046Z] A41152 w event
> > > backend/qdisk/1311/5632 84ce9e29-a5cc-42a1-a47b-497b95767885
> > > [20200918T15:15:18.047Z] A41233 w event
> > > backend/qdisk/1311/5632 ea944ad3-3af6-4688-8076-db1eac25d8e9
> > > [20200918T15:15:18.047Z] A41069 w event
> > > backend/qdisk/1311/5632 ce57e169-e1ea-4fb5-b97f-23e651f49d79
> > > [20200918T15:15:18.047Z] A41287 w event
> > > backend/qdisk/1311/5632 d31110c8-ae0b-4b9d-b71f-aa2985addd1a
> > > [20200918T15:15:18.047Z] A40683 w event
> > > backend/qdisk/1311/5632 f0e4b0a0-fad0-4bb7-b01e-b8a31107ba3d
> > > [20200918T15:15:18.047Z] A41177 w event
> > > backend/qdisk/1311/5632 9ff80e49-4cca-4ec9-901a-d30198104f29
> > > [20200918T15:15:18.047Z] D0 w event
> > > backend/qdisk/1311/5632 FFFFFFFF8276B520
> > > [20200918T15:15:18.047Z] A40513 w event
> > > backend/qdisk/1311/5632 d35a9a42-c15e-492c-a70d-d8b20bafec8f
> > > [20200918T15:15:18.047Z] A41354 w event
> > > backend/qdisk/1311/5632 e4456ca4-70f4-4afc-9ba1-4a1cfd74c8e6
> > >
> > > We are not sure this is the root cause of the issue but this is the
> > > only real difference we can see in the log.
> > >
> > > We don't understand why the number of WATCH events is related to
> > > the
> > > number of concurrent running VM.
> > > A watch event should be registered and only fired for the current
> > > domain ID, so a write for a specific node path should only trigger
> > > one
> > > watch event and not 30 in our case.
> > >
> > > Any ideas / comments ?
> >
> > Cheers,
> >
Re: oxenstored performance issue when starting VMs in parallel [ In reply to ]
On Tue, 2020-09-22 at 15:17 +0200, jerome leseinne wrote:
> Hi,
>
> Edwin you rock ! This call in qemu is effectively the culprit !
> I have disabled this xen_bus_add_watch call and re-run test on our
> big server:
>
> - oxenstored is now between 10% to 20% CPU usage (previously was
> 100% all the time)
> - All our VMs are responsive
> - All our VM start in less than 10 seconds (before the fix some VMs
> could take more than one minute to be fully up
> - Dom0 is more responsive
>
> Disabling the watch may not be the ideal solution ( I let the qemu
> experts answer this and the possible side effects),

Hi,

CC-ed Qemu maintainer of Xen code, please see this discussion about
scalability issues with the backend watching code in qemu 4.1+.

I think the scalability issue is due to this code in qemu, which causes
an instance of qemu to see watches from all devices (even those
belonging to other qemu instances), such that adding a single device
causes N watches to be fired on each N instances of qemu:
xenbus->backend_watch =
xen_bus_add_watch(xenbus, "", /* domain root node */
"backend", xen_bus_backend_changed,
&local_err);

I can understand that for backwards compatibility you might need this
code, but is there a way that an up-to-date (xl) toolstack could tell
qemu what it needs to look at (e.g. via QMP, or other keys in xenstore)
instead of relying on an overly broad watch?

Best regards,
--Edwin


> but in our
> scenario and usage this fixes the problem and dramatically boosts the
> performance.
>
> So far we haven't seen any side effect, all the xl orders are ok, the
> VMs are fully functional, no devices leak (like network vif for
> exemple) and once all the VMs are down a call to xenstore-ls show
> that
> the store is indeed empty (dom0 excluded)
>
> We will continue additional testing and stress but in all cases a
> huge thanks to you and Julien for your help on this issue !
>
> Jerome
>
> Le lun. 21 sept. 2020 à 18:57, Edwin Torok <edvin.torok@citrix.com> a
> écrit :
> > On Mon, 2020-09-21 at 17:40 +0100, Julien Grall wrote:
> > > [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open
> > > attachments unless you have verified the sender and know the
> > > content
> > > is safe.
> > >
> > > On 21/09/2020 14:05, jerome leseinne wrote:
> > > > Hello,
> > >
> > > Hello,
> > >
> > > I am only CCing Edwin who is working on OXenStored. Hopefully, he
> > > will
> > > be able to give you some pointers.
> > >
> > > > We are developing a solution based on Xen 4.13 who is
> > > > constantly
> > > > creating / destroying VMs.
> > > >
> > > > To summarize our lifecycle :
> > > >
> > > > - xl restore vmX
> > > > - xl cd-insert ....
> > > > - We do our stuff for ~ 2 minutes
> > > > - xl destroy vmX
> > > >
> > > > So our VMs have a life of approximately 2 minutes.
> > > >
> > > > The number of VMs we ran in parallel depends on the underlying
> > > > server.
> > > >
> > > > We are seeing the issue with our larger server who is running
> > > > 30
> > > > VMs
> > > > (HVM) in parallel.
> > > >
> > > > On this server oxenstored is constantly running at 100% cpu
> > > > usage
> > > > and
> > > > some VMs are almost stucked or unresponsive.
> > > >
> > > > This is not an hardware issue, 72 xeon cores, 160 GB of memory
> > > > and
> > > > very fast I/O subsystem.
> > > > Everything else is running smoothly on the server.
> > > >
> > > > what we witness in the xenstore-access.log is that the number
> > > > of
> > > > WATCH
> > > > event is matching the number of currently running VMs
> > > >
> > > > so for example for a single WRITE event is followed by around
> > > > 30
> > > > watch events :
> > > >
> > > > [20200918T15:15:18.045Z] A41354 write
> > > > /local/domain/0/backend/qdisk/1311/5632
> > > > [20200918T15:15:18.046Z] A41248 w event
> > > > backend/qdisk/1311/5632 38ed11d9-9a38-4022-ad75-7c571d4886ed
> > > > [20200918T15:15:18.046Z] A41257 w event
> > > > backend/qdisk/1311/5632 98fa91b8-e88b-4667-9813-d95196257288
> > > > [20200918T15:15:18.046Z] A40648 w event
> > > > backend/qdisk/1311/5632 e6fd9a35-61ec-4750-93eb-999fb7f662fc
> > > > [20200918T15:15:18.046Z] A40542 w event
> >
> > Hi,
> >
> > All the A<id> here point to another unix domain socket. They have
> > all
> > registered a watch on this key (or a parent).
> >
> > To find out what key they are watching you can look for the watch
> > registration:
> > grep 'A41248 .*watch' /var/log/xenstored-access.log
> >
> > You can find out all xenstore clients connected to oxenstored, find
> > all
> > the numbers in the port column here:
> > ss -a --unix -p|grep xenstored
> > Then look up those numbers, e.g. assuming you see 22269:
> > ss -a --unix -p|grep 22269
> >
> > This should tell you what the processes are.
> >
> > I have a suspicion that it is related to this code in qemu:
> >
> > xenbus->backend_watch =
> > xen_bus_add_watch(xenbus, "", /* domain root node */
> > "backend", xen_bus_backend_changed,
> > &local_err);
> >
> > If I read the code right this seems to set up a watch on
> > /local/domain/0/backend, so if you have N qemus running (one for
> > each
> > disk) then you get N^2 watch events fired in total.
> >
> > Best regards,
> > --Edwin
> >
> > > > backend/qdisk/1311/5632 6a39c858-2fd4-46e4-a810-485a41328f8c
> > > > [20200918T15:15:18.046Z] A41141 w event
> > > > backend/qdisk/1311/5632 8762d552-b4b4-41ef-a2aa-23700f790ea2
> > > > [20200918T15:15:18.046Z] A41310 w event
> > > > backend/qdisk/1311/5632 4dc2a9ae-6388-4b0c-9c98-df3c897a832f
> > > > [20200918T15:15:18.046Z] A40660 w event
> > > > backend/qdisk/1311/5632 6abf244d-5939-4540-b176-4ec7d14b392c
> > > > [20200918T15:15:18.046Z] A41347 w event
> > > > backend/qdisk/1311/5632 ecb93157-9929-43e2-8ed4-f5e78ab2f37d
> > > > [20200918T15:15:18.046Z] A41015 w event
> > > > backend/qdisk/1311/5632 a1fec49f-e7cc-4059-87d3-ce43f386746e
> > > > [20200918T15:15:18.046Z] A41167 w event
> > > > backend/qdisk/1311/5632 e9419014-9fd2-47c0-b79d-30f99d9530d6
> > > > [20200918T15:15:18.046Z] A41100 w event
> > > > backend/qdisk/1311/5632 a2754a91-ecd6-4b6b-87ea-b68db8b888df
> > > > [20200918T15:15:18.046Z] A41147 w event
> > > > backend/qdisk/1311/5632 176a1c3c-add7-4710-a7ee-3b5548d7a56a
> > > > [20200918T15:15:18.046Z] A41305 w event
> > > > backend/qdisk/1311/5632 afe7933b-c92d-4403-8d6c-2e530558c937
> > > > [20200918T15:15:18.046Z] A40616 w event
> > > > backend/qdisk/1311/5632 35fa45e0-21e8-4666-825b-0c3d629f378d
> > > > [20200918T15:15:18.046Z] A40951 w event
> > > > backend/qdisk/1311/5632 230eb42f-d700-46ce-af61-89242847a978
> > > > [20200918T15:15:18.046Z] A40567 w event
> > > > backend/qdisk/1311/5632 39cc7ffb-5045-4120-beb7-778073927c93
> > > > [20200918T15:15:18.046Z] A41363 w event
> > > > backend/qdisk/1311/5632 9e42e74a-80fb-46e8-81f2-718628bf70f6
> > > > [20200918T15:15:18.046Z] A40740 w event
> > > > backend/qdisk/1311/5632 1a64af31-fee6-45be-b8d8-c98baa5e162f
> > > > [20200918T15:15:18.046Z] A40632 w event
> > > > backend/qdisk/1311/5632 466ef522-cb76-4117-8e93-42471897c353
> > > > [20200918T15:15:18.046Z] A41319 w event
> > > > backend/qdisk/1311/5632 19ea986b-e303-4180-b833-c691b2b32819
> > > > [20200918T15:15:18.046Z] A40677 w event
> > > > backend/qdisk/1311/5632 fb01629a-033b-41d6-8349-cec82e570238
> > > > [20200918T15:15:18.046Z] A41152 w event
> > > > backend/qdisk/1311/5632 84ce9e29-a5cc-42a1-a47b-497b95767885
> > > > [20200918T15:15:18.047Z] A41233 w event
> > > > backend/qdisk/1311/5632 ea944ad3-3af6-4688-8076-db1eac25d8e9
> > > > [20200918T15:15:18.047Z] A41069 w event
> > > > backend/qdisk/1311/5632 ce57e169-e1ea-4fb5-b97f-23e651f49d79
> > > > [20200918T15:15:18.047Z] A41287 w event
> > > > backend/qdisk/1311/5632 d31110c8-ae0b-4b9d-b71f-aa2985addd1a
> > > > [20200918T15:15:18.047Z] A40683 w event
> > > > backend/qdisk/1311/5632 f0e4b0a0-fad0-4bb7-b01e-b8a31107ba3d
> > > > [20200918T15:15:18.047Z] A41177 w event
> > > > backend/qdisk/1311/5632 9ff80e49-4cca-4ec9-901a-d30198104f29
> > > > [20200918T15:15:18.047Z] D0 w event
> > > > backend/qdisk/1311/5632 FFFFFFFF8276B520
> > > > [20200918T15:15:18.047Z] A40513 w event
> > > > backend/qdisk/1311/5632 d35a9a42-c15e-492c-a70d-d8b20bafec8f
> > > > [20200918T15:15:18.047Z] A41354 w event
> > > > backend/qdisk/1311/5632 e4456ca4-70f4-4afc-9ba1-4a1cfd74c8e6
> > > >
> > > > We are not sure this is the root cause of the issue but this is
> > > > the
> > > > only real difference we can see in the log.
> > > >
> > > > We don't understand why the number of WATCH events is related
> > > > to
> > > > the
> > > > number of concurrent running VM.
> > > > A watch event should be registered and only fired for the
> > > > current
> > > > domain ID, so a write for a specific node path should only
> > > > trigger
> > > > one
> > > > watch event and not 30 in our case.
> > > >
> > > > Any ideas / comments ?
> > >
> > > Cheers,
> > >
RE: oxenstored performance issue when starting VMs in parallel [ In reply to ]
> -----Original Message-----
> From: Edwin Torok <edvin.torok@citrix.com>
> Sent: 22 September 2020 14:29
> To: sstabellini@kernel.org; Anthony Perard <anthony.perard@citrix.com>; xen-
> devel@lists.xenproject.org; paul@xen.org
> Cc: xen-users@lists.xenproject.org; jerome.leseinne@gmail.com; julien@xen.org
> Subject: Re: oxenstored performance issue when starting VMs in parallel
>
> On Tue, 2020-09-22 at 15:17 +0200, jerome leseinne wrote:
> > Hi,
> >
> > Edwin you rock ! This call in qemu is effectively the culprit !
> > I have disabled this xen_bus_add_watch call and re-run test on our
> > big server:
> >
> > - oxenstored is now between 10% to 20% CPU usage (previously was
> > 100% all the time)
> > - All our VMs are responsive
> > - All our VM start in less than 10 seconds (before the fix some VMs
> > could take more than one minute to be fully up
> > - Dom0 is more responsive
> >
> > Disabling the watch may not be the ideal solution ( I let the qemu
> > experts answer this and the possible side effects),
>
> Hi,
>
> CC-ed Qemu maintainer of Xen code, please see this discussion about
> scalability issues with the backend watching code in qemu 4.1+.
>
> I think the scalability issue is due to this code in qemu, which causes
> an instance of qemu to see watches from all devices (even those
> belonging to other qemu instances), such that adding a single device
> causes N watches to be fired on each N instances of qemu:
> xenbus->backend_watch =
> xen_bus_add_watch(xenbus, "", /* domain root node */
> "backend", xen_bus_backend_changed,
> &local_err);
>
> I can understand that for backwards compatibility you might need this
> code, but is there a way that an up-to-date (xl) toolstack could tell
> qemu what it needs to look at (e.g. via QMP, or other keys in xenstore)
> instead of relying on an overly broad watch?

I think this could be made more efficient. The call to "module_call_init(MODULE_INIT_XEN_BACKEND)" just prior to this watch will register backends that do auto-creation so we could register individual watches for the various backend types instead of this single one.

Paul

>
> Best regards,
> --Edwin
>
>
> > but in our
> > scenario and usage this fixes the problem and dramatically boosts the
> > performance.
> >
> > So far we haven't seen any side effect, all the xl orders are ok, the
> > VMs are fully functional, no devices leak (like network vif for
> > exemple) and once all the VMs are down a call to xenstore-ls show
> > that
> > the store is indeed empty (dom0 excluded)
> >
> > We will continue additional testing and stress but in all cases a
> > huge thanks to you and Julien for your help on this issue !
> >
> > Jerome
> >
> > Le lun. 21 sept. 2020 à 18:57, Edwin Torok <edvin.torok@citrix.com> a
> > écrit :
> > > On Mon, 2020-09-21 at 17:40 +0100, Julien Grall wrote:
> > > > [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open
> > > > attachments unless you have verified the sender and know the
> > > > content
> > > > is safe.
> > > >
> > > > On 21/09/2020 14:05, jerome leseinne wrote:
> > > > > Hello,
> > > >
> > > > Hello,
> > > >
> > > > I am only CCing Edwin who is working on OXenStored. Hopefully, he
> > > > will
> > > > be able to give you some pointers.
> > > >
> > > > > We are developing a solution based on Xen 4.13 who is
> > > > > constantly
> > > > > creating / destroying VMs.
> > > > >
> > > > > To summarize our lifecycle :
> > > > >
> > > > > - xl restore vmX
> > > > > - xl cd-insert ....
> > > > > - We do our stuff for ~ 2 minutes
> > > > > - xl destroy vmX
> > > > >
> > > > > So our VMs have a life of approximately 2 minutes.
> > > > >
> > > > > The number of VMs we ran in parallel depends on the underlying
> > > > > server.
> > > > >
> > > > > We are seeing the issue with our larger server who is running
> > > > > 30
> > > > > VMs
> > > > > (HVM) in parallel.
> > > > >
> > > > > On this server oxenstored is constantly running at 100% cpu
> > > > > usage
> > > > > and
> > > > > some VMs are almost stucked or unresponsive.
> > > > >
> > > > > This is not an hardware issue, 72 xeon cores, 160 GB of memory
> > > > > and
> > > > > very fast I/O subsystem.
> > > > > Everything else is running smoothly on the server.
> > > > >
> > > > > what we witness in the xenstore-access.log is that the number
> > > > > of
> > > > > WATCH
> > > > > event is matching the number of currently running VMs
> > > > >
> > > > > so for example for a single WRITE event is followed by around
> > > > > 30
> > > > > watch events :
> > > > >
> > > > > [20200918T15:15:18.045Z] A41354 write
> > > > > /local/domain/0/backend/qdisk/1311/5632
> > > > > [20200918T15:15:18.046Z] A41248 w event
> > > > > backend/qdisk/1311/5632 38ed11d9-9a38-4022-ad75-7c571d4886ed
> > > > > [20200918T15:15:18.046Z] A41257 w event
> > > > > backend/qdisk/1311/5632 98fa91b8-e88b-4667-9813-d95196257288
> > > > > [20200918T15:15:18.046Z] A40648 w event
> > > > > backend/qdisk/1311/5632 e6fd9a35-61ec-4750-93eb-999fb7f662fc
> > > > > [20200918T15:15:18.046Z] A40542 w event
> > >
> > > Hi,
> > >
> > > All the A<id> here point to another unix domain socket. They have
> > > all
> > > registered a watch on this key (or a parent).
> > >
> > > To find out what key they are watching you can look for the watch
> > > registration:
> > > grep 'A41248 .*watch' /var/log/xenstored-access.log
> > >
> > > You can find out all xenstore clients connected to oxenstored, find
> > > all
> > > the numbers in the port column here:
> > > ss -a --unix -p|grep xenstored
> > > Then look up those numbers, e.g. assuming you see 22269:
> > > ss -a --unix -p|grep 22269
> > >
> > > This should tell you what the processes are.
> > >
> > > I have a suspicion that it is related to this code in qemu:
> > >
> > > xenbus->backend_watch =
> > > xen_bus_add_watch(xenbus, "", /* domain root node */
> > > "backend", xen_bus_backend_changed,
> > > &local_err);
> > >
> > > If I read the code right this seems to set up a watch on
> > > /local/domain/0/backend, so if you have N qemus running (one for
> > > each
> > > disk) then you get N^2 watch events fired in total.
> > >
> > > Best regards,
> > > --Edwin
> > >
> > > > > backend/qdisk/1311/5632 6a39c858-2fd4-46e4-a810-485a41328f8c
> > > > > [20200918T15:15:18.046Z] A41141 w event
> > > > > backend/qdisk/1311/5632 8762d552-b4b4-41ef-a2aa-23700f790ea2
> > > > > [20200918T15:15:18.046Z] A41310 w event
> > > > > backend/qdisk/1311/5632 4dc2a9ae-6388-4b0c-9c98-df3c897a832f
> > > > > [20200918T15:15:18.046Z] A40660 w event
> > > > > backend/qdisk/1311/5632 6abf244d-5939-4540-b176-4ec7d14b392c
> > > > > [20200918T15:15:18.046Z] A41347 w event
> > > > > backend/qdisk/1311/5632 ecb93157-9929-43e2-8ed4-f5e78ab2f37d
> > > > > [20200918T15:15:18.046Z] A41015 w event
> > > > > backend/qdisk/1311/5632 a1fec49f-e7cc-4059-87d3-ce43f386746e
> > > > > [20200918T15:15:18.046Z] A41167 w event
> > > > > backend/qdisk/1311/5632 e9419014-9fd2-47c0-b79d-30f99d9530d6
> > > > > [20200918T15:15:18.046Z] A41100 w event
> > > > > backend/qdisk/1311/5632 a2754a91-ecd6-4b6b-87ea-b68db8b888df
> > > > > [20200918T15:15:18.046Z] A41147 w event
> > > > > backend/qdisk/1311/5632 176a1c3c-add7-4710-a7ee-3b5548d7a56a
> > > > > [20200918T15:15:18.046Z] A41305 w event
> > > > > backend/qdisk/1311/5632 afe7933b-c92d-4403-8d6c-2e530558c937
> > > > > [20200918T15:15:18.046Z] A40616 w event
> > > > > backend/qdisk/1311/5632 35fa45e0-21e8-4666-825b-0c3d629f378d
> > > > > [20200918T15:15:18.046Z] A40951 w event
> > > > > backend/qdisk/1311/5632 230eb42f-d700-46ce-af61-89242847a978
> > > > > [20200918T15:15:18.046Z] A40567 w event
> > > > > backend/qdisk/1311/5632 39cc7ffb-5045-4120-beb7-778073927c93
> > > > > [20200918T15:15:18.046Z] A41363 w event
> > > > > backend/qdisk/1311/5632 9e42e74a-80fb-46e8-81f2-718628bf70f6
> > > > > [20200918T15:15:18.046Z] A40740 w event
> > > > > backend/qdisk/1311/5632 1a64af31-fee6-45be-b8d8-c98baa5e162f
> > > > > [20200918T15:15:18.046Z] A40632 w event
> > > > > backend/qdisk/1311/5632 466ef522-cb76-4117-8e93-42471897c353
> > > > > [20200918T15:15:18.046Z] A41319 w event
> > > > > backend/qdisk/1311/5632 19ea986b-e303-4180-b833-c691b2b32819
> > > > > [20200918T15:15:18.046Z] A40677 w event
> > > > > backend/qdisk/1311/5632 fb01629a-033b-41d6-8349-cec82e570238
> > > > > [20200918T15:15:18.046Z] A41152 w event
> > > > > backend/qdisk/1311/5632 84ce9e29-a5cc-42a1-a47b-497b95767885
> > > > > [20200918T15:15:18.047Z] A41233 w event
> > > > > backend/qdisk/1311/5632 ea944ad3-3af6-4688-8076-db1eac25d8e9
> > > > > [20200918T15:15:18.047Z] A41069 w event
> > > > > backend/qdisk/1311/5632 ce57e169-e1ea-4fb5-b97f-23e651f49d79
> > > > > [20200918T15:15:18.047Z] A41287 w event
> > > > > backend/qdisk/1311/5632 d31110c8-ae0b-4b9d-b71f-aa2985addd1a
> > > > > [20200918T15:15:18.047Z] A40683 w event
> > > > > backend/qdisk/1311/5632 f0e4b0a0-fad0-4bb7-b01e-b8a31107ba3d
> > > > > [20200918T15:15:18.047Z] A41177 w event
> > > > > backend/qdisk/1311/5632 9ff80e49-4cca-4ec9-901a-d30198104f29
> > > > > [20200918T15:15:18.047Z] D0 w event
> > > > > backend/qdisk/1311/5632 FFFFFFFF8276B520
> > > > > [20200918T15:15:18.047Z] A40513 w event
> > > > > backend/qdisk/1311/5632 d35a9a42-c15e-492c-a70d-d8b20bafec8f
> > > > > [20200918T15:15:18.047Z] A41354 w event
> > > > > backend/qdisk/1311/5632 e4456ca4-70f4-4afc-9ba1-4a1cfd74c8e6
> > > > >
> > > > > We are not sure this is the root cause of the issue but this is
> > > > > the
> > > > > only real difference we can see in the log.
> > > > >
> > > > > We don't understand why the number of WATCH events is related
> > > > > to
> > > > > the
> > > > > number of concurrent running VM.
> > > > > A watch event should be registered and only fired for the
> > > > > current
> > > > > domain ID, so a write for a specific node path should only
> > > > > trigger
> > > > > one
> > > > > watch event and not 30 in our case.
> > > > >
> > > > > Any ideas / comments ?
> > > >
> > > > Cheers,
> > > >
Re: oxenstored performance issue when starting VMs in parallel [ In reply to ]
On 22.09.20 15:42, Paul Durrant wrote:
>> -----Original Message-----
>> From: Edwin Torok <edvin.torok@citrix.com>
>> Sent: 22 September 2020 14:29
>> To: sstabellini@kernel.org; Anthony Perard <anthony.perard@citrix.com>; xen-
>> devel@lists.xenproject.org; paul@xen.org
>> Cc: xen-users@lists.xenproject.org; jerome.leseinne@gmail.com; julien@xen.org
>> Subject: Re: oxenstored performance issue when starting VMs in parallel
>>
>> On Tue, 2020-09-22 at 15:17 +0200, jerome leseinne wrote:
>>> Hi,
>>>
>>> Edwin you rock ! This call in qemu is effectively the culprit !
>>> I have disabled this xen_bus_add_watch call and re-run test on our
>>> big server:
>>>
>>> - oxenstored is now between 10% to 20% CPU usage (previously was
>>> 100% all the time)
>>> - All our VMs are responsive
>>> - All our VM start in less than 10 seconds (before the fix some VMs
>>> could take more than one minute to be fully up
>>> - Dom0 is more responsive
>>>
>>> Disabling the watch may not be the ideal solution ( I let the qemu
>>> experts answer this and the possible side effects),
>>
>> Hi,
>>
>> CC-ed Qemu maintainer of Xen code, please see this discussion about
>> scalability issues with the backend watching code in qemu 4.1+.
>>
>> I think the scalability issue is due to this code in qemu, which causes
>> an instance of qemu to see watches from all devices (even those
>> belonging to other qemu instances), such that adding a single device
>> causes N watches to be fired on each N instances of qemu:
>> xenbus->backend_watch =
>> xen_bus_add_watch(xenbus, "", /* domain root node */
>> "backend", xen_bus_backend_changed,
>> &local_err);
>>
>> I can understand that for backwards compatibility you might need this
>> code, but is there a way that an up-to-date (xl) toolstack could tell
>> qemu what it needs to look at (e.g. via QMP, or other keys in xenstore)
>> instead of relying on an overly broad watch?
>
> I think this could be made more efficient. The call to "module_call_init(MODULE_INIT_XEN_BACKEND)" just prior to this watch will register backends that do auto-creation so we could register individual watches for the various backend types instead of this single one.

The watch should be on guest domain level, e.g. for:

/local/domain/0/backend/vbd/5

We have one qemu process per guest, after all.


Juergen
RE: oxenstored performance issue when starting VMs in parallel [ In reply to ]
> -----Original Message-----
> From: Jürgen Groß <jgross@suse.com>
> Sent: 22 September 2020 15:18
> To: paul@xen.org; 'Edwin Torok' <edvin.torok@citrix.com>; sstabellini@kernel.org; 'Anthony Perard'
> <anthony.perard@citrix.com>; xen-devel@lists.xenproject.org
> Cc: xen-users@lists.xenproject.org; jerome.leseinne@gmail.com; julien@xen.org
> Subject: Re: oxenstored performance issue when starting VMs in parallel
>
> On 22.09.20 15:42, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Edwin Torok <edvin.torok@citrix.com>
> >> Sent: 22 September 2020 14:29
> >> To: sstabellini@kernel.org; Anthony Perard <anthony.perard@citrix.com>; xen-
> >> devel@lists.xenproject.org; paul@xen.org
> >> Cc: xen-users@lists.xenproject.org; jerome.leseinne@gmail.com; julien@xen.org
> >> Subject: Re: oxenstored performance issue when starting VMs in parallel
> >>
> >> On Tue, 2020-09-22 at 15:17 +0200, jerome leseinne wrote:
> >>> Hi,
> >>>
> >>> Edwin you rock ! This call in qemu is effectively the culprit !
> >>> I have disabled this xen_bus_add_watch call and re-run test on our
> >>> big server:
> >>>
> >>> - oxenstored is now between 10% to 20% CPU usage (previously was
> >>> 100% all the time)
> >>> - All our VMs are responsive
> >>> - All our VM start in less than 10 seconds (before the fix some VMs
> >>> could take more than one minute to be fully up
> >>> - Dom0 is more responsive
> >>>
> >>> Disabling the watch may not be the ideal solution ( I let the qemu
> >>> experts answer this and the possible side effects),
> >>
> >> Hi,
> >>
> >> CC-ed Qemu maintainer of Xen code, please see this discussion about
> >> scalability issues with the backend watching code in qemu 4.1+.
> >>
> >> I think the scalability issue is due to this code in qemu, which causes
> >> an instance of qemu to see watches from all devices (even those
> >> belonging to other qemu instances), such that adding a single device
> >> causes N watches to be fired on each N instances of qemu:
> >> xenbus->backend_watch =
> >> xen_bus_add_watch(xenbus, "", /* domain root node */
> >> "backend", xen_bus_backend_changed,
> >> &local_err);
> >>
> >> I can understand that for backwards compatibility you might need this
> >> code, but is there a way that an up-to-date (xl) toolstack could tell
> >> qemu what it needs to look at (e.g. via QMP, or other keys in xenstore)
> >> instead of relying on an overly broad watch?
> >
> > I think this could be made more efficient. The call to "module_call_init(MODULE_INIT_XEN_BACKEND)"
> just prior to this watch will register backends that do auto-creation so we could register individual
> watches for the various backend types instead of this single one.
>
> The watch should be on guest domain level, e.g. for:
>
> /local/domain/0/backend/vbd/5
>
> We have one qemu process per guest, after all.
>

I'll see if I can spin a patch this afternoon.

Paul

>
> Juergen