Mailing List Archive

storagedriver domain limited to 61 "shares" - Xen 4.16.6 pre2
Hi all,

I am running into an issue with my storagedomain.
I have the HBAs assigned to the domain,
Added "driver_domain=1" to the config.

And am accessing filesystems on this domain from other domains.

This works as expected.

However, I am only able to assign 61 filesystems to other domains. As soon as I
attempt to assign a 62nd filesystem, it fails.

I am completely at a loss where this limit of 61 is coming from and am unable
to find anything with over 2 days of google searches.

I also noticed that "xl devd" isn't actually running on my storagedomain
anymore. It used to run in a previous version, but as the whole environment
was booting succesfully I never bothered to keep checking.

I do remember that it was running in the past. Currently, I think the init-
script is stopping because "/proc/xen/capabilities" is empty.

Please let me know which config-files are needed to troubleshoot this.

Many thanks,

Joost Roeleveld
Re: storagedriver domain limited to 61 "shares" - Xen 4.16.6 pre2 [ In reply to ]
On 11.02.24 17:21, J. Roeleveld wrote:
> Hi all,
>
> I am running into an issue with my storagedomain.
> I have the HBAs assigned to the domain,
> Added "driver_domain=1" to the config.
>
> And am accessing filesystems on this domain from other domains.
>
> This works as expected.
>
> However, I am only able to assign 61 filesystems to other domains. As soon as I
> attempt to assign a 62nd filesystem, it fails.
>
> I am completely at a loss where this limit of 61 is coming from and am unable
> to find anything with over 2 days of google searches.
>
> I also noticed that "xl devd" isn't actually running on my storagedomain
> anymore. It used to run in a previous version, but as the whole environment
> was booting succesfully I never bothered to keep checking.
>
> I do remember that it was running in the past. Currently, I think the init-
> script is stopping because "/proc/xen/capabilities" is empty.
>
> Please let me know which config-files are needed to troubleshoot this.

My first guess would be that the driver domain is limited by the max number
of Xenstore nodes it is allowed to own.

You can raise the default of 1000 nodes per domain (how to do that depends
on the Xenstore type you are running, xenstored or oxenstored).


Juergen
Re: storagedriver domain limited to 61 "shares" - Xen 4.16.6 pre2 [ In reply to ]
On Monday, February 12, 2024 7:40:39 AM CET Juergen Gross wrote:
> On 11.02.24 17:21, J. Roeleveld wrote:
>
> > Hi all,
> >
> > I am running into an issue with my storagedomain.
> > I have the HBAs assigned to the domain,
> > Added "driver_domain=1" to the config.
> >
> > And am accessing filesystems on this domain from other domains.
> >
> > This works as expected.
> >
> > However, I am only able to assign 61 filesystems to other domains. As soon
> > as I attempt to assign a 62nd filesystem, it fails.
> >
> > I am completely at a loss where this limit of 61 is coming from and am
> > unable to find anything with over 2 days of google searches.
> >
> > I also noticed that "xl devd" isn't actually running on my storagedomain
> > anymore. It used to run in a previous version, but as the whole
> > environment was booting succesfully I never bothered to keep checking.
> >
> > I do remember that it was running in the past. Currently, I think the
> > init- script is stopping because "/proc/xen/capabilities" is empty.
> >
> > Please let me know which config-files are needed to troubleshoot this.
>
>
> My first guess would be that the driver domain is limited by the max number
> of Xenstore nodes it is allowed to own.
>
> You can raise the default of 1000 nodes per domain (how to do that depends
> on the Xenstore type you are running, xenstored or oxenstored).

Hi Juergen,

Many thanks for this. I forgot I already raised this in the past to 10000 on
the host and the current amount of entries is quite close to this limit:
# xenstore-ls | wc -l
9590

I will increase this and try again when I have a chance to reboot the host.

Is there any benefit of using "oxenstored" over "xenstored" ? I currently use
"xenstored" and apart from this, don't seem to have any issues.

--
Joost
Re: storagedriver domain limited to 61 "shares" - Xen 4.16.6 pre2 [ In reply to ]
On 12.02.24 07:57, J. Roeleveld wrote:
> On Monday, February 12, 2024 7:40:39 AM CET Juergen Gross wrote:
>> On 11.02.24 17:21, J. Roeleveld wrote:
>>
>>> Hi all,
>>>
>>> I am running into an issue with my storagedomain.
>>> I have the HBAs assigned to the domain,
>>> Added "driver_domain=1" to the config.
>>>
>>> And am accessing filesystems on this domain from other domains.
>>>
>>> This works as expected.
>>>
>>> However, I am only able to assign 61 filesystems to other domains. As soon
>>> as I attempt to assign a 62nd filesystem, it fails.
>>>
>>> I am completely at a loss where this limit of 61 is coming from and am
>>> unable to find anything with over 2 days of google searches.
>>>
>>> I also noticed that "xl devd" isn't actually running on my storagedomain
>>> anymore. It used to run in a previous version, but as the whole
>>> environment was booting succesfully I never bothered to keep checking.
>>>
>>> I do remember that it was running in the past. Currently, I think the
>>> init- script is stopping because "/proc/xen/capabilities" is empty.
>>>
>>> Please let me know which config-files are needed to troubleshoot this.
>>
>>
>> My first guess would be that the driver domain is limited by the max number
>> of Xenstore nodes it is allowed to own.
>>
>> You can raise the default of 1000 nodes per domain (how to do that depends
>> on the Xenstore type you are running, xenstored or oxenstored).
>
> Hi Juergen,
>
> Many thanks for this. I forgot I already raised this in the past to 10000 on
> the host and the current amount of entries is quite close to this limit:
> # xenstore-ls | wc -l
> 9590
>
> I will increase this and try again when I have a chance to reboot the host.

Which version of Xen are you running?

In case it is not too old (>4.15), you could try to restart xenstored via
live update giving it the new parameters. Note that this feature is still
"Tech Preview", but there are no issues I'm aware of. In dom0 you'd do:

xenstore-control live-update -c '<new parameters>' /usr/sbin/xenstored

> Is there any benefit of using "oxenstored" over "xenstored" ? I currently use
> "xenstored" and apart from this, don't seem to have any issues.

I'm biased, as I'm the maintainer of "xenstored". :-)

BTW, oxenstored has the same default of 1000 nodes per domain.


Juergen
Re: storagedriver domain limited to 61 "shares" - Xen 4.16.6 pre2 [ In reply to ]
On Monday, February 12, 2024 8:09:35 AM CET Juergen Gross wrote:
> On 12.02.24 07:57, J. Roeleveld wrote:
>
> > On Monday, February 12, 2024 7:40:39 AM CET Juergen Gross wrote:
> >
> >> On 11.02.24 17:21, J. Roeleveld wrote:
> >>
> >>
> >> You can raise the default of 1000 nodes per domain (how to do that
> >> depends
> >> on the Xenstore type you are running, xenstored or oxenstored).
> >
> >
> > Hi Juergen,
> >
> > Many thanks for this. I forgot I already raised this in the past to 10000
> > on the host and the current amount of entries is quite close to this
> > limit: # xenstore-ls | wc -l
> > 9590
> >
> > I will increase this and try again when I have a chance to reboot the
> > host.
>
> Which version of Xen are you running?

I am currently using 4.16.6_pre2

> In case it is not too old (>4.15), you could try to restart xenstored via
> live update giving it the new parameters. Note that this feature is still
> "Tech Preview", but there are no issues I'm aware of. In dom0 you'd do:
>
> xenstore-control live-update -c '<new parameters>' /usr/sbin/xenstored

Considering that restarting of VMs fails once I hit this issue, I prefer to
test when I am able to reboot the host anyway. :)

> > Is there any benefit of using "oxenstored" over "xenstored" ? I currently
> > use "xenstored" and apart from this, don't seem to have any issues.
>
> I'm biased, as I'm the maintainer of "xenstored". :-)

Understandable. Reason I am asking is because "oxenstored" seems new to me as
I didn't hear about it till now. Was wondering if that is going to replace
"xenstored" eventually.

> BTW, oxenstored has the same default of 1000 nodes per domain.

There is one thing that confuses me about this.

I notice the limit is applied to the entire host, not per domain.
The entire xenstore-ls output gives me 9k+ entries. This is with 15 domU's.
On average, I should stay below 1k nodes per domain.

Unless "domain" in this context is not a Virtual Machine ?

--
Joost
Re: storagedriver domain limited to 61 "shares" - Xen 4.16.6 pre2 [ In reply to ]
On 12.02.24 08:40, J. Roeleveld wrote:
> On Monday, February 12, 2024 8:09:35 AM CET Juergen Gross wrote:
>> On 12.02.24 07:57, J. Roeleveld wrote:
>>
>>> On Monday, February 12, 2024 7:40:39 AM CET Juergen Gross wrote:
>>>
>>>> On 11.02.24 17:21, J. Roeleveld wrote:
>>>>
>>>>
>>>> You can raise the default of 1000 nodes per domain (how to do that
>>>> depends
>>>> on the Xenstore type you are running, xenstored or oxenstored).
>>>
>>>
>>> Hi Juergen,
>>>
>>> Many thanks for this. I forgot I already raised this in the past to 10000
>>> on the host and the current amount of entries is quite close to this
>>> limit: # xenstore-ls | wc -l
>>> 9590
>>>
>>> I will increase this and try again when I have a chance to reboot the
>>> host.
>>
>> Which version of Xen are you running?
>
> I am currently using 4.16.6_pre2
>
>> In case it is not too old (>4.15), you could try to restart xenstored via
>> live update giving it the new parameters. Note that this feature is still
>> "Tech Preview", but there are no issues I'm aware of. In dom0 you'd do:
>>
>> xenstore-control live-update -c '<new parameters>' /usr/sbin/xenstored
>
> Considering that restarting of VMs fails once I hit this issue, I prefer to
> test when I am able to reboot the host anyway. :)
>
>>> Is there any benefit of using "oxenstored" over "xenstored" ? I currently
>>> use "xenstored" and apart from this, don't seem to have any issues.
>>
>> I'm biased, as I'm the maintainer of "xenstored". :-)
>
> Understandable. Reason I am asking is because "oxenstored" seems new to me as
> I didn't hear about it till now. Was wondering if that is going to replace
> "xenstored" eventually.
>
>> BTW, oxenstored has the same default of 1000 nodes per domain.
>
> There is one thing that confuses me about this.
>
> I notice the limit is applied to the entire host, not per domain.

The limit is a global one being applied to each domain (except dom0), but there
are plans to make all Xenstore quota configurable per domain.

> The entire xenstore-ls output gives me 9k+ entries. This is with 15 domU's.
> On average, I should stay below 1k nodes per domain.

"On average" doesn't really help here. :-)

> Unless "domain" in this context is not a Virtual Machine ?

It is.


Juergen
Re: storagedriver domain limited to 61 "shares" - Xen 4.16.6 pre2 [ In reply to ]
On Monday, February 12, 2024 9:02:39 AM CET Juergen Gross wrote:
> On 12.02.24 08:40, J. Roeleveld wrote:
>
> > On Monday, February 12, 2024 8:09:35 AM CET Juergen Gross wrote:

> > There is one thing that confuses me about this.
> >
> > I notice the limit is applied to the entire host, not per domain.
>
> The limit is a global one being applied to each domain (except dom0), but
> there are plans to make all Xenstore quota configurable per domain.

IOW, all domains added together?

> > The entire xenstore-ls output gives me 9k+ entries. This is with 15
> > domU's. On average, I should stay below 1k nodes per domain.
>
> "On average" doesn't really help here. :-)

I did a quick check:
# for i in {0..15} ; do xenstore-ls /local/domain/${i} | wc -l ; done
2131 <--- dom0
190
1999 <--- storagedomain
668
258
239
158
158
315
315
142
158
158
201
201
130

I currently use " -E 10000 " (this is 10k)

If the limit is per domain, I would assume 10k should be more then enough.
However, it looks like the limit is global.

To me, the wording is confusing:
-E, --entry-nb <nb> limit the number of entries per domain

With the largest domain being the storagedomain (as expected) being around 2k,
I would assume (based on the wording) that I could add about 5x as many
entries. Going from 61 to 68 devices should not cause this a 5x increase for
the storagedomain.

--
Joost
Re: storagedriver domain limited to 61 "shares" - Xen 4.16.6 pre2 [ In reply to ]
On Monday, February 12, 2024 7:40:39 AM CET Juergen Gross wrote:
> On 11.02.24 17:21, J. Roeleveld wrote:
>
> > Hi all,
> >
> > I am running into an issue with my storagedomain.
> > I have the HBAs assigned to the domain,
> > Added "driver_domain=1" to the config.
> >
> > And am accessing filesystems on this domain from other domains.
> >
> > This works as expected.
> >
> > However, I am only able to assign 61 filesystems to other domains. As soon
> > as I
attempt to assign a 62nd filesystem, it fails.
> >
> > I am completely at a loss where this limit of 61 is coming from and am
> > unable
to find anything with over 2 days of google searches.
> >
> > I also noticed that "xl devd" isn't actually running on my storagedomain
> > anymore. It used to run in a previous version, but as the whole
> > environment
was booting succesfully I never bothered to keep checking.
> >
> > I do remember that it was running in the past. Currently, I think the
> > init- script is stopping because "/proc/xen/capabilities" is empty.
> >
> > Please let me know which config-files are needed to troubleshoot this.
>
>
> My first guess would be that the driver domain is limited by the max number
> of Xenstore nodes it is allowed to own.
>
> You can raise the default of 1000 nodes per domain (how to do that depends
> on the Xenstore type you are running, xenstored or oxenstored).

Hi Juergen,

I just tested with -E set to 15000 (15k) but am still seeing the same issue.

On the storagedomain, I see the following in dmesg:
===
[58855.383841] vbd vbd-16-51729: 7 adding watch on /local/domain/16/device/
vbd/51729/state
[58855.384545] vbd vbd-16-51729: xenbus: watch_otherend on backend/vbd/
16/51729 failed.
[58855.384548] vbd: probe of vbd-16-51729 failed with error -7
===

followed by several like:
===
[58855.407134] vbd vbd-16-51730: 7 adding watch on backend/vbd/16/51730/
physical-device
[58855.408205] xen-blkback: xen_blkbk_probe failed
[58855.408242] vbd vbd-16-51730: 7 xenbus_dev_probe on backend/vbd/16/51730
[58855.408405] vbd: probe of vbd-16-51730 failed with error -7
===

The part "adding watch" makes me wonder about the option:
-W, --watch-nb <nb> limit the number of watches per domain,

From the code, I find the default is 128. Is there any way to find out how many
are currently set on my system and what a good amount would be?

I am considering trying 256 for this.
Is there anything I need to be aware off before making this change?

--
Joost
Re: storagedriver domain limited to 61 "shares" - Xen 4.16.6 pre2 [ In reply to ]
On 12.02.24 10:22, J. Roeleveld wrote:
> On Monday, February 12, 2024 7:40:39 AM CET Juergen Gross wrote:
>> On 11.02.24 17:21, J. Roeleveld wrote:
>>
>>> Hi all,
>>>
>>> I am running into an issue with my storagedomain.
>>> I have the HBAs assigned to the domain,
>>> Added "driver_domain=1" to the config.
>>>
>>> And am accessing filesystems on this domain from other domains.
>>>
>>> This works as expected.
>>>
>>> However, I am only able to assign 61 filesystems to other domains. As soon
>>> as I
> attempt to assign a 62nd filesystem, it fails.
>>>
>>> I am completely at a loss where this limit of 61 is coming from and am
>>> unable
> to find anything with over 2 days of google searches.
>>>
>>> I also noticed that "xl devd" isn't actually running on my storagedomain
>>> anymore. It used to run in a previous version, but as the whole
>>> environment
> was booting succesfully I never bothered to keep checking.
>>>
>>> I do remember that it was running in the past. Currently, I think the
>>> init- script is stopping because "/proc/xen/capabilities" is empty.
>>>
>>> Please let me know which config-files are needed to troubleshoot this.
>>
>>
>> My first guess would be that the driver domain is limited by the max number
>> of Xenstore nodes it is allowed to own.
>>
>> You can raise the default of 1000 nodes per domain (how to do that depends
>> on the Xenstore type you are running, xenstored or oxenstored).
>
> Hi Juergen,
>
> I just tested with -E set to 15000 (15k) but am still seeing the same issue.
>
> On the storagedomain, I see the following in dmesg:
> ===
> [58855.383841] vbd vbd-16-51729: 7 adding watch on /local/domain/16/device/
> vbd/51729/state
> [58855.384545] vbd vbd-16-51729: xenbus: watch_otherend on backend/vbd/
> 16/51729 failed.
> [58855.384548] vbd: probe of vbd-16-51729 failed with error -7
> ===
>
> followed by several like:
> ===
> [58855.407134] vbd vbd-16-51730: 7 adding watch on backend/vbd/16/51730/
> physical-device
> [58855.408205] xen-blkback: xen_blkbk_probe failed
> [58855.408242] vbd vbd-16-51730: 7 xenbus_dev_probe on backend/vbd/16/51730
> [58855.408405] vbd: probe of vbd-16-51730 failed with error -7
> ===
>
> The part "adding watch" makes me wonder about the option:
> -W, --watch-nb <nb> limit the number of watches per domain,
>
>>From the code, I find the default is 128. Is there any way to find out how many
> are currently set on my system and what a good amount would be?

Oh, the relevant pieces have been backported due to several XSAs for
Xenstore.

You should be able to issue:

xenstore-control quota

For showing the current settings and

xenstore-control quota set watches 256

for changing the current quota value for watches. Same should work for
other quota.

xenstore-control quota <domid>

shows the current number of used up resources of a specific domain.

> I am considering trying 256 for this.
> Is there anything I need to be aware off before making this change?

Using above commands: no.


Juergen