Mailing List Archive

User domain starts with a crash loop when memory configured is above 500GB
Hi everybody,

I have a server with 760GB of RAM. I have only domain 0 running there
with 16GB of ram assigned to it.

Here is a configuration for my user domain:

name = "node01"
kernel = "/boot/vmlinuz-5.15.0-82-generic"
root = "/dev/xvda"
memory = 614400
maxmem = 614400
vcpus = 32
maxvcpus = 32
disk = ['file:/vserver/images/node01.img,xvda,w']
vif = ['bridge=virbr0,mac=00:16:3e:01:01:02']
iommu = "soft"
swiotlb = "force"
pci_permissive = 1
pci =
['0000:3e:00.0','0000:3f:00.0','0000:40:00.0','0000:41:00.0','0000:b1:00.0','0000:b2:00.0']

nics = 1
dhcp = "off"
ip = "192.168.122.15"
netmask = "255.255.255.0"
gateway = "192.168.122.1"
hostname = "node01"

extra="3"

When I try to start the domain, it spins in a crash loop with following
error messages:

[ 6864.140170] WARNING: CPU: 2 PID: 266 at arch/x86/xen/multicalls.c:102
xen_mc_flush+0x197/0x200
[ 6864.140183] Modules linked in:
[ 6864.140190] CPU: 2 PID: 266 Comm: xen-balloon Tainted: G D W
5.15.0-82-generic #91-Ubuntu
[ 6864.140203] RIP: e030:xen_mc_flush+0x197/0x200
[ 6864.140212] Code: 77 65 89 c0 48 c1 e0 05 48 05 00 20 00 81 ff d0 0f
1f 00 49 89 45 18 48 85 c0 0f 89 17 ff ff ff 45 8b 4d 00 41 bf 01 00 00
00 <0f> 0b 48 c7 c7 f0 8e 5b 82 44 89 ca 44 89 fe 45 31 f6 65 8b 0d e8
[ 6864.140234] RSP: e02b:ffffc90041027b88 EFLAGS: 00010002
[ 6864.140243] RAX: 0000000000000001 RBX: 0000000000000040 RCX:
0000000000000000
[ 6864.140253] RDX: 0000000000000000 RSI: 0000000000000002 RDI:
ffff89009809e310
[ 6864.140264] RBP: ffffc90041027bb8 R08: ffff888168dc0000 R09:
0000000000000002
[ 6864.140275] R10: 0000000000000200 R11: ffff8900980b7690 R12:
0000000000000000
[ 6864.140286] R13: ffff89009809e300 R14: 0000000000000002 R15:
0000000000000001
[ 6864.140303] FS: 0000000000000000(0000) GS:ffff890098080000(0000)
knlGS:0000000000000000
[ 6864.140315] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6864.140324] CR2: 0000000000000000 CR3: 0000000002e10000 CR4:
0000000000050660
[ 6864.140339] Call Trace:
[ 6864.140344] <TASK>
[ 6864.140349] ? __raw_callee_save_xen_make_pte+0x15/0x27
[ 6864.140359] xen_mc_issue+0x61/0x80
[ 6864.140367] xen_alloc_pte+0xd8/0x290
[ 6864.140376] pmd_populate_kernel.constprop.0+0x4b/0xa0
[ 6864.140387] vmemmap_pmd_populate+0x69/0x79
[ 6864.140395] vmemmap_populate_basepages+0x68/0xb3
[ 6864.140405] vmemmap_populate+0x2a/0xa9
[ 6864.140412] __populate_section_memmap+0x3c/0x57
[ 6864.140422] sparse_add_section+0x12b/0x1dc
[ 6864.140431] __add_pages+0xac/0x150
[ 6864.140440] add_pages+0x17/0x70
[ 6864.140447] arch_add_memory+0x45/0x60
[ 6864.140455] add_memory_resource+0x12c/0x320
[ 6864.140467] reserve_additional_memory+0x10f/0x160
[ 6864.140476] balloon_thread+0x337/0x500
[ 6864.140483] ? wait_woken+0x70/0x70
[ 6864.140492] ? reserve_additional_memory+0x160/0x160
[ 6864.140501] kthread+0x127/0x150
[ 6864.140509] ? set_kthread_struct+0x50/0x50
[ 6864.140518] ret_from_fork+0x1f/0x30
[ 6864.140528] </TASK>
[ 6864.140533] ---[ end trace 3bca9737718a46b2 ]---
[ 6864.140541] 1 of 2 multicall(s) failed: cpu 2
[ 6864.140549] call 2: op=26 arg=[ffff89009809eb10] result=-22

Any suggestion what I am doing wrong? There should be plenty of RAM to
start 600GB domain. I can start user domain with 500GB no problem.
Thank you in advance for your help and suggestions.

# xl info | grep memory
total_memory : 785055
free_memory : 143839
Re: User domain starts with a crash loop when memory configured is above 500GB [ In reply to ]
On 18.09.23 17:34, Robert Polasek wrote:
> Hi everybody,
>
> I have a server with 760GB of RAM. I have only domain 0 running there with 16GB
> of ram assigned to it.
>
> Here is a configuration for my user domain:
>
> name = "node01"
> kernel = "/boot/vmlinuz-5.15.0-82-generic"
> root = "/dev/xvda"
> memory = 614400
> maxmem = 614400
> vcpus = 32
> maxvcpus = 32
> disk = ['file:/vserver/images/node01.img,xvda,w']
> vif = ['bridge=virbr0,mac=00:16:3e:01:01:02']
> iommu = "soft"
> swiotlb = "force"
> pci_permissive = 1
> pci =
> ['0000:3e:00.0','0000:3f:00.0','0000:40:00.0','0000:41:00.0','0000:b1:00.0','0000:b2:00.0']
>
> nics = 1
> dhcp = "off"
> ip = "192.168.122.15"
> netmask = "255.255.255.0"
> gateway = "192.168.122.1"
> hostname = "node01"
>
> extra="3"
>
> When I try to start the domain, it spins in a crash loop with following error
> messages:
>
> [ 6864.140170] WARNING: CPU: 2 PID: 266 at arch/x86/xen/multicalls.c:102
> xen_mc_flush+0x197/0x200
> [ 6864.140183] Modules linked in:
> [ 6864.140190] CPU: 2 PID: 266 Comm: xen-balloon Tainted: G      D W
>  5.15.0-82-generic #91-Ubuntu
> [ 6864.140203] RIP: e030:xen_mc_flush+0x197/0x200
> [ 6864.140212] Code: 77 65 89 c0 48 c1 e0 05 48 05 00 20 00 81 ff d0 0f 1f 00 49
> 89 45 18 48 85 c0 0f 89 17 ff ff ff 45 8b 4d 00 41 bf 01 00 00 00 <0f> 0b 48 c7
> c7 f0 8e 5b 82 44 89 ca 44 89 fe 45 31 f6 65 8b 0d e8
> [ 6864.140234] RSP: e02b:ffffc90041027b88 EFLAGS: 00010002
> [ 6864.140243] RAX: 0000000000000001 RBX: 0000000000000040 RCX: 0000000000000000
> [ 6864.140253] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff89009809e310
> [ 6864.140264] RBP: ffffc90041027bb8 R08: ffff888168dc0000 R09: 0000000000000002
> [ 6864.140275] R10: 0000000000000200 R11: ffff8900980b7690 R12: 0000000000000000
> [ 6864.140286] R13: ffff89009809e300 R14: 0000000000000002 R15: 0000000000000001
> [ 6864.140303] FS:  0000000000000000(0000) GS:ffff890098080000(0000)
> knlGS:0000000000000000
> [ 6864.140315] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 6864.140324] CR2: 0000000000000000 CR3: 0000000002e10000 CR4: 0000000000050660
> [ 6864.140339] Call Trace:
> [ 6864.140344]  <TASK>
> [ 6864.140349]  ? __raw_callee_save_xen_make_pte+0x15/0x27
> [ 6864.140359]  xen_mc_issue+0x61/0x80
> [ 6864.140367]  xen_alloc_pte+0xd8/0x290
> [ 6864.140376]  pmd_populate_kernel.constprop.0+0x4b/0xa0
> [ 6864.140387]  vmemmap_pmd_populate+0x69/0x79
> [ 6864.140395]  vmemmap_populate_basepages+0x68/0xb3
> [ 6864.140405]  vmemmap_populate+0x2a/0xa9
> [ 6864.140412]  __populate_section_memmap+0x3c/0x57
> [ 6864.140422]  sparse_add_section+0x12b/0x1dc
> [ 6864.140431]  __add_pages+0xac/0x150
> [ 6864.140440]  add_pages+0x17/0x70
> [ 6864.140447]  arch_add_memory+0x45/0x60
> [ 6864.140455]  add_memory_resource+0x12c/0x320
> [ 6864.140467]  reserve_additional_memory+0x10f/0x160
> [ 6864.140476]  balloon_thread+0x337/0x500
> [ 6864.140483]  ? wait_woken+0x70/0x70
> [ 6864.140492]  ? reserve_additional_memory+0x160/0x160
> [ 6864.140501]  kthread+0x127/0x150
> [ 6864.140509]  ? set_kthread_struct+0x50/0x50
> [ 6864.140518]  ret_from_fork+0x1f/0x30
> [ 6864.140528]  </TASK>
> [ 6864.140533] ---[ end trace 3bca9737718a46b2 ]---
> [ 6864.140541] 1 of 2 multicall(s) failed: cpu 2
> [ 6864.140549]   call  2: op=26 arg=[ffff89009809eb10] result=-22
>
> Any suggestion what I am doing wrong? There should be plenty of RAM to start
> 600GB domain. I can start  user domain with 500GB no problem. Thank you in
> advance for your help and suggestions.

I think your kernel has been configured with CONFIG_XEN_512GB.

You should try to add "xen_512gb_limit=0" to your guest's command line.

Even if this is fixing your boot issue, the guest shouldn't show the error
you are seeing.


Juergen
Re[2]: User domain starts with a crash loop when memory configured is above 500GB [ In reply to ]
Thank you Juergen for the tip. It was dead on. I made those changes and
I am able to boot larger user domains than 500GB and also the kernel
crash messages went away.

Cheers,
Robert


------ Original Message ------
From "Juergen Gross" <jgross@suse.com>
To "Robert Polasek" <polasekr@gmail.com>; xen-users@lists.xenproject.org
Date 2023-09-19 10:35:24
Subject Re: User domain starts with a crash loop when memory configured
is above 500GB

>On 18.09.23 17:34, Robert Polasek wrote:
>>Hi everybody,
>>
>>I have a server with 760GB of RAM. I have only domain 0 running there with 16GB of ram assigned to it.
>>
>>Here is a configuration for my user domain:
>>
>>name = "node01"
>>kernel = "/boot/vmlinuz-5.15.0-82-generic"
>>root = "/dev/xvda"
>>memory = 614400
>>maxmem = 614400
>>vcpus = 32
>>maxvcpus = 32
>>disk = ['file:/vserver/images/node01.img,xvda,w']
>>vif = ['bridge=virbr0,mac=00:16:3e:01:01:02']
>>iommu = "soft"
>>swiotlb = "force"
>>pci_permissive = 1
>>pci = ['0000:3e:00.0','0000:3f:00.0','0000:40:00.0','0000:41:00.0','0000:b1:00.0','0000:b2:00.0']
>>
>>nics = 1
>>dhcp = "off"
>>ip = "192.168.122.15"
>>netmask = "255.255.255.0"
>>gateway = "192.168.122.1"
>>hostname = "node01"
>>
>>extra="3"
>>
>>When I try to start the domain, it spins in a crash loop with following error messages:
>>
>>[ 6864.140170] WARNING: CPU: 2 PID: 266 at arch/x86/xen/multicalls.c:102 xen_mc_flush+0x197/0x200
>>[ 6864.140183] Modules linked in:
>>[ 6864.140190] CPU: 2 PID: 266 Comm: xen-balloon Tainted: G D W 5.15.0-82-generic #91-Ubuntu
>>[ 6864.140203] RIP: e030:xen_mc_flush+0x197/0x200
>>[ 6864.140212] Code: 77 65 89 c0 48 c1 e0 05 48 05 00 20 00 81 ff d0 0f 1f 00 49 89 45 18 48 85 c0 0f 89 17 ff ff ff 45 8b 4d 00 41 bf 01 00 00 00 <0f> 0b 48 c7 c7 f0 8e 5b 82 44 89 ca 44 89 fe 45 31 f6 65 8b 0d e8
>>[ 6864.140234] RSP: e02b:ffffc90041027b88 EFLAGS: 00010002
>>[ 6864.140243] RAX: 0000000000000001 RBX: 0000000000000040 RCX: 0000000000000000
>>[ 6864.140253] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff89009809e310
>>[ 6864.140264] RBP: ffffc90041027bb8 R08: ffff888168dc0000 R09: 0000000000000002
>>[ 6864.140275] R10: 0000000000000200 R11: ffff8900980b7690 R12: 0000000000000000
>>[ 6864.140286] R13: ffff89009809e300 R14: 0000000000000002 R15: 0000000000000001
>>[ 6864.140303] FS: 0000000000000000(0000) GS:ffff890098080000(0000) knlGS:0000000000000000
>>[ 6864.140315] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>[ 6864.140324] CR2: 0000000000000000 CR3: 0000000002e10000 CR4: 0000000000050660
>>[ 6864.140339] Call Trace:
>>[ 6864.140344] <TASK>
>>[ 6864.140349] ? __raw_callee_save_xen_make_pte+0x15/0x27
>>[ 6864.140359] xen_mc_issue+0x61/0x80
>>[ 6864.140367] xen_alloc_pte+0xd8/0x290
>>[ 6864.140376] pmd_populate_kernel.constprop.0+0x4b/0xa0
>>[ 6864.140387] vmemmap_pmd_populate+0x69/0x79
>>[ 6864.140395] vmemmap_populate_basepages+0x68/0xb3
>>[ 6864.140405] vmemmap_populate+0x2a/0xa9
>>[ 6864.140412] __populate_section_memmap+0x3c/0x57
>>[ 6864.140422] sparse_add_section+0x12b/0x1dc
>>[ 6864.140431] __add_pages+0xac/0x150
>>[ 6864.140440] add_pages+0x17/0x70
>>[ 6864.140447] arch_add_memory+0x45/0x60
>>[ 6864.140455] add_memory_resource+0x12c/0x320
>>[ 6864.140467] reserve_additional_memory+0x10f/0x160
>>[ 6864.140476] balloon_thread+0x337/0x500
>>[ 6864.140483] ? wait_woken+0x70/0x70
>>[ 6864.140492] ? reserve_additional_memory+0x160/0x160
>>[ 6864.140501] kthread+0x127/0x150
>>[ 6864.140509] ? set_kthread_struct+0x50/0x50
>>[ 6864.140518] ret_from_fork+0x1f/0x30
>>[ 6864.140528] </TASK>
>>[ 6864.140533] ---[ end trace 3bca9737718a46b2 ]---
>>[ 6864.140541] 1 of 2 multicall(s) failed: cpu 2
>>[ 6864.140549] call 2: op=26 arg=[ffff89009809eb10] result=-22
>>
>>Any suggestion what I am doing wrong? There should be plenty of RAM to start 600GB domain. I can start user domain with 500GB no problem. Thank you in advance for your help and suggestions.
>
>I think your kernel has been configured with CONFIG_XEN_512GB.
>
>You should try to add "xen_512gb_limit=0" to your guest's command line.
>
>Even if this is fixing your boot issue, the guest shouldn't show the error
>you are seeing.
>
>
>Juergen
>