Mailing List Archive

xc: error: xc_machphys_mfn_list: 83 != 129 when suspending 32GB PV DomU
Hi,

I have a 256GB host and run a 32GB 64-bit PV domain (SLES 11) on it.
When I try and suspend the domain, xc barfs with:

xc: error: xc_machphys_mfn_list: 83 != 129: Internal error
xc: error: xc_get_m2p_mfns (0 = Success): Internal error
xc: error: Failed to map live M2P table (0 = Success): Internal error

At first, since dom0 is 32 bit, I suspected the compat layer. However
the hypercall in xen/arch/x86/x86_64/compat/mm.c compat_arch_memory_op()
seems to agree with the numbers:

(XEN) compat_arch_memory_op returned 0 (nr_extents = 83, max_extents = 129)

>From this I conclude that everything is working OK at the hypercall
layer. However, looking at the code in compat_arch_memory_op() it
appears that the code is failing due to some arcane limits of the compat
subsystem. The following code to establish the variable 'limit' is
causing the loop to exit early:

limit = (unsigned long)(compat_machine_to_phys_mapping +
min_t(unsigned long, max_page,
MACH2PHYS_COMPAT_NR_ENTRIES(current->domain)));
if ( limit > RDWR_COMPAT_MPT_VIRT_END )
limit = RDWR_COMPAT_MPT_VIRT_END;
for ( i = 0, v = RDWR_COMPAT_MPT_VIRT_START, last_mfn = 0;
(i != xmml.max_extents) && (v < limit);
i++, v += 1 << L2_PAGETABLE_SHIFT )
{
/* do stuff */
}
xmml.nr_extents = i;

Further debugging reveals the variables are set as such:
(XEN) compat_machine_to_phys_mapping = 18446606377058041856
(XEN) max_page = 67272704
(XEN) MACH2PHYS_COMPAT_NR_ENTRIES(current->domain) = 43515904
(XEN) RDWR_COMPAT_MPT_VIRT_START = 18446606377058041856
(XEN) RDWR_COMPAT_MPT_VIRT_END = 18446606378131783680
(XEN) limit = 18446606377232105472, (1 << L2_PAGETABLE_SHIFT) = 2097152

Could it be that the compat mach-to-phys conversion table size of 1GB is
too small? Or that there exists some other arbitrary limit on the size
of domains that can be suspended [when using a 32bit dom0] ?

Gianni


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: xc: error: xc_machphys_mfn_list: 83 != 129 when suspending 32GB PV DomU [ In reply to ]
On 11/03/2011 18:52, "Gianni Tedesco" <gianni.tedesco@citrix.com> wrote:

> Further debugging reveals the variables are set as such:
> (XEN) compat_machine_to_phys_mapping = 18446606377058041856
> (XEN) max_page = 67272704
> (XEN) MACH2PHYS_COMPAT_NR_ENTRIES(current->domain) = 43515904
> (XEN) RDWR_COMPAT_MPT_VIRT_START = 18446606377058041856
> (XEN) RDWR_COMPAT_MPT_VIRT_END = 18446606378131783680
> (XEN) limit = 18446606377232105472, (1 << L2_PAGETABLE_SHIFT) = 2097152
>
> Could it be that the compat mach-to-phys conversion table size of 1GB is
> too small?

It is insufficient to cover all of the system's memory. The reason for the
limit is that a 1GB M2P table is all that is reasonable to map into a 32-bit
domain's address space while still leaving space for the guest's own
mappings.

We could make the compat m2p bigger solely for mapping by dom0 when doing
save/restore? However, you'd likely still soon hit the mmap limit for the
save/restore process in dom0. It might extend the lifetime of your 32-bit
dom0 for a bit longer however. Long enough for the lightweight HVM container
for PV guests work to get checked in and make our PV 64-bit guest
performance better.

-- Keir

> Or that there exists some other arbitrary limit on the size
> of domains that can be suspended [when using a 32bit dom0] ?



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: xc: error: xc_machphys_mfn_list: 83 != 129 when suspending 32GB PV DomU [ In reply to ]
On Fri, 2011-03-11 at 19:21 +0000, Keir Fraser wrote:
> On 11/03/2011 18:52, "Gianni Tedesco" <gianni.tedesco@citrix.com> wrote:
>
> > Further debugging reveals the variables are set as such:
> > (XEN) compat_machine_to_phys_mapping = 18446606377058041856
> > (XEN) max_page = 67272704
> > (XEN) MACH2PHYS_COMPAT_NR_ENTRIES(current->domain) = 43515904
> > (XEN) RDWR_COMPAT_MPT_VIRT_START = 18446606377058041856
> > (XEN) RDWR_COMPAT_MPT_VIRT_END = 18446606378131783680
> > (XEN) limit = 18446606377232105472, (1 << L2_PAGETABLE_SHIFT) = 2097152
> >
> > Could it be that the compat mach-to-phys conversion table size of 1GB is
> > too small?
>
> It is insufficient to cover all of the system's memory. The reason for the
> limit is that a 1GB M2P table is all that is reasonable to map into a 32-bit
> domain's address space while still leaving space for the guest's own
> mappings.

The compat M2P actually mapped into the guest isn't 1GB, 1GB would be
the entire kernel mapping with no room for anything else. Also 1GB of
M2P is enough to cover 1TB of host memory so I don't think it's too
small at the moment. Is the limit here not MACH2PHYS_COMPAT_NR_ENTRIES?
(in the above limit == compat_machine_to_phys_mapping + ~160M)

IIRC the size of the M2P which is mapped into a PAE guest is normally
capped at ~160M (the total size of the hypervisor hole for a PAE guest
running on a PAE hypervisor). 160M is enough M2P for 160G of host
address space which would explain why this is seen on a 256GB host but
not a 128GB one.

The limit on the size of the M2P is adjustable, in particular for dom0 I
think it would be reasonable to allow it to expand to, e.g. 256M,
without too much cause for concern.

Obviously this hole eats into the 1GB kernel mapping so you don't want
it to grow too much bigger and long run something better would be needed
but this would probably allow you to support 256GB without too much
trouble in the short term, other than slightly reducing the amount of
lowmem the system sees (which might be an issue if you've chosen
dom0_mem on that basis...)

The lower limit is set by the kernel in its XEN_ELFNOTE_HV_START_LOW ELF
note (set in arch/x86/kernel/head_32-xen.S), which is picked up in
xen/arch/x86/build_domain.c:construct_dom0(). NB: This might be the
first time this functionality has been used in anger to increase the M2P
space (I think it is actively used to shrink it on hosts with <160G).

Another alternative, which would allow large hosts without needing to
expand the dom0 M2P, would be to provide interfaces that allow the tools
to map specific portions of the host M2P so the tools can build
themselves a mapcache style thing. The M2P space which needs to be
accessed to perform a migration of an individual guest is likely going
to be smaller than the total host RAM so even using 256M-512M of guest
user-mode address space (allowing for 256GB-512GB of host address space)
would likely allow you to map the bits you need without excessive churn
(aka performance hit) in the mapping. A given userspace process has 3G
of address space to play with so it can take the hit of increasing the
M2P mapcache size far easier than the kernel can. Hrm, maybe you don't
even need a map cache thing -- just a way to allow a userspace process
to map more M2P than the kernel can... (which might be as simple as
removing the limit clamp based on MACH2PHYS_COMPAT_NR_ENTRIES in the
compat layer?)

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel