Mailing List Archive

[Paging] 1GB pages in PV DomU
Hi,

I need to map large portions (say 64GB) of mini-os virtual address space of
a PV DomU to 1GB machine pages on top of xen-unstable (processor has all
required capabilities and resources).

When creating a 10GB domain, I can find sets of machine 4KB pages in
start_info.mfn_list that can be grouped to build 1GB page but they appear in
"descending" order. So wen I build my domain page table
(arch/x86/mm.c:build_pagetable), I make sure I use 512 adjacent pages that
can be 1GB aligned and try to mark the corresponding L3 table.
In a few words, it looks so ugly that I am sure I am not doing things
straight.

Could you point me to best practice for this?

Cordially,

François-Frédéric

Context/Rationale:
I am porting a special purpose OS designed to handle traffic from multiple
10Gbps port from « bare-metal » to Xen.
Some applications make use of up to 50GB data set. The most precious
resource are the TLB for "huge" pages.
As a matter of fact, Intel processors have plenty of TLB for 4KB pages but a
handful of huge (either 1GB or 2MB) TLB.
So the most efficient organization is this case to only use 4KB and 1GB
pages, no 2MB pages.
And the reason to have in "virtualized", I would rather say "partitioned"
next to a standard Linux is to be able to communicate with "any" Dom0
capable Linux.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [Paging] 1GB pages in PV DomU [ In reply to ]
>>> On 09.01.13 at 16:25, François-Frédéric Ozog<ff@ozog.com> wrote:
> I need to map large portions (say 64GB) of mini-os virtual address space of
> a PV DomU to 1GB machine pages on top of xen-unstable (processor has all
> required capabilities and resources).
>
> When creating a 10GB domain, I can find sets of machine 4KB pages in
> start_info.mfn_list that can be grouped to build 1GB page but they appear in
> "descending" order. So wen I build my domain page table
> (arch/x86/mm.c:build_pagetable), I make sure I use 512 adjacent pages that
> can be 1GB aligned and try to mark the corresponding L3 table.

For one, 512 pages sum up to only 2Mb, not 1Gb.

Next, without enhancing the hypervisor to support this, the
biggest mappings you can create in a PV guest are 2Mb (and
all you should need is enable the support on the Xen command
line and in the guest config file(s) (for those where you need
it). Then the domain builder should be populating the physical
address space with contiguous 2Mb chunks.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: [Paging] 1GB pages in PV DomU [ In reply to ]
Well the allowsuperpage=1 is present on the command line. The dom0-mem is 4GB and autoballooning is disabled.
I googled references to a superpages=1 in the domain config file but I can't find this parameter in the xl parsring so I guess it is not correct.

Here is an output of the console:
Xen Minimal OS!
start_info: 000000000146b000(VA)
nr_pages: 0x280000
shared_inf: 0xa24ee000(MA)
pt_base: 000000000146e000(VA)
nr_pt_frames: 0xf
mfn_list: 000000000006b000(VA)
mod_start: 0x0(VA)
mod_len: 0
flags: 0x0
cmd_line:
stack: 000000000002a560-000000000004a560
MM: Init
_text: 0000000000000000(VA)
_etext: 0000000000017c1e(VA)
_erodata: 000000000001d000(VA)
_edata: 000000000001d480(VA)
stack start: 000000000002a560(VA)
_end: 000000000006aa7c(VA)
start_pfn: 1480
max_pfn: 280000
Mapping memory range 0x1800000 - 0x280000000
setting 0000000000000000-000000000001d000 readonly
skipped 0000000000001000
MM: Initialise page allocator for 287d000(287d000)-280000000(280000000)
MM: done
Demand map pfns at 280001000-2280001000.
Initialising timer interface
Initialising console ... done.
gnttab_table mapped at 0000000280001000.
Initialising scheduler
Thread "Idle": pointer: 0x3cd8070, stack: 0x3ce0000
Thread "xenstore": pointer: 0x3cd80d0, stack: 0x3cf0000
xenbus initialised on irq 1 mfn 0x271fcb
Thread "shutdown": pointer: 0x3cd8130, stack: 0x3d00000
Dummy main: start_info=000000000004a560

The following is the analysis of the virtual address space I get in mini-os, no attempts to group pages:

Guest Virutal address -> Machine address space
0x000000000000 - 0x000000000fff -> not mapped
0x000000001000 - 0x000000001fff -> 0x0000a24ee000 - 0x0000a24eefff (in 1 4KB-pages)
0x000000002000 - 0x000000002fff -> 0x00027dd0c000 - 0x00027dd0cfff (in 1 4KB-pages)
0x000000003000 - 0x000000003fff -> 0x00025fdac000 - 0x00025fdacfff (in 1 4KB-pages)
0x000000004000 - 0x000000005fff -> 0x0002752e2000 - 0x0002752e5fff (in 2 4KB-pages desc)
0x000000006000 - 0x000000007fff -> 0x0002712e4000 - 0x0002712e7fff (in 2 4KB-pages desc)
0x000000008000 - 0x000000009fff -> 0x00027facc000 - 0x00027facffff (in 2 4KB-pages desc)
0x00000000a000 - 0x00000000bfff -> 0x00027cdac000 - 0x00027cdaffff (in 2 4KB-pages desc)
0x00000000c000 - 0x00000000dfff -> 0x00027d9ec000 - 0x00027d9effff (in 2 4KB-pages desc)
0x00000000e000 - 0x00000000ffff -> 0x00027e0ac000 - 0x00027e0affff (in 2 4KB-pages desc)
0x000000010000 - 0x000000011fff -> 0x00025fcac000 - 0x00025fcaffff (in 2 4KB-pages desc)
0x000000012000 - 0x000000013fff -> 0x00027e7ac000 - 0x00027e7affff (in 2 4KB-pages desc)
0x000000014000 - 0x000000015fff -> 0x00027c8ac000 - 0x00027c8affff (in 2 4KB-pages desc)
0x000000016000 - 0x000000017fff -> 0x00027e40c000 - 0x00027e40ffff (in 2 4KB-pages desc)
0x000000018000 - 0x000000019fff -> 0x00027e2ac000 - 0x00027e2affff (in 2 4KB-pages desc)
0x00000001a000 - 0x00000001bfff -> 0x00027f6ac000 - 0x00027f6affff (in 2 4KB-pages desc)
0x00000001c000 - 0x00000001dfff -> 0x00027ecac000 - 0x00027ecaffff (in 2 4KB-pages desc)
0x00000001e000 - 0x00000001ffff -> 0x00027c10c000 - 0x00027c10ffff (in 2 4KB-pages desc)
0x000000020000 - 0x000000021fff -> 0x00027c0f4000 - 0x00027c0f7fff (in 2 4KB-pages desc)
0x000000022000 - 0x000000023fff -> 0x00027c19c000 - 0x00027c19ffff (in 2 4KB-pages desc)
0x000000024000 - 0x000000025fff -> 0x00027c184000 - 0x00027c187fff (in 2 4KB-pages desc)
0x000000026000 - 0x000000027fff -> 0x00027c16c000 - 0x00027c16ffff (in 2 4KB-pages desc)
0x000000028000 - 0x000000029fff -> 0x00027c14c000 - 0x00027c14ffff (in 2 4KB-pages desc)
0x00000002a000 - 0x00000002bfff -> 0x00027c134000 - 0x00027c137fff (in 2 4KB-pages desc)
0x00000002c000 - 0x00000002dfff -> 0x00027c39c000 - 0x00027c39ffff (in 2 4KB-pages desc)
0x00000002e000 - 0x00000002ffff -> 0x00027c384000 - 0x00027c387fff (in 2 4KB-pages desc)
0x000000030000 - 0x000000031fff -> 0x00027c36c000 - 0x00027c36ffff (in 2 4KB-pages desc)
0x000000032000 - 0x000000033fff -> 0x00027c35c000 - 0x00027c35ffff (in 2 4KB-pages desc)
0x000000034000 - 0x000000037fff -> 0x000260c46000 - 0x000260c4bfff (in 4 4KB-pages desc)
0x000000038000 - 0x00000003bfff -> 0x0002752de000 - 0x0002752e3fff (in 4 4KB-pages desc)
0x00000003c000 - 0x00000003ffff -> 0x00027fb26000 - 0x00027fb2bfff (in 4 4KB-pages desc)
0x000000040000 - 0x000000043fff -> 0x00027ce06000 - 0x00027ce0bfff (in 4 4KB-pages desc)
0x000000044000 - 0x000000047fff -> 0x00027dd06000 - 0x00027dd0bfff (in 4 4KB-pages desc)
0x000000048000 - 0x00000004bfff -> 0x00027e1a6000 - 0x00027e1abfff (in 4 4KB-pages desc)
0x00000004c000 - 0x00000004ffff -> 0x00025fda6000 - 0x00025fdabfff (in 4 4KB-pages desc)
0x000000050000 - 0x000000053fff -> 0x00027e8a6000 - 0x00027e8abfff (in 4 4KB-pages desc)
0x000000054000 - 0x000000057fff -> 0x00027caa6000 - 0x00027caabfff (in 4 4KB-pages desc)
0x000000058000 - 0x00000005bfff -> 0x00027e3e6000 - 0x00027e3ebfff (in 4 4KB-pages desc)
0x00000005c000 - 0x00000005ffff -> 0x00027e6a6000 - 0x00027e6abfff (in 4 4KB-pages desc)
0x000000060000 - 0x000000063fff -> 0x00027f8a6000 - 0x00027f8abfff (in 4 4KB-pages desc)
0x000000064000 - 0x000000067fff -> 0x00027f0a6000 - 0x00027f0abfff (in 4 4KB-pages desc)
0x000000068000 - 0x00000006bfff -> 0x00027c0fe000 - 0x00027c103fff (in 4 4KB-pages desc)
0x00000006c000 - 0x00000006ffff -> 0x00027c1a6000 - 0x00027c1abfff (in 4 4KB-pages desc)
0x000000070000 - 0x000000073fff -> 0x00027c18e000 - 0x00027c193fff (in 4 4KB-pages desc)
0x000000074000 - 0x000000077fff -> 0x00027c176000 - 0x00027c17bfff (in 4 4KB-pages desc)
0x000000078000 - 0x00000007bfff -> 0x00027c376000 - 0x00027c37bfff (in 4 4KB-pages desc)
0x00000007c000 - 0x00000007ffff -> 0x00027c35e000 - 0x00027c363fff (in 4 4KB-pages desc)
0x000000080000 - 0x000000083fff -> 0x00027c15e000 - 0x00027c163fff (in 4 4KB-pages desc)
0x000000084000 - 0x000000087fff -> 0x00027c13e000 - 0x00027c143fff (in 4 4KB-pages desc)
0x000000088000 - 0x00000008bfff -> 0x00027c3a6000 - 0x00027c3abfff (in 4 4KB-pages desc)
0x00000008c000 - 0x00000008ffff -> 0x00027c38e000 - 0x00027c393fff (in 4 4KB-pages desc)
0x000000090000 - 0x000000097fff -> 0x00025fd9e000 - 0x00025fda7fff (in 8 4KB-pages desc)
0x000000098000 - 0x00000009ffff -> 0x00027e89e000 - 0x00027e8a7fff (in 8 4KB-pages desc)
0x0000000a0000 - 0x0000000a7fff -> 0x00027ca9e000 - 0x00027caa7fff (in 8 4KB-pages desc)
0x0000000a8000 - 0x0000000affff -> 0x00027e3de000 - 0x00027e3e7fff (in 8 4KB-pages desc)
0x0000000b0000 - 0x0000000b7fff -> 0x00027e69e000 - 0x00027e6a7fff (in 8 4KB-pages desc)
0x0000000b8000 - 0x0000000bffff -> 0x00027f89e000 - 0x00027f8a7fff (in 8 4KB-pages desc)
0x0000000c0000 - 0x0000000c7fff -> 0x00027f09e000 - 0x00027f0a7fff (in 8 4KB-pages desc)
0x0000000c8000 - 0x0000000cffff -> 0x00027c0f6000 - 0x00027c0fffff (in 8 4KB-pages desc)
0x0000000d0000 - 0x0000000d7fff -> 0x00027c19e000 - 0x00027c1a7fff (in 8 4KB-pages desc)
0x0000000d8000 - 0x0000000dffff -> 0x00027c186000 - 0x00027c18ffff (in 8 4KB-pages desc)
0x0000000e0000 - 0x0000000e7fff -> 0x00027c16e000 - 0x00027c177fff (in 8 4KB-pages desc)
0x0000000e8000 - 0x0000000effff -> 0x00027c136000 - 0x00027c13ffff (in 8 4KB-pages desc)
0x0000000f0000 - 0x0000000f7fff -> 0x00027c39e000 - 0x00027c3a7fff (in 8 4KB-pages desc)
0x0000000f8000 - 0x0000000fffff -> 0x00027c386000 - 0x00027c38ffff (in 8 4KB-pages desc)
0x000000100000 - 0x000000107fff -> 0x00027c36e000 - 0x00027c377fff (in 8 4KB-pages desc)
0x000000108000 - 0x000000117fff -> 0x0002712ee000 - 0x0002712fffff (in 16 4KB-pages desc)
0x000000118000 - 0x000000127fff -> 0x00027face000 - 0x00027fadffff (in 16 4KB-pages desc)
0x000000128000 - 0x000000137fff -> 0x00027cdae000 - 0x00027cdbffff (in 16 4KB-pages desc)
0x000000138000 - 0x000000147fff -> 0x00027d9ee000 - 0x00027d9fffff (in 16 4KB-pages desc)
0x000000148000 - 0x000000157fff -> 0x00027e0ae000 - 0x00027e0bffff (in 16 4KB-pages desc)
0x000000158000 - 0x000000167fff -> 0x00025fcae000 - 0x00025fcbffff (in 16 4KB-pages desc)
0x000000168000 - 0x000000177fff -> 0x00027e7ae000 - 0x00027e7bffff (in 16 4KB-pages desc)
0x000000178000 - 0x000000187fff -> 0x00027c8ae000 - 0x00027c8bffff (in 16 4KB-pages desc)
0x000000188000 - 0x000000197fff -> 0x00027e40e000 - 0x00027e41ffff (in 16 4KB-pages desc)
0x000000198000 - 0x0000001a7fff -> 0x00027e2ae000 - 0x00027e2bffff (in 16 4KB-pages desc)
0x0000001a8000 - 0x0000001b7fff -> 0x00027f6ae000 - 0x00027f6bffff (in 16 4KB-pages desc)
0x0000001b8000 - 0x0000001c7fff -> 0x00027ecae000 - 0x00027ecbffff (in 16 4KB-pages desc)
0x0000001c8000 - 0x0000001d7fff -> 0x00027c14e000 - 0x00027c15ffff (in 16 4KB-pages desc)
0x0000001d8000 - 0x000000237fff -> 0x00027e2fe000 - 0x00027e35ffff (in 96 4KB-pages desc)
0x000000238000 - 0x000000437fff -> 0x000274ffe000 - 0x0002751fffff (in 512 4KB-pages desc)
0x000000438000 - 0x000000837fff -> 0x0002713fe000 - 0x0002717fffff (in 1024 4KB-pages desc)
0x000000838000 - 0x000001437fff -> 0x00025fffe000 - 0x000260bfffff (in 3072 4KB-pages desc)
0x000001438000 - 0x000001c37fff -> 0x0002717fe000 - 0x000271ffffff (in 2048 4KB-pages desc)
0x000001c38000 - 0x000004c37fff -> 0x000271ffe000 - 0x000274ffffff (in 12288 4KB-pages desc)
0x000004c38000 - 0x000044c37fff -> 0x0002ffffe000 - 0x00033fffffff (in 262144 4KB-pages desc)
0x000044c38000 - 0x000084c37fff -> 0x00033fffe000 - 0x00037fffffff (in 262144 4KB-pages desc)
0x000084c38000 - 0x0000c4c37fff -> 0x00037fffe000 - 0x0003bfffffff (in 262144 4KB-pages desc)
0x0000c4c38000 - 0x000104c37fff -> 0x0003bfffe000 - 0x0003ffffffff (in 262144 4KB-pages desc)
0x000104c38000 - 0x000144c37fff -> 0x0001ffffe000 - 0x00023fffffff (in 262144 4KB-pages desc)
0x000144c38000 - 0x000184c37fff -> 0x0000ffffe000 - 0x00013fffffff (in 262144 4KB-pages desc)
0x000184c38000 - 0x0001c4c37fff -> 0x00013fffe000 - 0x00017fffffff (in 262144 4KB-pages desc)
0x0001c4c38000 - 0x000204c37fff -> 0x00017fffe000 - 0x0001bfffffff (in 262144 4KB-pages desc)
0x000204c38000 - 0x000244c37fff -> 0x0001bfffe000 - 0x0001ffffffff (in 262144 4KB-pages desc)
0x000244c38000 - 0x000244c44fff -> 0x0000a24de000 - 0x0000a24ecfff (in 13 4KB-pages desc)
0x000244c45000 - 0x000244c64fff -> 0x0000a271e000 - 0x0000a273ffff (in 32 4KB-pages desc)
0x000244c65000 - 0x000244c84fff -> 0x0000a24be000 - 0x0000a24dffff (in 32 4KB-pages desc)
0x000244c85000 - 0x000244cc4fff -> 0x0000a273e000 - 0x0000a277ffff (in 64 4KB-pages desc)
0x000244cc5000 - 0x000244d04fff -> 0x0000a247e000 - 0x0000a24bffff (in 64 4KB-pages desc)
0x000244d05000 - 0x000244d84fff -> 0x0000a277e000 - 0x0000a27fffff (in 128 4KB-pages desc)
0x000244d85000 - 0x00027fffffff -> 0x000067203000 - 0x0000a247ffff (in 242299 4KB-pages desc)
0x000280000000 - 0x000280000fff -> not mapped
0x000280001000 - 0x000280001fff -> 0x00027e7ad000 - 0x00027e7adfff (in 1 4KB-pages)
0x000280002000 - 0x000280002fff -> 0x00027c385000 - 0x00027c385fff (in 1 4KB-pages)
0x000280003000 - 0x000280003fff -> 0x00027c37c000 - 0x00027c37cfff (in 1 4KB-pages)
0x000280004000 - 0x000280004fff -> 0x00027c36d000 - 0x00027c36dfff (in 1 4KB-pages)
0x000280005000 - 0x7fffffffffff -> not mapped

PS: As I lately downgraded the tests to 2MB I kept the value 512 in mind, but I used 256K pages for the 1GB tests.

-----Message d'origine-----
De : Jan Beulich [mailto:JBeulich@suse.com]
Envoyé : mercredi 9 janvier 2013 16:36
À : François-Frédéric Ozog
Cc : xen-devel@lists.xen.org
Objet : Re: [Xen-devel] [Paging] 1GB pages in PV DomU

>>> On 09.01.13 at 16:25, François-Frédéric Ozog<ff@ozog.com> wrote:
> I need to map large portions (say 64GB) of mini-os virtual address
> space of a PV DomU to 1GB machine pages on top of xen-unstable
> (processor has all required capabilities and resources).
>
> When creating a 10GB domain, I can find sets of machine 4KB pages in
> start_info.mfn_list that can be grouped to build 1GB page but they
> appear in "descending" order. So wen I build my domain page table
> (arch/x86/mm.c:build_pagetable), I make sure I use 512 adjacent pages
> that can be 1GB aligned and try to mark the corresponding L3 table.

For one, 512 pages sum up to only 2Mb, not 1Gb.

Next, without enhancing the hypervisor to support this, the biggest mappings you can create in a PV guest are 2Mb (and all you should need is enable the support on the Xen command line and in the guest config file(s) (for those where you need it). Then the domain builder should be populating the physical address space with contiguous 2Mb chunks.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel