Mailing List Archive

[PATCH v3] xen block backend.
The following patch implements the Xen block backend driver for
upstream Linux. This is the host side counterpart to the frontend driver
in drivers/block/xen-blkfront.c. The PV protocol is also implemented by
frontend drivers in other OSes too, such as the BSDs and even Windows.

This driver has a long history as an out of tree driver but I am
submitting it here as a single monolithic patch to aid review. Once it
has been reviewed and is considered suitable for merging can we perhaps
consider merging the equivalent git branch which maintains much of
history?

The patch is based on the driver from the xen.git pvops kernel tree but
has been put through the checkpatch.pl wringer plus several manual
cleanup passes. It has also been moved from drivers/xen/blkback to
drivers/block/xen-blback.

The git tree of the broken out driver is available at:

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/xen-blkback-v3

Note that this driver depends on couple of fixes to the Xen backend
subsystem. Those are available in devel/backend.base.v3 branch and are
intended for the 2.6.40 merge window.

drivers/block/Kconfig | 8 +
drivers/block/Makefile | 1 +
drivers/block/xen-blkback/Makefile | 3 +
drivers/block/xen-blkback/blkback.c | 828 +++++++++++++++++++++++++++++++++++
drivers/block/xen-blkback/common.h | 128 ++++++
drivers/block/xen-blkback/xenbus.c | 764 ++++++++++++++++++++++++++++++++
include/xen/blkif.h | 122 +++++
7 files changed, 1854 insertions(+), 0 deletions(-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH v3] xen block backend. [ In reply to ]
This should sit in userspace. And last time was discussed the issue
Stefano said the qemu Xen disk backend is just as fast as this kernel
code. And that's with an not even very optimized codebase yet.

So clear NAK for adding all this mess to the kernel.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: [PATCH v3] xen block backend. [ In reply to ]
On Thu, 2011-04-21 at 04:37 +0100, Christoph Hellwig wrote:
> This should sit in userspace. And last time was discussed the issue
> Stefano said the qemu Xen disk backend is just as fast as this kernel
> code. And that's with an not even very optimized codebase yet.

Stefano was comparing qdisk to blktap. This patch is blkback which is a
completely in-kernel driver which exports raw block devices to guests,
e.g. it's very useful in conjunction with LVM, iSCSI, etc. The last
measurements I heard was that qdisk was around 15% down compared to
blkback.

By contrast blktap has a userspace component so it's not all that
surprising that it turns out to be roughly equivalent to qdisk. (bear in
mind that Stefano's tests were very rough and ready initial tests, not
that anyone expects a more thorough benchmarking treatment to really
change the result). Nobody I know of thinks blktap should go upstream
since as you say there is no reason not to punt the kernel side part
into userspace too.

Ian.

> So clear NAK for adding all this mess to the kernel.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: [PATCH v3] xen block backend. [ In reply to ]
On Thu, 2011-04-21 at 08:28 +0100, Ian Campbell wrote:
> By contrast blktap has a userspace component so it's not all that
> surprising that it turns out to be roughly equivalent to qdisk. (bear in
> mind that Stefano's tests were very rough and ready initial tests, not
> that anyone expects a more thorough benchmarking treatment to really
> change the result). Nobody I know of thinks blktap should go upstream
> since as you say there is no reason not to punt the kernel side part
> into userspace too.

BTW about the only nice property blktap has as it currently stands over
this plan is that it exports an actual block device from vhd, qcow etc
files (in some sense blktap is a loopback driver for complex disk image
file formats). It turns out to occasionally be quite useful to be able
to mount such files, even on non-virtualisation systems (in its current
incarnation blktap has no dependency on Xen).

Having removed the kernel component (or switched to qdisk) we will
probably end up running a blkfront to provide such block devices (sadly
Xen dependent) or, more likely, putting something like an NBD server
into the userspace process.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: [PATCH v3] xen block backend. [ In reply to ]
On Thu, Apr 21, 2011 at 08:28:45AM +0100, Ian Campbell wrote:
> On Thu, 2011-04-21 at 04:37 +0100, Christoph Hellwig wrote:
> > This should sit in userspace. And last time was discussed the issue
> > Stefano said the qemu Xen disk backend is just as fast as this kernel
> > code. And that's with an not even very optimized codebase yet.
>
> Stefano was comparing qdisk to blktap. This patch is blkback which is a
> completely in-kernel driver which exports raw block devices to guests,
> e.g. it's very useful in conjunction with LVM, iSCSI, etc. The last
> measurements I heard was that qdisk was around 15% down compared to
> blkback.

Please show real numbers on why adding this to kernel space is required.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: [PATCH v3] xen block backend. [ In reply to ]
On Thu, Apr 21, 2011 at 09:03:23AM +0100, Ian Campbell wrote:
> BTW about the only nice property blktap has as it currently stands over
> this plan is that it exports an actual block device from vhd, qcow etc
> files (in some sense blktap is a loopback driver for complex disk image
> file formats). It turns out to occasionally be quite useful to be able
> to mount such files, even on non-virtualisation systems (in its current
> incarnation blktap has no dependency on Xen).

You can already do that using qemu-nbd today. In most cases the image
format support in qemu is much better than in the various Xen trees
anyway, with vhd beeing the only one that looks potentially better in
Xen.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: [PATCH v3] xen block backend. [ In reply to ]
On Thu, 2011-04-21 at 09:06 +0100, Christoph Hellwig wrote:
> On Thu, Apr 21, 2011 at 09:03:23AM +0100, Ian Campbell wrote:
> > BTW about the only nice property blktap has as it currently stands over
> > this plan is that it exports an actual block device from vhd, qcow etc
> > files (in some sense blktap is a loopback driver for complex disk image
> > file formats). It turns out to occasionally be quite useful to be able
> > to mount such files, even on non-virtualisation systems (in its current
> > incarnation blktap has no dependency on Xen).
>
> You can already do that using qemu-nbd today.

Good to know, I'd had a vague feeling this was possible but hadn't
looked into how.

> In most cases the image format support in qemu is much better than in the various Xen trees
> anyway, with vhd beeing the only one that looks potentially better in Xen.

That's about what I reckon too.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Re: [PATCH v3] xen block backend. [ In reply to ]
On Thu, Apr 21, 2011 at 04:04:12AM -0400, Christoph Hellwig wrote:
> On Thu, Apr 21, 2011 at 08:28:45AM +0100, Ian Campbell wrote:
> > On Thu, 2011-04-21 at 04:37 +0100, Christoph Hellwig wrote:
> > > This should sit in userspace. And last time was discussed the issue
> > > Stefano said the qemu Xen disk backend is just as fast as this kernel
> > > code. And that's with an not even very optimized codebase yet.
> >
> > Stefano was comparing qdisk to blktap. This patch is blkback which is a
> > completely in-kernel driver which exports raw block devices to guests,
> > e.g. it's very useful in conjunction with LVM, iSCSI, etc. The last
> > measurements I heard was that qdisk was around 15% down compared to
> > blkback.
>
> Please show real numbers on why adding this to kernel space is required.

First off, many thanks go out to Alyssa Wilk and Vivek Goyal.

Alyssa for cluing me on the CPU banding problem (on the first machine I was
doing the testing I hit the CPU ceiling and had quite skewed results).
Vivek for helping me figure out why the kernel blkback was sucking when a READ
request got added on the stream of WRITEs with CFQ scheduler (I did not the
REQ_SYNC on the WRITE request).

The setup is as follow:

iSCSI target - running Linux v2.6.39-rc4 with TCM LIO-4.1 patches (which
provide iSCSI and Fibre target support) [1]. I export this 10GB RAMdisk over
a 1GB network connection.

iSCSI initiator - Sandy Bridge i3-2100 3.1GHz w/8GB, runs v2.6.39-rc4
with pv-ops patches [2]. Either 32-bit or 64-bit, and with Xen-unstable
(c/s 23246), Xen QEMU (e073e69457b4d99b6da0b6536296e3498f7f6599) with
one patch to enable aio [3]. Upstream QEMU version is quite close to this
one (it has a bug-fix in it). Memory limited to Dom0/DomU to 2GB.
I boot of PXE and run everything from the ramdisk.

The kernel/initramfs that I am using for this testing is the same
throughout and is based off VirtualIron's build system [4].

There are two tests, each test is run three times.

The first is random writes of 64K across the disk with four threads
doing this pounding. The results are in the 'randw-bw.png' file.

The second is based off IOMeter - it does random reads (20%) and writes
(80%), with various byte sizes : from 512 bytes up to 64K - two threads
doing it. The results are in the 'iometer-bw.png' file.

Attached is also the 'write' and 'iometer' fio files I used.

The guest config files are quite simple. They look as so:

kernel="/mnt/lab/latest/vmlinuz"
ramdisk="/mnt/lab/latest/initramfs.cpio.gz"
extra="console=hvc0 debug earlyprintk=xenboot"
memory=2048
maxmem=2048
vcpus=2
name="phy-xvda"
on_crash="preserve"
vif = [ 'bridge=switch' ]
vfb = [ 'vnc=1, vnclisten=0.0.0.0,vncunused=1']
disk = [ 'phy:/dev/sdb,xvda,w']

or to use QEMU qdisk:

kernel="/mnt/lab/latest/vmlinuz"
ramdisk="/mnt/lab/latest/initramfs.cpio.gz"
extra="console=hvc0 debug earlyprintk=xenboot"
memory=2048
maxmem=2048
vcpus=2
name="qdisk-xvda"
on_crash="preserve"
vif = [ 'bridge=switch' ]
vfb = [ 'vnc=1, vnclisten=0.0.0.0,vncunused=1']
disk = [ 'file:/dev/sdb,xvda,w']

/dev/sdb is naturally the LIO TCM RAMDISK.

[1]: git://git.kernel.org/pub/scm/linux/kernel/git/nab/lio-core-2.6.git #lio-4.1
[2]: git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git #devel/next-2.6.39
[3]: http://darnok.org/xen/qdisk_vs_blkback_v3.1/qemu-enable-aio.patch
[4]: git://xenbits.xensource.com/xentesttools/bootstrap.git
Re: Re: [PATCH v3] xen block backend. [ In reply to ]
On Wed, Apr 27, 2011 at 06:06:34PM -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Apr 21, 2011 at 04:04:12AM -0400, Christoph Hellwig wrote:
> > On Thu, Apr 21, 2011 at 08:28:45AM +0100, Ian Campbell wrote:
> > > On Thu, 2011-04-21 at 04:37 +0100, Christoph Hellwig wrote:
> > > > This should sit in userspace. And last time was discussed the issue
> > > > Stefano said the qemu Xen disk backend is just as fast as this kernel
> > > > code. And that's with an not even very optimized codebase yet.
> > >
> > > Stefano was comparing qdisk to blktap. This patch is blkback which is a
> > > completely in-kernel driver which exports raw block devices to guests,
> > > e.g. it's very useful in conjunction with LVM, iSCSI, etc. The last
> > > measurements I heard was that qdisk was around 15% down compared to
> > > blkback.
> >
> > Please show real numbers on why adding this to kernel space is required.
>
> First off, many thanks go out to Alyssa Wilk and Vivek Goyal.
>
> Alyssa for cluing me on the CPU banding problem (on the first machine I was
> doing the testing I hit the CPU ceiling and had quite skewed results).
> Vivek for helping me figure out why the kernel blkback was sucking when a READ
> request got added on the stream of WRITEs with CFQ scheduler (I did not the
> REQ_SYNC on the WRITE request).
>
> The setup is as follow:
>
> iSCSI target - running Linux v2.6.39-rc4 with TCM LIO-4.1 patches (which
> provide iSCSI and Fibre target support) [1]. I export this 10GB RAMdisk over
> a 1GB network connection.
>
> iSCSI initiator - Sandy Bridge i3-2100 3.1GHz w/8GB, runs v2.6.39-rc4
> with pv-ops patches [2]. Either 32-bit or 64-bit, and with Xen-unstable
> (c/s 23246), Xen QEMU (e073e69457b4d99b6da0b6536296e3498f7f6599) with
> one patch to enable aio [3]. Upstream QEMU version is quite close to this
> one (it has a bug-fix in it). Memory limited to Dom0/DomU to 2GB.
> I boot of PXE and run everything from the ramdisk.
>
> The kernel/initramfs that I am using for this testing is the same
> throughout and is based off VirtualIron's build system [4].
>
> There are two tests, each test is run three times.
>
> The first is random writes of 64K across the disk with four threads
> doing this pounding. The results are in the 'randw-bw.png' file.
>
> The second is based off IOMeter - it does random reads (20%) and writes
> (80%), with various byte sizes : from 512 bytes up to 64K - two threads
> doing it. The results are in the 'iometer-bw.png' file.
>

A summary for those who don't bother checking the attachments :)

xen-blkback (kernel) backend seems to perform a lot better
than qemu qdisc (usermode) backend.

Also cpu-usage is smaller with the kernel-backend driver.
Detailed numbers in the attachments in Konrad's previous email.

-- Pasi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel