Mailing List Archive

Re: [patch] Add support for barriers to blk{back,front} drivers.
Ian Pratt wrote:
>
>> It's about correctness, not about performance. I've talked
>> with Jens Axboe about it a while ago. Barriers are
>> *required* for journaling filesystems to work reliable.
>
> I just don't believe that. If the underlying device doesn't support
> barriers Linux should just stop issuing requests until it has the
> completion notifcation back for *all* the outstanding IOs to the device,
> and then start issuing the new batch of IOs.

That is what actually happens, that isn't enough in all cases though.

> I'd be incredibly surprised if this is not what Linux does today for
> devices that don't support barriers. [.NB: you still have the issue of
> disks that support write caching and lie and send the completion before
> data has hit the disk, but there's nothing you can do about that from
> the OS]

The write caching case is exactly the problematic one. If you turn off
the disks write caching you are fine I think. With write caching
enabled journaling is safe only together with barriers. The kernel can
either use request tagging so the drive itself makes sure the write
order is correct or explicitly ask the drive to flush the write cache,
depending on what the hardware supports.

> I'd certainly be interested to see some benchmarks with and without the
> barrier support in blkfront/back.

I'll go run some ...

cheers,
Gerd

--
Gerd Hoffmann <kraxel@suse.de>
http://www.suse.de/~kraxel/julika-dora.jpeg

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [patch] Add support for barriers to blk{back,front} drivers. [ In reply to ]
Gerd Hoffmann wrote:
> Ian Pratt wrote:

>> I'd certainly be interested to see some benchmarks with and without the
>> barrier support in blkfront/back.
>
> I'll go run some ...

I did some bonnie++ runs, each test three times. Complete set of
results, also the scripts I've used to run stuff, are at
http://www.suse.de/~kraxel/bb/. Handle the scripts with care, there is
a mkfs call in there ;)

I've picked two numbers (sequential block writes, sequential file
creation) which should have a bunch barrier requests for journal updates.

Machine is a intel devel box with hyperthreading, 1GB of memory, sata
disk. I've used the ext3 filesystem, freshly created on a lvm volume,
mounted with barrier={0,1} and data=journal (to stress journaling and
barriers ...).

Legend:
wce0 - write cache enable = off (hdparm -W0 /dev/sda)
wce1 - write cache enable = on (hdparm -W1 /dev/sda)
barrier0 - barriers off (mount -o barrier=0)
barrier1 - barriers on

Here are the dom0 results:

name blkout create
---------------------------------------
dom0-wce0-barrier1 12614 1841
12598 975
12575 1102
dom0-wce0-barrier0 12656 1320
12569 1629
12669 1313
dom0-wce1-barrier1 18457 2801
18395 3316
18560 2948
dom0-wce1-barrier0 19224 3190
18880 3079
19121 3555

With write caching disabled barriers don't make a big difference, it's
in the noise. Not surprising. With write caching enabled barriers make
things a bit slower. But keep in mind that journaling filesystems can
NOT work reliable without barriers when write caching is enabled. The
small performance penalty simply is the price to have to pay for
filesystem integrity. The other way to make sure the journaling works
ok is to disable write caching. As you can see this costs even more
performance.

domU results (guest with 256 MB memory, separate table due to the
results not being directly comparable with the dom0 ones).

name blkout create
---------------------------------------
domU-wce0-barrier1 9247 866
9167 1265
9591 1131
domU-wce0-barrier0 9413 882
9414 1082
9113 942
domU-wce1-barrier1 21134 4428
19782 3507
19813 3810
domU-wce1-barrier0 20065 3411
20451 4342
20312 4672

Look simliar, but a bit more noisy, so there isn't a clear difference
visible for barriers being on/off ...

cheers,
Gerd

--
Gerd Hoffmann <kraxel@suse.de>
http://www.suse.de/~kraxel/julika-dora.jpeg

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [patch] Add support for barriers to blk{back,front} drivers. [ In reply to ]
Ian Pratt wrote:
> What's the current logic for determining whether the backend advertises
> barrier support, and whether the frontend chooses to use it?

Backend: "feature-barrier" node in xenstore. It means the backend
understands the new BLKIF_OP_WRITE_BARRIER operation. The node can be
either 1 or 0, depending on whenever the underlying block device
actually supports barriers or not. Initially it is '1' unconditionally,
the only way to figure whenever the underlying block device supports
barriers is to actually submit one and see if it works. If a barrier
write fails with -EOPNOTSUPP the backend changes the node to '0'.

The error is also propagated to the frontend so it knows barriers don't
work (and can in turn propagate that up to the filesystem driver), the
new BLKIF_RSP_EOPNOTSUPP error code is needed for that.

Frontend: It simply submits barrier writes ;) The backend takes care
that the new error code is used for barrier writes only (it should never
ever happen for normal writes anyway), so old frontends which don't know
about barriers (and thus never submit barrier write requests) should
never ever see the new error code.

> I guess the backend could always advertise it providing it did the
> appropriate queue drain whenever it encoutered one if the underlying
> block device didn't support barriers.

The filesystems do some best-effort handling when barrier are not
available anyway (which works ok for the non-write caching case). IMO
the best way to handle non-working barriers is to simply let the
filesystems know, which is exactly what the patch implements: It passes
through the capabilities of the underlying block device to the domU
instead of trying to fake something which isn't there.

cheers,
Gerd

--
Gerd Hoffmann <kraxel@suse.de>
http://www.suse.de/~kraxel/julika-dora.jpeg

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [patch] Add support for barriers to blk{back,front} drivers. [ In reply to ]
Hi,

> What block devices currently support barriers? I assume device mapper
> does (if the underlying devices do), but loop presumably doesn't.

ide does, sata does (at least ide also depending on disk capabilities),
lvm on top of these does too.

loop doesn't, I have a patch in the queue though:
http://www.suse.de/~kraxel/patches/kraxel-unstable-zweiblum-hg11870-quilt/loop-barrier.diff

>> The error is also propagated to the frontend so it knows
>> barriers don't work (and can in turn propagate that up to the
>> filesystem driver), the new BLKIF_RSP_EOPNOTSUPP error code
>> is needed for that.
>
> Do you do a probe at backendd initialisation time to avoid telling the
> frontend that barriers are supported and then having to tell it they are
> not the first time it tries to use one? Althoug Linux may be happy with
> that its not a clean interface.

Well, there is no way around that. The only way to probe this reliably
is to actually submit a barrier request to the hardware and see if they
work ok (due to some buggy drives, telling they support the required
features but grok if such a request is submitted).

There are likely cases where it is possible to figure it in advance and
avoid the "submit barrier req and see it fail" roundtrip. Havn't
investigated that. I'd prefer to have *one* code path only, if you have
two the unlikely of them tends to get much less tested ...

cheers,
Gerd

--
Gerd Hoffmann <kraxel@suse.de>
http://www.suse.de/~kraxel/julika-dora.jpeg

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [patch] Add support for barriers to blk{back,front} drivers. [ In reply to ]
Hi,

ping. What is the status of this? Any remaining issues to be solved?

cheers,
Gerd

--
Gerd Hoffmann <kraxel@suse.de>
http://www.suse.de/~kraxel/julika-dora.jpeg

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: [patch] Add support for barriers to blk{back, front} drivers. [ In reply to ]
> -----Original Message-----
> From: Gerd Hoffmann [mailto:kraxel@suse.de]
> Sent: 08 November 2006 15:38
> To: Gerd Hoffmann
> Cc: Ian Pratt; xen-devel@lists.xensource.com; ian.pratt@cl.cam.ac.uk
> Subject: Re: [Xen-devel] [patch] Add support for barriers to
> blk{back,front} drivers.
>
>
> ping. What is the status of this? Any remaining issues to be solved?

I'd like to hear from the Solaris and BSD folks whether the linux notion
of a barrier is stronger than used by those OSes (I know at least one OS
that treats barriers as being between requests rather than implicitly
both sides of a request as in Linux) and whether they feel that this
would actually make any practical performance difference).

It would also be good to know whether the other OSes could cope with
linux's odd way of determining whether barriers are supported (i.e. send
requests assuming they are and get failures if not).

Ian


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [patch] Add support for barriers to blk{back,front} drivers. [ In reply to ]
What happens (even on native Linux) if you have, say, a RAID array in which
some of the discs support barriers and others don't? For example, you have a
mirror set where one of the discs supports barriers but the others don't:
what status gets returned? Or you have a RAID-0 set, so that some barrier
writes can succeed and others can't: must the filesystem handle
barrier_write failures at any time?

-- Keir

On 8/11/06 3:38 pm, "Gerd Hoffmann" <kraxel@suse.de> wrote:

> Hi,
>
> ping. What is the status of this? Any remaining issues to be solved?
>
> cheers,
> Gerd



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [patch] Add support for barriers to blk{back,front} drivers. [ In reply to ]
Keir Fraser wrote:
> What happens (even on native Linux) if you have, say, a RAID array in which
> some of the discs support barriers and others don't?

The raid0 driver doesn't support barriers in the first place. Not sure
about the other raid drivers.

> Or you have a RAID-0 set, so that some barrier
> writes can succeed and others can't:

I think that can never ever happen. To maintain correct barrier
semantics for the complete raid0 it is not enough to simply pass down
the barrier request to the disk which happens to hold the block in
question. You have to somehow synchronize the disks as well.

> must the filesystem handle
> barrier_write failures at any time?

The linux filesystems do. The usual way to handle the error is to
submit the request again as normal request and turn off the barrier flag
so the filesystem driver wouldn't use barrier requests again (and enter
fallback mode). It doesn't matter whenever that happens on the very
first or any subsequent barrier write request.

As far I know barriers are always a all-or-nothing thingy. Either they
work or they don't. Only problem is that there is no reliable way to
figure without actually submitting a barrier request. I don't like it
that much either, but the existing hardware forces us to handle it that
way. Life sucks sometimes ...

cheers,
Gerd

--
Gerd Hoffmann <kraxel@suse.de>
http://www.suse.de/~kraxel/julika-dora.jpeg

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [patch] Add support for barriers to blk{back,front} drivers. [ In reply to ]
On 9/11/06 12:48, "Gerd Hoffmann" <kraxel@suse.de> wrote:

>> What happens (even on native Linux) if you have, say, a RAID array in which
>> some of the discs support barriers and others don't?
>
> The raid0 driver doesn't support barriers in the first place. Not sure
> about the other raid drivers.

Does this mean journalling filesystems cannot run reliably on top of RAID-0?
That sounds a bit concerning!

What about when running on top of LVM, where an LV is stitched together from
bits of various PVs. Some may support barriers, some may not. Running on top
of LVM is default for most distros, so surely it must have a story on write
barriers?

-- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [patch] Add support for barriers to blk{back,front} drivers. [ In reply to ]
Keir Fraser wrote:
> On 9/11/06 12:48, "Gerd Hoffmann" <kraxel@suse.de> wrote:
>
>>> What happens (even on native Linux) if you have, say, a RAID array in which
>>> some of the discs support barriers and others don't?
>> The raid0 driver doesn't support barriers in the first place. Not sure
>> about the other raid drivers.
>
> Does this mean journalling filesystems cannot run reliably on top of RAID-0?
> That sounds a bit concerning!

Without barrier support available linux filesytems fallback to just wait
until the requests (which would have been submitted as barrier requests)
are finished before submitting the next to make sure the ordering is
fine. That works ok as long as the disk doesn't do write caching, so
you better turn write caching off in that case.

> What about when running on top of LVM, where an LV is stitched together from
> bits of various PVs. Some may support barriers, some may not. Running on top
> of LVM is default for most distros, so surely it must have a story on write
> barriers?

lvm works ok for me, almost all my domU disks are on lvm. I have just
one disk in the system though.

I'm not sure what happens with multiple PVs. I'd expect barriers are
working fine as long as your logical volume is not spread over multiple
physical volumes and the underlying physical volume can handle barriers.

cheers,

Gerd

--
Gerd Hoffmann <kraxel@suse.de>
http://www.suse.de/~kraxel/julika-dora.jpeg

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel