Mailing List Archive

[PATCH 2 of 3] interface: Flesh out the BLKIF_OP_DISCARD description
# HG changeset patch
# User Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
# Date 1318457227 14400
# Node ID 15c2d70dbac3e31c2d74b6700e1bb5f8a7d8256e
# Parent 88b7814df143169a1cf946a9881ae2ecea9693bd
interface: Flesh out the BLKIF_OP_DISCARD description.

We flesh out details on what is expected of 'feature-flush' and
what are some of the extra parameters that the frontend can read
from the backend. Those extra parameters are: : discard-aligment,
and discard-granularity.

Acked-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

diff -r 88b7814df143 -r 15c2d70dbac3 xen/include/public/io/blkif.h
--- a/xen/include/public/io/blkif.h Wed Oct 12 18:07:04 2011 -0400
+++ b/xen/include/public/io/blkif.h Wed Oct 12 18:07:07 2011 -0400
@@ -83,22 +83,42 @@
#define BLKIF_OP_RESERVED_1 4
/*
* Recognised only if "feature-discard" is present in backend xenbus info.
- * The "feature-discard" node contains a boolean indicating whether discard
- * requests are likely to succeed or fail. Either way, a discard request
+ * The "feature-discard" node contains a boolean indicating whether trim
+ * (ATA) or unmap (SCSI) - conviently called discard requests are likely
+ * to succeed or fail. Either way, a discard request
* may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by
* the underlying block-device hardware. The boolean simply indicates whether
* or not it is worthwhile for the frontend to attempt discard requests.
* If a backend does not recognise BLKIF_OP_DISCARD, it should *not*
* create the "feature-discard" node!
- *
+ *
* Discard operation is a request for the underlying block device to mark
- * extents to be erased. Discard operations are passed with sector_number as the
+ * extents to be erased. However, discard does not guarantee that the blocks
+ * will be erased from the device - it is just a hint to the device
+ * controller that these blocks are no longer in use. What the device
+ * controller does with that information is left to the controller.
+ * Discard operations are passed with sector_number as the
* sector index to begin discard operations at and nr_sectors as the number of
* sectors to be discarded. The specified sectors should be discarded if the
- * underlying block device supports discard operations, or a BLKIF_RSP_EOPNOTSUPP
- * should be returned. More information about discard operations at:
+ * underlying block device supports trim (ATA) or unmap (SCSI) operations,
+ * or a BLKIF_RSP_EOPNOTSUPP should be returned.
+ * More information about trim/unmap operations at:
* http://t13.org/Documents/UploadedDocuments/docs2008/
* e07154r6-Data_Set_Management_Proposal_for_ATA-ACS2.doc
+ * http://www.seagate.com/staticfiles/support/disc/manuals/
+ * Interface%20manuals/100293068c.pdf
+ * The backend can optionally provide two extra XenBus attributes to
+ * further optimize the discard functionality:
+ * 'discard-aligment' - Devices that support discard functionality may
+ * internally allocate space in units that are bigger than the exported
+ * logical block size. The discard-alignment parameter indicates how many bytes
+ * the beginning of the partition is offset from the internal allocation unit's
+ * natural alignment.
+ * 'discard-granularity' - Devices that support discard functionality may
+ * internally allocate space using units that are bigger than the logical block
+ * size. The discard-granularity parameter indicates the size of the internal
+ * allocation unit in bytes if reported by the device. Otherwise the
+ * discard-granularity will be set to match the device's physical block size.
*/
#define BLKIF_OP_DISCARD 5




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH 2 of 3] interface: Flesh out the BLKIF_OP_DISCARD description [ In reply to ]
Thanks for splitting these out.

On Wed, 2011-10-12 at 23:12 +0100, Konrad Rzeszutek Wilk wrote:
[...]
> + * The backend can optionally provide two extra XenBus attributes to
> + * further optimize the discard functionality:
> + * 'discard-aligment' - Devices that support discard functionality may
> + * internally allocate space in units that are bigger than the exported
> + * logical block size. The discard-alignment parameter indicates how many bytes
> + * the beginning of the partition is offset from the internal allocation unit's
> + * natural alignment.

So this is to account for the case where a physical device can discard
e.g. 128K blocks at a time but the VBD (a better term than "partition"
in the context, I think) starts at e.g. offset 64K within that
underlying device?

Does this mean that the virtual device can discard the first 64K (and
thereafter in 128K chunks), or that it cannot because that would overlap
the first 64K of that block which belongs to something else? Or that it
can try but it may or may not succeed. What about if the secure flag is
set?

Could we simplify and say that blkback won't expose discard support
unless the underlying block device is correctly aligned for it? i.e.
encourage people to align their underlying storage correctly? Presumably
doing that has other benefits?

> + * 'discard-granularity' - Devices that support discard functionality may
> + * internally allocate space using units that are bigger than the logical block
> + * size. The discard-granularity parameter indicates the size of the internal
> + * allocation unit in bytes if reported by the device. Otherwise the
> + * discard-granularity will be set to match the device's physical block size.

This is effectively the minimum size you can discard? (modulo the
sub-block at the front arising from discard-alignment).

Presumably the granularity sized blocks are self aligned to that same ?
(again modulo the sub-block at the beginning).

Would there be any benefit to having both these numbers in logical-block
sized units instead of bytes? The rest of the interface typically uses
sectors/segments.

Ian.

> */
> #define BLKIF_OP_DISCARD 5
>
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH 2 of 3] interface: Flesh out the BLKIF_OP_DISCARD description [ In reply to ]
On Thu, Oct 13, 2011 at 09:00:07AM +0100, Ian Campbell wrote:
> Thanks for splitting these out.
>
> On Wed, 2011-10-12 at 23:12 +0100, Konrad Rzeszutek Wilk wrote:
> [...]
> > + * The backend can optionally provide two extra XenBus attributes to
> > + * further optimize the discard functionality:
> > + * 'discard-aligment' - Devices that support discard functionality may
> > + * internally allocate space in units that are bigger than the exported
> > + * logical block size. The discard-alignment parameter indicates how many bytes
> > + * the beginning of the partition is offset from the internal allocation unit's
> > + * natural alignment.
>

[note: I copied the Documentation/ABI/testing/sysfs-block contents]

> So this is to account for the case where a physical device can discard
> e.g. 128K blocks at a time but the VBD (a better term than "partition"
> in the context, I think) starts at e.g. offset 64K within that
> underlying device?

Yes. And the tools, such as 'fdisk/gparted' can take advantage of that
and create the partitions^H^H^VBDs at the proper spots.

>
> Does this mean that the virtual device can discard the first 64K (and
> thereafter in 128K chunks), or that it cannot because that would overlap
> the first 64K of that block which belongs to something else? Or that it
> can try but it may or may not succeed. What about if the secure flag is
> set?

They are all "best try, but we might fail."
>
> Could we simplify and say that blkback won't expose discard support
> unless the underlying block device is correctly aligned for it? i.e.

I am not sure how we would do that? The discard support works for
full devices, not LVMs, not partitions. So if the user does not
setup the partitions correctly it will try to discard but not do a very
good job.

The current way that Linux does report that the aligment is off is by
by exporting the discard-aligment flag as -1 if it is improperly aligned.
(/sys/block/sda/discard_aligment)

> encourage people to align their underlying storage correctly? Presumably
> doing that has other benefits?

It does that automatically if the user uses the newly found tools
like parted/fdisk..
>
> > + * 'discard-granularity' - Devices that support discard functionality may
> > + * internally allocate space using units that are bigger than the logical block
> > + * size. The discard-granularity parameter indicates the size of the internal
> > + * allocation unit in bytes if reported by the device. Otherwise the
> > + * discard-granularity will be set to match the device's physical block size.
>
> This is effectively the minimum size you can discard? (modulo the
> sub-block at the front arising from discard-alignment).

Yes.
>
> Presumably the granularity sized blocks are self aligned to that same ?
> (again modulo the sub-block at the beginning).

Yes.

>
> Would there be any benefit to having both these numbers in logical-block
> sized units instead of bytes? The rest of the interface typically uses
> sectors/segments.

Uhh, I would prefer not too - as we would have to convert those values
back to bytes when providing it to the block API. And the backend would
have to convert from bytes to sectors/segments again.

But this got me thinking - I don't think we actually figure out the
correct block size. Meaning we just hard-code 512.. But then I am not
sure what Linux is doing either:

scsi 2:0:0:0: Direct-Access ATA INTEL SSDSA2M080 2CV1 PQ: 0 ANSI: 5
sd 2:0:0:0: [sda] 156301488 512-byte logical blocks: (80.0 GB/74.5 GiB)
sd 2:0:0:0: Attached scsi generic sg0 type 0
scsi 3:0:0:0: Direct-Access ATA ST3250410AS 3.AA PQ: 0 ANSI: 5
sd 3:0:0:0: [sdb] 488397168 512-byte logical blocks: (250 GB/232 GiB)
sd 3:0:0:0: Attached scsi generic sg1 type 0

And logical_block_size 512, discard_granularity is 512, and discard_alignment
is zero.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH 2 of 3] interface: Flesh out the BLKIF_OP_DISCARD description [ In reply to ]
On Thu, Oct 13, 2011 at 09:00:07AM +0100, Ian Campbell wrote:
> Thanks for splitting these out.
>
> On Wed, 2011-10-12 at 23:12 +0100, Konrad Rzeszutek Wilk wrote:
> [...]
> > + * The backend can optionally provide two extra XenBus attributes to
> > + * further optimize the discard functionality:
> > + * 'discard-aligment' - Devices that support discard functionality may
> > + * internally allocate space in units that are bigger than the exported
> > + * logical block size. The discard-alignment parameter indicates how many bytes
> > + * the beginning of the partition is offset from the internal allocation unit's
> > + * natural alignment.
>
> So this is to account for the case where a physical device can discard
> e.g. 128K blocks at a time but the VBD (a better term than "partition"
> in the context, I think) starts at e.g. offset 64K within that
> underlying device?
>
> Does this mean that the virtual device can discard the first 64K (and
> thereafter in 128K chunks), or that it cannot because that would overlap

[edit: I don't think I answered this question]
Yes.
> the first 64K of that block which belongs to something else? Or that it
> can try but it may or may not succeed. What about if the secure flag is

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
[PATCH 2 of 3] interface: Flesh out the BLKIF_OP_DISCARD description [ In reply to ]
# HG changeset patch
# User Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
# Date 1318457227 14400
# Node ID 14793d6c4adb38cc57a6d55a8907a18d5ca18634
# Parent 92e266ce9c1c1600998aeacee2996b8de2a8743e
interface: Flesh out the BLKIF_OP_DISCARD description.

We flesh out details on what is expected of 'feature-flush' and
what are some of the extra parameters that the frontend can read
from the backend. Those extra parameters are: : discard-aligment,
and discard-granularity.

Acked-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

diff -r 92e266ce9c1c -r 14793d6c4adb xen/include/public/io/blkif.h
--- a/xen/include/public/io/blkif.h Wed Oct 26 17:00:51 2011 -0400
+++ b/xen/include/public/io/blkif.h Wed Oct 12 18:07:07 2011 -0400
@@ -83,22 +83,43 @@
#define BLKIF_OP_RESERVED_1 4
/*
* Recognised only if "feature-discard" is present in backend xenbus info.
- * The "feature-discard" node contains a boolean indicating whether discard
- * requests are likely to succeed or fail. Either way, a discard request
+ * The "feature-discard" node contains a boolean indicating whether trim
+ * (ATA) or unmap (SCSI) - conviently called discard requests are likely
+ * to succeed or fail. Either way, a discard request
* may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by
* the underlying block-device hardware. The boolean simply indicates whether
* or not it is worthwhile for the frontend to attempt discard requests.
* If a backend does not recognise BLKIF_OP_DISCARD, it should *not*
* create the "feature-discard" node!
- *
+ *
* Discard operation is a request for the underlying block device to mark
- * extents to be erased. Discard operations are passed with sector_number as the
+ * extents to be erased. However, discard does not guarantee that the blocks
+ * will be erased from the device - it is just a hint to the device
+ * controller that these blocks are no longer in use. What the device
+ * controller does with that information is left to the controller.
+ * Discard operations are passed with sector_number as the
* sector index to begin discard operations at and nr_sectors as the number of
* sectors to be discarded. The specified sectors should be discarded if the
- * underlying block device supports discard operations, or a BLKIF_RSP_EOPNOTSUPP
- * should be returned. More information about discard operations at:
+ * underlying block device supports trim (ATA) or unmap (SCSI) operations,
+ * or a BLKIF_RSP_EOPNOTSUPP should be returned.
+ * More information about trim/unmap operations at:
* http://t13.org/Documents/UploadedDocuments/docs2008/
* e07154r6-Data_Set_Management_Proposal_for_ATA-ACS2.doc
+ * http://www.seagate.com/staticfiles/support/disc/manuals/
+ * Interface%20manuals/100293068c.pdf
+ * The backend can optionally provide these extra XenBus attributes to
+ * further optimize the discard functionality:
+ * 'discard-aligment' - Devices that support discard functionality may
+ * internally allocate space in units that are bigger than the exported
+ * logical block size. The discard-alignment parameter indicates how many bytes
+ * the beginning of the partition is offset from the internal allocation unit's
+ * natural alignment. Do not confuse this with natural disk alignment offset.
+ * 'discard-granularity' - Devices that support discard functionality may
+ * internally allocate space using units that are bigger than the logical block
+ * size. The discard-granularity parameter indicates the size of the internal
+ * allocation unit in bytes if reported by the device. Otherwise the
+ * discard-granularity will be set to match the device's physical block size.
+ * It is the minimum size you can discard.
*/
#define BLKIF_OP_DISCARD 5




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel