Mailing List Archive

Should raw I/O be added to the kernel?
I'm not a kernel hacker, but have been following this discussion with
some interest. This sounds like a clear philosophical distinction that
has some merit. Why should raw I/O functionality be added to the kernel
when device I/O can be better implemented in other functions of the OS?
Many of us benefit from the fact that Linux is faster and more stable than
other OS's which employ raw I/O.
I am reminded of the catch-22 that results from loading products like
Direct-X to a M$ environment. Enabling direct access to hardware often
results in instability and system crashes, which anyone who loads the
above product on a Windows 9x box can testify to after a program in
userspace brings the whole system down, ad infinitum ad nauseum.
We don't want to sacrifice the distinctions that make Linux better than
other OS's for the sake of specific user needs; it makes sense to me that
certain types of development may take longer [.i.e. the development of an
appropriate high-performance filesystem for Video I/O] but may be more
successful than they would have been had raw I/O been introduced into the
mainstream kernel. Then we would have applications crashing the system
as currently happens with many other OS's.
On Tue, 15 Dec 1998, Linus Torvalds wrote:
>
>
> On Sat, 12 Dec 1998, Alan Cox wrote:
> >
> > Frame capture into a file is uninteresting for serious video people because
> > a) we have this toy computer 2Gig file limit[1] and b) performance is a colossal
> > issue. a) is easy to work around [.apple worked around it nicely with Qt,
> > intel forgot] b) is a problem. Video people really don't care about throwing
> > an 80Gig FCAL array at a video problem. No file system who cares - saves fscking
>
> Alan, let me clue you in: we're going to be living in the 21st century in
> not too long.
>
> We're going to have mom-and-pop users who want to capture their
> grandchildren in HDTV on their computer from their camera. And yes, their
> computers _are_ going to be able to handle it, and the 2G file limit won't
> be there.
>
> Anybody who seriously thinks that raw device access is worthwhile had
> better think again. It's not. It's a special case thing that will never be
> acceptable to any real target audience.
>
> Right now you might be able to do it on current hardware only with raw
> device access, but designing for it is a piece of shit design.
>
> > We need memory locking for stuff like video capture DMA. The demonstration
> > HDTV chipsets want capture to DMA 1600x1200x24bit data to memory targets.
> > The existing bttv sick 'vmalloc and look the other way' approach isnt as
> > good as locking pages for this (Disk I/O issues aside)
>
> Umm.. That's what I said. You can lock down pages _easily_ in the page
> cache. You just increment their usage count.
>
> And that has absolutely NOTHING to do with raw device access. What you do
> is you make "sendfile()" work for the "copy to page cache" case too.
>
> Linus
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
>
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
> other OS's for the sake of specific user needs; it makes sense to me that
> certain types of development may take longer [.i.e. the development of an
> appropriate high-performance filesystem for Video I/O] but may be more
> successful than they would have been had raw I/O been introduced into the
> mainstream kernel. Then we would have applications crashing the system
> as currently happens with many other OS's.
The point of the raw I/O (be it sendfile, O_DIRECT or otherwise) is to provide
hooks that allow programs to safely access I/O without some of the existing
overhead in specific cases where the program knows best.
Its not about giving direct access to the hardware. O_DIRECT says "This isnt
worth caching, and please get it here efficiently". It doesnt say "excuse me
I want to take over the disk controller".
Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
On Wed, 16 Dec 1998, Glendon Gross wrote:
> I am reminded of the catch-22 that results from loading products like
> Direct-X to a M$ environment. Enabling direct access to hardware often
> results in instability and system crashes, which anyone who loads the
> above product on a Windows 9x box can testify to after a program in
> userspace brings the whole system down, ad infinitum ad nauseum.
Raw IO isn't direct access to hardware, it's direct access to the hardware
_driver_. A user program with permission to access a raw device won't be
able to cause system instability any more than a user program with
permission to access a file. Indeed, the file access case has many more
layers of code to go through.
The classic example of allowing userspace direct access to hardware is X -
another lovely can of worms.
Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
On Fri, 18 Dec 1998, Chris Evans wrote:
>
> Raw IO isn't direct access to hardware, it's direct access to the hardware
> _driver_. A user program with permission to access a raw device won't be
> able to cause system instability any more than a user program with
> permission to access a file. Indeed, the file access case has many more
> layers of code to go through.
It's a classic mistake to think that this means that direct drievr access
is faster.
The thing is, that raw IO not only means avoiding the filesystem overhead,
or _also_ means that the kernel no longer has any control over the
requests. And that in turn means that suddenly the driver has to take
cases into account that could never happen before.
Basically, raw disk IO is _not_ necessarily noticeably faster at all. The
filesystem layer is fairly lightweight and optimized.
The only argument for raw disk IO is the caching policy issue, and I very
strongly agre with Ingo that we should consider it as such, and not get
dragged into the rathole of thinking that raw devices are somehow worth it
for some other reason.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
On Fri, 18 Dec 1998, Linus Torvalds wrote:
> The only argument for raw disk IO is the caching policy issue, and I very
> strongly agre with Ingo that we should consider it as such, and not get
> dragged into the rathole of thinking that raw devices are somehow worth it
> for some other reason.
Raw-io is basically a solution to a problem. As with most solutions there
are advantages and disadvantages. One of the main advantages I see is that
it is a by now well known solution to one set of problems. Perhaps someone
with a deep understanding of the kernel and the problems should take a
step back and look at the problems instead of the various solutions
proposed.
The database problem is one I find the most interesting. Using raw io
basically means we cannot find an abstracted way of describing what we
require of the system, so we do everything by ourself. Of course, that way
anything can be done.
Peter
--
Peter Svensson ! Pgp key available by finger, fingerprint:
<petersv@psv.nu> ! 8A E9 20 98 C1 FF 43 E3 07 FD B9 0A 80 72 70 AF
<petersv@df.lth.se> !
------------------------------------------------------------------------
Remember, Luke, your source will be with you... always...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
Linus Torvalds <torvalds@transmeta.com> writes:
>The only argument for raw disk IO is the caching policy issue
Exactly right. Without raw disk IO, you couldn't guarantee a database
to always be in a consistent state on the disk (i.e. it isn't very
desirable to have a committed transaction in a cache when the pc loses
power). It makes it impossible to recover the database. And what
good is having a transaction oriented database on Linux in a
production environment if you can't be guaranteed to recover it after
a crash?
From my days at Unify a decade ago, you had to use raw disk IO because
fsync() wasn't guaranteed to have flushed the cache completely to the
disk before it returned. Of course, I'll be the first to admit that I
haven't checked out the kernel source to see if this is true for
Linux. But, I'm sure that someone will so that the database part of
this thread can be put to rest.
--
Forte International, P.O. Box 1412, Ridgecrest, CA 93556-1412
Ronald Cole <ronald@forte-intl.com> Phone: (760) 499-9142
President, CEO Fax: (760) 499-9152
My PGP fingerprint: 15 6E C7 91 5F AF 17 C4 24 93 CB 6B EB 38 B5 E5
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
Well,
I believe the CodaFS managed to get passed this problem using
the RVM .. Recoverable Virtual Memory. Shrug.
On Sun, Dec 20, 1998 at 10:56:48PM -0800, Ronald Cole wrote:
> Linus Torvalds <torvalds@transmeta.com> writes:
> >The only argument for raw disk IO is the caching policy issue
>
> Exactly right. Without raw disk IO, you couldn't guarantee a database
> to always be in a consistent state on the disk (i.e. it isn't very
> desirable to have a committed transaction in a cache when the pc loses
> power). It makes it impossible to recover the database. And what
> good is having a transaction oriented database on Linux in a
> production environment if you can't be guaranteed to recover it after
> a crash?
>
> >From my days at Unify a decade ago, you had to use raw disk IO because
> fsync() wasn't guaranteed to have flushed the cache completely to the
> disk before it returned. Of course, I'll be the first to admit that I
> haven't checked out the kernel source to see if this is true for
> Linux. But, I'm sure that someone will so that the database part of
> this thread can be put to rest.
>
> --
> Forte International, P.O. Box 1412, Ridgecrest, CA 93556-1412
> Ronald Cole <ronald@forte-intl.com> Phone: (760) 499-9142
> President, CEO Fax: (760) 499-9152
> My PGP fingerprint: 15 6E C7 91 5F AF 17 C4 24 93 CB 6B EB 38 B5 E5
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
--
"Reality is what you can get away with!"
++Robert Anton Wilson
Major'Trips'
E-Mail : shadow@cyberwizards.com || major@jimco-fwt.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
On Sun, Dec 20, 1998 at 10:56:48PM -0800, Ronald Cole wrote:
> Linus Torvalds <torvalds@transmeta.com> writes:
> >The only argument for raw disk IO is the caching policy issue
>
> Exactly right. Without raw disk IO, you couldn't guarantee a database
> to always be in a consistent state on the disk (i.e. it isn't very
> desirable to have a committed transaction in a cache when the pc loses
> power). It makes it impossible to recover the database. And what
So you better not use any drive or controller that does any buffering or
reordering. Don't confuse mechanism and policy.
---------------------------------
Victor Yodaiken
Department of Computer Science
New Mexico Institute of Mining and Technology
Socorro NM 87801
Homepage http://www.cs.nmt.edu/~yodaiken
PowerPC Linux page http://linuxppc.cs.nmt.edu
Real-Time Page http://rtlinux.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
On Sun, 20 Dec 1998, Ronald Cole wrote:
> Linus Torvalds <torvalds@transmeta.com> writes:
> >The only argument for raw disk IO is the caching policy issue
>
> Exactly right. Without raw disk IO, you couldn't guarantee a database
> to always be in a consistent state on the disk (i.e. it isn't very
> desirable to have a committed transaction in a cache when the pc loses
> power). It makes it impossible to recover the database. And what
> good is having a transaction oriented database on Linux in a
> production environment if you can't be guaranteed to recover it after
> a crash?
If you want performances, then you also want to queue several IOs to the
device at a time, each time it is possible. Imagine now, that you got an
IO error on some IO currently being processed by the device. In SCSI you
will get an ACA condition if the device supports it. If you want to really
stay consistant, then you want to try to recover from this error.
Generally, the application does not know at which level (driver level,
controller level, device level, ...) each pending IOs is currently queued.
Having a consistant view of the ordering of IOs from the application is
generally not possible with existing systems in case of IO error.
On the other hand, hard drives may have the QUEUE ALGORITM QUALIFIER set
to 1. That means that the application client does not require the device
to take care of ordering for pending SIMPLE TAGGED IOs that overlap. Data
base manager probably only queue IOs that donnot conflicts as do most
kernels. And hard disks may have write caching enabled, without the
application (data base manager) knowing of that ...
BTW, I could also mention RAID arrays and especially software-RAID, but
this would be too long ...
Could you tell us how existing database managers guarantee consistency
in case of IO error ?
> >From my days at Unify a decade ago, you had to use raw disk IO because
> fsync() wasn't guaranteed to have flushed the cache completely to the
> disk before it returned. Of course, I'll be the first to admit that I
> haven't checked out the kernel source to see if this is true for
> Linux. But, I'm sure that someone will so that the database part of
> this thread can be put to rest.
The fact that the THING you are raw-ioing :) returns good status does not
guarantee that the data has been successfully written to the media when
some kind of write caching is performed by the raw-ioed THING. You may get
deferred errors in such a situation if some error occurs and you have
several IOs queued to the THING at this time.
You wrote:
> Exactly right. Without raw disk IO, you couldn't guarantee a database
> to always be in a consistent state on the disk (i.e. it isn't very
I reply you that even with raw disk IO, it is not that easy to _really_
guarantee such a consistency without being _really_ aware of what may
_really_ happen with _real_ IOs, and that _real_ IO system services
are generally too poor to allow full control on what _really_ happens
with IOs.
I would be glad if Linux provided DIRECT IO from user space. This would
be helpfull, IMO, for some applications that would gain advantage of
it. But the concept of RAW disk IO is extremally _poor_ in my opinion and
at the moment are only usefull to implement file systems from user-land
as it seems database manager do.
One of the reason of Linux not having raw IOs is that Linux does not use
the KVA as a virtual mapping for supplying virtually contiguous buffers to
drivers, but constructs scatterlists directly from kernel buffers and
supplies scatterlists to drivers. And under Linux, only the initial kernel
memory mapping is IOable (e.g. allow use of virt_to_bus() address
translation to provide memory bus addresses to IO controllers for DMA).
This design choice made raw IO implementation not simple to implement
under Linux, despite the fact that it seemed way trivial to implement on
some other O/Ses.
Regards,
Gerard.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
From: Ronald Cole <ronald@forte-intl.com>
Date: Sun, 20 Dec 1998 22:56:48 -0800 (PST)
From my days at Unify a decade ago, you had to use raw disk IO because
fsync() wasn't guaranteed to have flushed the cache completely to the
disk before it returned. Of course, I'll be the first to admit that I
haven't checked out the kernel source to see if this is true for
Linux. But, I'm sure that someone will so that the database part of
this thread can be put to rest.
As far as I know, POSIX has always required that fsync() return when the
cache was flushed to disk, and I didn't think there were any fsync()
implementations that didn't follow this rule.
There were OS's that didn't have fsync(), however, and historically
sync() was never guaranteed to have completed its work before
returning. Perhaps that's what you were thinking of?
- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
"Theodore Y. Ts'o" <tytso@MIT.EDU> sayeth:
: As far as I know, POSIX has always required that fsync() return when the
: cache was flushed to disk, and I didn't think there were any fsync()
: implementations that didn't follow this rule.
Part of the problem is that OS implementations have sometimes forgotten to
sync associated metadata as part of fsync(). Given that people may have
been bitten by this in the past, they don't always trust fsync() to do the
right thing.
In the raw I/O case, the application is managing the meta data so it
can do whatever it wants.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
On Mon, 21 Dec 1998, Larry McVoy wrote:
> "Theodore Y. Ts'o" <tytso@MIT.EDU> sayeth:
> : As far as I know, POSIX has always required that fsync() return when the
> : cache was flushed to disk, and I didn't think there were any fsync()
> : implementations that didn't follow this rule.
>
> Part of the problem is that OS implementations have sometimes forgotten to
> sync associated metadata as part of fsync(). Given that people may have
> been bitten by this in the past, they don't always trust fsync() to do the
> right thing.
>
> In the raw I/O case, the application is managing the meta data so it
> can do whatever it wants.
And it to doesn't really matter in a number of implementations. For
example, ConvexOS has a patchable variable (defaults to 'true') which
allows all synchronous operations to actually be B_ASYNC>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
On Mon, Dec 21, 1998 at 03:06:26PM -0800, Larry McVoy wrote:
> Part of the problem is that OS implementations have sometimes
> forgotten to sync associated metadata as part of fsync(). Given
> that people may have been bitten by this in the past, they don't
> always trust fsync() to do the right thing.
There is code out there that assumes fsync() flushes all the
meta-data too (I think this is a FFS/BSDism more than anything).
To the best of my knowledge, nothing requires meta-data to be flush
upon fsync, and it isn't -- applications wanting this behavior will
have to do other things to ensure this.
-cw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
Hi,
On Mon, 21 Dec 1998 22:33:06 +0100 (MET), Gerard Roudier
<groudier@club-internet.fr> said:
> If you want performances, then you also want to queue several IOs to the
> device at a time, each time it is possible. Imagine now, that you got an
> IO error on some IO currently being processed by the device. ...
>> Exactly right. Without raw disk IO, you couldn't guarantee a database
>> to always be in a consistent state on the disk (i.e. it isn't very
> I reply you that even with raw disk IO, it is not that easy to _really_
> guarantee such a consistency without being _really_ aware of what may
> _really_ happen with _real_ IOs, and that _real_ IO system services
> are generally too poor to allow full control on what _really_ happens
> with IOs.
Yes you can. The way these applications work is to write all of the
data for a commit, and to wait for it to complete before finally writing
a commit record. That way, provided the OS does not acknowledge the
write before it actually reaches the disk, the application can guarantee
enough about the write ordering to be sure about data consistency. Raw
IO, O_SYNC and fsync() provide enough support for the application to get
this right, but fsync() does not provide guaranteed IO error
notification so is not sufficient in the presence of errors.
--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
On Thu, 21 Jan 1999, Stephen C. Tweedie wrote:
> Hi,
>
> On Mon, 21 Dec 1998 22:33:06 +0100 (MET), Gerard Roudier
> <groudier@club-internet.fr> said:
>
> > If you want performances, then you also want to queue several IOs to the
> > device at a time, each time it is possible. Imagine now, that you got an
> > IO error on some IO currently being processed by the device. ...
>
> >> Exactly right. Without raw disk IO, you couldn't guarantee a database
> >> to always be in a consistent state on the disk (i.e. it isn't very
>
> > I reply you that even with raw disk IO, it is not that easy to _really_
> > guarantee such a consistency without being _really_ aware of what may
> > _really_ happen with _real_ IOs, and that _real_ IO system services
> > are generally too poor to allow full control on what _really_ happens
> > with IOs.
>
> Yes you can. The way these applications work is to write all of the
Providing integrity using only a synchronous IO semantic is so costly for
performances that I donnot even want to think for a second to such an
approach. With 'full control' I meant 'full control on actual IO ordering
requirements to maintain integrity and consistency. I want to think of an
IO sub-system that allows to ask for 'ordering' requirements of the
reality of IOs. For example, I think of logical IO services that allows
to provide order attibutes that look like those available in SCSI physical
IO management. e.g. : SIMPLE, ORDERED and HEAD of QUEUE. This would allow
the logical IO layers in the O/S and the device to implement optimizations
_and_ also to provide the ordering required by the application.
> data for a commit, and to wait for it to complete before finally writing
> a commit record. That way, provided the OS does not acknowledge the
An IO that has been completed by a device may still be in some device
buffer. If you want to be guarantee that the data are really written to
the device, then you also need to allow the application to have control
via the logical IO layer on the buffering of the physical datas that
are involved in the logical operations the application has asked for.
> write before it actually reaches the disk, the application can guarantee
> enough about the write ordering to be sure about data consistency. Raw
> IO, O_SYNC and fsync() provide enough support for the application to get
> this right, but fsync() does not provide guaranteed IO error
> notification so is not sufficient in the presence of errors.
Raw IO, O_SYNC and 'but fsync() ;-)' is what is common at the moment or
most UNIX systems in order to guarantee consistency of data systems. I
understand that one may want to implement boring commercial applications
using that semantic and that it may work enough for most applications.
Current database managers, for example, want kernels to just be wrappers
to physical IO services and memory management, because they want to have
portable code in user land and they have poorly optimized their software
using the commonly available SYNC/RAW IO services. If their portability
issues, probably motivated by some market domination strategy, lead to
such abominations, then I am not interested in their stuff.
Obviously, I think that any O/S should provide some direct IO services,
since there are applications that really need these services in order
to work correctly, or will make a good use of this feature.
But I disagree strongly with the assertion of Raw IO or sync IO to be
some panacea for dealing with data consistency. It is just what some
dinausors, old-minded or $$-interested people want us to believe, in my
opinion.
Regards,
Gerard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
Followup to: <199901211735.RAA04836@dax.scot.redhat.com>
By author: "Stephen C. Tweedie" <sct@redhat.com>
In newsgroup: linux.dev.kernel
>
> Yes you can. The way these applications work is to write all of the
> data for a commit, and to wait for it to complete before finally writing
> a commit record. That way, provided the OS does not acknowledge the
> write before it actually reaches the disk, the application can guarantee
> enough about the write ordering to be sure about data consistency. Raw
> IO, O_SYNC and fsync() provide enough support for the application to get
> this right, but fsync() does not provide guaranteed IO error
> notification so is not sufficient in the presence of errors.
>
Actually, it is better than that: as long as the commit record cannot
pass the data records, you can roll back to a consistent state.
-hpa
--
PGP: 2047/2A960705 BA 03 D3 2C 14 A8 A8 BD 1E DF FE 69 EE 35 BD 74
See http://www.zytor.com/~hpa/ for web page and full PGP public key
I am Bahá'í -- ask me about it or see http://www.bahai.org/
"To love another person is to see the face of God." -- Les Misérables
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
In article <199901211735.RAA04836@dax.scot.redhat.com>,
Stephen C. Tweedie <sct@redhat.com> wrote:
>On Mon, 21 Dec 1998 22:33:06 +0100 (MET), Gerard Roudier
><groudier@club-internet.fr> said:
>> If you want performances, then you also want to queue several IOs to the
>> device at a time, each time it is possible. Imagine now, that you got an
>> IO error on some IO currently being processed by the device. ...
>
>>> Exactly right. Without raw disk IO, you couldn't guarantee a database
>>> to always be in a consistent state on the disk (i.e. it isn't very
>
>> I reply you that even with raw disk IO, it is not that easy to _really_
>> guarantee such a consistency without being _really_ aware of what may
>> _really_ happen with _real_ IOs, and that _real_ IO system services
>> are generally too poor to allow full control on what _really_ happens
>> with IOs.
>
>Yes you can. The way these applications work is to write all of the
No you can't. Suppose you send a bunch of raw writes to a SCSI disk drive.
OK, so the SCSI disk drive queues them in its embedded cache RAM and tells
the host CPU to send more data. Then the power fails before the SCSI drive
can flush its embedded cache.
Oops.
The software (kernel or application or system configuration/install
program) has to know how to tell the drive that it is not to use its
write cache *at all*, or if it is to use it, to use it to perform writes
in *strict* compliance with the order given, and to *not* acknowledge
anything until all writes have been completed. And of course those
settings had better not vanish if the SCSI bus is reset due to say heat
problems.
There's the possibility of external RAID devices that will undo all that
work for you by doing buffering and ACK's by themselves, then turning
around to talk to disks with data in cache.
Some of the more exotic storage types will actually put several disk
blocks in jeopardy when rewriting only one or two, in the name of
enhanced error correction capability or some such thing. If the kernel
orders a 512-byte sector to be updated to the hardware, will the drive
overwrite only that sector, or will it rewrite the entire track with
new Reed-Solomon codes or some such thing? In other words, if the power
fails, is one sector trashed or an entire cylinder?
And of course we assume here that the disk won't lose its marbles if
hardware failure occurs during write; what if a power problem starts a head
moving across the disk platter with the write head engaged?
Oops.
I won't mention hard disk firmware bugs. <shiver> Let's not go there.
Assuming you can herd these cats, then everything you say is true.
But that's a big assumption and it's not something you can say without
carefully reading the labels on all the boxes. Raw disk I/O is probably
necessary but definitely not sufficient.
--
Zygo Blaxell (with a name like that, who needs a nick?)
Linux Engineer (my favorite official job title so far)
Corel Corporation (whose opinions sometimes differ from those shown above)
zygob@corel.ca (also zblaxell@furryterror.org)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
From: uixjjji1@umail.furryterror.org (Zygo Blaxell)
Date: 21 Jan 1999 23:38:15 -0500
Then the power fails before the SCSI drive
can flush its embedded cache.
Oops.
Oops, but only for the drive manufacturer. Last I checked nearly all
scsi disk manufacturers which used a cache design where this would
matter, have a mechanism which keeps enough charge around such that at
the event of a power loss to the disk the cache will be flushed in
time within some large margin of error.
Later,
David S. Miller
davem@dm.cobaltmicro.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
Hi,
On Thu, 21 Jan 1999 21:39:26 +0100 (MET), Gerard Roudier
<groudier@club-internet.fr> said:
> Providing integrity using only a synchronous IO semantic is so costly for
> performances that I donnot even want to think for a second to such an
> approach.
Interesting to hear you say this, since that is _precisely_ what you get
if you are running a large DB like Oracle or Informix on raw devices.
All raw devices are _necessarily_ synchronous, and yet these devices are
suggested as a means of improving performance.
> With 'full control' I meant 'full control on actual IO ordering
> requirements to maintain integrity and consistency. I want to think of an
> IO sub-system that allows to ask for 'ordering' requirements of the
> reality of IOs.
Absolutely, there's no doubt about this. The only problem is that the
Unix API does not allow it. The nearest we can get is asynchronous IO
(using posix.4 libaio) along with synchronous status returns: in other
words, the writes are asynchronous but the completion status is not
returned until the data has positively hit disk.
--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
This is a multi-part message in MIME format.
--------------0628198DCD88BA7F985ED719
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Hi,
"Stephen C. Tweedie" wrote:
> Interesting to hear you say this, since that is _precisely_ what you get
> if you are running a large DB like Oracle or Informix on raw devices.
> All raw devices are _necessarily_ synchronous, and yet these devices are
> suggested as a means of improving performance.
NO! Not to improve performance but to improve recoverability of databases.
One will get 2 - 3 times better performance on a filesystem but we can't
guarantee that the write has been completed.
Jason
--------------0628198DCD88BA7F985ED719
Content-Type: text/x-vcard; charset=us-ascii;
name="jfroebe.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Card for Jason Froebe
Content-Disposition: attachment;
filename="jfroebe.vcf"
begin:vcard
n:Froebe;Jason
tel;fax:773-864-7288
tel;work:1-800-8SYBASE
x-mozilla-html:TRUE
url:http://www.sybase.com
org:Sybase, Inc.;Technical Support
adr:;;8755 W. Higgins Road Suite 1000 ;Chicago;IL;60631;USA
version:2.1
email;internet:jfroebe@sybase.com
title:Technical Support Engineer
x-mozilla-cpt:;20256
fn:Jason Froebe
end:vcard
--------------0628198DCD88BA7F985ED719--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
I would really like raw I/O to make it into the Linux kernel. Not for
consistency on a raw partition, but for speed through a filesystem.
I can get ~40 MB/s off of a 8-way striped FC array through GFS (on Linux).
On an SGI O2, I can get ~80 MB/s with the same array when using Direct I/O
(their version of raw I/O). I believe a big part of the difference in speeds
is the memory copy from the buffer cache to user memory.
I would really like a good interface that would allow a filesystem to bypass
the buffer cache on big file data.
Ken Preslan
(GFS web page: http://gfs.lcse.umn.edu)
>
> Hi,
>
> On Mon, 21 Dec 1998 22:33:06 +0100 (MET), Gerard Roudier
> <groudier@club-internet.fr> said:
>
> > If you want performances, then you also want to queue several IOs to the
> > device at a time, each time it is possible. Imagine now, that you got an
> > IO error on some IO currently being processed by the device. ...
>
> >> Exactly right. Without raw disk IO, you couldn't guarantee a database
> >> to always be in a consistent state on the disk (i.e. it isn't very
>
> > I reply you that even with raw disk IO, it is not that easy to _really_
> > guarantee such a consistency without being _really_ aware of what may
> > _really_ happen with _real_ IOs, and that _real_ IO system services
> > are generally too poor to allow full control on what _really_ happens
> > with IOs.
>
> Yes you can. The way these applications work is to write all of the
> data for a commit, and to wait for it to complete before finally writing
> a commit record. That way, provided the OS does not acknowledge the
> write before it actually reaches the disk, the application can guarantee
> enough about the write ordering to be sure about data consistency. Raw
> IO, O_SYNC and fsync() provide enough support for the application to get
> this right, but fsync() does not provide guaranteed IO error
> notification so is not sufficient in the presence of errors.
>
> --Stephen
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
On Fri, Jan 22, 1999 at 05:31:53PM -0600, Jason Froebe wrote:
> NO! Not to improve performance but to improve recoverability of
> databases. One will get 2 - 3 times better performance on a
> filesystem but we can't guarantee that the write has been
> completed.
Yes you can -- either using O_SYNC of fsync. Sure, it still copied to
the buffer cache, so in theory could be faster if it were direct, but
either of the above will guarantee (as far as the kernel can) the
data has been written to persistent storage.
-cw
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
Jason Froebe wrote:
>
> Hi,
>
> "Stephen C. Tweedie" wrote:
>
> > Interesting to hear you say this, since that is _precisely_ what you get
> > if you are running a large DB like Oracle or Informix on raw devices.
> > All raw devices are _necessarily_ synchronous, and yet these devices are
> > suggested as a means of improving performance.
>
> NO! Not to improve performance but to improve recoverability of databases.
> One will get 2 - 3 times better performance on a filesystem but we can't
> guarantee that the write has been completed.
>
#include <std_disclaimer.h>
Being log-based Oracle is happy enough to have its redo logs on raw devices,
the database files can stay on filesystem. However, the combo of Oracle's
buffer cache management on raw device usually gives a better performance
than filesystem buffering because the LRU management of the database's own
buffer cache isn't impartial. Knowing that certain data blocks are coming
from a full table scan implies that these blocks do not go on the MRU end
of the LRU list. Writing is of course another matter, but you'll probably
find that databases are written less often than they're read :)
Also, very large volumes of data simply shouldn't be cached. When your EMC
box deals with the monthly rollup of 20GB of tables, it is clear that the
usually lovely 4GB RAID cache is something you would want to bypass.
Tuning depends :)
--alessandro <asuardi@uninetcom.it> <asuardi@it.oracle.com>
Linux 2.0.36/2.2.0-final glibc-2.0.7-29 gcc-2.8.1 binutils-2.9.1.0.19a
"I hate bugs which disappear just as soon as you start trying to
narrow things down." -- Stephen Tweedie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
Hi,
On 22 Jan 1999 02:19:17 GMT, hpa@transmeta.com (H. Peter Anvin) said:
> Actually, it is better than that: as long as the commit record cannot
> pass the data records, you can roll back to a consistent state.
Absolutely, but that's not the point: the point is that the Unix API
does not allow applications to specify such a partial order on IO. Raw
IO, O_SYNC and friends, and fsync() are about all an application can
use.
--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Should raw I/O be added to the kernel? [ In reply to ]
Hi,
On 21 Jan 1999 23:38:15 -0500, uixjjji1@umail.furryterror.org (Zygo
Blaxell) said:
>> Yes you can. The way these applications work is to write all of the
> No you can't. Suppose you send a bunch of raw writes to a SCSI disk
> drive. OK, so the SCSI disk drive queues them in its embedded cache
> RAM and tells the host CPU to send more data. Then the power fails
> before the SCSI drive can flush its embedded cache.
> Oops.
Fine. If your disk hardware tells the host that it has completed an
update to oxide and the update is still volatile, you have a broken
disk: send it back. No enterprise-class databases support such
hardware. For writeback caching to be supported, the cache _must_ be
battery-backed or (in advanced multi-controller redundant storage
cabinets) multipowered and multiported. That is a fundamental storage
architecture issue, nothing at all to do with the host O/S.
> There's the possibility of external RAID devices that will undo all that
> work for you by doing buffering and ACK's by themselves, then turning
> around to talk to disks with data in cache.
Not on ANY decent storage systems.
--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

1 2  View All