Mailing List Archive: [RFC] block layer support for DMA IOMMU bypass mode

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

Jul 3, 2003, 1:26 PM

Post #26 of 51 (1532 views)

On Iau, 2003-07-03 at 00:56, Andi Kleen wrote:
> > 1. We allocate pages in reverse order so most merges cant occur
>
> I added an printk and I get quite a lot of merges during bootup
> with normal IDE.
>
> (sometimes 12+ segments)
Thats merging adjacent blocks with non adjacent page targets using the
IOMMU right - I was doing mergign without an IOMMU which is a little
different and turns out to be a waste of cpu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

ak at suse

Jul 3, 2003, 2:24 PM

Post #27 of 51 (1531 views)

Permalink

On Thu, Jul 03, 2003 at 09:26:29PM +0100, Alan Cox wrote:
> On Iau, 2003-07-03 at 00:56, Andi Kleen wrote:
> > > 1. We allocate pages in reverse order so most merges cant occur
> >
> > I added an printk and I get quite a lot of merges during bootup
> > with normal IDE.
> >
> > (sometimes 12+ segments)
>
> Thats merging adjacent blocks with non adjacent page targets using the
> IOMMU right - I was doing mergign without an IOMMU which is a little
Yep.
> different and turns out to be a waste of cpu
Understandable. Especially when memory fragments after some uptime.
But of course it doesn't help much in practice because all the interesting
block devices support DAC anyways and the IOMMU is disabled for that.
Also it's likely cheaper just submit more segments than to have the IOMMU
overhead
(at least for sane devices, if not it may be worth to artificially limit the
dma mask of the device to force IOMMU on IA64 and x86-64)
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

grundler at parisc-linux

Jul 3, 2003, 3:19 PM

Post #28 of 51 (1534 views)

Permalink

On Thu, Jul 03, 2003 at 11:24:15PM +0200, Andi Kleen wrote:
...
> Also it's likely cheaper just submit more segments than to have the IOMMU
> overhead
It depends on the device. If using something like 8237A to master DMA cycles,
then CPU cost of merging is relatively cheap. If sending the SG list is
just a sequence of MMIO space writes, then passing the raw list is cheaper.
ZX1 and PARISC IOMMUs clearly add some overhead both in terms of CPU
utilization (manage IOMMU) and DMA latency (IOMMU TLB misses sometimes).
...
> (at least for sane devices, if not it may be worth to artificially limit the
> dma mask of the device to force IOMMU on IA64 and x86-64)
Agreed. We are only doing that until BIO code and IOMMU code can
agree on how merging works without requiring the IOMMU.
thanks,
grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

davem at redhat

Jul 7, 2003, 7:14 PM

Post #29 of 51 (1558 views)

Permalink

From: Andi Kleen <ak@suse.de>
Date: Thu, 3 Jul 2003 23:24:15 +0200
But of course it doesn't help much in practice because all the interesting
block devices support DAC anyways and the IOMMU is disabled for that.

Platform dependant. SAC DMA transfers are faster on sparc64 so
we only allow the device to specify a 32-bit DMA mask successfully.
And actually, I would recommend other platforms that have a IOMMU do
this too (unless there is some other reason not to) since virtual
merging causes less scatter-gather entries to be used in the device
and thus you can stuff more requests into it.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

ak at suse

Jul 8, 2003, 12:34 PM

Post #30 of 51 (1544 views)

Permalink

On Mon, 07 Jul 2003 19:14:38 -0700 (PDT)
"David S. Miller" <davem@redhat.com> wrote:
> From: Andi Kleen <ak@suse.de>
> Date: Thu, 3 Jul 2003 23:24:15 +0200
>
> But of course it doesn't help much in practice because all the interesting
> block devices support DAC anyways and the IOMMU is disabled for that.
>
> Platform dependant. SAC DMA transfers are faster on sparc64 so
> we only allow the device to specify a 32-bit DMA mask successfully.
>
> And actually, I would recommend other platforms that have a IOMMU do
> this too (unless there is some other reason not to) since virtual
> merging causes less scatter-gather entries to be used in the device
> and thus you can stuff more requests into it.
Do you know a common PCI block device that would benefit from this (performs significantly
better with short sg lists)? It would be interesting to test.
I don't want to use the IOMMU for production for SAC on AMD64 because
on some of the boxes the available IOMMU area is quite small. e.g. the single
processor boxes typically only have a 128MB aperture set up, which means
the IOMMU hole is only 64MB (other 64MB for AGP).And some of them do not even have a
BIOS option to enlarge it (I can allocate a bigger one myself, but it costs
memory). The boxes that have more than 4GB memory at least typically
support enlarging it.
Overflow is typically deadly because the API does not allow proper
error handling and most drivers don't check for it. That's especially
risky for block devices: while pci_map_sg can at least return an error
not everybody checks for it and when you get an overflow the next
super block write with such an unchecked error will destroy the file
system.
Also networking tests have shown that it costs around 10% performance.
These are old numbers and some optimizations have been done since then
so it may be better now.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

jgarzik at pobox

Jul 8, 2003, 12:47 PM

Post #31 of 51 (1531 views)

Permalink

On Tue, Jul 08, 2003 at 09:34:27PM +0200, Andi Kleen wrote:
> Overflow is typically deadly because the API does not allow proper
> error handling and most drivers don't check for it. That's especially
> risky for block devices: while pci_map_sg can at least return an error
> not everybody checks for it and when you get an overflow the next
> super block write with such an unchecked error will destroy the file
> system.
Personally, I've always thought we were kidding ourselves by not doing
the error checking you describe. From my somewhat-narrow perspective of
network drivers and the libata storage driver, you have to deal with
atomic allocations _anyway_ ... so why not make sure IOMMU overflow
properly fails at the pci_map_foo level?
Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

ak at suse

Jul 8, 2003, 1:10 PM

Post #32 of 51 (1531 views)

Permalink

On Tue, 8 Jul 2003 15:47:44 -0400
Jeff Garzik <jgarzik@pobox.com> wrote:
> Personally, I've always thought we were kidding ourselves by not doing
> the error checking you describe. From my somewhat-narrow perspective of
> network drivers and the libata storage driver, you have to deal with
> atomic allocations _anyway_ ... so why not make sure IOMMU overflow
> properly fails at the pci_map_foo level?
pci_map_single currently has no defined error return, but if you could
persuade all your drivers colleagues to fix their drivers to check
this I'm sure things would be better.
(on AMD64 the check could be trivially implemented as a macro because errors
always return a well defined address)
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

grundler at parisc-linux

Jul 8, 2003, 1:11 PM

Post #33 of 51 (1531 views)

Permalink

On Tue, Jul 08, 2003 at 03:47:44PM -0400, Jeff Garzik wrote:
> Personally, I've always thought we were kidding ourselves by not doing
> the error checking you describe.
Amen. When I pointed this out a few years back, it was made it clear this
was a design choice : not providing a return value was simpler for driver
writers. I agree it's simpler and don't pretend to know what's best
for other driver writers. Sounds like a topic for 2.7 though.
grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

davem at redhat

Jul 8, 2003, 3:04 PM

Post #34 of 51 (1531 views)

Permalink

From: Andi Kleen <ak@suse.de>
Date: Tue, 8 Jul 2003 21:34:27 +0200
Do you know a common PCI block device that would benefit from this
(performs significantly better with short sg lists)? It would be
interesting to test.

%10 to %15 on sym53c8xx devices found on sparc64 boxes.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

davem at redhat

Jul 8, 2003, 3:23 PM

Post #35 of 51 (1532 views)

Permalink

From: Grant Grundler <grundler@parisc-linux.org>
Date: Tue, 8 Jul 2003 16:25:45 -0600
On Tue, Jul 08, 2003 at 03:04:33PM -0700, David S. Miller wrote:
> Do you know a common PCI block device that would benefit from this
> (performs significantly better with short sg lists)? It would be
> interesting to test.
>
> %10 to %15 on sym53c8xx devices found on sparc64 boxes.

Which workload?
dbench type stuff, but that's a hard thing to test these days with
the block I/O schedulers changing so much. Try to keep that part
constant in the with/vs/without VIO_VMERGE!=0 testing :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

grundler at parisc-linux

Jul 8, 2003, 3:25 PM

Post #36 of 51 (1531 views)

Permalink

On Tue, Jul 08, 2003 at 03:04:33PM -0700, David S. Miller wrote:
> Do you know a common PCI block device that would benefit from this
> (performs significantly better with short sg lists)? It would be
> interesting to test.
>
> %10 to %15 on sym53c8xx devices found on sparc64 boxes.
Which workload?
I'd like to test this on our ia64 boxes.
thanks,
grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

ak at suse

Jul 9, 2003, 11:55 AM

Post #37 of 51 (1531 views)

Permalink

On Tue, 08 Jul 2003 15:23:14 -0700 (PDT)
"David S. Miller" <davem@redhat.com> wrote:
> From: Grant Grundler <grundler@parisc-linux.org>
> Date: Tue, 8 Jul 2003 16:25:45 -0600
>
> On Tue, Jul 08, 2003 at 03:04:33PM -0700, David S. Miller wrote:
> > Do you know a common PCI block device that would benefit from this
> > (performs significantly better with short sg lists)? It would be
> > interesting to test.
> >
> > %10 to %15 on sym53c8xx devices found on sparc64 boxes.
>
> Which workload?
>
> dbench type stuff, but that's a hard thing to test these days with
> the block I/O schedulers changing so much. Try to keep that part
> constant in the with/vs/without VIO_VMERGE!=0 testing :)
With MPT-Fusion and reaim "new dbase" load it seems to be slightly faster
with forced IOMMU merging on Opteron, but the differences are quite small (~4%) and could
be measurement errors.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

grundler at parisc-linux

Jul 23, 2003, 4:40 AM

Post #38 of 51 (1530 views)

Permalink

On Tue, Jul 08, 2003 at 03:23:14PM -0700, David S. Miller wrote:
> From: Grant Grundler <grundler@parisc-linux.org>
> Date: Tue, 8 Jul 2003 16:25:45 -0600
>
> On Tue, Jul 08, 2003 at 03:04:33PM -0700, David S. Miller wrote:
> > Do you know a common PCI block device that would benefit from this
> > (performs significantly better with short sg lists)? It would be
> > interesting to test.
> >
> > %10 to %15 on sym53c8xx devices found on sparc64 boxes.
>
> Which workload?
>
> dbench type stuff,
Without more specific guidance, dbench looks like a load of crap.
"dbench 50" is claiming 850MB/s throughput on a system with 1 disk @
u320 and 2 disks on a seperate 40 MB/s (Ultra Wide SE SCSI). More details
are appended below. I'll try again with lmbench or bonnie.
Andi, if you could pass me details about the "reaim new dbase" (ie how
many devices I need, where to get it) I could make time to try that in
the next couple of weeks.
> but that's a hard thing to test these days with
> the block I/O schedulers changing so much. Try to keep that part
> constant in the with/vs/without VIO_VMERGE!=0 testing :)
yes - James Bottomley was asking for the same info.
thanks,
grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

grundler at parisc-linux

Jul 23, 2003, 6:20 AM

Post #39 of 51 (1533 views)

Permalink

On Tue, Jul 08, 2003 at 03:23:14PM -0700, David S. Miller wrote:
> dbench type stuff,
realizing dbench is blissfully ignorant of the system (2GB RAM),
for grins I ran "dbench 500" to see what would happen. The throughput
rate dbench reported continued to decline to ~20MB/s. This is about what
I would expect for one disk a 40MB/s SCSI bus.
Then dbench started spewing errors:
...
(7) ERROR: handle 13781 was not found
(6) open clients/client428 failed for handle 13781 (No such file or
directory)
(7) ERROR: handle 13781 was not found
(6) open clients/client423 failed for handle 13781 (No such file or directory)
(7) ERROR: handle 13781 was not found
(6) open clients/client48 failed for handle 13781 (No such file or directory)
(7) ERROR: handle 13781 was not found
(6) open clients/client55 failed for handle 13781 (No such file or directory)
(7) ERROR: handle 13781 was not found
(6) open clients/client419 failed for handle 13781 (No such file or directory)
(7) ERROR: handle 13781 was not found
(6) open clients/client415 failed for handle 13781 (No such file or directory)
...
write failed on handle 13783
write failed on handle 13707
write failed on handle 13808
write failed on handle 13117
write failed on handle 13850
write failed on handle 14000
write failed on handle 13767
write failed on handle 13787
...
NFC what that's all about. sorry - I have to punt on digging deeper.
I really need more guidance on
(a) how much memory I should be testing with
(b) how many spindles would be useful (I've got ~15 on each box)
(c) how to tell dbench to use the FS mounted on the target disks.
I've attached the iommu stats in case anyone finds that useful.
grant
<HR NOSHADE>
<UL>
<LI>text/plain attachment: dbench-zx1-01
</UL>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

axboe at suse

Jul 23, 2003, 8:30 AM

Post #40 of 51 (1533 views)

Permalink

On Wed, Jul 23 2003, Grant Grundler wrote:
> On Tue, Jul 08, 2003 at 03:23:14PM -0700, David S. Miller wrote:
> > dbench type stuff,
>
> realizing dbench is blissfully ignorant of the system (2GB RAM),
> for grins I ran "dbench 500" to see what would happen. The throughput
> rate dbench reported continued to decline to ~20MB/s. This is about what
> I would expect for one disk a 40MB/s SCSI bus.
>
> Then dbench started spewing errors:
> ..
> (7) ERROR: handle 13781 was not found
> (6) open clients/client428 failed for handle 13781 (No such file or
> directory)
> (7) ERROR: handle 13781 was not found
> (6) open clients/client423 failed for handle 13781 (No such file or directory)
> (7) ERROR: handle 13781 was not found
> (6) open clients/client48 failed for handle 13781 (No such file or directory)
> (7) ERROR: handle 13781 was not found
> (6) open clients/client55 failed for handle 13781 (No such file or directory)
> (7) ERROR: handle 13781 was not found
> (6) open clients/client419 failed for handle 13781 (No such file or directory)
> (7) ERROR: handle 13781 was not found
> (6) open clients/client415 failed for handle 13781 (No such file or directory)
> ..
> write failed on handle 13783
> write failed on handle 13707
> write failed on handle 13808
> write failed on handle 13117
> write failed on handle 13850
> write failed on handle 14000
> write failed on handle 13767
> write failed on handle 13787
> ..
>
> NFC what that's all about. sorry - I have to punt on digging deeper.
You are running out of disk space, most likely :-)
> I really need more guidance on
> (a) how much memory I should be testing with
With 2G of RAM, you need lots of clients. Would be much saner to just
boot with 256M, or something like that.
> (b) how many spindles would be useful (I've got ~15 on each box)
> (c) how to tell dbench to use the FS mounted on the target disks.
>
> I've attached the iommu stats in case anyone finds that useful.
To be honest, I don't think dbench is terrible useful for this. It often
suffers from the butterfly effect, so with the small improvements
virtual merging should so will most likely be lost in the noise.
--
Jens Axboe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

ak at suse

Jul 28, 2003, 4:15 AM

Post #41 of 51 (1534 views)

Permalink

On Wed, 23 Jul 2003 05:40:06 -0600
Grant Grundler <grundler@parisc-linux.org> wrote:
>
> Andi, if you could pass me details about the "reaim new dbase" (ie how
> many devices I need, where to get it) I could make time to try that in
> the next couple of weeks.
Download reaim from sourceforge
Use the workfile.new_dbase test
Run it with 100-500 users (reaim -f workfile... -s 100 -e 500 -i 100)
I tested with ext3 on a single SCSI disk.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

grundler at parisc-linux

Jul 28, 2003, 7:59 AM

Post #42 of 51 (1533 views)

Permalink

On Mon, Jul 28, 2003 at 01:15:13PM +0200, Andi Kleen wrote:
> Download reaim from sourceforge
http://lwn.net/Articles/20733/
"(couldn't think of a better name, sorry)"
I was happy when "apt-get install reaim" just worked... *sigh*
But figured out "reaim" != "re-aim-7".
debian doesn't know anything about re-aim-7. :^(
http://sourceforge.org/projects/re-aim-7
We're Sorry.
The SourceForge.net Website is currently down for maintenance.
We will be back shortly
willy mentioned it's on OSDL too. Will look for that next.
> Use the workfile.new_dbase test
> Run it with 100-500 users (reaim -f workfile... -s 100 -e 500 -i 100)
> I tested with ext3 on a single SCSI disk.
thanks Andi - hopefully I can generate results this afternoon
when I've got connectivity again.
grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

grundler at parisc-linux

Jul 29, 2003, 7:31 PM

Post #43 of 51 (1533 views)

Permalink

On Mon, Jul 28, 2003 at 01:15:13PM +0200, Andi Kleen wrote:
> Run it with 100-500 users (reaim -f workfile... -s 100 -e 500 -i 100)
> I tested with ext3 on a single SCSI disk.
andi, davem, jens,
sorry for the long delay. Here's the data for ZX1 using u320 HBA
(LSI 53c1030) and ST373453LC disk (running U160 IIRC).
If you need more runs on this config, please ask.
Executive summary: < %1 difference for this config.
I'd still like to try a 53c1010 but don't have any installed right now.
I suspect 53c1010 is alot less efficient at retrieving SG lists
and will see a bigger difference in perf.
One minor issue when starting re-aim-7, but not during the run:
reaim(343): floating-point assist fault at ip 4000000000017a81, isr 0000020000000008
reaim(343): floating-point assist fault at ip 4000000000017a61, isr 0000020000000008
For the record, I retrieved source from:
http://umn.dl.sourceforge.net/sourceforge/re-aim-7/reaim-0.1.8.tar.gz
This should be renamed to osdl-aim-7 (or something like that).
The name space collision is unfortunate and annoying.
hth,
grant
#define BIO_VMERGE_BOUNDARY 0 /* (ia64_max_iommu_merge_mask + 1) */
iota:/mnt# reaim -f /mnt/usr/local/share/reaim/workfile.new_dbase -s100 -e 500 -i 100
REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime
Num Parent Child Child Jobs per Jobs/min/ Std_dev Std_dev JTI
Forked Time SysTime UTime Minute Child Time Percent
100 110.25 12.67 206.35 5605.19 56.05 3.48 3.29 96
200 219.07 25.56 411.93 5642.06 28.21 7.83 3.73 96
300 327.59 38.23 615.83 5659.46 18.86 12.79 4.09 95
400 436.66 51.19 821.30 5661.19 14.15 18.34 4.42 95
500 548.21 65.76 1029.15 5636.54 11.27 23.34 4.47 95
Max Jobs per Minute 5661.19
iota:/mnt#
#define BIO_VMERGE_BOUNDARY (ia64_max_iommu_merge_mask + 1)
iota:/mnt# PATH=$PATH:/mnt/usr/local/bin
iota:/mnt# reaim -f /mnt/usr/local/share/reaim/workfile.new_dbase -s100 -e 500 -i 100
REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime
Num Parent Child Child Jobs per Jobs/min/ Std_dev Std_dev JTI
Forked Time SysTime UTime Minute Child Time Percent
100 108.72 12.47 203.78 5684.17 56.84 4.46 4.32 95
200 217.64 25.59 408.90 5679.16 28.40 8.65 4.16 95
300 326.48 37.88 613.62 5678.81 18.93 13.80 4.44 95
400 434.87 50.53 817.64 5684.46 14.21 17.40 4.18 95
500 544.69 65.23 1022.92 5672.92 11.35 21.53 4.12 95
Max Jobs per Minute 5684.46
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

grundler at parisc-linux

Jul 29, 2003, 9:42 PM

Post #44 of 51 (1530 views)

Permalink

On Mon, Jul 28, 2003 at 01:15:13PM +0200, Andi Kleen wrote:
> Run it with 100-500 users (reaim -f workfile... -s 100 -e 500 -i 100)
jejb was wondering if 4k pages would cause different behaviors becuase
of file system vs page size (4k vs 16k). ia64 uses 16k by default.
I've rebuilt the kernel with 4k page size and VMERGE != 0.
The substantially worse performance feels like a rat hole because
of 4x pressure on CPU TLB.
Ideally, we need a workload to test BIO code without a file system.
Any suggestions?
grant
iota:/mnt# reaim -f /mnt/usr/local/share/reaim/workfile.new_dbase -s100 -e 500 -i 100
REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime
Num Parent Child Child Jobs per Jobs/min/ Std_dev Std_dev JTI
Forked Time SysTime UTime Minute Child Time Percent
100 118.90 21.17 214.03 5197.78 51.98 4.52 3.98 96
200 236.75 42.54 429.94 5220.63 26.10 9.43 4.16 95
300 354.94 64.47 644.80 5223.34 17.41 14.47 4.27 95
400 474.50 87.01 861.09 5209.66 13.02 24.76 5.59 94
500 594.26 109.80 1077.00 5199.78 10.40 25.36 4.49 95
Max Jobs per Minute 5223.34
iota:/mnt#
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

davem at redhat

Jul 29, 2003, 9:51 PM

Post #45 of 51 (1530 views)

Permalink

On Tue, 29 Jul 2003 22:42:56 -0600
Grant Grundler <grundler@parisc-linux.org> wrote:
> On Mon, Jul 28, 2003 at 01:15:13PM +0200, Andi Kleen wrote:
> > Run it with 100-500 users (reaim -f workfile... -s 100 -e 500 -i 100)
>
> jejb was wondering if 4k pages would cause different behaviors becuase
> of file system vs page size (4k vs 16k). ia64 uses 16k by default.
> I've rebuilt the kernel with 4k page size and VMERGE != 0.
> The substantially worse performance feels like a rat hole because
> of 4x pressure on CPU TLB.
Make an ext2 filesystem with 16K blocks :-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

grundler at parisc-linux

Jul 30, 2003, 6:06 AM

Post #46 of 51 (1532 views)

Permalink

On Tue, Jul 29, 2003 at 09:51:18PM -0700, David S. Miller wrote:
> Make an ext2 filesystem with 16K blocks :-)
heh - right. I thought you were going to tell me I needed to
install DIMMs that support 4k pages :^)
grant
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

James.Bottomley at steeleye

Jul 30, 2003, 7:20 AM

Post #47 of 51 (1532 views)

Permalink

On Tue, 2003-07-29 at 23:42, Grant Grundler wrote:
> On Mon, Jul 28, 2003 at 01:15:13PM +0200, Andi Kleen wrote:
> > Run it with 100-500 users (reaim -f workfile... -s 100 -e 500 -i 100)
>
> jejb was wondering if 4k pages would cause different behaviors becuase
> of file system vs page size (4k vs 16k). ia64 uses 16k by default.
> I've rebuilt the kernel with 4k page size and VMERGE != 0.
> The substantially worse performance feels like a rat hole because
> of 4x pressure on CPU TLB.
OK, I admit it, it was a rat hole. Provided reaim uses large files, we
should only get block<->page fragmentation at the edges, and obviously,
reaim has to use large files otherwise it's not testing the virtual
merging properly...
James
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

grundler at parisc-linux

Jul 30, 2003, 9:02 AM

Post #48 of 51 (1532 views)

Permalink

On Tue, Jul 29, 2003 at 09:51:18PM -0700, David S. Miller wrote:
> Make an ext2 filesystem with 16K blocks :-)
Executive summary:
looks the same as previous 4k block/16k page w/VMERGE enabled.
davem, I thought you were joking...I've submitted a oneliner to
Ted Tyso to increase EXT2_MAX_BLOCK_LOG_SIZE to 64k.
kudos to willy for quickly digging this up.
16k block size Works For Me (tm).
appended is the re-aim-7 results for 16k page/block on ext2.
grant
iota:/mnt# PATH=$PATH:/mnt/usr/local/bin
iota:/mnt# reaim -f /mnt/usr/local/share/reaim/workfile.new_dbase -s100 -e 500 -i 100
REAIM Workload
Times are in seconds - Child times from tms.cstime and tms.cutime
Num Parent Child Child Jobs per Jobs/min/ Std_dev Std_dev JTI
Forked Time SysTime UTime Minute Child Time Percent
100 108.62 12.46 203.84 5689.35 56.89 4.59 4.46 95
200 217.51 25.29 408.11 5682.60 28.41 9.06 4.36 95
300 325.57 38.05 612.20 5694.63 18.98 12.01 3.85 96
400 434.89 50.67 817.90 5684.16 14.21 15.60 3.75 96
500 545.89 65.74 1024.75 5660.51 11.32 29.45 5.72 94
Max Jobs per Minute 5694.63
iota:/mnt#
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

ak at suse

Jul 30, 2003, 9:36 AM

Post #49 of 51 (1532 views)

Permalink

On Wed, Jul 30, 2003 at 10:02:50AM -0600, Grant Grundler wrote:
> On Tue, Jul 29, 2003 at 09:51:18PM -0700, David S. Miller wrote:
> > Make an ext2 filesystem with 16K blocks :-)
>
> Executive summary:
> looks the same as previous 4k block/16k page w/VMERGE enabled.
>
> davem, I thought you were joking...I've submitted a oneliner to
> Ted Tyso to increase EXT2_MAX_BLOCK_LOG_SIZE to 64k.
> kudos to willy for quickly digging this up.
> 16k block size Works For Me (tm).
>
> appended is the re-aim-7 results for 16k page/block on ext2.
The differences were greater with the mpt fusion driver, maybe it has
more overhead. Or your IO subsystem is significantly different.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [RFC] block layer support for DMA IOMMU bypass mode II [ In reply to ]

James.Bottomley at steeleye

Jul 30, 2003, 10:18 AM

Post #50 of 51 (1533 views)

Permalink

On Wed, 2003-07-30 at 11:36, Andi Kleen wrote:
> The differences were greater with the mpt fusion driver, maybe it has
> more overhead. Or your IO subsystem is significantly different.
By and large, these results are more like what I expect.
As I've said before, getting SG tables to work efficiently is a core
part of getting an I/O board to function.
There are two places vmerging can help:
1. Reducing the size of the SG table
2. Increasing the length of the I/O for devices with fixed (but small)
SG table lengths.
However, it's important to remember that vmerging comes virtually for
free in the BIO layer, so the only added cost is the programming of the
IOMMU. This isn't an issue on SPARC, PA-RISC and the like where IOMMU
programming is required to do I/O, it may be something the IOMMU
optional architectures (like IA-64 and AMD-64) should consider, which is
where I entered with the IOMMU bypass patch.
James
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/