Mailing List Archive

[PATCH] TTM DMA pool v2.2 or [GIT PULL] (stable/ttm.dma_pool.v2.3) for 3.3
I am not sure what the right way to patches in Dave tree is for Linux 3.3, so I
am posting the patches and also providing the means of doing a git pull.

The git tree is:

git pull git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/ttm.dma_pool.v2.3


Oh, and Thomas, should I add your Ack on the patches as well? I think it was an implied
Ack, but I do not want to presume. If so, I can respin this with your Ack shortly.

and it has since v2.1: https://lwn.net/Articles/463815/
- Fixed bugs/mistakes pointed out by Jerome
- Added Review-by: Jereme Glisse
Since v2.0: [not posted]
- Redid the registration/override to be tightly integrated with the
'struct ttm_backend_func' per Thomas's suggestion.
Since v1.9: [not posted]
- Performance improvements - it was doing O(n^2) instead of O(n) on certain
workloads.
Since v1.8: [lwn.net/Articles/458724/]
- Removed swiotlb_enabled and used swiotlb_nr_tbl.
- Added callback for changing cache types.
Since v1.7: [https://lkml.org/lkml/2011/8/30/460]
- Fixed checking the DMA address in radeon/nouveau code.
Since v1: [http://lwn.net/Articles/456246/]
- Ran it through the gauntlet of SubmitChecklist and fixed issues
- Made radeon/nouveau driver set coherent_dma (which is required for dmapool)

[.. and this is what I said in v1 post]:

Way back in January this patchset:
http://lists.freedesktop.org/archives/dri-devel/2011-January/006905.html
was merged in, but pieces of it had to be reverted b/c they did not
work properly under PowerPC, ARM, and when swapping out pages to disk.

After a bit of discussion on the mailing list
http://marc.info/?i=4D769726.2030307@shipmail.org I started working on it, but
got waylaid by other things .. and finally I am able to post the RFC patches.

There was a lot of discussion about it and I am not sure if I captured
everybody's thoughts - if I did not - that is _not_ intentional - it has just
been quite some time..

Anyhow .. the patches explore what the "lib/dmapool.c" does - which is to have a
DMA pool that the device has associated with. I kind of married that code
along with drivers/gpu/drm/ttm/ttm_page_alloc.c to create a TTM DMA pool code.
The end result is DMA pool with extra features: can do write-combine, uncached,
writeback (and tracks them and sets back to WB when freed); tracks "cached"
pages that don't really need to be returned to a pool; and hooks up to
the shrinker code so that the pools can be shrunk.

If you guys think this set of patches make sense - my future plans were
1) Get this in large crowd of testing .. and if it works for a kernel release
2) to move a bulk of this in the lib/dmapool.c (I spoke with Matthew Wilcox
about it and he is OK as long as I don't introduce performance regressions).

In regards to testing, I've been running them non-stop for the last two months.
(and found some issues which I've fixed up) - and been quite happy with how
they work.

Michel (thanks!) took a spin of the patches on his PowerPC and they did not
cause any regressions (wheew).

The patches are also located in a git tree:

git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/ttm.dma_pool.v2.3

Konrad Rzeszutek Wilk (11):
swiotlb: Expose swiotlb_nr_tlb function to modules
nouveau/radeon: Set coherent DMA mask
ttm/radeon/nouveau: Check the DMA address from TTM against known value.
ttm: Wrap ttm_[put|get]_pages and extract GFP_* and caching states from 'struct ttm_tt'
ttm: Get rid of temporary scaffolding
ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.
ttm: Do not set the ttm->be to NULL before calling the TTM page pool to free pages.
ttm: Provide DMA aware TTM page pool code.
ttm: Add 'no_dma' parameter to turn the TTM DMA pool off during runtime.
nouveau/ttm/dma: Enable the TTM DMA pool if device can only do 32-bit DMA.
radeon/ttm/dma: Enable the TTM DMA pool if the device can only do 32-bit.

drivers/gpu/drm/nouveau/nouveau_debugfs.c | 1 +
drivers/gpu/drm/nouveau/nouveau_mem.c | 5 +
drivers/gpu/drm/nouveau/nouveau_sgdma.c | 8 +-
drivers/gpu/drm/radeon/radeon_device.c | 6 +
drivers/gpu/drm/radeon/radeon_gart.c | 4 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 19 +-
drivers/gpu/drm/ttm/Makefile | 3 +
drivers/gpu/drm/ttm/ttm_memory.c | 5 +
drivers/gpu/drm/ttm/ttm_page_alloc.c | 108 ++-
drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 1409 +++++++++++++++++++++++++++++
drivers/gpu/drm/ttm/ttm_tt.c | 21 +-
drivers/xen/swiotlb-xen.c | 2 +-
include/drm/ttm/ttm_bo_driver.h | 31 +
include/drm/ttm/ttm_page_alloc.h | 53 +-
include/linux/swiotlb.h | 2 +-
lib/swiotlb.c | 5 +-
16 files changed, 1600 insertions(+), 82 deletions(-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH] TTM DMA pool v2.2 or [GIT PULL] (stable/ttm.dma_pool.v2.3) for 3.3 [ In reply to ]
On Tue, Nov 01, 2011 at 02:47:21PM -0400, Konrad Rzeszutek Wilk wrote:
> I am not sure what the right way to patches in Dave tree is for Linux 3.3, so I
> am posting the patches and also providing the means of doing a git pull.
>
> The git tree is:
>
> git pull git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/ttm.dma_pool.v2.3
>
>
> Oh, and Thomas, should I add your Ack on the patches as well? I think it was an implied
> Ack, but I do not want to presume. If so, I can respin this with your Ack shortly.
>
> and it has since v2.1: https://lwn.net/Articles/463815/
> - Fixed bugs/mistakes pointed out by Jerome
> - Added Review-by: Jereme Glisse
> Since v2.0: [not posted]
> - Redid the registration/override to be tightly integrated with the
> 'struct ttm_backend_func' per Thomas's suggestion.
> Since v1.9: [not posted]
> - Performance improvements - it was doing O(n^2) instead of O(n) on certain
> workloads.
> Since v1.8: [lwn.net/Articles/458724/]
> - Removed swiotlb_enabled and used swiotlb_nr_tbl.
> - Added callback for changing cache types.
> Since v1.7: [https://lkml.org/lkml/2011/8/30/460]
> - Fixed checking the DMA address in radeon/nouveau code.
> Since v1: [http://lwn.net/Articles/456246/]
> - Ran it through the gauntlet of SubmitChecklist and fixed issues
> - Made radeon/nouveau driver set coherent_dma (which is required for dmapool)
>
> [.. and this is what I said in v1 post]:
>
> Way back in January this patchset:
> http://lists.freedesktop.org/archives/dri-devel/2011-January/006905.html
> was merged in, but pieces of it had to be reverted b/c they did not
> work properly under PowerPC, ARM, and when swapping out pages to disk.
>
> After a bit of discussion on the mailing list
> http://marc.info/?i=4D769726.2030307@shipmail.org I started working on it, but
> got waylaid by other things .. and finally I am able to post the RFC patches.
>
> There was a lot of discussion about it and I am not sure if I captured
> everybody's thoughts - if I did not - that is _not_ intentional - it has just
> been quite some time..
>
> Anyhow .. the patches explore what the "lib/dmapool.c" does - which is to have a
> DMA pool that the device has associated with. I kind of married that code
> along with drivers/gpu/drm/ttm/ttm_page_alloc.c to create a TTM DMA pool code.
> The end result is DMA pool with extra features: can do write-combine, uncached,
> writeback (and tracks them and sets back to WB when freed); tracks "cached"
> pages that don't really need to be returned to a pool; and hooks up to
> the shrinker code so that the pools can be shrunk.
>
> If you guys think this set of patches make sense - my future plans were
> 1) Get this in large crowd of testing .. and if it works for a kernel release
> 2) to move a bulk of this in the lib/dmapool.c (I spoke with Matthew Wilcox
> about it and he is OK as long as I don't introduce performance regressions).
>
> In regards to testing, I've been running them non-stop for the last two months.
> (and found some issues which I've fixed up) - and been quite happy with how
> they work.
>
> Michel (thanks!) took a spin of the patches on his PowerPC and they did not
> cause any regressions (wheew).
>
> The patches are also located in a git tree:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/ttm.dma_pool.v2.3
>

On what hw did you tested ? With and without xen ? Here radeon
that doesn't need dma32 doesn't work when forcing swiotlb which
kind of expected i guess. Should we expose if swiotlb is enabled
forced so we use dma pool in such case ?

Cheers,
Jerome

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH] TTM DMA pool v2.2 or [GIT PULL] (stable/ttm.dma_pool.v2.3) for 3.3 [ In reply to ]
On Fri, Nov 04, 2011 at 02:44:53PM -0400, Konrad Rzeszutek Wilk wrote:
> > > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/ttm.dma_pool.v2.3
> > >
> >
> > On what hw did you tested ? With and without xen ? Here radeon
>
> On AMD and Intel. And with both Nvidia and Radeon cards.
> 64-bit cards (I have a patch where I forced the 64-bit card to use
> the TTM DMA pool code to test) and 32-bit cards (ATI ES1000)
>
> On baremetal and Xen. Um, Fedora Core 16 as distro.
>
> Oh, and I also tried PPC (Power Mac 4) but could not get it to boot
> the 3.1 kernel. Something with the LILO grub loader did not work.
>
> > that doesn't need dma32 doesn't work when forcing swiotlb which
> > kind of expected i guess. Should we expose if swiotlb is enabled
>
> You did 'swiotlb=force' ?
> > forced so we use dma pool in such case ?

Issue is that when booted without force swiotlb_nr_tlb still return
positive thus we endup using the dma pool path.

Cheers,
Jerome

> Hm, it shoudl have enabled itself. The swiotlb_nr_tlb would return some
> contents and we would.. Oh, you mean you did a 64-bit card _and_
> did swiotlb=force. And since the rdev->dma32 was set to zero it
> did _not_ use the TTM DMA pool.
>
> Right. I did not do it initially just so that I could limit the scope
> in case I messed up something in the code. But the code has the
> 'no_dma' parameter, so it can easily turn off the DMA TTM code.
>
> So, to answer your question - sure, we can ignore the rdev_dma32 and
> just use the the swiotlb_nr_tlb to check.
>
> BTW, thank you for taking a spin with these patches and rebasing them
> on top of yours. I am going to start testing them and reviewing the
> latest batch you sent on Monday.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH] TTM DMA pool v2.2 or [GIT PULL] (stable/ttm.dma_pool.v2.3) for 3.3 [ In reply to ]
> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/ttm.dma_pool.v2.3
> >
>
> On what hw did you tested ? With and without xen ? Here radeon

On AMD and Intel. And with both Nvidia and Radeon cards.
64-bit cards (I have a patch where I forced the 64-bit card to use
the TTM DMA pool code to test) and 32-bit cards (ATI ES1000)

On baremetal and Xen. Um, Fedora Core 16 as distro.

Oh, and I also tried PPC (Power Mac 4) but could not get it to boot
the 3.1 kernel. Something with the LILO grub loader did not work.

> that doesn't need dma32 doesn't work when forcing swiotlb which
> kind of expected i guess. Should we expose if swiotlb is enabled

You did 'swiotlb=force' ?
> forced so we use dma pool in such case ?

Hm, it shoudl have enabled itself. The swiotlb_nr_tlb would return some
contents and we would.. Oh, you mean you did a 64-bit card _and_
did swiotlb=force. And since the rdev->dma32 was set to zero it
did _not_ use the TTM DMA pool.

Right. I did not do it initially just so that I could limit the scope
in case I messed up something in the code. But the code has the
'no_dma' parameter, so it can easily turn off the DMA TTM code.

So, to answer your question - sure, we can ignore the rdev_dma32 and
just use the the swiotlb_nr_tlb to check.

BTW, thank you for taking a spin with these patches and rebasing them
on top of yours. I am going to start testing them and reviewing the
latest batch you sent on Monday.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH] TTM DMA pool v2.2 or [GIT PULL] (stable/ttm.dma_pool.v2.3) for 3.3 [ In reply to ]
On Fri, Nov 04, 2011 at 03:24:51PM -0400, Jerome Glisse wrote:
> On Fri, Nov 04, 2011 at 02:44:53PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/ttm.dma_pool.v2.3
> > > >
> > >
> > > On what hw did you tested ? With and without xen ? Here radeon
> >
> > On AMD and Intel. And with both Nvidia and Radeon cards.
> > 64-bit cards (I have a patch where I forced the 64-bit card to use
> > the TTM DMA pool code to test) and 32-bit cards (ATI ES1000)
> >
> > On baremetal and Xen. Um, Fedora Core 16 as distro.
> >
> > Oh, and I also tried PPC (Power Mac 4) but could not get it to boot
> > the 3.1 kernel. Something with the LILO grub loader did not work.
> >
> > > that doesn't need dma32 doesn't work when forcing swiotlb which
> > > kind of expected i guess. Should we expose if swiotlb is enabled
> >
> > You did 'swiotlb=force' ?
> > > forced so we use dma pool in such case ?
>
> Issue is that when booted without force swiotlb_nr_tlb still return
> positive thus we endup using the dma pool path.

Did " Using software bounce buffering for IO (SWIOTLB)" or "software IO TLB at"
show up in the dmesg output? You might have to run it with 'debug loglevel=8'?
Presumarily yes, since the code "swiotlb: Expose.." sets
io_tlb_nslabs = 0

if there is no need for it. And since io_tlb_nslabs is set, then the code
did start.

Some AMD boxes with the AMD-Vi chipset enable the SWIOTLB b/c not all of
the PCI devices are on the IOMMU chipset path. The Intel VT-d does the same
thing.

Hmm, I think the reason for those devices to turn SWIOTLB on is that they
are not comfortable handling 32-bit devices, and the TTM DMA pool code would
only turn itself on when the radeon|nouveau was 32-bit _and_ SWIOTLB was enabled.

.. Testing the patches you posted on my AMD box.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH] TTM DMA pool v2.2 or [GIT PULL] (stable/ttm.dma_pool.v2.3) for 3.3 [ In reply to ]
On Fri, Nov 04, 2011 at 03:41:03PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Nov 04, 2011 at 03:24:51PM -0400, Jerome Glisse wrote:
> > On Fri, Nov 04, 2011 at 02:44:53PM -0400, Konrad Rzeszutek Wilk wrote:
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/ttm.dma_pool.v2.3
> > > > >
> > > >
> > > > On what hw did you tested ? With and without xen ? Here radeon
> > >
> > > On AMD and Intel. And with both Nvidia and Radeon cards.
> > > 64-bit cards (I have a patch where I forced the 64-bit card to use
> > > the TTM DMA pool code to test) and 32-bit cards (ATI ES1000)
> > >
> > > On baremetal and Xen. Um, Fedora Core 16 as distro.
> > >
> > > Oh, and I also tried PPC (Power Mac 4) but could not get it to boot
> > > the 3.1 kernel. Something with the LILO grub loader did not work.
> > >
> > > > that doesn't need dma32 doesn't work when forcing swiotlb which
> > > > kind of expected i guess. Should we expose if swiotlb is enabled
> > >
> > > You did 'swiotlb=force' ?
> > > > forced so we use dma pool in such case ?
> >
> > Issue is that when booted without force swiotlb_nr_tlb still return
> > positive thus we endup using the dma pool path.
>
> Did " Using software bounce buffering for IO (SWIOTLB)" or "software IO TLB at"
> show up in the dmesg output? You might have to run it with 'debug loglevel=8'?
> Presumarily yes, since the code "swiotlb: Expose.." sets
> io_tlb_nslabs = 0
>
> if there is no need for it. And since io_tlb_nslabs is set, then the code
> did start.
>
> Some AMD boxes with the AMD-Vi chipset enable the SWIOTLB b/c not all of
> the PCI devices are on the IOMMU chipset path. The Intel VT-d does the same
> thing.
>
> Hmm, I think the reason for those devices to turn SWIOTLB on is that they
> are not comfortable handling 32-bit devices, and the TTM DMA pool code would
> only turn itself on when the radeon|nouveau was 32-bit _and_ SWIOTLB was enabled.
>
> .. Testing the patches you posted on my AMD box.

Yes, swiotlb was enabled (bounce buffer message) thing is when force is
not set usual ttm memory page allocation works properly ie on pci map
no bounce buffer is use, but when force is set it does use a bounce
This is due to :
if (dma_capable(dev, dev_addr, size) && !swiotlb_force)
in swiotlb_map_page, i am not sure how much the force argument is
usefull.

Anyway i did few benchmark and it seems that the dma allocator isn't
slower than the other page allocator. But this is limited testing
(openarena, nexuiz running on composited desktop with firefox).

Cheers,
Jerome

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH] TTM DMA pool v2.2 or [GIT PULL] (stable/ttm.dma_pool.v2.3) for 3.3 [ In reply to ]
. snip..
> > Some AMD boxes with the AMD-Vi chipset enable the SWIOTLB b/c not all of
> > the PCI devices are on the IOMMU chipset path. The Intel VT-d does the same
> > thing.
> >
> > Hmm, I think the reason for those devices to turn SWIOTLB on is that they
> > are not comfortable handling 32-bit devices, and the TTM DMA pool code would
> > only turn itself on when the radeon|nouveau was 32-bit _and_ SWIOTLB was enabled.
> >
> > .. Testing the patches you posted on my AMD box.
>
> Yes, swiotlb was enabled (bounce buffer message) thing is when force is
> not set usual ttm memory page allocation works properly ie on pci map
> no bounce buffer is use, but when force is set it does use a bounce
> This is due to :
> if (dma_capable(dev, dev_addr, size) && !swiotlb_force)
> in swiotlb_map_page, i am not sure how much the force argument is
> usefull.
>
> Anyway i did few benchmark and it seems that the dma allocator isn't
> slower than the other page allocator. But this is limited testing
> (openarena, nexuiz running on composited desktop with firefox).

Hehe. I also run it with perf record to see it before and after.
Albeit with 'swiotlb=force' _everything_ is going throught the bounce buffer
- including your network packets/USB mouse/etc.

This little patch makes it easier to switch between the TTM DMA and
the default one without using the big hammer approach of 'swiotlb=force':

>From d60930d9b515036268cdf9d9a3d4181bb458ac5c Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri, 4 Nov 2011 16:53:34 -0400
Subject: [PATCH] ttm:dma: Add 'ttm_dma' module to radeon and nouveau to force
enable the TTM DMA

. irregardless of whether the device is restricted to DMA32. This
patch is for testing purposes.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
drivers/gpu/drm/radeon/radeon.h | 1 +
drivers/gpu/drm/radeon/radeon_drv.c | 4 ++++
drivers/gpu/drm/radeon/radeon_ttm.c | 6 +++---
3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 63257ba..9cae9e2 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -94,6 +94,7 @@ extern int radeon_disp_priority;
extern int radeon_hw_i2c;
extern int radeon_pcie_gen2;

+extern int radeon_ttm_dma;
/*
* Copy from radeon_drv.h so we don't have to include both and have conflicting
* symbol;
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index e71d2ed..ff81748 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -118,6 +118,10 @@ int radeon_audio = 0;
int radeon_disp_priority = 0;
int radeon_hw_i2c = 0;
int radeon_pcie_gen2 = 0;
+int radeon_ttm_dma = 0;
+
+MODULE_PARM_DESC(ttm_dma, "Enable TTM DMA page pool always irregardless of DMA32 flag");
+module_param_named(ttm_dma, radeon_ttm_dma, int, 0444);

MODULE_PARM_DESC(no_wb, "Disable AGP writeback for scratch registers");
module_param_named(no_wb, radeon_no_wb, int, 0444);
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 82119a5..0dc0048 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -593,7 +593,7 @@ static int radeon_ttm_tt_populate(struct ttm_tt *ttm)
rdev = radeon_get_rdev(ttm->bdev);

#ifdef CONFIG_SWIOTLB
- if (rdev->need_dma32 && swiotlb_nr_tbl()) {
+ if ((rdev->need_dma32 && swiotlb_nr_tbl()) || radeon_ttm_dma) {
return ttm_dma_populate(ttm, rdev->dev);
}
#endif
@@ -628,7 +628,7 @@ static void radeon_ttm_tt_unpopulate(struct ttm_tt *ttm)
rdev = radeon_get_rdev(ttm->bdev);

#ifdef CONFIG_SWIOTLB
- if (rdev->need_dma32 && swiotlb_nr_tbl()) {
+ if ((rdev->need_dma32 && swiotlb_nr_tbl()) || radeon_ttm_dma) {
ttm_dma_unpopulate(ttm, rdev->dev);
return;
}
@@ -858,7 +858,7 @@ static int radeon_ttm_debugfs_init(struct radeon_device *rdev)
radeon_mem_types_list[i].driver_features = 0;
radeon_mem_types_list[i++].data = NULL;
#ifdef CONFIG_SWIOTLB
- if (rdev->need_dma32 && swiotlb_nr_tbl()) {
+ if ((rdev->need_dma32 && swiotlb_nr_tbl()) || radeon_ttm_dma) {
sprintf(radeon_mem_types_names[i], "ttm_dma_page_pool");
radeon_mem_types_list[i].name = radeon_mem_types_names[i];
radeon_mem_types_list[i].show = &ttm_dma_page_alloc_debugfs;
--
1.7.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel