Mailing List Archive

piledriver/trinity cpu/apu hardware tester needed, bug 445053
Bug #445053 deals with the new USE=fma flag in sci-libs/fftw-3.3.3
(~amd64). This flag enables upstream's new-for-that-version fma
instruction set optimizations, but the problem is that there's two
different fma instruction sets, fma3 and fma4. The wikipedia article
explains the difference, history, etc, in some detail.

Bug URL: https://bugs.gentoo.org/show_bug.cgi?id=445053

fma on wikipedia: http://en.wikipedia.org/wiki/FMA_instruction_set


So when I go to do my update, I see the new USE flag, and having an amd
bdver1 (bulldozer) with fma4, but seeing the USE flag is for fma (no
number appended), I'm confused and start looking into things, then file
that bug.

I've now actually tested USE=fma on my bdver1 (fma4) hardware with both
the ebuild's "small" tests, and manually run "make bigtest" in all three
subdirs (single/double/long-double) created as part of the build process,
passing all tests, so it seems fma works reliably for fma4 hardware.
What we do NOT yet know for sure is whether it works reliably on fma3
hardware, so we now need someone with fma3 hardware to check there, as
well.

According to the wikipedia article, Intel hardware will support fma3 with
hardware to be released in 2013, so AFAIK, there's no released Intel
hardware with hardware fma support at all, yet. Still anyone with a
current (definitely this year) Intel cpu/apu is welcome to check /proc/
cpuinfo and see, and run the tests if they have it.

The newest amd hardware should already have fma support, however, but it
could be fma3 or fma4 depending on CPU.

Bulldozer (-march=bdver1 in gcc) chips, released in late 2011, should
have fma4 listed in /proc/cpuinfo, as I do here. That's what I tested
with USE=fma here, with all tests I ran passing.

The new piledriver CPUs, and trinity APUs, however (I believe -
march=bdver2, but am not positive on that), are supposed to support
fma3. I'd guess /proc/cpuinfo should report either fma3 or simply fma,
for them. That's what still needs tested.

So, anyone with that hardware, could you at least set USE=fma and run
ebuild ... test on sci-libs/fftw-3.3.3 , then report the results in the
bug? Based on my results, the whole build and test (the ebuild runs make
smalltest for all three subdirs) should only run perhaps five minutes or
so (it was about three here, including the configure and build, tho my
PORTAGE_TMPDIR is on tmpfs, so it might take a bit longer for those with
it on a spinning hard drive).

Ideally, once the ebuild test passes, you'd also manually cd into the
work dir, source the environment file to get the portage build
environment, and run emake bigtest in all three subdirs (the ebuild uses
a loop thru the subdirs to run smalltest, you can do the same for bigtest,
or cd into each and run the tests manually). That will take rather
longer, perhaps an hour or so for the single subdir, longer, maybe two
hours, for the double subdir, and the same or longer for long-double.
However, the tests don't make very efficient use of the CPU, so if you
have a quad-core or better, likely with piledriver anyway, you could
probably run the tests for all three subdirs in parallel and still have
CPU left to run other things.

If it passes (e)make smalltest (in the ebuild test phase) and the manual
(e)make bigtest, for all three subdirs, with USE=fma, on an fma3 hardware
system, it should be safe to change the USE flag description to say it
can be used for either fma3 or fma4 hardware. If not, then since it does
seem to work on my fma4 hardware, perhaps the flag should be changed to
fma4.

So any help testing fma3 hardware would definitely be appreciated. Please
report results on the bug. Anyone with fma4 hardware can double-check my
results as well, but it does seem to work here.

Thanks. =:^)

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman