Mailing List Archive

Optimizing performance
Hi all,

I was wondering if there are any sane ways to optimize the performance
of a Gentoo system.
Overoptimization (the well known "-O9 -fomgomg" CFLAGS etc.) tends to
make things unstable, which is of course not what we want. The "easy"
way out would be buying faster hardware, but that is usually not an
option ;-)

So ... what can be done to get the stable maximum out of your hardware?

In my experience (x86 centric - do other arches have different
"problems"?) the following is stable, but not necessarily the optimum:
- don't overtweak CFLAGS. "-O2 -march=$your_cpu_family" seems to be on
average the best, -O3 is often slower and can cause bugs
- don't do anything with ASFLAGS, LDFLAGS. This causes weird random
breakage (e.g. LDFLAGS="-O1" causes prelink to fail with "absurd"
errors) and doesn't give a noticeable performance boost
- check that all IDE disks use DMA mode, otherwise they are limited to
~16M/s with a huge CPU usage penalty. Sometimes (application-specific)
increasing the readahead with hdparm gives a huge throughput boost.
- kernel tweaks like preempt may increase the responsiveness of the
system, but often reduce throughput and may have unexpected sideeffects
like random audio stutter as well as random kernel crashes ;-)
- kernel tweaks like setting swappiness or using a different I/O
scheduler (CFQ, deadline) should help, but I'm not aware of any "real"
benchmarks except microbenchmarks (can create 1M files 10% faster!!!!! -
yes, but how does it behave with a normal workload?)
- using a "smarter" filesystem can dramatically improve performance at
the potential cost of reliability. As data on FS reliability is hard to
find from unbiased sources this becomes a religious issue ... migrating
from ext3 to reiserfs makes "emerge sync" extremely much faster, but is
reiserfs sustainable?

Are there any application-specific tweaks (e.g. "use the prefork MPM
with apache2")? What is known to break things, what has usually
beneficial behaviour? Are there any useful benchmarks that show the
performance difference between different settings?

Thanks for your input,

Patrick
--
Stand still, and let the rest of the universe move
Re: Optimizing performance [ In reply to ]
Patrick Lauer wrote:
> Hi all,
>
> I was wondering if there are any sane ways to optimize the performance
> of a Gentoo system.
> Overoptimization (the well known "-O9 -fomgomg" CFLAGS etc.) tends to
> make things unstable, which is of course not what we want. The "easy"
> way out would be buying faster hardware, but that is usually not an
> option ;-)

Some upstreams, mostly media related but also unsuspectable like MySQL,
use and test their apps with high optimizations.
As an effect some of these apps tend to be _more_ stable with those high
optimizations.
If I recall correctly Ned Ludd (solar) did some work on having per
package defined CFLAGS, don't know what was the intent of that work but
integrate in portage a /etc/portage/package.env support, and let the
packages mantainer _suggest_ optimal C*FLAGS may increase both stability
and performance.
However this require _a lot_ of manpower, add maybe unmanageable
complexity, in every stage of a package life, from writing the ebuild to
the final stabilization.

>
> So ... what can be done to get the stable maximum out of your hardware?
>
> In my experience (x86 centric - do other arches have different
> "problems"?) the following is stable, but not necessarily the optimum:
> - don't overtweak CFLAGS. "-O2 -march=$your_cpu_family" seems to be on
> average the best, -O3 is often slower and can cause bugs

see ^^^

> - don't do anything with ASFLAGS, LDFLAGS. This causes weird random
> breakage (e.g. LDFLAGS="-O1" causes prelink to fail with "absurd"
> errors) and doesn't give a noticeable performance boost

see ^^^

> - check that all IDE disks use DMA mode, otherwise they are limited to
> ~16M/s with a huge CPU usage penalty. Sometimes (application-specific)
> increasing the readahead with hdparm gives a huge throughput boost.

having more than one disk or a lot of memory add very interesting
addition, read raid 0 (stripe) or tmpfs for working data that does'nt
need a backup fex: $PORTIR, /var/tmp ...

> - kernel tweaks like preempt may increase the responsiveness of the
> system, but often reduce throughput and may have unexpected sideeffects
> like random audio stutter as well as random kernel crashes ;-)

I've found that preemption with the new standard 250Hz of the kernel is
suitable for mostly needs, however no server here has preemption enabled ;-)

> - kernel tweaks like setting swappiness or using a different I/O
> scheduler (CFQ, deadline) should help, but I'm not aware of any "real"
> benchmarks except microbenchmarks (can create 1M files 10% faster!!!!! -
> yes, but how does it behave with a normal workload?)

what is a normal workload ? Define it and creating tests should not be
so difficult, then there are apps that can help to profiling, thinking
at bootchart, sysproof, memproof, valgrind ... strace

> - using a "smarter" filesystem can dramatically improve performance at
> the potential cost of reliability. As data on FS reliability is hard to
> find from unbiased sources this becomes a religious issue ... migrating
> from ext3 to reiserfs makes "emerge sync" extremely much faster, but is
> reiserfs sustainable?

reiserfs is sustainable, at least for 99.999% of uses, last reiserfs bug
on very high load (and with degraded raid5) is dated 4 years ago here.
However upstream is going to the route of reiser4, much more complex,
and much more unstable, latest problems where in 2.6.14, additionally no
devs in gentoo are (will?) support it the patch for grub it's still not
in place I think.

>
> Are there any application-specific tweaks (e.g. "use the prefork MPM
> with apache2")? What is known to break things, what has usually
> beneficial behaviour? Are there any useful benchmarks that show the
> performance difference between different settings?

is'n there "ab" [1] for apache testing ?

Cheers,
Francesco Riosa

[1] http://httpd.apache.org/docs/2.0/programs/ab.html
--
gentoo-dev@gentoo.org mailing list
Re: Optimizing performance [ In reply to ]
On Thu, 2005-12-15 at 13:48 +0100, Patrick Lauer wrote:
> - don't overtweak CFLAGS. "-O2 -march=$your_cpu_family" seems to be on
> average the best, -O3 is often slower and can cause bugs

-O2 -march=$your_cpu_family -pipe -fomit-frame-pointer

-pipe
Use pipes rather than temporary files for communication between
the various stages of compilation. This fails to work on some
systems where the assembler is unable to read from a pipe; but
the GNU assembler has no trouble.

-O also turns on -fomit-frame-pointer on machines where doing so does
not interfere with debugging.

(However, x86 is not one of these machines, so you can turn it on if you
are not a developer doing debugging for a slight additional speed
increase)

-fomit-frame-pointer
Don't keep the frame pointer in a register for functions that
don't need one. This avoids the instructions to save, set up and
restore frame pointers; it also makes an extra register
available in many functions.

> - don't do anything with ASFLAGS, LDFLAGS. This causes weird random
> breakage (e.g. LDFLAGS="-O1" causes prelink to fail with "absurd"
> errors) and doesn't give a noticeable performance boost

Correct.

Also, running prelink can improve speed at the cost of disk space.

> - check that all IDE disks use DMA mode, otherwise they are limited to
> ~16M/s with a huge CPU usage penalty. Sometimes (application-specific)
> increasing the readahead with hdparm gives a huge throughput boost.

I typically use the same hdparm settings as listed in the Handbook:

disc0_args="-d1 -A1 -m16 -u1 -a64 -c1"
cdrom0_args="-d1 -c1"

> - kernel tweaks like preempt may increase the responsiveness of the
> system, but often reduce throughput and may have unexpected sideeffects
> like random audio stutter as well as random kernel crashes ;-)

This is especially true on non-x86 architectures.

> - kernel tweaks like setting swappiness or using a different I/O
> scheduler (CFQ, deadline) should help, but I'm not aware of any "real"
> benchmarks except microbenchmarks (can create 1M files 10% faster!!!!! -
> yes, but how does it behave with a normal workload?)

CFQ is much worse for a desktop system. I tend to like deadline for
playing games. These can probably make a bit more difference than a new
-fomg-itsofast-and-broken-math added to CFLAGS.

> - using a "smarter" filesystem can dramatically improve performance at
> the potential cost of reliability. As data on FS reliability is hard to
> find from unbiased sources this becomes a religious issue ... migrating
> from ext3 to reiserfs makes "emerge sync" extremely much faster, but is
> reiserfs sustainable?

Well, reiserfs 3 isn't so bad on architectures where it doesn't vomit
all over itself immediately. Also, resierfs loses much of its luster if
you're running ext3 with dir_index. There was a tip in the GWN about
turning on dir_index on an already formatted file system. If formatting
a new one, just use mkfs.ext2 -J -O dir_index /dev/$whatever to create
your file system.

> Are there any application-specific tweaks (e.g. "use the prefork MPM
> with apache2")? What is known to break things, what has usually
> beneficial behaviour? Are there any useful benchmarks that show the
> performance difference between different settings?

Well, turning on SBA and Fast Writes on Nvidia always helps. As for
benchmarks, I think the issue is it depends entirely on usage. Having
something that is 30% faster on paper isn't very useful if you never do
it the way the benchmark does. I wish I had more numbers/examples here,
but there isn't really much in the way of decent benchmarks published
and readily available. Hopefully some other people will know of more of
them than I do.

--
Chris Gianelloni
Release Engineering - Strategic Lead
x86 Architecture Team
Games - Developer
Gentoo Linux
Re: Optimizing performance [ In reply to ]
On Thursday 15 December 2005 14:43, Francesco Riosa wrote:
> Some upstreams, mostly media related but also unsuspectable like MySQL,
> use and test their apps with high optimizations.
Not exactly true.. many media related upstreams forces "ricing" flags
(-fomg-so-fast) on packages, but that does not really mean it's more stable
that way... actually, xine-lib proved to be way more stable while *not* using
ricing CFLAGS.
The actual problem there is that many packages have code that breaks if you
remove those flags, so for example xine-lib has to foce a few flags on to
work fine (on x86).

--
Diego "Flameeyes" Pettenò - http://dev.gentoo.org/~flameeyes/
Gentoo/ALT lead, Gentoo/FreeBSD, Video, AMD64, Sound, PAM, KDE
Re: Re: Optimizing performance [ In reply to ]
On Thu, 2005-12-15 at 07:43 -0700, Duncan wrote:
> > I was wondering if there are any sane ways to optimize the performance
> > of a Gentoo system.
> This really belongs on user, or perhaps on the appropriate purposed list,
> desktop or hardened or whatever, not on devel. That said, some
> comments... (I can't resist. <g>)
-user has the risk of many "use teh -fomglol flag, it si teh fast0r" ;-)
hardened doesn't have much to do with performance (although I'd be
interested what impact - if any - the different security features have!)

> > - don't overtweak CFLAGS. "-O2 -march=$your_cpu_family" seems to be on
> > average the best, -O3 is often slower and can cause bugs
>
> A lot of folks don't realize the effect of cache memory on optimizations.
> I'll be brief here, but particularly for things like the kernel that stay
> in memory, -Os can at times work wonders, because it means more of the
> working set stays in a cache closer to the CPU, and the additional speed
> in retrieving that code far outweighs the compromises made to
> optimizations to shrink it to size. Conversely, media streaming or
> encoding apps are constantly throwing out old data and fetching new data,
> and the optimizations are often more effective for them, so they work
> better with -O2 or even -O3.
I've not seen any substantial benefits from -Os over -O2.
Also the size difference is quite small - ~5M on a "normal" install iirc

> There have been occasional problems with -Os, generally because it isn't
> used as much and gets less testing, so earlier in a gcc cycle series.
> However, I run -Os here (amd64) by default, and haven't seen any issues
> that went away if I reverted to -O2, over the couple years I've been
> running Gentoo.
I've seen some reproducable breakage, e.g. KDE doesn't like it at all
> (Actually, that has been the case, even when I've edited
> ebuilds to remove their stripflags calls and the like. Glibc and xorg
> both stripflags including -Os. xorg seemed to benefit here from -Os after
> I removed the stripflags call, while glibc worked but seemed slower. Note
> that editing ebuilds means if it breaks, you get to keep the pieces!)
... which is exactly what I wanted to avoid. Ricing for the sake of it is boring ;-)

> For gcc, -pipe doesn't improve program optimization, but will make
> compiling faster. -fomit-frame-pointers makes smaller applications if
> you aren't debugging. Those are both common enough to be fairly safe.
agreed
> -frename-registers and -fweb may also be useful. (-fweb ceases to be so on
> gcc4, however, because it is implemented differently.) -funit-at-a-time
> (new to gcc-3.4, so don't try it with gcc-3.3) may also be worth looking
> into, altho it's already enabled by -Os. These latter flags are less
> commonly used, however, thus less well tested, and may therefore cause
> very occasional problems. (-funit-at-a-time was known to do so early in
> the 3.4 cycle, but those issues should have been long ago dealt with by
> now.) I consider those /reasonably/ conservative, and it's what I run.
> If I were running a server, however, I'd probably only run -O2 and the
> first two (-pipe and -fomit-frame-pointers).
on a server you'd not use omit-frame-pointers to keep debuggability I think.
> Do some research on -Os, in any case. It could be well worth your time.
from my (limited) experience it isn't, especially on CPUs with larger caches

> This suggestion does involve hardware, but not a real heavy cost, and the
> performance boost may be worth it.
That's usually not an option :-)

> Consider running a RAID system. I
> recently switched to RAID, a four-disk setup, raid1/mirrored for /boot,
> raid6 (for redundancy) for most of the system, raid0/striped (for speed)
> for /tmp, the portage dir, etc, stuff that was either temporary anyway, or
> could easily be redownloaded. (Swap can also be striped, set equal
> partitions on each disk and set equal priority for them in fstab.) I was
> very pleasantly surprised at how much of a difference it made!
Yes. 4-disk raid5 delivers amazing performance with minimal CPU overhead (~10% @1Ghz)
But 4 disks at 100Euro + controller (100 Eur) is more than the price of
a "new" system for most people.
> If you have
> onboard SATA and are buying new disks so can buy SATA anyway (my case),
> that should do just fine, as SATA runs a dedicated channel to each
> drive anyway. SCSI is a higher cost option, ruled out here, but SATA
> works very nicely, certainly so for me.
SCSI does deliver better performance, but at a prohibitive cost for "average" users.

> Again, a reasonable new-hardware suggestion. When purchasing a new system
> or considering an upgrade, more memory is often the most effective
> optimization you can make (with the raid suggestion above very close to
> it).
"The only thing better than a large engine is a larger engine" ;-)
Depending on workload 4G does wonders, but again - prohibitive for the
normal user.

> Slower CPU and more memory, up to a gig or so, is almost always
> better than the reverse, because hard drive access is WAYYY slower than
> even cheap/slow memory. At a gig of memory, running with swap disabled is
> actually a practical option,
but if you're investing anyway keep 1G per disk for swap just in
case ;-)
> altho it might not be faster and there are a
> certain memory zone management considerations. Usual X/KDE desktop usage
> will run perhaps a third of a gig. That means half to 2/3 gig for cache,
> which is "comfortable".
Agreed, although I wonder why we need so much memory in the first
place ...

> Naturally, if you take the RAID suggestion above,
> this one isn't quite as critical, because drive latency will be lower so
> reliance on swap isn't as painful, and a big cache not nearly as critical
> to good performance.
latency is the same, but concurrent accesses can happen, thus throughput
increases.
Still memory > * ...

> A gig to two gig can still be useful, but the
> cost/performance tradeoff isn't as good, and the money will likely be
> better spent elsewhere.
No. The only thing better than memory is more memory ;-)

> I run reiserfs here on everything. However, some don't consider it
> extremely stable. I keep second-copy partitions as backups of stuff I
> want to ensure is safe, for that reason and others (fat-finger deleting,
> anyone?).
Backups are independent of drive speed ;-)
> Bottom line, reiserfs is certainly safe "enough", if you have a
> decent backup system in place, and you follow it regularly, as you should.
> I can't see how anyone can reasonably disagree with that, filesystem
> religious zealousy or not.
In my experience it is as "safe" as ext3 and XFS, meaning it can go down, but usually just works.

> As I said, I run reiserfs for everything here, but I also have backup
> images of stuff I know I want to keep.
Always backup, what if your disk(s) die?
I've seen 6 out of 10 disks in a RAID die within a few hours ...

So while not completely related to software tweaks thanks for the
hardware upgrade info ;-)

Patrick
--
Stand still, and let the rest of the universe move
Re: Re: Optimizing performance [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Patrick Lauer wrote:
| On Thu, 2005-12-15 at 07:43 -0700, Duncan wrote:
|>This really belongs on user, or perhaps on the appropriate purposed list,
|>desktop or hardened or whatever, not on devel. That said, some
|>comments... (I can't resist. <g>)
|
| -user has the risk of many "use teh -fomglol flag, it si teh fast0r" ;-)
| hardened doesn't have much to do with performance (although I'd be
| interested what impact - if any - the different security features have!)

~From http://www.gentoo.org/main/en/lists.xml --

gentoo-performance Discussions about improving the performance of Gentoo

Although it was a bit quiet last time I was subscribed to it.

Thanks,
Donnie
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFDoZC3XVaO67S1rtsRAvIhAKDOq5aL7mSWi4sv1Qvmvn/woVLKMwCgp5xG
ap+Fg5bDbSF9ZcvGnS7ysuY=
=0vj5
-----END PGP SIGNATURE-----
--
gentoo-dev@gentoo.org mailing list
Re: Optimizing performance [ In reply to ]
On Thu, 2005-12-15 at 13:48 +0100, Patrick Lauer wrote:
>
> I was wondering if there are any sane ways to optimize the performance
> of a Gentoo system.

for package in $system_packages; do
profile_application $package
eliminate_bottlenecks $package
submit_patch_upstream $package
done

Plus, you can fix bugs while you're at it! ;)

--
gentoo-dev@gentoo.org mailing list
Re: Optimizing performance [ In reply to ]
On Thu, 2005-12-15 at 14:43 +0100, Francesco Riosa wrote:
> having more than one disk or a lot of memory add very interesting
> addition, read raid 0 (stripe) or tmpfs for working data that does'nt
> need a backup fex: $PORTIR, /var/tmp ...
tmpfs has miserable performance when larger than RAM iirc - you'd need >5G for openoffice :-)

> I've found that preemption with the new standard 250Hz of the kernel is
> suitable for mostly needs, however no server here has preemption enabled ;-)
My system still manages to run a DVD at a load of ~8, so from my point of view that is not a problem
(2Ghz Athlon ... one of the "faster" machines I'd say as many people
still use ~500Mhz)

What causes more problems are packages that become slow on update - e.g.
gtk+ 2.8 is _really_ slow (takes a few seconds to redraw apps that took
<1sec with 2.6 ... :-( )
> what is a normal workload ? Define it and creating tests should not be
> so difficult, then there are apps that can help to profiling, thinking
> at bootchart, sysproof, memproof, valgrind ... strace
I guess then you'd have to split into server / desktop / ...


> reiserfs is sustainable, at least for 99.999% of uses, last reiserfs bug
> on very high load (and with degraded raid5) is dated 4 years ago here.
> However upstream is going to the route of reiser4, much more complex,
> and much more unstable, latest problems where in 2.6.14, additionally no
> devs in gentoo are (will?) support it the patch for grub it's still not
> in place I think.
reiser4 is "new and untested", I'd keep away from it until it has shown its reliability.
Also in my (limited) testing it is relatively slow (about the same speed
as reiser3)

> > Are there any application-specific tweaks (e.g. "use the prefork MPM
> > with apache2")? What is known to break things, what has usually
> > beneficial behaviour? Are there any useful benchmarks that show the
> > performance difference between different settings?
> is'n there "ab" [1] for apache testing ?
Yes, but that's apache specific and is quite hard to use correctly.
(but very nice for slashdotting simulation ;-) )

Patrick
--
Stand still, and let the rest of the universe move
Re: Re: Optimizing performance [ In reply to ]
On Thursday 15 December 2005 16:43, Patrick Lauer wrote:
> [talking about -Os if I'm right]
> I've seen some reproducable breakage, e.g. KDE doesn't like it at all
Actually, I'm running KDE with -Os right now...

--
Diego "Flameeyes" Pettenò - http://dev.gentoo.org/~flameeyes/
Gentoo/ALT lead, Gentoo/FreeBSD, Video, AMD64, Sound, PAM, KDE
Re: Optimizing performance [ In reply to ]
On 12/15/05, Patrick Lauer <patrick@gentoo.org> wrote:
> > > Are there any application-specific tweaks (e.g. "use the prefork MPM
> > > with apache2")? [...]
> > is'n there "ab" [1] for apache testing ?
> Yes, but that's apache specific and is quite hard to use correctly.
Isn't that what you asked?

--
gentoo-dev@gentoo.org mailing list
Re: Re: Optimizing performance [ In reply to ]
Patrick Lauer wrote:
> -user has the risk of many "use teh -fomglol flag, it si teh fast0r" ;-)
> hardened doesn't have much to do with performance (although I'd be
> interested what impact - if any - the different security features have!)

fresh of typing (but worked on for few months)
http://www.pjvenda.org/linux/doc/pax-performance/
--
gentoo-dev@gentoo.org mailing list
Re: Optimizing performance [ In reply to ]
On Thu, Dec 15, 2005 at 09:13:34AM -0500, Chris Gianelloni wrote:
> CFQ is much worse for a desktop system. I tend to like deadline for
> playing games. These can probably make a bit more difference than a new
> -fomg-itsofast-and-broken-math added to CFLAGS.

That's funny, i switched from default to CFQ on my notebook which has
a rather slow disk and it feels much better, especially when
recovering from suspend to disk which swaps out a lot. It's possible
it decreases overall performance, but it may feel faster sometimes.

> There was a tip in the GWN about
> turning on dir_index on an already formatted file system. If formatting
> a new one, just use mkfs.ext2 -J -O dir_index /dev/$whatever to create
> your file system.

Good thing you remind me of that. As a new ext3 convert (i happily
used reiser3 for years before), any problems to be expected by doing
so? Afaics it turns on B-trees which should have no impact on the
safety of my data, right? Just want to make sure, i rather use a
slightly slower file system than risking data loss.

cheers,
Wernfried

--
Wernfried Haas (amne) - amne at gentoo dot org
Gentoo Forums: http://forums.gentoo.org
IRC: #gentoo-forums on freenode - email: forum-mods at gentoo dot org
--
gentoo-dev@gentoo.org mailing list
Re: Optimizing performance [ In reply to ]
On Thursday 15 December 2005 04:48, Patrick Lauer wrote:
> Hi all,
>
> I was wondering if there are any sane ways to optimize the performance
> of a Gentoo system.
> Overoptimization (the well known "-O9 -fomgomg" CFLAGS etc.) tends to
> make things unstable, which is of course not what we want. The "easy"
> way out would be buying faster hardware, but that is usually not an
> option ;-)
>
> So ... what can be done to get the stable maximum out of your hardware?
This should be obvious, but don't USE=debug globally. Last time I did that, it
made my Athlon64 3400+ with 1G of RAM feel like the 300MHz PII with 192M of
RAM I have.
--
#
# electronerd, the electronerdian from electronerdia
#
Re: Optimizing performance [ In reply to ]
Wernfried Haas wrote:
> On Thu, Dec 15, 2005 at 09:13:34AM -0500, Chris Gianelloni wrote:
>>There was a tip in the GWN about
>>turning on dir_index on an already formatted file system. If formatting
>>a new one, just use mkfs.ext2 -J -O dir_index /dev/$whatever to create
>>your file system.
>
>
> Good thing you remind me of that. As a new ext3 convert (i happily
> used reiser3 for years before), any problems to be expected by doing
> so? Afaics it turns on B-trees which should have no impact on the
> safety of my data, right? Just want to make sure, i rather use a
> slightly slower file system than risking data loss.
>
> cheers,
> Wernfried
>

I've been using it on ext3 for about a year and it has never lost any
data. It also seems to speed up emerge --sync about as much as using
reiser3 does.
--
gentoo-dev@gentoo.org mailing list
Re: Re: Optimizing performance [ In reply to ]
On Thursday 15 December 2005 16:50, Donnie Berkholz wrote:
> Patrick Lauer wrote:
> | On Thu, 2005-12-15 at 07:43 -0700, Duncan wrote:
> |>This really belongs on user, or perhaps on the appropriate purposed list,
> |>desktop or hardened or whatever, not on devel. That said, some
> |>comments... (I can't resist. <g>)
> |
> | -user has the risk of many "use teh -fomglol flag, it si teh fast0r" ;-)
> | hardened doesn't have much to do with performance (although I'd be
> | interested what impact - if any - the different security features have!)
>
> ~From http://www.gentoo.org/main/en/lists.xml --
>
> gentoo-performance Discussions about improving the performance of Gentoo
>
> Although it was a bit quiet last time I was subscribed to it.

I still am. The last message is from last august. And that was about someone
wanting to unsubscribe the wrong way. The last proper message was from July
4th.

Paul

--
Paul de Vrieze
Gentoo Developer
Mail: pauldv@gentoo.org
Homepage: http://www.devrieze.net
Re: Optimizing performance [ In reply to ]
On Thursday 15 December 2005 19:38, John Myers wrote:
> On Thursday 15 December 2005 04:48, Patrick Lauer wrote:
> > Hi all,
> >
> > I was wondering if there are any sane ways to optimize the performance
> > of a Gentoo system.
> > Overoptimization (the well known "-O9 -fomgomg" CFLAGS etc.) tends to
> > make things unstable, which is of course not what we want. The "easy"
> > way out would be buying faster hardware, but that is usually not an
> > option ;-)
> >
> > So ... what can be done to get the stable maximum out of your hardware?
>
> This should be obvious, but don't USE=debug globally. Last time I did that,
> it made my Athlon64 3400+ with 1G of RAM feel like the 300MHz PII with 192M
> of RAM I have.

Just to add. This is not so much related to debugging information in the
library files (what gdb can use). That information never makes it from disk
so is not that much of a speed issue (esp. if it is split out). It is however
related to the debug use flag enabling various kinds of debugging checks,
output and whatnot in software. Those tests are useful for debugging, but in
the case of tests are normally disabled because of the performance hit they
carry.

Paul

--
Paul de Vrieze
Gentoo Developer
Mail: pauldv@gentoo.org
Homepage: http://www.devrieze.net
Re: Re: Optimizing performance [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Paul de Vrieze wrote:
| On Thursday 15 December 2005 16:50, Donnie Berkholz wrote:
|>gentoo-performance Discussions about improving the performance of Gentoo
|>
|>Although it was a bit quiet last time I was subscribed to it.
|
|
| I still am. The last message is from last august. And that was about
someone
| wanting to unsubscribe the wrong way. The last proper message was from
July
| 4th.

That clearly means everybody thinks Gentoo's performance is great and
feels no need to discuss it. =)

Donnie
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFDrDWsXVaO67S1rtsRAt/oAKCBfnZ+RN+rit/SIy8PgMxf8fagIACePVaX
km6UtHuIbO3jqm2r4EVmcJA=
=wIaL
-----END PGP SIGNATURE-----
--
gentoo-dev@gentoo.org mailing list
Re: Re: Optimizing performance [ In reply to ]
On Fri, 2005-12-23 at 09:36 -0800, Donnie Berkholz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Paul de Vrieze wrote:
> | On Thursday 15 December 2005 16:50, Donnie Berkholz wrote:
> |>gentoo-performance Discussions about improving the performance of Gentoo
> |>
> |>Although it was a bit quiet last time I was subscribed to it.
> |
> |
> | I still am. The last message is from last august. And that was about
> someone
> | wanting to unsubscribe the wrong way. The last proper message was from
> July
> | 4th.
>
> That clearly means everybody thinks Gentoo's performance is great and
> feels no need to discuss it. =)
It is discussed on gentoo-user. although many times with
less-then-practical solutions

-Lares
--
Lares Moreau <lares.moreau@gmail.com> | LRU: 400755 http://counter.li.org
lares/irc.freenode.net |
Gentoo x86 Arch Tester | ::0 Alberta, Canada
Public Key: 0D46BB6E @ subkeys.pgp.net | Encrypted Mail Preferred
Key fingerprint = 0CA3 E40D F897 7709 3628 C5D4 7D94 483E 0D46 BB6E
Re: Optimizing performance [ In reply to ]
On Friday 23 December 2005 18:35, Paul de Vrieze wrote:
> Just to add. This is not so much related to debugging information in the
> library files (what gdb can use). That information never makes it from disk
> so is not that much of a speed issue (esp. if it is split out).
Actually, if the binaries are not stripped, they consume more memory.
With splitdebug the issue is unseen (I'm happily using it with -g3 for
everything now..)

--
Diego "Flameeyes" Pettenò - http://dev.gentoo.org/~flameeyes/
Gentoo/ALT lead, Gentoo/FreeBSD, Video, AMD64, Sound, PAM, KDE
Re: Optimizing performance [ In reply to ]
On Friday 23 December 2005 15:52, Diego 'Flameeyes' Pettenò wrote:
> On Friday 23 December 2005 18:35, Paul de Vrieze wrote:
> > Just to add. This is not so much related to debugging information in the
> > library files (what gdb can use). That information never makes it from
> > disk so is not that much of a speed issue (esp. if it is split out).
>
> Actually, if the binaries are not stripped, they consume more memory.
> With splitdebug the issue is unseen (I'm happily using it with -g3 for
> everything now..)
But, as pauldv said, you still shouldn't USE=debug if you want speed
--
#
# electronerd, the electronerdian from electronerdia
#
Re: Optimizing performance [ In reply to ]
On Sat, 24 Dec 2005 00:52:46 +0100
"Diego 'Flameeyes' Pettenò" <flameeyes@gentoo.org> wrote:

> Actually, if the binaries are not stripped, they consume more memory.

I'm still convinced this is untrue (apart from disk space). Debug
symbols are not part of the executable view. The kernel & loader map
PT_LOAD sections, which do not include the debug symbols. Indeed debug
segments don't have a load address, so the loader won't know where to
put them if it does. Compare and contrast the output of 'readelf
-l' (which shows the program headers - in particular look at the
PT_LOAD sections) and 'readelf -s' (which shows all segments).

If any one can point me to code in the kernel or loader that maps debug
symbol sections I'm sure many would be interested.

--
Kevin F. Quinn

--
gentoo-dev@gentoo.org mailing list
Re: Optimizing performance [ In reply to ]
On Saturday 24 December 2005 12:37, Kevin F. Quinn wrote:
> I'm still convinced this is untrue (apart from disk space).
IIRC was solar who said some time ago that executables are mmapped before the
sections to load are loaded.
And when I was using non-stripped binaries, I had less free memory than I have
now with splitdebug binaries... might be a coincidence, but I wouldn't bet on
that.

--
Diego "Flameeyes" Pettenò - http://dev.gentoo.org/~flameeyes/
Gentoo/ALT lead, Gentoo/FreeBSD, Video, AMD64, Sound, PAM, KDE
Re: Optimizing performance [ In reply to ]
On Saturday 24 December 2005 00:52, Diego 'Flameeyes' Pettenò wrote:
> On Friday 23 December 2005 18:35, Paul de Vrieze wrote:
> > Just to add. This is not so much related to debugging information in the
> > library files (what gdb can use). That information never makes it from
> > disk so is not that much of a speed issue (esp. if it is split out).
>
> Actually, if the binaries are not stripped, they consume more memory.
> With splitdebug the issue is unseen (I'm happily using it with -g3 for
> everything now..)

Debug info shouldn't be loaded into memory. Or is it? I agree though that
splitting them out is probably better for memory use.

Paul

--
Paul de Vrieze
Gentoo Developer
Mail: pauldv@gentoo.org
Homepage: http://www.devrieze.net
Re: Optimizing performance [ In reply to ]
Patrick Lauer posted <1134650885.4634.57.camel@localhost>, excerpted
below, on Thu, 15 Dec 2005 13:48:05 +0100:

> I was wondering if there are any sane ways to optimize the performance
> of a Gentoo system.

This really belongs on user, or perhaps on the appropriate purposed list,
desktop or hardened or whatever, not on devel. That said, some
comments... (I can't resist. <g>)

> Overoptimization (the well known "-O9 -fomgomg" CFLAGS etc.) tends to
> make things unstable, which is of course not what we want. The "easy"
> way out would be buying faster hardware, but that is usually not an
> option ;-)
>
> So ... what can be done to get the stable maximum out of your hardware?
>
> In my experience (x86 centric - do other arches have different
> "problems"?) the following is stable, but not necessarily the optimum:

The general rules are the same, but there are architectural differences
that often change the details. I /think/ it was MIPS that has extremely
slow i/o (I saw that mentioned in the split-kde-ebuilds debate, they said
it could cause compile times to double -- a big thing for something as big
as KDE). x86 (32-bit) has a relatively small number of CPU registers,
compared to most other archs (amd64 in 64-bit mode increased the number
dramatically, tho it's the same for 32-bit mode for compatibility
reasons), and this has a big effect on register use strategy.

That said, in the general case, the -march switch normally chooses pretty
good defaults for the target arch. Modifying them a whole lot from that,
other than to cover special cases, or with the general -Ox optimization
switches, is therefore often counterproductive and/or problematic.

> - don't overtweak CFLAGS. "-O2 -march=$your_cpu_family" seems to be on
> average the best, -O3 is often slower and can cause bugs

A lot of folks don't realize the effect of cache memory on optimizations.
I'll be brief here, but particularly for things like the kernel that stay
in memory, -Os can at times work wonders, because it means more of the
working set stays in a cache closer to the CPU, and the additional speed
in retrieving that code far outweighs the compromises made to
optimizations to shrink it to size. Conversely, media streaming or
encoding apps are constantly throwing out old data and fetching new data,
and the optimizations are often more effective for them, so they work
better with -O2 or even -O3.

There have been occasional problems with -Os, generally because it isn't
used as much and gets less testing, so earlier in a gcc cycle series.
However, I run -Os here (amd64) by default, and haven't seen any issues
that went away if I reverted to -O2, over the couple years I've been
running Gentoo. (Actually, that has been the case, even when I've edited
ebuilds to remove their stripflags calls and the like. Glibc and xorg
both stripflags including -Os. xorg seemed to benefit here from -Os after
I removed the stripflags call, while glibc worked but seemed slower. Note
that editing ebuilds means if it breaks, you get to keep the pieces!)

For gcc, -pipe doesn't improve program optimization, but will make
compiling faster. -fomit-frame-pointers makes smaller applications if
you aren't debugging. Those are both common enough to be fairly safe.
-frename-registers and -fweb may also be useful. (-fweb ceases to be so on
gcc4, however, because it is implemented differently.) -funit-at-a-time
(new to gcc-3.4, so don't try it with gcc-3.3) may also be worth looking
into, altho it's already enabled by -Os. These latter flags are less
commonly used, however, thus less well tested, and may therefore cause
very occasional problems. (-funit-at-a-time was known to do so early in
the 3.4 cycle, but those issues should have been long ago dealt with by
now.) I consider those /reasonably/ conservative, and it's what I run.
If I were running a server, however, I'd probably only run -O2 and the
first two (-pipe and -fomit-frame-pointers).

Do some research on -Os, in any case. It could be well worth your time.

> - check that all IDE disks use DMA mode, otherwise they are limited to
> ~16M/s with a huge CPU usage penalty. Sometimes (application-specific)
> increasing the readahead with hdparm gives a huge throughput boost.

This suggestion does involve hardware, but not a real heavy cost, and the
performance boost may be worth it. Consider running a RAID system. I
recently switched to RAID, a four-disk setup, raid1/mirrored for /boot,
raid6 (for redundancy) for most of the system, raid0/striped (for speed)
for /tmp, the portage dir, etc, stuff that was either temporary anyway, or
could easily be redownloaded. (Swap can also be striped, set equal
partitions on each disk and set equal priority for them in fstab.) I was
very pleasantly surprised at how much of a difference it made!

Cost, as I said, is reasonable, particularly if you have disks laying
around or can buy them used. Even buying say three 80-gig drives and
doing what I did only with a raid5 is reasonable, at the price of hard
drives these days. Unfortunately, if your board is still PATA, you can
only run a single disk per IDE channel or it bogs down, so you may need to
buy a PCI IDE expansion board which will add to the cost. If you have
onboard SATA and are buying new disks so can buy SATA anyway (my case),
that should do just fine, as SATA runs a dedicated channel to each
drive anyway. SCSI is a higher cost option, ruled out here, but SATA
works very nicely, certainly so for me.

> - kernel tweaks like setting swappiness or using a different I/O
> scheduler (CFQ, deadline) should help, but I'm not aware of any "real"
> benchmarks

Again, a reasonable new-hardware suggestion. When purchasing a new system
or considering an upgrade, more memory is often the most effective
optimization you can make (with the raid suggestion above very close to
it). Slower CPU and more memory, up to a gig or so, is almost always
better than the reverse, because hard drive access is WAYYY slower than
even cheap/slow memory. At a gig of memory, running with swap disabled is
actually a practical option, altho it might not be faster and there are a
certain memory zone management considerations. Usual X/KDE desktop usage
will run perhaps a third of a gig. That means half to 2/3 gig for cache,
which is "comfortable". Naturally, if you take the RAID suggestion above,
this one isn't quite as critical, because drive latency will be lower so
reliance on swap isn't as painful, and a big cache not nearly as critical
to good performance. A gig to two gig can still be useful, but the
cost/performance tradeoff isn't as good, and the money will likely be
better spent elsewhere.

Note that with a gig of memory and a striped swap, I have swappiness upped
to 100 to force the most unused app memory to swap, and I literally can't
tell when it starts swapping at all, except by watching the used swap
graph on ksysguard. None at all of the slowdowns I had previously
associated with swapping, back when I had a single drive and a half-gig of
memory.

> - using a "smarter" filesystem can dramatically improve performance at
> the potential cost of reliability. As data on FS reliability is hard to
> find from unbiased sources this becomes a religious issue ... migrating
> from ext3 to reiserfs makes "emerge sync" extremely much faster, but is
> reiserfs sustainable?

I run reiserfs here on everything. However, some don't consider it
extremely stable. I keep second-copy partitions as backups of stuff I
want to ensure is safe, for that reason and others (fat-finger deleting,
anyone?). Bottom line, reiserfs is certainly safe "enough", if you have a
decent backup system in place, and you follow it regularly, as you should.
I can't see how anyone can reasonably disagree with that, filesystem
religious zealousy or not.

In any case, note that you can simply redownload your portage tree anyway,
and with the speed and size benefits of reiserfs (size only if you don't
have notail in your config), even the ones least likely to trust the
integrity of reiserfs should see the benefit of putting your portage tree
on it. /tmp and/or /var/tmp may equally benefit, for the same reasons. An
exception might be if you regularly put huge files (700 meg CD and
multi-gig DVD images to burn, would be one example) on the partition. In
that case, jfs or xfs (don't remember which, but one's optimized for large
files) might be preferable.

As I said, I run reiserfs for everything here, but I also have backup
images of stuff I know I want to keep.

> Are there any application-specific tweaks

As I mentioned, -O3 is often best for multimedia stuff,
encoders/decoders/streamers and the like, while -O2, or often, -Os, is
better for most things.


--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html


--
gentoo-dev@gentoo.org mailing list
Re: Re: Optimizing performance [ In reply to ]
Diego 'Flameeyes' Pettenò posted
<200512151704.00376@enterprise.flameeyes.is-a-geek.org>, excerpted below,
on Thu, 15 Dec 2005 17:03:59 +0100:

> On Thursday 15 December 2005 16:43, Patrick Lauer wrote:
>> [talking about -Os if I'm right]
>> I've seen some reproducable breakage, e.g. KDE doesn't like it at all
> Actually, I'm running KDE with -Os right now...

Same here, and I've been running it with -Os over a year, IIRC, even
when it was supposedly causing issues. Of course, I'm on amd64, which
might have something to do with it, if the issues were x86(32) only.

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman in
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html


--
gentoo-dev@gentoo.org mailing list

1 2  View All