Mailing List Archive

Possible performance issue?
I notice that /usr (/dev/sda2) is at 96%. ext2 has some pretty bad
problems with fragmentation once it gets above a certain percentage.
This can cause some pretty bad performance problems. Once it has
fragmented, it is difficult to get it back to a contiguous state.

There are defrag programs, but they are fairly scary. The only other
way to get it back to normal is to back everything up, mkfs, and restore
it.

Perhaps somebody can remove a bunch of the packages that are installed
that we don't use?

--
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN
Re: Possible performance issue? [ In reply to ]
> (Nick Reinking <nick@twoevils.org>):
> I notice that /usr (/dev/sda2) is at 96%...
> Perhaps somebody can remove a bunch of the packages that
> are installed that we don't use?

The server is very clean in terms of software. The big culprits
for disk usage are MySQL's Innodb transaction data (currently a
single 10Gb file!), and logfiles from MySQL and Apache.

I don't know enough about MySQL to know how to limit that data
and what the impact might be. I do think we could lighten the
load on Apache log files considerably now, to save both disk
space and gain some performance. For instance, we logged user
agents and referrers to get some stats, but I don't think we
really need that anymore.

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Re: Possible performance issue? [ In reply to ]
On Tue, Apr 29, 2003 at 05:07:24PM -0500, Lee Daniel Crocker wrote:
> > (Nick Reinking <nick@twoevils.org>):
> > I notice that /usr (/dev/sda2) is at 96%...
> > Perhaps somebody can remove a bunch of the packages that
> > are installed that we don't use?
>
> The server is very clean in terms of software. The big culprits
> for disk usage are MySQL's Innodb transaction data (currently a
> single 10Gb file!), and logfiles from MySQL and Apache.
>
> I don't know enough about MySQL to know how to limit that data
> and what the impact might be. I do think we could lighten the
> load on Apache log files considerably now, to save both disk
> space and gain some performance. For instance, we logged user
> agents and referrers to get some stats, but I don't think we
> really need that anymore.

Yeah, that's what I was thinking. Putting the Apache logs on another
drive, or turning off the access_logs altogether would probably be quite
helpful; at the cost of statistics (since we don't have a second drive.

Of course, it doesn't really matter since Apache will be on a different
machine soon enough - but we should still get /usr utilization down,
else we're going to fragment the heck out of that partition.

--
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN
Re: Possible performance issue? [ In reply to ]
On Tue, 29 Apr 2003, Lee Daniel Crocker wrote:
> > (Nick Reinking <nick@twoevils.org>):
> > I notice that /usr (/dev/sda2) is at 96%...
> > Perhaps somebody can remove a bunch of the packages that
> > are installed that we don't use?
>
> The server is very clean in terms of software. The big culprits
> for disk usage are MySQL's Innodb transaction data (currently a
> single 10Gb file!), and logfiles from MySQL and Apache.

Innodb keeps most of its goodies in that one big file, which can expand
but cannot contract. Certain operations (like altering the table
structure) involve making a complete duplicate of the database, altering
it, then replacing the old one; so it's taking up nearly twice the space
it actually _needs_ on a regular basis. On the plus side, it gives us room
to grow. :)

There's also the www-bin.### files, which are the binary log. These track
changes made to the database, and are rotated at 1 gigabyte or when the
server is restarted. These are mainly useful for database replication,
which we don't do _yet_ but will do in the future. For now, I just
periodically delete the old ones. It can be disabled somehow, but we'll
likely want them in the future so I've not bothered.

Now, here's the space used by the actual wiki files under
/usr/local/apache:

2017344 htdocs
387056 logs
302384 htdocs-fr
183700 htdocs-sv
172480 htdocs-de
133512 htdocs-meta
129772 htdocs-eo
103748 htdocs-pl
98588 htdocs-es
97832 htdocs-ja
88128 htdocs-nl
71080 htdocs-da
31276 htdocs-zh
30896 htdocs-test
17036 htdocs-wiktionary
10868 htdocs-ko
7788 htdocs-ru
6784 htdocs-cs
4056 htdocs-bs
3256 htdocs-ms
3020 htdocs-el
2960 htdocs-tr
2788 htdocs-sh
2788 htdocs-ml
2772 htdocs-sr
2772 htdocs-hr
2740 htdocs-sep11

These include the php files, uploaded images, backup tarballs, webalizer
stuff, and TeX-generated images. I've deleted saved log files from prior
to one week ago (and those that are retained are gzipped).

Further breakdown on the English wiki:
1306996 tarballs
440728 upload
120176 stats
92972 tmp
28472 math
16796 images
4444 w
... some other small smidgens of files...

> I do think we could lighten the load on Apache log files considerably
> now, to save both disk space and gain some performance. For instance,
> we logged user agents and referrers to get some stats, but I don't
> think we really need that anymore.

Oh, I think it's quite useful to get that information, otherwise I
wouldn't know about *($%@^&*$%@# Grub.

Anyway, I cleaned out a few things and moved some of the older tarballs
over to the archives in the home partition, and we're down to 85% usage on
/usr.

-- brion vibber (brion @ pobox.com)
Re: Possible performance issue? [ In reply to ]
Brion Vibber <vibber@aludra.usc.edu> wrote in
news:Pine.GSO.4.33.0304291740490.16252-100000@aludra.usc.edu:

> Oh, I think it's quite useful to get that information, otherwise I
> wouldn't know about *($%@^&*$%@# Grub.
>
> Anyway, I cleaned out a few things and moved some of the older
> tarballs over to the archives in the home partition, and we're down to
> 85% usage on /usr.
>
> -- brion vibber (brion @ pobox.com)

I like also stats. If possibel, i would like more stats. Now there are only
Top 30 of 10972 Total Referrers. I would like to have the top 250. Same
whit Top 30 of 12904 Total URLs. I like the top 500.

Now there is no registration of the Top of Total Countries.

--
Contact: giskart AT wikipedia.be
Ook een artikeltje schrijven? WikipediaNL, de vrije GNU/FDL encyclopedie
http://www.wikipedia.be
Re: Re: Possible performance issue? [ In reply to ]
On Wed, 30 Apr 2003, Giskart wrote:
> I like also stats. If possibel, i would like more stats. Now there are only
> Top 30 of 10972 Total Referrers. I would like to have the top 250. Same
> whit Top 30 of 12904 Total URLs. I like the top 500.

Well, at the least we should merge the fifty variants of google. :)

> Now there is no registration of the Top of Total Countries.

Yeah, I had to disable that because it took ALL DAY to do the reverse DNS
lookups. Hypothetically we could have Apache do lookups as connections
happen and store the hostnames in the log, but that would lead to a
performance hit during busy times...

-- brion vibber (brion @ pobox.com)
Re: Re: Possible performance issue? [ In reply to ]
On Wed, Apr 30, 2003 at 10:01:33PM +0000, Giskart wrote:
> Brion Vibber <vibber@aludra.usc.edu> wrote in
> news:Pine.GSO.4.33.0304291740490.16252-100000@aludra.usc.edu:
>
> > Oh, I think it's quite useful to get that information, otherwise I
> > wouldn't know about *($%@^&*$%@# Grub.
> >
> > Anyway, I cleaned out a few things and moved some of the older
> > tarballs over to the archives in the home partition, and we're down to
> > 85% usage on /usr.
> >
> > -- brion vibber (brion @ pobox.com)
>
> I like also stats. If possibel, i would like more stats. Now there are only
> Top 30 of 10972 Total Referrers. I would like to have the top 250. Same
> whit Top 30 of 12904 Total URLs. I like the top 500.
>
> Now there is no registration of the Top of Total Countries.

I imagine it will matter much less with the new server, but it would
still be a good idea to add a second drive that is only used for logs
and database dumps. That would help quite a bit with the high I/O
levels on the primary disk.

--
Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN