Mailing List Archive

oom-killer keeps killing big un-tars
Greetings,

Several times now, I've been untarring some files in my dom0 and they've
been killed by the oom-killer. I originally thought it might be because
they were not in the foreground as I normally use screen. I just tried
again without screen and it was killed again.

I compiled Xen from a nightly snapshot that I downloaded a couple of
weeks ago.

xen root # ls -al xen-2.0-testing-src.tgz
-rw-r--r-- 1 root root 2432727 Apr 29 08:29 xen-2.0-testing-src.tgz

The system is a dual processor P2 300 Dell Workstation. It has only
128MB RAM total. I've allocated 42MB RAM to dom0. dom0 also mounts a
256MB swap partition.

Each time I've reviewed the /var/log/messages, there had been plenty of
free swap space when the process was killed.

I'm attaching sections from /var/log/messages where the process is
killed. Please let me know what else would be useful.

What should my next step be?

--
Andrew Thompson
http://aktzero.com/
Re: oom-killer keeps killing big un-tars [ In reply to ]
Are you running any guest domains? What are you using for the other domains
filesystems? Things like LVM snapshots and (especially) loopback devices
over NFS can use lots of (non swappable) kernel memory in dom0 and cause OOM
conditions quite easily.

Cheers,
Mark

On Wednesday 18 May 2005 16:43, Andrew Thompson wrote:
> Greetings,
>
> Several times now, I've been untarring some files in my dom0 and they've
> been killed by the oom-killer. I originally thought it might be because
> they were not in the foreground as I normally use screen. I just tried
> again without screen and it was killed again.
>
> I compiled Xen from a nightly snapshot that I downloaded a couple of
> weeks ago.
>
> xen root # ls -al xen-2.0-testing-src.tgz
> -rw-r--r-- 1 root root 2432727 Apr 29 08:29 xen-2.0-testing-src.tgz
>
> The system is a dual processor P2 300 Dell Workstation. It has only
> 128MB RAM total. I've allocated 42MB RAM to dom0. dom0 also mounts a
> 256MB swap partition.
>
> Each time I've reviewed the /var/log/messages, there had been plenty of
> free swap space when the process was killed.
>
> I'm attaching sections from /var/log/messages where the process is
> killed. Please let me know what else would be useful.
>
> What should my next step be?

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Re: oom-killer keeps killing big un-tars [ In reply to ]
Mark Williamson wrote:
> Are you running any guest domains? What are you using for the other domains
> filesystems? Things like LVM snapshots and (especially) loopback devices
> over NFS can use lots of (non swappable) kernel memory in dom0 and cause OOM
> conditions quite easily.

I currently have one guest domain that is happily running in another
30-40MB of RAM. (Repartitioned my drive, so I'm trying to untar gentoo
stage3 and portage.)

All filesystems are currently living on standard extended partitions of
a SCSI harddrive.

I do not have NFS(or hardly any other apps) running in dom0.

Now it appears that sshd has locked up or was killed so I can no longer
connect to the instance from here. I'll have to sit down at the machine
later tonight to do any more debugging. (I had Xen kill something
important in dom0 one time that broke my whole access to the console and
ssh of dom0.)

--
Andrew Thompson
http://aktzero.com/

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Re: oom-killer keeps killing big un-tars [ In reply to ]
On Wed, May 18, 2005 at 11:43:36AM -0400, Andrew Thompson wrote:
>Several times now, I've been untarring some files in my dom0 and they've
>been killed by the oom-killer. I originally thought it might be because

Hm, can you reproduce this with kernel 2.6.7?

Without using xen I hit the same bug. The oom-killer was changed in
2.6.8 IIRC. On a "normal" pc with 512 MB RAM (and lots of swap) I
could hit the oom-killer creating a big tar archive (> 60GB
uncompressed) or even running samba3 and apache2 (for subversion).
Free and top didn't show any signs for running out of memory.
So 2.6.8 and 2.6.9 were quite unusable for me. In the newer kernels
the oom-killer seems to be not so aggressive anymore (at least on my
system). Google shows, that you can tweak your vm settings with
sysctl.

Back to xen:
When I wanted to test xen I tried to build a new SuSE domain. Dom0 was
running Debian as well as another domain. Both didn't have much to do.
Since SuSE seems to lack a brilliant tool like debootstrap, I copied
an existing SuSE installation to an image file, transfered the image to
Dom0, mounted it via loopback and tried to copy the files to a new
partition (reiserfs). With not much success, because the oom-killer
was getting reproducable in my way (killing sometimes the copy
process, sometimes the sshd). When I doubled the memory for Dom0 (I
think from 64 to 128 MB), the oom-killer stopped.
So it seems, for new kernels, you need a lot of RAM for some
operations, even if top or free reporting enough free memory.
Maybe it has to do with mount options for reiserfs, which was the
target fs for all oom-killer actions.

You seem to have reiserfs, too. Could you mount it with data=writeback
and try again? This is the fastest (and in some way unsafest) option.
Your write performance will increase very much. I'm normally using
data=journal, the safest and slowest option.
Maybe some other guys can tell, if journalling code is unswappable and
can lead to oom-killer.

Shade and sweet water!

Stephan

--
| Stephan Seitz E-Mail: Nur-Ab-Sal@gmx.de |
| WWW: http://fsing.rootsland.net/~stse/ |
| PGP Public Keys: http://fsing.rootsland.net/~stse/pgp.html |
Re: Re: oom-killer keeps killing big un-tars [ In reply to ]
Stephan Seitz wrote:
> On Wed, May 18, 2005 at 11:43:36AM -0400, Andrew Thompson wrote:
>
>> Several times now, I've been untarring some files in my dom0 and
>> they've been killed by the oom-killer. I originally thought it might
>> be because
>
>
> Hm, can you reproduce this with kernel 2.6.7?
>
> Without using xen I hit the same bug. The oom-killer was changed in
> 2.6.8 IIRC. On a "normal" pc with 512 MB RAM (and lots of swap) I
> could hit the oom-killer creating a big tar archive (> 60GB
> uncompressed) or even running samba3 and apache2 (for subversion).
> Free and top didn't show any signs for running out of memory.
> So 2.6.8 and 2.6.9 were quite unusable for me. In the newer kernels
> the oom-killer seems to be not so aggressive anymore (at least on my
> system). Google shows, that you can tweak your vm settings with
> sysctl.
>
> Back to xen:
> When I wanted to test xen I tried to build a new SuSE domain. Dom0 was
> running Debian as well as another domain. Both didn't have much to do.
> Since SuSE seems to lack a brilliant tool like debootstrap, I copied
> an existing SuSE installation to an image file, transfered the image to
> Dom0, mounted it via loopback and tried to copy the files to a new
> partition (reiserfs). With not much success, because the oom-killer
> was getting reproducable in my way (killing sometimes the copy
> process, sometimes the sshd). When I doubled the memory for Dom0 (I
> think from 64 to 128 MB), the oom-killer stopped.
> So it seems, for new kernels, you need a lot of RAM for some
> operations, even if top or free reporting enough free memory.
> Maybe it has to do with mount options for reiserfs, which was the
> target fs for all oom-killer actions.
>
> You seem to have reiserfs, too. Could you mount it with data=writeback
> and try again? This is the fastest (and in some way unsafest) option.
> Your write performance will increase very much. I'm normally using
> data=journal, the safest and slowest option.
> Maybe some other guys can tell, if journalling code is unswappable and
> can lead to oom-killer.
>
> Shade and sweet water!
>
> Stephan
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users


--
Andrew Thompson
http://aktzero.com/

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Re: Re: oom-killer keeps killing big un-tars [ In reply to ]
Please excuse the other email, brain fart...

Stephan Seitz wrote:
<snip> (I'll respond to the snipped sections after testing.)

> Maybe it has to do with mount options for reiserfs, which was the
> target fs for all oom-killer actions.
>
> You seem to have reiserfs, too. Could you mount it with data=writeback
> and try again? This is the fastest (and in some way unsafest) option.
> Your write performance will increase very much. I'm normally using
> data=journal, the safest and slowest option.
> Maybe some other guys can tell, if journalling code is unswappable and
> can lead to oom-killer.


When you say "unsafest", that's only for the partition that's being
mounted in that fashion, correct?

So, if I only mount this drive I'm trying to untar this file on with
data=writeback, I won't be unwittingly putting my other partitions at
risk will I? (I'm hoping that this option can be set for a specific
partition and not for the entire OS.)

--
Andrew Thompson
http://aktzero.com/

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Re: Re: oom-killer keeps killing big un-tars [ In reply to ]
Stephan Seitz wrote:
> On Wed, May 18, 2005 at 11:43:36AM -0400, Andrew Thompson wrote:
>
>> Several times now, I've been untarring some files in my dom0 and
>> they've been killed by the oom-killer. I originally thought it might
>> be because
>
> Hm, can you reproduce this with kernel 2.6.7?

I'm sort of confused. I'm currently running 2.6.11. Are you asking me to
go back to 2.6.7 and try it?

May 18 00:38:49 xen Linux version 2.6.11-xen0 (root@xen) (gcc version
3.3.5-20050130 (Gentoo Linux 3.3.5.20050130-r1, ssp-3.3.5.20050130-1,
pie-8.7.7.1)) #1 Fri Apr 29 11:15:24 EDT 2005

> Without using xen I hit the same bug. The oom-killer was changed in
> 2.6.8 IIRC. On a "normal" pc with 512 MB RAM (and lots of swap) I
> could hit the oom-killer creating a big tar archive (> 60GB
> uncompressed) or even running samba3 and apache2 (for subversion).
> Free and top didn't show any signs for running out of memory.
> So 2.6.8 and 2.6.9 were quite unusable for me. In the newer kernels
> the oom-killer seems to be not so aggressive anymore (at least on my
> system). Google shows, that you can tweak your vm settings with
> sysctl.

I believe I can still boot the original Gentoo (non-xen) kernel. I will
try and reproduce it there.

--
Andrew Thompson
http://aktzero.com/

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users
Re: oom-killer keeps killing big un-tars [ In reply to ]
On Wed, May 18, 2005 at 04:18:00PM -0400, Andrew Thompson wrote:
>When you say "unsafest", that's only for the partition that's being
>mounted in that fashion, correct?

Yes. As a mount option it only affects the partition that uses it.
And unsafe is not so bad, too. data=writeback only garanties correct
metadata, not correct data. This is the normal option in 2.4 for
reiserfs I believe without external pachtes. I don't in which 2.6er
kernel data=ordered and data=journal were introduced and in which
kernel data=ordered became the default.

>So, if I only mount this drive I'm trying to untar this file on with
>data=writeback, I won't be unwittingly putting my other partitions at
>risk will I? (I'm hoping that this option can be set for a specific

Yes, you can put it in fstab in the option section (data=writeback) or
at the command line mount device mountpoint -o data=writeback.

Shade and sweet water!

Stephan

--
| Stephan Seitz E-Mail: Nur-Ab-Sal@gmx.de |
| WWW: http://fsing.rootsland.net/~stse/ |
| PGP Public Keys: http://fsing.rootsland.net/~stse/pgp.html |
Re: oom-killer keeps killing big un-tars [ In reply to ]
On Wed, May 18, 2005 at 04:31:31PM -0400, Andrew Thompson wrote:
>>Hm, can you reproduce this with kernel 2.6.7?
>I'm sort of confused. I'm currently running 2.6.11. Are you asking me to
>go back to 2.6.7 and try it?

Yes, exactly. Since the changes were introduced in 2.6.8, you will get
the "old" behaviour of the oom-killer in 2.6.7.

Shade and sweet water!

Stephan

--
| Stephan Seitz E-Mail: Nur-Ab-Sal@gmx.de |
| WWW: http://fsing.rootsland.net/~stse/ |
| PGP Public Keys: http://fsing.rootsland.net/~stse/pgp.html |