Mailing List Archive

[PATCH] Re: MM deadlock [was: Re: arca-vm-8...]
Hi,
On Wed, 13 Jan 1999 14:45:09 +0100 (CET), Andrea Arcangeli
<andrea@e-mind.com> said:
> On Tue, 12 Jan 1999, Rik van Riel wrote:
>> IIRC this facility was in the original swapin readahead
>> implementation. That only leaves the question who removed
>> it and why :))
> There's another thing I completly disagree and that I just removed here.
> It's the alignment of the offset field. I see no one point in going back
> instead of only doing real read_ahead_.
> Maybe I am missing something?
Yes, very much so.
When paging in binaries, you often have locality of reference in both
directions --- a set of functions compiled from a single source file
will occupy adjacent pages in VM, but you are as likely to call a
function at the end of the region first as one at the beginning. It
is very common to get backwards locality as a result.
The big advantage of doing aligned clusters for readin is twofold:
first, it means that you get as much of a readahead advantage for
these backwards access patterns as for forward accesses. Secondly, it
means that you are reading in complete tiles which are guaranteed to
have no gaps between them, so any two accesses in adjacent tiles are
sufficient to read in the complete set of nearby pages without missing
any gaps between them: it avoids having to do yet another IO to fill
in the few pages missed by a strictly forward-looking readahead
function.
> + /* don't block on I/O for doing readahead -arca */
> + atomic_read(&nr_async_pages) > pager_daemon.max_async_pages)
> return;
I think this is the wrong solution: far better to do the patch below,
which simply exempts reads from nr_async_pages altogether. I
originally added nr_async_pages to serve two functions: to allow
kswapd to determine how much memory it was already in the process of
freeing, and to act as a throttle on the number of write IOs submitted
when swapping.
We don't need a similar throttling action for reads, because every
place where we do VM readahead, each readahead IO cluster is followed
by a synchronous read on one page. We don't throttle the async
readaheads on normal file IO, for example.
--Stephen
----------------------------------------------------------------
--- mm/page_io.c~ Mon Dec 28 21:56:29 1998
+++ mm/page_io.c Tue Jan 12 16:45:55 1999
@@ -58,7 +58,8 @@
}

/* Don't allow too many pending pages in flight.. */
- if (atomic_read(&nr_async_pages) > pager_daemon.swap_cluster)
+ if (rw == WRITE &&
+ atomic_read(&nr_async_pages) > pager_daemon.swap_cluster)
wait = 1;

p = &swap_info[type];
@@ -170,7 +171,7 @@
atomic_dec(&page->count);
return;
}
- if (!wait) {
+ if (rw == WRITE && !wait) {
set_bit(PG_decr_after, &page->flags);
atomic_inc(&nr_async_pages);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: MM deadlock [was: Re: arca-vm-8...] [ In reply to ]
On Wed, 13 Jan 1999, Stephen C. Tweedie wrote:
> I think this is the wrong solution: far better to do the patch below,
> which simply exempts reads from nr_async_pages altogether. I
> originally added nr_async_pages to serve two functions: to allow
> kswapd to determine how much memory it was already in the process of
> freeing, and to act as a throttle on the number of write IOs submitted
> when swapping.
>
> We don't need a similar throttling action for reads, because every
> place where we do VM readahead, each readahead IO cluster is followed
> by a synchronous read on one page. We don't throttle the async
> readaheads on normal file IO, for example.
Note that we don't need nr_async_pages at all. Here when the limit of
nr_async_pages is low it's only a bottleneck for swapout performances. I
have not removed it (because it could be useful to decrease swapout I/O if
somebody needs this strange feature), but I have added a
page_daemon.max_async_pages and set it to something like 256. Now I check
nr_async_pages against the new max_async_pages.
I _guess_ (not checked) that the _only_ reason Steve seen arca-vm-16 so
high improved changing SWAP_CLUSTER_MAX to 512 instead of 32 is the
removal of the nr_async_pages bottleneck.
Andrea Arcangeli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: MM deadlock [was: Re: arca-vm-8...] [ In reply to ]
Hi,
On Wed, 13 Jan 1999 19:52:03 +0100 (CET), Andrea Arcangeli
<andrea@e-mind.com> said:
> Note that we don't need nr_async_pages at all. Here when the limit of
> nr_async_pages is low it's only a bottleneck for swapout performances. I
> have not removed it (because it could be useful to decrease swapout I/O if
> somebody needs this strange feature), but I have added a
> page_daemon.max_async_pages and set it to something like 256. Now I check
> nr_async_pages against the new max_async_pages.
The problem is that if you do this, it is easy for the swapper to
generate huge amounts of async IO without actually freeing any real
memory: there's a question of balancing the amount of free memory we
have available right now with the amount which we are in the process of
freeing. Setting the nr_async_pages bound to 256 just makes the swapper
keen to send a whole 1MB of memory out to disk at a time, which is a bit
steep on an 8MB box.
--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: MM deadlock [was: Re: arca-vm-8...] [ In reply to ]
On Wed, 13 Jan 1999, Stephen C. Tweedie wrote:
>
> The problem is that if you do this, it is easy for the swapper to
> generate huge amounts of async IO without actually freeing any real
> memory: there's a question of balancing the amount of free memory we
> have available right now with the amount which we are in the process of
> freeing. Setting the nr_async_pages bound to 256 just makes the swapper
> keen to send a whole 1MB of memory out to disk at a time, which is a bit
> steep on an 8MB box.
Note that this should be much less of a problem with the current swapout
strategies, but yes, basically we definitely do want to have _some_ way of
maintaining a sane "maximum number of pages in flight" thing.
The right solution may be to do the check in some other place, rather than
fairly deep inside the swap logic.
It's not a big deal, I suspect.
Anyway, there's a real pre7 out there now, and it doesn't change a lot of
th issues discussed here. I wanted to get something stable and working. I
still need to get the recursive semaphore thing (or other approach) done,
but basically I think we're at 2.2.0 already apart from that issue, and
that we can continue this discussion as a "occasional tweaks" thing.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/