Mailing List Archive

[PATCH 00/03] Unmapped: Separate unmapped and mapped pages
Unmapped patches - Use two LRU:s per zone.

These patches break out the per-zone LRU into two separate LRU:s - one for
mapped pages and one for unmapped pages. The patches also introduce guarantee
support, which allows the user to set how many percent of all pages per node
that should be kept in memory for mapped or unmapped pages. This guarantee
makes it possible to adjust the VM behaviour depending on the workload.

Reasons behind the LRU separation:

- Avoid unnecessary page scanning.
The current VM implementation rotates mapped pages on the active list
until the number of mapped pages are high enough to start unmap and page out.
By using two LRU:s we can avoid this scanning and shrink/rotate unmapped
pages only, not touching mapped pages until the threshold is reached.

- Make it possible to adjust the VM behaviour.
In some cases the user might want to guarantee that a certain amount of
pages should be kept in memory, overriding the standard behaviour. Separating
pages into mapped and unmapped LRU:s allows guarantee with low overhead.

I've performed many tests on a Dual PIII machine while varying the amount of
RAM available. Kernel compiles on a 64MB configuration gets a small speedup,
but the impact on other configurations and workloads seems to be unaffected.

Apply on top of 2.6.16-rc5.

Comments?

/ magnus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
Magnus Damm wrote:
> Unmapped patches - Use two LRU:s per zone.
>
> These patches break out the per-zone LRU into two separate LRU:s - one for
> mapped pages and one for unmapped pages. The patches also introduce guarantee
> support, which allows the user to set how many percent of all pages per node
> that should be kept in memory for mapped or unmapped pages. This guarantee
> makes it possible to adjust the VM behaviour depending on the workload.
>
> Reasons behind the LRU separation:
>
> - Avoid unnecessary page scanning.
> The current VM implementation rotates mapped pages on the active list
> until the number of mapped pages are high enough to start unmap and page out.
> By using two LRU:s we can avoid this scanning and shrink/rotate unmapped
> pages only, not touching mapped pages until the threshold is reached.
>
> - Make it possible to adjust the VM behaviour.
> In some cases the user might want to guarantee that a certain amount of
> pages should be kept in memory, overriding the standard behaviour. Separating
> pages into mapped and unmapped LRU:s allows guarantee with low overhead.
>
> I've performed many tests on a Dual PIII machine while varying the amount of
> RAM available. Kernel compiles on a 64MB configuration gets a small speedup,
> but the impact on other configurations and workloads seems to be unaffected.
>
> Apply on top of 2.6.16-rc5.
>
> Comments?
>

I did something similar a while back which I called split active lists.
I think it is a good idea in general and I did see fairly large speedups
with heavy swapping kbuilds, but nobody else seemed to want it :P

So you split the inactive list as well - that's going to be a bit of
change in behaviour and I'm not sure whether you gain anything.

I don't think PageMapped is a very good name for the flag.

I test mapped lazily. Much better way to go IMO.

I had further patches that got rid of reclaim_mapped completely while
I was there. It is based on crazy metrics that basically completely
change meaning if there are changes in the memory configuration of
the system, or small changes in reclaim algorithms.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
On 3/10/06, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Magnus Damm wrote:
> > Unmapped patches - Use two LRU:s per zone.
> >
> > These patches break out the per-zone LRU into two separate LRU:s - one for
> > mapped pages and one for unmapped pages. The patches also introduce guarantee
> > support, which allows the user to set how many percent of all pages per node
> > that should be kept in memory for mapped or unmapped pages. This guarantee
> > makes it possible to adjust the VM behaviour depending on the workload.
> >
> > Reasons behind the LRU separation:
> >
> > - Avoid unnecessary page scanning.
> > The current VM implementation rotates mapped pages on the active list
> > until the number of mapped pages are high enough to start unmap and page out.
> > By using two LRU:s we can avoid this scanning and shrink/rotate unmapped
> > pages only, not touching mapped pages until the threshold is reached.
> >
> > - Make it possible to adjust the VM behaviour.
> > In some cases the user might want to guarantee that a certain amount of
> > pages should be kept in memory, overriding the standard behaviour. Separating
> > pages into mapped and unmapped LRU:s allows guarantee with low overhead.
> >
> > I've performed many tests on a Dual PIII machine while varying the amount of
> > RAM available. Kernel compiles on a 64MB configuration gets a small speedup,
> > but the impact on other configurations and workloads seems to be unaffected.
> >
> > Apply on top of 2.6.16-rc5.
> >
> > Comments?
> >
>
> I did something similar a while back which I called split active lists.
> I think it is a good idea in general and I did see fairly large speedups
> with heavy swapping kbuilds, but nobody else seemed to want it :P

I want it if it helps you! =)

I don't see why both mapped and unmapped pages should be kept on the
same list at all actually, especially with the reclaim_mapped
threshold used today. The current solution is to scan through lots of
mapped pages on the active list if the threshold is not reached. I
think avoiding this scanning can improve performance.

The single LRU solution today keeps mapped pages on the active list,
but always moves unmapped pages from the active list to the inactive
list. I would say that that solution is pretty different from having
two individual LRU:s with two lists each.

> So you split the inactive list as well - that's going to be a bit of
> change in behaviour and I'm not sure whether you gain anything.

Well, other parts of the VM still use lru_cache_add_active for some
mapped pages, so anonymous pages will mostly be in the active list on
the mapped LRU. My plan with using two full LRU:s is to provide two
separate LRU instances that individually will act as two-list LRU:s.
So active mapped pages should actually end up on the active list,
while seldom used mapped pages should be on the inactive list.

Also, I think it makes sense to separate mapped from unmapped because
mapped pages needs to clear the young-bits in the pte to track usage,
but unmapped activity happens through mark_page_accessed(). So mapped
pages needs to be scanned, but unmapped pages could say be moved to
the head of a list to avoid scanning. I'm not sure that is a win
though.

> I don't think PageMapped is a very good name for the flag.

Yeah, it's a bit confusing to both have PageMapped() and page_mapped().

> I test mapped lazily. Much better way to go IMO.

I will have a look at your patch to see how you handle things.

> I had further patches that got rid of reclaim_mapped completely while
> I was there. It is based on crazy metrics that basically completely
> change meaning if there are changes in the memory configuration of
> the system, or small changes in reclaim algorithms.

It is not very NUMA aware either, right?

I think there are many interesting things that are possible to improve
in the vmscan code, but I'm trying to change as little as possible for
now.

Thanks for the comments,

/ magnus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
> Apply on top of 2.6.16-rc5.
>
> Comments?


my big worry with a split LRU is: how do you keep fairness and balance
between those LRUs? This is one of the things that made the 2.4 VM suck
really badly, so I really wouldn't want this bad...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
On Fri, 2006-03-10 at 12:44 +0900, Magnus Damm wrote:
> Unmapped patches - Use two LRU:s per zone.
>
> These patches break out the per-zone LRU into two separate LRU:s - one for
> mapped pages and one for unmapped pages. The patches also introduce guarantee
> support, which allows the user to set how many percent of all pages per node
> that should be kept in memory for mapped or unmapped pages. This guarantee
> makes it possible to adjust the VM behaviour depending on the workload.
>
> Reasons behind the LRU separation:
>
> - Avoid unnecessary page scanning.
> The current VM implementation rotates mapped pages on the active list
> until the number of mapped pages are high enough to start unmap and page out.
> By using two LRU:s we can avoid this scanning and shrink/rotate unmapped
> pages only, not touching mapped pages until the threshold is reached.
>
> - Make it possible to adjust the VM behaviour.
> In some cases the user might want to guarantee that a certain amount of
> pages should be kept in memory, overriding the standard behaviour. Separating
> pages into mapped and unmapped LRU:s allows guarantee with low overhead.
>
> I've performed many tests on a Dual PIII machine while varying the amount of
> RAM available. Kernel compiles on a 64MB configuration gets a small speedup,
> but the impact on other configurations and workloads seems to be unaffected.
>
> Apply on top of 2.6.16-rc5.
>
> Comments?

I'm not convinced of special casing mapped pages, nor of tunable knobs.
I've been working on implementing some page replacement algorithms that
have neither.

Breaking the LRU in two like this breaks the page ordering, which makes
it possible for pages to stay resident even though they have much less
activity than pages that do get reclaimed.

I have a serious regression somewhere, but will post as soon as we've
managed to track it down.

If you're interrested, the work can be found here:
http://programming.kicks-ass.net/kernel-patches/page-replace/


--
Peter Zijlstra <a.p.zijlstra@chello.nl>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
On 3/10/06, Arjan van de Ven <arjan@infradead.org> wrote:
> > Apply on top of 2.6.16-rc5.
> >
> > Comments?
>
>
> my big worry with a split LRU is: how do you keep fairness and balance
> between those LRUs? This is one of the things that made the 2.4 VM suck
> really badly, so I really wouldn't want this bad...

Yeah, I agree this is important. I think linux-2.4 tried to keep the
LRU list lengths in a certain way (maybe 2/3 of all pages active, 1/3
inactive). In 2.6 there is no such thing, instead the number of pages
scanned is related to the current scanning priority.

My current code just extends this idea which basically means that
there is currently no relation between how many pages that sit in each
LRU. The LRU with the largest amount of pages will be shrunk/rotated
first. And on top of that is the guarantee logic and the
reclaim_mapped threshold, ie the unmapped LRU will be shrunk first by
default.

The current balancing code plays around with nr_scan_active and
nr_scan_inactive, but I'm not entirely sure why that logic is there.
If anyone can explain the reason behind that code I'd be happy to hear
it.

Thanks,

/ magnus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
On 3/10/06, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Fri, 2006-03-10 at 12:44 +0900, Magnus Damm wrote:
> > Unmapped patches - Use two LRU:s per zone.
> >
> > These patches break out the per-zone LRU into two separate LRU:s - one for
> > mapped pages and one for unmapped pages. The patches also introduce guarantee
> > support, which allows the user to set how many percent of all pages per node
> > that should be kept in memory for mapped or unmapped pages. This guarantee
> > makes it possible to adjust the VM behaviour depending on the workload.
> >
> > Reasons behind the LRU separation:
> >
> > - Avoid unnecessary page scanning.
> > The current VM implementation rotates mapped pages on the active list
> > until the number of mapped pages are high enough to start unmap and page out.
> > By using two LRU:s we can avoid this scanning and shrink/rotate unmapped
> > pages only, not touching mapped pages until the threshold is reached.
> >
> > - Make it possible to adjust the VM behaviour.
> > In some cases the user might want to guarantee that a certain amount of
> > pages should be kept in memory, overriding the standard behaviour. Separating
> > pages into mapped and unmapped LRU:s allows guarantee with low overhead.
> >
> > I've performed many tests on a Dual PIII machine while varying the amount of
> > RAM available. Kernel compiles on a 64MB configuration gets a small speedup,
> > but the impact on other configurations and workloads seems to be unaffected.
> >
> > Apply on top of 2.6.16-rc5.
> >
> > Comments?
>
> I'm not convinced of special casing mapped pages, nor of tunable knobs.

I think it makes sense to treat mapped pages separately because only
mapped pages require clearing of young-bits in pte:s. The logic for
unmapped pages could be driven entirely from mark_page_access(), no
scanning required. At least in my head that is. =)

Also, what might be an optimal page replacement policy for for
unmapped pages might be suboptimal for mapped pages.

> I've been working on implementing some page replacement algorithms that
> have neither.

Yeah, I know that. =) I think your ClockPRO work looks very promising.
I would really like to see some better page replacement policy than
LRU merged.

> Breaking the LRU in two like this breaks the page ordering, which makes
> it possible for pages to stay resident even though they have much less
> activity than pages that do get reclaimed.

Yes, true. But this happens already with a per-zone LRU. LRU pages
that happen to end up in the DMA zone will probably stay there a
longer time than pages in the normal zone. That does not mean it is
right to break the page ordering though, I'm just saying it happens
already and the oldest piece of data in the global system will not be
reclaimed first - instead there are priorities such as unmapped pages
will be reclaimed over mapped and so on. (I strongly feel that there
should be per-node LRU:s, but that's another story)

> I have a serious regression somewhere, but will post as soon as we've
> managed to track it down.
>
> If you're interrested, the work can be found here:
> http://programming.kicks-ass.net/kernel-patches/page-replace/

I'm definitely interested, but I also believe that the page reclaim
code is hairy as hell, and that complicated changes to the "stable"
2.6-tree are hard to merge. So I see my work as a first step (or just
something that starts a discussion if no one is interested), and in
the end a page replacement policy implementation such as yours will be
accepted.

Thanks!

/ magnus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
On Fri, 2006-03-10 at 14:19 +0100, Magnus Damm wrote:
> On 3/10/06, Arjan van de Ven <arjan@infradead.org> wrote:
> > > Apply on top of 2.6.16-rc5.
> > >
> > > Comments?
> >
> >
> > my big worry with a split LRU is: how do you keep fairness and balance
> > between those LRUs? This is one of the things that made the 2.4 VM suck
> > really badly, so I really wouldn't want this bad...
>
> Yeah, I agree this is important. I think linux-2.4 tried to keep the
> LRU list lengths in a certain way (maybe 2/3 of all pages active, 1/3
> inactive).

not really

> My current code just extends this idea which basically means that
> there is currently no relation between how many pages that sit in each
> LRU. The LRU with the largest amount of pages will be shrunk/rotated
> first. And on top of that is the guarantee logic and the
> reclaim_mapped threshold, ie the unmapped LRU will be shrunk first by
> default.

that sounds wrong, you lose history this way. There is NO reason to
shrink only the unmapped LRU and not the mapped one. At minimum you
always need to pressure both. How you pressure (absolute versus
percentage) is an interesting question, but to me there is no doubt that
you always need to pressure both, and "equally" to some measure of equal


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
On Fri, 10 Mar 2006, Magnus Damm wrote:

> Unmapped patches - Use two LRU:s per zone.

Note that if this is done then the default case of zone_reclaim becomes
trivial to deal with and we can get rid of the zone_reclaim_interval.

However, I have not looked at the rest yet.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
On 3/10/06, Arjan van de Ven <arjan@infradead.org> wrote:
> On Fri, 2006-03-10 at 14:19 +0100, Magnus Damm wrote:
> > My current code just extends this idea which basically means that
> > there is currently no relation between how many pages that sit in each
> > LRU. The LRU with the largest amount of pages will be shrunk/rotated
> > first. And on top of that is the guarantee logic and the
> > reclaim_mapped threshold, ie the unmapped LRU will be shrunk first by
> > default.
>
> that sounds wrong, you lose history this way. There is NO reason to
> shrink only the unmapped LRU and not the mapped one. At minimum you
> always need to pressure both. How you pressure (absolute versus
> percentage) is an interesting question, but to me there is no doubt that
> you always need to pressure both, and "equally" to some measure of equal

Regarding if shrinking the unmapped LRU only is bad or not: In the
vanilla version of refill_inactive_zone(), if reclaim_mapped is false
then mapped pages are rotated on the active list without the
young-bits are getting cleared in the PTE:s. I would say this is very
similar to leaving the pages on the mapped active list alone as long
as reclaim_mapped is false in the dual LRU case. Do you agree?

Also, losing history, do you mean that the order of the pages are not
kept? If so, then I think my refill_inactive_zone() rant above shows
that the order of the pages are not kept today. But yes, keeping the
order is probaly a good idea.

It would be interesting to hear what you mean by "pressure", do you
mean that both the active list and inactive list are scanned?

Many thanks,

/ magnus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
On 3/11/06, Christoph Lameter <clameter@sgi.com> wrote:
> On Fri, 10 Mar 2006, Magnus Damm wrote:
>
> > Unmapped patches - Use two LRU:s per zone.
>
> Note that if this is done then the default case of zone_reclaim becomes
> trivial to deal with and we can get rid of the zone_reclaim_interval.

That's a good thing, right? =)

> However, I have not looked at the rest yet.

Please do. I'd like to hear what you think about it.

Thanks,

/ magnus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
On Fri, 2006-03-10 at 14:19 +0100, Magnus Damm wrote:
> On 3/10/06, Arjan van de Ven <arjan@infradead.org> wrote:
> > > Apply on top of 2.6.16-rc5.
> > >
> > > Comments?
> >
> >
> > my big worry with a split LRU is: how do you keep fairness and balance
> > between those LRUs? This is one of the things that made the 2.4 VM suck
> > really badly, so I really wouldn't want this bad...
>
> Yeah, I agree this is important. I think linux-2.4 tried to keep the
> LRU list lengths in a certain way (maybe 2/3 of all pages active, 1/3
> inactive). In 2.6 there is no such thing, instead the number of pages
> scanned is related to the current scanning priority.

This sounds wrong, the active and inactive lists are balanced to a 1:1
ratio. This is happens because the scan speed is directly proportional
to the size of the list. Hence the largest list will shrink fastest -
this gives a natural balance to equal sizes.

Peter


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
On Fri, 2006-03-10 at 14:38 +0100, Magnus Damm wrote:
> On 3/10/06, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:

> > Breaking the LRU in two like this breaks the page ordering, which makes
> > it possible for pages to stay resident even though they have much less
> > activity than pages that do get reclaimed.
>
> Yes, true. But this happens already with a per-zone LRU. LRU pages
> that happen to end up in the DMA zone will probably stay there a
> longer time than pages in the normal zone. That does not mean it is
> right to break the page ordering though, I'm just saying it happens
> already and the oldest piece of data in the global system will not be
> reclaimed first - instead there are priorities such as unmapped pages
> will be reclaimed over mapped and so on. (I strongly feel that there
> should be per-node LRU:s, but that's another story)

If reclaim works right* there is equal pressure on each zone
(proportional to their size) and hence each page will have an equal life
time expectancy.

(*) this is of course not possible for all workloads, however
balance_pgdat and the page allocator take pains to make it as true as
possible.

Peter

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
On 3/12/06, Peter Zijlstra <peter@programming.kicks-ass.net> wrote:
> On Fri, 2006-03-10 at 14:19 +0100, Magnus Damm wrote:
> > On 3/10/06, Arjan van de Ven <arjan@infradead.org> wrote:
> > > > Apply on top of 2.6.16-rc5.
> > > >
> > > > Comments?
> > >
> > >
> > > my big worry with a split LRU is: how do you keep fairness and balance
> > > between those LRUs? This is one of the things that made the 2.4 VM suck
> > > really badly, so I really wouldn't want this bad...
> >
> > Yeah, I agree this is important. I think linux-2.4 tried to keep the
> > LRU list lengths in a certain way (maybe 2/3 of all pages active, 1/3
> > inactive). In 2.6 there is no such thing, instead the number of pages
> > scanned is related to the current scanning priority.
>
> This sounds wrong, the active and inactive lists are balanced to a 1:1
> ratio. This is happens because the scan speed is directly proportional
> to the size of the list. Hence the largest list will shrink fastest -
> this gives a natural balance to equal sizes.

Yes, you are explaining the current 2.6 behaviour much better. Also,
some balancing logic with nr_scan_active/nr_scan_inactive is present
in the code today. I'm not entirely sure about the purpose of that
code.

Thanks,

/ magnus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/03] Unmapped: Separate unmapped and mapped pages [ In reply to ]
On 3/12/06, Peter Zijlstra <peter@programming.kicks-ass.net> wrote:
> On Fri, 2006-03-10 at 14:38 +0100, Magnus Damm wrote:
> > On 3/10/06, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> > > Breaking the LRU in two like this breaks the page ordering, which makes
> > > it possible for pages to stay resident even though they have much less
> > > activity than pages that do get reclaimed.
> >
> > Yes, true. But this happens already with a per-zone LRU. LRU pages
> > that happen to end up in the DMA zone will probably stay there a
> > longer time than pages in the normal zone. That does not mean it is
> > right to break the page ordering though, I'm just saying it happens
> > already and the oldest piece of data in the global system will not be
> > reclaimed first - instead there are priorities such as unmapped pages
> > will be reclaimed over mapped and so on. (I strongly feel that there
> > should be per-node LRU:s, but that's another story)
>
> If reclaim works right* there is equal pressure on each zone
> (proportional to their size) and hence each page will have an equal life
> time expectancy.
>
> (*) this is of course not possible for all workloads, however
> balance_pgdat and the page allocator take pains to make it as true as
> possible.

In shrink_zone(), there is +1 logic that adds at least one to
nr_scan_active/nr_scan_inactive, and resets them to zero when they
have reached sc->swap_cluster_max (32 or higher in some cases).

So nr_scan_active/nr_scan_inactive will in most cases be 16
(SWAP_CLUSTER_MAX / 2), regardless of the size of the zone. So, a
total of 256 calls to shrink_zone() on a zone with 4096 pages will
likely scan through 100% of the pages on both LRU lists, while 256
calls to shrink_zone() on a zone with say 8096 pages will result in
around 50% of the pages on the lists are scanned through.

Maybe not entirely true, but the bottom line is that the +1 logic will
scan though smaller zones faster than large ones.

/ magnus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/