Mailing List Archive

Bug: early_pfn_in_nid() called when not early
After a lot of debugging in spufs, I found that a crash that we encountered
on Cell actually was caused by a change in the memory management.

The patch that caused it is archived in http://lkml.org/lkml/2006/11/1/43,
and this one has been discussed back and forth, but I fear that the current
version may be broken for all setups that do memory hotplug with sparsemen
and NUMA, at least on powerpc.

What happens exactly is that the spufs code tries to register the memory
area owned by the SPU as hotplug memory in order to get page structs (we
probably shouldn't do it that way, but that's a separate discussion).

memmap_init_zone now calls early_pfn_valid() and early_pfn_in_nid()
in order to determine if the page struct should be initialized. This
is wrong for two reasons:

- early_pfn_in_nid checks the early_node_map variable to determine
to which node the hot plugged memory belongs. However, the new
memory never was part of the early_node_map to start with, so
it incorrectly returns node zero, and then fails to initialize
the page struct if we were trying to add it to a nonzero node.
This is probably not a problem for pseries, but it is for cell.

- both early_pfn_{in,to}_nid and early_node_map are in the __init
section and may already have been freed at the time we are calling
memmap_init_zone().

The patch below is not a suggested fix that I want to get into mainline
(checking slab_is_available is the wrong here), but it is a quick fix
that you should apply if you want to run a recent (post-2.6.18) kernel
on the IBM QS20 blade. I'm sorry for not having reported this earlier,
but we were always trying to find the problem in my own code...

Arnd <><

--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -1962,7 +1962,8 @@ void __meminit memmap_init_zone(unsigned
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
if (!early_pfn_valid(pfn))
continue;
- if (!early_pfn_in_nid(pfn, nid))
+ if (!slab_is_available() &&
+ !early_pfn_in_nid(pfn, nid))
continue;
page = pfn_to_page(pfn);
set_page_links(page, zone, nid, pfn);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: Bug: early_pfn_in_nid() called when not early [ In reply to ]
On Wed, Dec 13, 2006 at 07:20:57PM +0100, Arnd Bergmann wrote:
> After a lot of debugging in spufs, I found that a crash that we encountered
> on Cell actually was caused by a change in the memory management.
>
> The patch that caused it is archived in http://lkml.org/lkml/2006/11/1/43,
> and this one has been discussed back and forth, but I fear that the current
> version may be broken for all setups that do memory hotplug with sparsemen
> and NUMA, at least on powerpc.

I believe you are correct. At least the memory hotplug code for powerpc
is currently broken (caused crash!).

> - both early_pfn_{in,to}_nid and early_node_map are in the __init
> section and may already have been freed at the time we are calling
> memmap_init_zone().

Well that is the root of the problem for powerpc. I believe that
__meminit attribute on memmap_init_zone() has no definition if
CONFIG_MEMORY_HOTPLUG is defined. This is so that it can be called
after boot. But, this also implies that memmap_init_zone() can not
call any routines in the __init section.

> The patch below is not a suggested fix that I want to get into mainline
> (checking slab_is_available is the wrong here), but it is a quick fix
> that you should apply if you want to run a recent (post-2.6.18) kernel
> on the IBM QS20 blade. I'm sorry for not having reported this earlier,
> but we were always trying to find the problem in my own code...

Thanks for the debug work! Just curious if you really need
CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? Can you get
those types of memory layouts? If not, an easy/immediate fix for you
might be to simply turn off the option.

> --- linux-2.6.orig/mm/page_alloc.c
> +++ linux-2.6/mm/page_alloc.c
> @@ -1962,7 +1962,8 @@ void __meminit memmap_init_zone(unsigned
> for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> if (!early_pfn_valid(pfn))
> continue;
> - if (!early_pfn_in_nid(pfn, nid))
> + if (!slab_is_available() &&
> + !early_pfn_in_nid(pfn, nid))
> continue;
> page = pfn_to_page(pfn);
> set_page_links(page, zone, nid, pfn);

I know you don't recommend this as a fix, but it has the interesting
quality of doing exactly what we want for powerpc. When
slab_is_available() we are performing a 'memory add' operation
and there is no need to do the 'pfn_in_nid' check. We know that
the range of added pages will all be on the same (passed) nid.

--
Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: Bug: early_pfn_in_nid() called when not early [ In reply to ]
> Thanks for the debug work! Just curious if you really need
> CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? Can you get
> those types of memory layouts? If not, an easy/immediate fix for you
> might be to simply turn off the option.

Yes, we need that for some pSeries boxes.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: Bug: early_pfn_in_nid() called when not early [ In reply to ]
On Thursday 14 December 2006 00:17, Mike Kravetz wrote:
> Thanks for the debug work!  Just curious if you really need
> CONFIG_NODES_SPAN_OTHER_NODES defined for your platform?  Can you get
> those types of memory layouts?  If not, an easy/immediate fix for you
> might be to simply turn off the option.

We want to be able to run the same kernel as pseries, e.g. on
distributions that ship only one kernel. Cell should be able to
live without CONFIG_NODES_SPAN_OTHER_NODES, but that does not solve
the problem for us.

> > --- linux-2.6.orig/mm/page_alloc.c
> > +++ linux-2.6/mm/page_alloc.c
> > @@ -1962,7 +1962,8 @@ void __meminit memmap_init_zone(unsigned
> >       for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> >               if (!early_pfn_valid(pfn))
> >                       continue;
> > -             if (!early_pfn_in_nid(pfn, nid))
> > +             if (!slab_is_available() &&
> > +                 !early_pfn_in_nid(pfn, nid))
> >                       continue;
> >               page = pfn_to_page(pfn);
> >               set_page_links(page, zone, nid, pfn);
>
> I know you don't recommend this as a fix, but it has the interesting
> quality of doing exactly what we want for powerpc.  When
> slab_is_available() we are performing a 'memory add' operation
> and there is no need to do the 'pfn_in_nid' check.  We know that
> the range of added pages will all be on the same (passed) nid.

Right, that was at least my intention, though I wasn't sure
if pseries can live without the check.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: Bug: early_pfn_in_nid() called when not early [ In reply to ]
Mike Kravetz writes:

> Thanks for the debug work! Just curious if you really need
> CONFIG_NODES_SPAN_OTHER_NODES defined for your platform? Can you get
> those types of memory layouts? If not, an easy/immediate fix for you
> might be to simply turn off the option.

We really need CONFIG_NODES_SPAN_OTHER_NODES for pSeries. Since we
can build a single kernel binary that runs on both Cell and pSeries,
the Cell code needs to be able to work with that option turned on.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/