Mailing List Archive

[RFC] killing the NR_IRQS arrays.
Looking at irq handling in the kernel from a generic perspective I
see two problems.

- There are a huge number of possible interrupt sources but in
practice very few of them are used. So we need a large
irq_desc[NR_IRQS] array that mostly goes unused. If we try for
tighter pacing we get into all kinds of weird issues with irq
remaping and confusing human beings and sometimes the code.

Even with a large enough NR_IRQS we still get weird issues of
allocating and freeing elements in the array which is just needless
complexity.

- When dealing with interrupts we have no universal value that means
we don't have an irq. Inside the arch code we have to do
something different then in drivers because 0 is valid interrupt and
even at the level of drivers there are cases where the type is made
int irq and negative numbers are used.

So I propose we remove all assumptions from the code that we actually
have an array of irqs. That will allow for irq_desc to be dynamically
allocated instead of statically allocated saving memory and reducing
kernel complexity.

To do this I believe will require a s/unsigned int irq/struct irq_desc *irq/
throughout the entire kernel. Getting the arch specific code and the
generic kernel infrastructure fixed and ready for that change looks
like a pain but pretty doable.

Getting the drivers changed actually looks to be pretty straight
forward it will just be a very large mechanical change. We change the
type where of variables where appropriate and every once in a while
introduce an irq_nr(irq) to get the actual irq number for the places
that care (ISA or print statements).

Beyond that I did a quick test compile with just the interrupt.h and
pci.h changed and big chunks of the drivers compiled without errors.
Other drivers had more issues that mostly looked like they had an
internal irq number variable that needed updating.

I think we can make this change fairly smoothly if before the code is
merged into Linus's tree we have a patchset prepared with a all of the
core infrastructure changes and a best effort at all of the driver
changes. Then early some merge window we merge the patchset, and
fixup the drivers that were missed.

Andrew, Linus if you think this is a horrible idea I clearly cannot
pull this off, but if not I will start working up patches for 2.6.22
or more likely 2.6.23.

I expect the most it makes sense to aim for 2.6.22 are the genirq
changes so the internal arch code is passing struct irq_desc
everywhere internally.

Hopefully with this I can get the irq code in a shape where you don't
have to have been staring at the code for years to make sense of.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
> I expect the most it makes sense to aim for 2.6.22 are the genirq
> changes so the internal arch code is passing struct irq_desc
> everywhere internally.

Are there any livetime issues with passing pointers around?
e.g. what happens on APIC hotunplug etc.? We don't necessarily
support that yet, but for a big interface change it should
be probably kept in mind first.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
* Eric W. Biederman <ebiederm@xmission.com> wrote:

> So I propose we remove all assumptions from the code that we actually
> have an array of irqs. That will allow for irq_desc to be dynamically
> allocated instead of statically allocated saving memory and reducing
> kernel complexity.

hm. I'd suggest to do this without changing request_irq() - and then we
could avoid the 'massive, every driver affected' change, right?

i.e. because we'll (have to) have an nr_to_desc() and desc_to_nr()
mapping facility anyway, lets just not change the driver APIs massively.
There dont seem to be that many drivers that assume that irq_desc[] is
an array - are there?

otherwise, in terms of the irqchips infrastructure and the API between
genirq and the irqchip arch-level drivers, this change makes quite a bit
of sense i think.

or am i missing something fundamental?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
Ingo Molnar <mingo@elte.hu> writes:

> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
>> So I propose we remove all assumptions from the code that we actually
>> have an array of irqs. That will allow for irq_desc to be dynamically
>> allocated instead of statically allocated saving memory and reducing
>> kernel complexity.
>
> hm. I'd suggest to do this without changing request_irq() - and then we
> could avoid the 'massive, every driver affected' change, right?

It is a different aspect of the problem. But we have significant
problematic inconsistencies in what drivers are doing. I know at
least one driver put an irq into an unsigned char, and passed it
to user space that way.

So I think the driver change is very much worth doing because a
pointer is a token that is much harder to abuse, than an unsigned
int where you think you know how it works and so can take some
liberties.

> i.e. because we'll (have to) have an nr_to_desc() and desc_to_nr()
> mapping facility anyway, lets just not change the driver APIs massively.
> There dont seem to be that many drivers that assume that irq_desc[] is
> an array - are there?

We will have to have desc_to_nr(). I don't know about nr_to_desc().
Even if we do nr_to_desc() probably will just be a linked list walk.

There are a lot of drivers and other pieces of the kernel that don't
believe an irq is an unsigned int, and just using an unsigned int
makes killing the array an expensive operation because operations
go from O(1) to O(N). Now that isn't something anyone on a small
machine is likely to care about (N < 32). I have no problem
staggering the change. But I see a lot of benefit in going the whole
way.

> otherwise, in terms of the irqchips infrastructure and the API between
> genirq and the irqchip arch-level drivers, this change makes quite a bit
> of sense i think.

Sounds good, and that is certainly the level to start.

Eric



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
Andi Kleen <ak@suse.de> writes:

>> I expect the most it makes sense to aim for 2.6.22 are the genirq
>> changes so the internal arch code is passing struct irq_desc
>> everywhere internally.
>
> Are there any livetime issues with passing pointers around?
> e.g. what happens on APIC hotunplug etc.? We don't necessarily
> support that yet, but for a big interface change it should
> be probably kept in mind first.

Ouch. Let's consider the case of pci device (using msi's) hot unplug.
That case we theoretically support today but I'm not certain we account
for it.

The only real issue (I can imagine) would come from something that is
not part of the device driver using the irq, as the device and
everything associated with it should have the same lifetime rules.
(You can't unplug an ioapic without unplugging the device it is
connected to).

So the things to consider would be things like /proc/interrupts and
/proc/irq. I think we already have some kind of revoke in place when
the irq goes away so it probably makes sense just to make that revoke
solid and immediate.

So I can't imagine any real lifetime issues that would cause us
problems with a pointer.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
Ingo Molnar <mingo@elte.hu> writes:
>
> or am i missing something fundamental?

One piece.

At the driver level this not a big scary change.
This is just a change with widespread effect.

It should be no worse than enabling a very revealing new compiler
warning.

Every fix should be purely mechanical. There should be no need at
all to think to get it right (unless things are broken today and we
just don't see it.).

Yes typo's and the like will happen. There will be issues. But 99%
of them will be the code doesn't compile, for an obvious reason.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
Eric W. Biederman wrote:
> So I propose we remove all assumptions from the code that we actually
> have an array of irqs. That will allow for irq_desc to be dynamically
> allocated instead of statically allocated saving memory and reducing
> kernel complexity.
>

Sounds good to me. In Xen we have 1024 event channels which we need to
map down into a smaller irq. Aside from the complexity of maintaining a
mapping table, that's not a huge issue for now, but when we start
exposing pci devices to guests it all becomes more complex. The ideal
for us is to simply use event channel == irq, which this would allow.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Eric W. Biederman wrote:
>> So I propose we remove all assumptions from the code that we actually
>> have an array of irqs. That will allow for irq_desc to be dynamically
>> allocated instead of statically allocated saving memory and reducing
>> kernel complexity.
>>
>
> Sounds good to me. In Xen we have 1024 event channels which we need to
> map down into a smaller irq. Aside from the complexity of maintaining a
> mapping table, that's not a huge issue for now, but when we start
> exposing pci devices to guests it all becomes more complex. The ideal
> for us is to simply use event channel == irq, which this would allow.

Well you shouldn't need to wait just run with a kernel with NR_IRQS >= 1024.
1024 is stretch but it isn't to bad. There are already x86 boxes that have
more pins on their ioapics then that. So x86_64 and with this latest
round of patches from Len Brown and I i386 should be able to support that.

On the other side 1024 looks extremely limiting for exposing pci devices.
If someone gets serious about pushing what is legal with MSI-X you may be
in trouble. As a single device is allowed to have 4096 interrupts. Not
that I can think of a user for so many but...

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
Eric W. Biederman wrote:
> Well you shouldn't need to wait just run with a kernel with NR_IRQS >= 1024.
> 1024 is stretch but it isn't to bad. There are already x86 boxes that have
> more pins on their ioapics then that. So x86_64 and with this latest
> round of patches from Len Brown and I i386 should be able to support that.
>

Early Xen patches did just that, but there was general criticism about
the memory use. And in the paravirt_ops world, a large compile-time
static allocation is not really acceptable if its only needed by Xen.
But, hey, if you're OK with it I'll submit the patch ;)

> On the other side 1024 looks extremely limiting for exposing pci devices.
> If someone gets serious about pushing what is legal with MSI-X you may be
> in trouble. As a single device is allowed to have 4096 interrupts. Not
> that I can think of a user for so many but...
>

No, I think we'll burn that bridge when we come to it.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
On Friday 16 February 2007 13:10, Eric W. Biederman wrote:
> To do this I believe will require a s/unsigned int irq/struct irq_desc *irq/
> throughout the entire kernel.  Getting the arch specific code and the
> generic kernel infrastructure fixed and ready for that change looks
> like a pain but pretty doable.

We did something like this a few years back on the s390 architecture, which
happens to be lucky enough not to share any interrupt based drivers with
any of the other architectures.

It helped a lot on s390, and I think the change will be beneficial on others
as well, e.g. powerpc already uses 'virtual' interrupt numbers to collapse
the large (sparse) range of interrupt numbers into 512 unique numbers. This
could easily be avoided if there was simply an array of irq_desc structures
per interrupt controller.

However, I also think we should maintain the old interface, and introduce
a new one to deal only with those cases that benefit from it (MSI, Xen,
powerpc VIO, ...). This means one subsystem can be converted at a time.

I don't think there is a point converting the legacy ISA interrupts to
a different interface, as the concept of IRQ numbers is part of the
subsystem itself (if you want to call ISA a subsystem...).

For PCI, it makes a lot more sense to use something else, considering
that PCI interrupts are defined as 'pins' instead of 'lines', and
while an interrupt pin is defined per slot, while the line is per
bus, in a system with multiple PCI buses, the line is still not
necessarily unique.

One interface I could imagine for PCI devices would be

/* generic functions */
int request_irq_desc(struct irq_desc *desc, irq_handler_t handler,
unsigned long irqflags, const char *devname, void *dev_id);
int free_irq_desc(struct irq_desc *desc, void *dev_id);

/* legacy functions */
int request_irq(int irq, irq_handler_t handler,
unsigned long irqflags, const char *devname, void *dev_id)
{
return request_irq_desc(lookup_irq_desc(irq), handler, irqflags,
devname, dev_id);
}
int free_irq(int irq, void *dev_id)
{
return free_irq_desc(lookup_irq_desc(irq), dev_id);
}

/* pci specific */
struct irq_desc *pci_request_irq(struct pci_device *dev, int pin,
irq_handler_t handler)
{
struct irq_desc *desc = pci_lookup_irq(dev, pin);
int ret;

if (!desc)
return NULL;

ret = request_irq_desc(desc, handler, IRQF_SHARED,
&dev->dev.bus_id, dev);
if (ret < 0)
return NULL;
return desc;
}
int pci_free_irq(struct pci_device *dev, int pin)
{
return free_irq_desc(pci_lookup_irq(dev, pin), dev);
}

Now I don't know enough about MSI yet, but I could imagine
that something along these lines would work as well, and we
could simply require all drivers that want to support MSI
to use the new interfaces.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote:
> On Friday 16 February 2007 13:10, Eric W. Biederman wrote:
> > To do this I believe will require a s/unsigned int irq/struct irq_desc *irq/
> > throughout the entire kernel.  Getting the arch specific code and the
> > generic kernel infrastructure fixed and ready for that change looks
> > like a pain but pretty doable.
>
> We did something like this a few years back on the s390 architecture, which
> happens to be lucky enough not to share any interrupt based drivers with
> any of the other architectures.

What you're proposing is looking similar to a proposal I put forward some
4 years ago, but was rejected. Maybe times have changed and there's a
need for it now.

Message attached.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
On Friday 16 February 2007 20:52, Russell King wrote:
> On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote:
> > We did something like this a few years back on the s390 architecture, which
> > happens to be lucky enough not to share any interrupt based drivers with
> > any of the other architectures.
>
> What you're proposing is looking similar to a proposal I put forward some
> 4 years ago, but was rejected.  Maybe times have changed and there's a
> need for it now.

Yes, I think times have changed, with the increased popularity of MSI
and paravirtualized devices. A few points on your old proposal though:

- Doing it per architecture no longer sounds feasible, I think it would
need to be done per subsystem so that the drivers can be adapted to
a new interface, and most drivers are used across multiple architectures.
- struct irq sounds much more fitting than struct irq_desc
- creating new irq_foo() functions to replace foo_irq() also sounds right.
- I don't see the point in splitting request_irq into irq_request and
irq_register.
- doing subsystem specific abstractions ideally allows the drivers to
not even need to worry about the irq pointer, significantly simplifying
the interface for register/unregister.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
On Fri, Feb 16, 2007 at 09:43:24PM +0100, Arnd Bergmann wrote:
> On Friday 16 February 2007 20:52, Russell King wrote:
> > On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote:
> > > We did something like this a few years back on the s390 architecture, which
> > > happens to be lucky enough not to share any interrupt based drivers with
> > > any of the other architectures.
> >
> > What you're proposing is looking similar to a proposal I put forward some
> > 4 years ago, but was rejected.  Maybe times have changed and there's a
> > need for it now.
>
> Yes, I think times have changed, with the increased popularity of MSI
> and paravirtualized devices. A few points on your old proposal though:
>
> - Doing it per architecture no longer sounds feasible, I think it would
> need to be done per subsystem so that the drivers can be adapted to
> a new interface, and most drivers are used across multiple architectures.
> - struct irq sounds much more fitting than struct irq_desc
> - creating new irq_foo() functions to replace foo_irq() also sounds right.
> - doing subsystem specific abstractions ideally allows the drivers to
> not even need to worry about the irq pointer, significantly simplifying
> the interface for register/unregister.

I agree with your points above, except for:

> - I don't see the point in splitting request_irq into irq_request and
> irq_register.

This was to work around those scenarios where you want to mark an IRQ
resource as being in use prior to actually using it in much the same
way as is done with IO ports.

I've come across hardware where you need to claim the interrupt with
the controller masked, configure the device generating the interrupt
appropriately, and only then unmask it. Otherwise you end up spinning.

To work around that, we've had to introduce additional flags into the
genirq subsystem - IRQF_NOAUTOEN - whereas separating the "obtain"
from the "start using" bit of request_irq would've made this
unnecessary.

Another example where this (was|still is) used is the IDE code, but
that's probably been cleaned up in some way now.

There's nothing wrong with keeping a combined "request_irq" for the
common case though.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
On Fri, 2007-02-16 at 05:10 -0700, Eric W. Biederman wrote:

> Getting the drivers changed actually looks to be pretty straight
> forward it will just be a very large mechanical change. We change the
> type where of variables where appropriate and every once in a while
> introduce an irq_nr(irq) to get the actual irq number for the places
> that care (ISA or print statements).

Dunno about that irq_nr thingy. If we go that way, I'd be tempted to
remove the number completely from the "public" side of irq_desc... or
not.

On powerpc, we have this remapped thingy because we completely separate
the linux "virtual" interrupt domain from the physical numbering domains
of each PIC. Your change would turn the linux virtual domain into
pointers, removing the need for an array and associated limitations,
which is nice.

So to a given irq_desc / irq "virtual" number today, I match a pair HW
number (which is a special typedef which is currently defined as an
unsigned long) and a pointer to the irq "host" (which is the entity that
define a HW number domain).

That means that you can have multiple hosts and a given HW number can
exist multiple times, once per host.

Do you think the irq_hwnumber_t thingy I have should then be generalized
and put into the irq_desc ? I would need an additional void * pointer to
the irq host as well (it's not a 1:1 relationship to an irq chip and
need to be accessed by generic code).

Having the HW number be clearly specific to a "domain controller" makes
also a lot of sense in the embedded field with lots of cascaded
interrupt controllers. It avoids having to play all sorts of tricks to
assign ranges of numbers to various controllers in the system. Only the
local number on a given controller matters, the rest is dynamically
assigned.

Another option would be to have the irq_desc be created by the arch and
"embedded" in a larger data structure, in which case the HW number would
be part of the private part of that data structure. Though I suppose
that could be a problem with ISA...

I suspect that for backward compatibility, we will need to keep
something (optionally maybe via CONFIG_*) for ISA/legacy interrupts.
That is a 16 entries irq_desc* array, so we can go from a legacy IRQ
number to an irq_desc on platform that have legacy/ISA crap floating
around.

On powerpc, what I do is that I always reserve entries 0...15 of my
remapping array in such a way that linux virtual irq 0 is always
reserved, and 1...15 are only ever assigned to legacy interrupts if they
exist in the system, or left unassigned if they don't.

> I think we can make this change fairly smoothly if before the code is
> merged into Linus's tree we have a patchset prepared with a all of the
> core infrastructure changes and a best effort at all of the driver
> changes. Then early some merge window we merge the patchset, and
> fixup the drivers that were missed.

As long as we do things properly and not with a big "DESIGNED FOR x86"
hack in the middle that makes it hard for everybody else, I agree.

Cheers,
Ben.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
On Fri, 2007-02-16 at 13:41 +0100, Ingo Molnar wrote:
> * Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> > So I propose we remove all assumptions from the code that we actually
> > have an array of irqs. That will allow for irq_desc to be dynamically
> > allocated instead of statically allocated saving memory and reducing
> > kernel complexity.
>
> hm. I'd suggest to do this without changing request_irq() - and then we
> could avoid the 'massive, every driver affected' change, right?
>
> i.e. because we'll (have to) have an nr_to_desc() and desc_to_nr()
> mapping facility anyway, lets just not change the driver APIs massively.
> There dont seem to be that many drivers that assume that irq_desc[] is
> an array - are there?
>
> otherwise, in terms of the irqchips infrastructure and the API between
> genirq and the irqchip arch-level drivers, this change makes quite a bit
> of sense i think.
>
> or am i missing something fundamental?

Well, I don't want to see anything like desc_to_nr / nr_to_desc unless
the number in question is a virtual number. That is, there is no way we
should go that way and keep passing a HW number through request_irq.
That would just be a total nightmare for powerpc and sparc at least.

What we can do is generalize the powerpc virtual irq scheme though. You
can see the implementation in arch/powerpc/kernel/irq.c starting from
the definition of irq_alloc_host() though for some stupid reason, I've
put all the documentation in include/asm-powerpc/irq.h so you might want
to start there.

Once the IRQ numbers are virtualized, it becomes easier to slowly
migrate things to use irq_desc_t * while still having a virutal number
available.

Once everything has been migrated, we can then get rid of the virtual
numbers completely except maybe for an optional 16 entries array for
legacy cruft.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
> > Rather than having the job of rewriting this code during 2.6, I'd much
> > prefer to get something sorted, even if it is ARM only before 2.6.
> >
> > I believe that there are some common problems with the existing API
> > which have been hinted at over the last few days, such as large
> > NR_IRQS. As such, I think it would be a good idea to try to thrash
> > this issue out and get something which everyone is happy with.
> >
> > Additionally, I've added Alan's "reserve then hook" idea to the API;
> > I seem to remember there is a case in IDE which needs something like
> > this.

You might want to have a look at the powerpc API with it's remaping
capabilities. It's very nice for handling multiple domain spaces. It
might be of some use for you.

I like your proposed API, I think that's where we want to go in the long
run.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
On Friday 16 February 2007 23:37, Benjamin Herrenschmidt wrote:
> You might want to have a look at the powerpc API with it's remaping
> capabilities. It's very nice for handling multiple domain spaces. It
> might be of some use for you.

I don't consider the powerpc virtual IRQs a solution for the problem.
While I believe you did the right thing for powerpc with generalizing
this over all its platforms, it really isn't more than a workaround
for the problem that we can't deal well with the static irq_desc
array.

When that problem is now getting worse on other architectures, we
should try to get it right on all of them, rather than spreading
the workaround further.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
On Sat, 2007-02-17 at 02:37 +0100, Arnd Bergmann wrote:
> On Friday 16 February 2007 23:37, Benjamin Herrenschmidt wrote:
> > You might want to have a look at the powerpc API with it's remaping
> > capabilities. It's very nice for handling multiple domain spaces. It
> > might be of some use for you.
>
> I don't consider the powerpc virtual IRQs a solution for the problem.
> While I believe you did the right thing for powerpc with generalizing
> this over all its platforms, it really isn't more than a workaround
> for the problem that we can't deal well with the static irq_desc
> array.

It's not a solution per-se, though it contains elements of solution like
the reverse mappin, which I use to map HW numbers to virtual irqs but
can trivially adapt to map HW numbers to irq_desc pointers.

Among other things, I want to make sure that we don't end up with just
putting an irq number in a field of the irq_desc and have half of the
drivers peek at it and assume we can convert between irq_desc* and
number in arbitrary ways.

The HW irq number should be as much opaque as possible from the world
outside of the PIC code and/or arch code that assign them. That's an
area where the powerpc and/or sparc code might be of use.


> When that problem is now getting worse on other architectures, we
> should try to get it right on all of them, rather than spreading
> the workaround further.

Yes, but I'd like aspects of my remapping work to be included in
whatever we come up with, which is to have the new irq_desc either hide
the underlying HW number, or at least associate it make it very clear
that it's an opaque token and not guaranteed to be unique accross
multiple PICs in the system.

In addition, if we remove the numbers, archs will need basically the
exact same services provided by the powerpc irq core for reverse mapping
(going from a HW irq number on a given PIC back to an irq_desc *).

Either using a linear array for simple PICs or a radix tree for
platforms with very big interrupt numbers (BTW. I think we have lockless
radix trees nowadays, I can remove the spinlocks to protect it in the
powerpc remapper).

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Fri, 2007-02-16 at 05:10 -0700, Eric W. Biederman wrote:
>
>> Getting the drivers changed actually looks to be pretty straight
>> forward it will just be a very large mechanical change. We change the
>> type where of variables where appropriate and every once in a while
>> introduce an irq_nr(irq) to get the actual irq number for the places
>> that care (ISA or print statements).
>
> Dunno about that irq_nr thingy. If we go that way, I'd be tempted to
> remove the number completely from the "public" side of irq_desc... or
> not.

When dealing with users and userspace for /proc/interrupts /proc/irq
and the like we need a way to talk about irqs. Currently we use the
interrupt number for that and we are likely to break the user/kernel
interface if we don't preserve that. Debugging would tend to suck
if we couldn't print out the irq number of the irq a driver has
been assigned and trace it through various data structures.

For hardware that is not hotplug or auto discoverable I think we will
need the irq number to talk about the ISA number as well.

So I don't see a way that we can get rid of a number completely but
it should be of much less significance.

> On powerpc, we have this remapped thingy because we completely separate
> the linux "virtual" interrupt domain from the physical numbering domains
> of each PIC. Your change would turn the linux virtual domain into
> pointers, removing the need for an array and associated limitations,
> which is nice.
>
> So to a given irq_desc / irq "virtual" number today, I match a pair HW
> number (which is a special typedef which is currently defined as an
> unsigned long) and a pointer to the irq "host" (which is the entity that
> define a HW number domain).
>
> That means that you can have multiple hosts and a given HW number can
> exist multiple times, once per host.
>
> Do you think the irq_hwnumber_t thingy I have should then be generalized
> and put into the irq_desc ? I would need an additional void * pointer to
> the irq host as well (it's not a 1:1 relationship to an irq chip and
> need to be accessed by generic code).

Having taken a little bit of time to digest roughly what the concept
is I think I can finally answer this one.

No. I don't think we should make your irq_hwnumber_t thingy general
because it is not general. I don't understand why you need it to be
an unsigned long, that still puzzles me. But for the rest it actually
appears that ppc has a simpler model to deal with.

I don't think I actually can describe x86 hardware in you hwnumber_t
world. Although I can approximate.

In non-legacy mode at the top of the tree I have a network cooperating
irq controllers. For each cpu there is an lapic next to each cpu that
catches interrupt packets and below that I have interrupt controllers
that throw interrupt packets. In the network of cooperating interrupt
controllers a interrupt packet has a destination address that looks
like (cpu#, vector#) where cpu# is currently at 8 bits and slowly
growing and the vector# is a fixed 8 bits.

The interrupt controllers that throw those packets have a fixed
number of irq slots usually 24 or so. Each slot (referred to in the
code as a pin) can be programmed which (cpu#, vector#) packet it
throws when an interrupt occurs. Including an option to vary the cpu#
between a set of cpus.

So to be frank to handle this model properly I need to deal with
this properly I need.
#define NR_IRQS (NR_CPUS*256)

There is enough flexibility in this model that hardware vectors
have not found a need to cascade interrupt controllers.


> Having the HW number be clearly specific to a "domain controller" makes
> also a lot of sense in the embedded field with lots of cascaded
> interrupt controllers. It avoids having to play all sorts of tricks to
> assign ranges of numbers to various controllers in the system. Only the
> local number on a given controller matters, the rest is dynamically
> assigned.

Ben I have no problem with a number that is specific to an irq
controller for dealing with the internal irq controller
implementations, heck I think everyone has that to some degree

The linux irq number will remain an arbitrary software number for
use by the linux system for talking about the source of the
interrupt.

Why in a sparse address space you would find it hard to allocate a
range of numbers to an irq controller that only has a fixed number of
irqs it can deal with is something I don't understand and I think
it is does a disservice to your users. But that is all it is
a quality of implementation issue. ia64 does the same foolish
thing.

The only time it really makes sense to me to let the irq number vary
arbitrary are when things are truly dynamic, like with MSI, a
hypervisor, or hot plug interrupt controllers.

> Another option would be to have the irq_desc be created by the arch and
> "embedded" in a larger data structure, in which case the HW number would
> be part of the private part of that data structure. Though I suppose
> that could be a problem with ISA...

This definitely what I intend to have the gneirq code start allowing.
For all intents and purposes we already do this today.

> I suspect that for backward compatibility, we will need to keep
> something (optionally maybe via CONFIG_*) for ISA/legacy interrupts.
> That is a 16 entries irq_desc* array, so we can go from a legacy IRQ
> number to an irq_desc on platform that have legacy/ISA crap floating
> around.

Yes.

> On powerpc, what I do is that I always reserve entries 0...15 of my
> remapping array in such a way that linux virtual irq 0 is always
> reserved, and 1...15 are only ever assigned to legacy interrupts if they
> exist in the system, or left unassigned if they don't.

Yep. Once we are done you can remove the reserve on 0. And leave
0..15 only ever assigned to ISA style interrupts if they are in the
system.

I really don't like the term legacy or old/new, when referring to
things. Because today's current hip/new is tomorrow legacy and
we have lots of generations of hardware.

If we want to throw the legacy term around I hereby designate all non
MSI-X interrupt controllers legacy.

>> I think we can make this change fairly smoothly if before the code is
>> merged into Linus's tree we have a patchset prepared with a all of the
>> core infrastructure changes and a best effort at all of the driver
>> changes. Then early some merge window we merge the patchset, and
>> fixup the drivers that were missed.
>
> As long as we do things properly and not with a big "DESIGNED FOR x86"
> hack in the middle that makes it hard for everybody else, I agree.

Sure, and I have the same issue with a big "DESIGNED FOR ppc" in the middle,
or "DESIGNED FOR arch/x". However the unfortunate truth is that the x86
has enough volume that frequently other architectures use some x86
hardware and thus get some of x86's warts. So anything that doesn't
cope with the x86's warts is frequently doomed to failure.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> In addition, if we remove the numbers, archs will need basically the
> exact same services provided by the powerpc irq core for reverse mapping
> (going from a HW irq number on a given PIC back to an irq_desc *).

Ben you seem to be under misapprehension that except for the case of
ISA (0-16) the linux IRQ number is a hardware number. It is an arbitrary
software enumeration, and I think it has been that way a very long time.

> Either using a linear array for simple PICs or a radix tree for
> platforms with very big interrupt numbers (BTW. I think we have lockless
> radix trees nowadays, I can remove the spinlocks to protect it in the
> powerpc remapper).

I can only tell you that my impression of this last is that all the
world's not a PPC.

I have a version of the x86 code with a partial conversion done and
I didn't need a reverse mapping. What you call the hardware interrupt
number never happens to be interesting to me after the system is setup.

I do suspect there may be an interesting chunk of your ppc work that
probably makes sense as a library so other arches could use it.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
Russell King <rmk+lkml@arm.linux.org.uk> writes:

> On Fri, Feb 16, 2007 at 08:45:58PM +0100, Arnd Bergmann wrote:
>> On Friday 16 February 2007 13:10, Eric W. Biederman wrote:
>> > To do this I believe will require a s/unsigned int irq/struct irq_desc *irq/
>> > throughout the entire kernel.  Getting the arch specific code and the
>> > generic kernel infrastructure fixed and ready for that change looks
>> > like a pain but pretty doable.
>>
>> We did something like this a few years back on the s390 architecture, which
>> happens to be lucky enough not to share any interrupt based drivers with
>> any of the other architectures.
>
> What you're proposing is looking similar to a proposal I put forward some
> 4 years ago, but was rejected. Maybe times have changed and there's a
> need for it now.
>
> Message attached.
>
> --
> Russell King
> Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
> maintainer of:
>
> From: Russell King <rmk@arm.linux.org.uk>
> Subject: [RFC] IRQ API
> To: linux-arch@vger.kernel.org
> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
> Date: Sat, 07 Jun 2003 17:05:19 -0700
>
> Hi,
>
> I've recently received an updated development system from ARM Ltd,
> which has caused me to become concerned about whether the existing
> IRQ infrastructure for the ARM architecture is really up to the job
> of handling the developments which will occur over the 2.6 lifetime.
> Essentially we're going to be seeing the emergence of vectored
> interrupt controllers, and knowing hardware designers, they'll
> continue the practice of chaining interrupt controllers (which is
> pretty common on ARM today.) I have some hardware here today which
> has a vectored interrupt controller chained after two non-vectored
> controllers. This vectored interrupt controller is on an add-on
> card, and so has no fixed address space and no fixed IRQ numbering.
>
> Rather than having the job of rewriting this code during 2.6, I'd much
> prefer to get something sorted, even if it is ARM only before 2.6.
>
> I believe that there are some common problems with the existing API
> which have been hinted at over the last few days, such as large
> NR_IRQS. As such, I think it would be a good idea to try to thrash
> this issue out and get something which everyone is happy with.
>
> Additionally, I've added Alan's "reserve then hook" idea to the API;
> I seem to remember there is a case in IDE which needs something like
> this.
>
> Please note that what I am proposing is not to strip out the existing
> API between now and 2.7; what I am proposing is a structure for 2.7
> which can optionally be implemented by architectures and used in
> architecture specific drivers now if they feel they would benefit
> from it.
>
> Comments? (other than "wtf are you thinking about this so close to 2.6,
> are you mad" 8))
>
>
> Linux Interrupt API
> ===================
>
> Russell King <rmk@arm.linux.org.uk>
>
> The Linux Interrupt API provides a flexible mechanism to handle and
> control interrupts within the kernel. The design requirements for
> this API are:
>
> - must have as little overhead as possible for commodity hardware
> - must be easy and obvious to use
> - must allow complex multi-level interrupt implementations to exist
> transparently to device drivers
> - must be compatible with the existing API
>
> Essentially, this means that implementation of the existing API must
> be simple.
>
> ------------------------------------------------------------------------------
>
> The API.
> ========
>
> struct irq {
> /* architecture defined information */
> /* must not be dereferenced by drivers */
> /* eg, x86's irq_desc_t or sparc64's struct ino_bucket */
> };
>
> #define NO_IRQ <architecture-defined-int-constant>

When did you need a magic constant NO_IRQ in generic code.
One of the reasons I want to convert the drivers is so we can
kill the NO_IRQ nonsense.

As for struct irq. Instead of struct irq_desc I really don't
care, although the C++ camp hasn't not yet weighed in and mentioned
how that creates a namespace conflict for them.

> /**
> * irq_get - increment reference count on the IRQ descriptor
> * @irq: interrupt descriptor
> *
> * IRQ descriptor reference counting is mandatory for
> * implementations which provide dynamically allocated IRQ
> * descriptors. statically allocated IRQ descriptor
> * implementations may define these to be no-ops.
> */
> struct irq *irq_get(struct irq *irq);

> /**
> * irq_put - decrement reference count on IRQ descriptor
> * @irq: interrupt descriptor
> *
> * Decrement the reference counter in an IRQ descriptor.
> * If the reference counter drops to zero, the IRQ descriptor
> * will be freed.
> *
> * IRQ descriptor reference counting is mandatory for
> * implementations which provide dynamically allocated IRQ
> * descriptors. statically allocated IRQ descriptor
> * implementations may define these to be no-ops.
> */
> void irq_put(struct irq *irq);

We might need this. But I don't think we need reference counting in
the traditional sense. For all practical purpose we already have
dynamic irq allocation and it hasn't proven necessary. I would
prefer to go to lengths to avoid having to expose that kind of
an issue to driver code.


> Example backwards-compatible IRQ code.
> ======================================
>
> The following example code shows the expected back-compat code required
> for the x86 architecture. Since x86 uses a fixed table of interrupts,
> this is relatively straight forward and simple.

Well I just find that comment cute, in the current context :)

I actually don't like the idea of having two version of the
infrastructure or for that matter any driver visible changes except
the type change at the same time.

That makes the conversion much harder because you have to think about
the change. If I have to go in touch a huge percentage of the drivers
I don't want to have to think, I don't want something I have to code
review. I want something that any first grader can do correct so when
I am fatigued after having converted 1000 drivers I can still get it
right, and so that there is a chance I don't have to convert drivers.

But otherwise I agree we seem to be largely on the same wavelength.

I also don't like infrastructure changes that never get finished,
which is the big danger with providing a backwards compatibility
API. Some drivers just never adapt to what is new and better.

I think with a little care I can get 99% of the drivers compile
tested by enabling everything in the kernel (allyesconfig?)

I guess when it comes to that I will have to see if I'm crazy or not.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
> No. I don't think we should make your irq_hwnumber_t thingy general
> because it is not general. I don't understand why you need it to be
> an unsigned long, that still puzzles me. But for the rest it actually
> appears that ppc has a simpler model to deal with.

I think you might have misunderstood becaues I do beleive it's actually
very general :-)

Let me explain below.

> I don't think I actually can describe x86 hardware in you hwnumber_t
> world. Although I can approximate.

And I think it fits well...

> In non-legacy mode at the top of the tree I have a network cooperating
> irq controllers. For each cpu there is an lapic next to each cpu that
> catches interrupt packets and below that I have interrupt controllers
> that throw interrupt packets. In the network of cooperating interrupt
> controllers a interrupt packet has a destination address that looks
> like (cpu#, vector#) where cpu# is currently at 8 bits and slowly
> growing and the vector# is a fixed 8 bits.
>
> The interrupt controllers that throw those packets have a fixed
> number of irq slots usually 24 or so. Each slot (referred to in the
> code as a pin) can be programmed which (cpu#, vector#) packet it
> throws when an interrupt occurs. Including an option to vary the cpu#
> between a set of cpus.
>
> So to be frank to handle this model properly I need to deal with
> this properly I need.
> #define NR_IRQS (NR_CPUS*256)
>
> There is enough flexibility in this model that hardware vectors
> have not found a need to cascade interrupt controllers.

This is roughly similar to the cell "toplevel" model where interrupt
messages encode the source unit/node, target and class. The chip has an
interrupt "controller" (receiver of those messages) for each thread. In
the kernel, I use a "flat" model, that is I create one host for all of
them and my hardware numbers are mode of a similar bit encoding of those
"routing" infos.

That is, with a remapping model like mine, the x86 non-legacy situation
could be easily expressed by having one domain (I call them hosts in the
code) covering the whole fabric and the hw number be your (CPU << 16) |
vector thing.

In addition, but you don't need that on x86, cell has an external
controller cascaded on one of those interrupt, I use a separate domain
for it.

The reason my hwnumber thingy is a generic type is that i provide
generic functions to create a linux interrupt for a domain/number pair
and generic mecanism to do the reverse mapping. That's where I think my
code might be of some use as with the "numbers" going away, pretty
everybody will need a wat to reverse map from HW numbers back to
irq_desc *.

I use an unsigned long because I needed to choose a type that would fit
the biggest number potentially used by an interrupt controller, and that
can be real big with some hypervisors for which those are "tokens" which
are potentially 64 bits.

> Ben I have no problem with a number that is specific to an irq
> controller for dealing with the internal irq controller
> implementations, heck I think everyone has that to some degree
>
> The linux irq number will remain an arbitrary software number for
> use by the linux system for talking about the source of the
> interrupt.

So you do intend to keep the linux number which is what I call the
"virtual interrupt" number on powerpc... I wouldn't have thought that to
be necessary except as a special case of an array of 16 entries for ISA
interrupts...

> Why in a sparse address space you would find it hard to allocate a
> range of numbers to an irq controller that only has a fixed number of
> irqs it can deal with is something I don't understand and I think
> it is does a disservice to your users. But that is all it is
> a quality of implementation issue. ia64 does the same foolish
> thing.

It would be fairly easy to change my powerpc code to pre-allocate a full
range for a given domain/pic when initializing it instead of doing
"lazy" scattered allocation like I do, though it won't bring much I
think. It's not possible for all PICs though, for example, the pSeries
needs to use the radix tree reverse mapper because of how large HW
interrupt numbers can be.

I chose not to do it. In the long run, the only remotely meaningful way
to expose interrupt to users would be to -add- columns
to /proc/interrupts that provide the "host" and the HW number on that
host, though I'm not sure that wouldn't break some userland tools.

> The only time it really makes sense to me to let the irq number vary
> arbitrary are when things are truly dynamic, like with MSI, a
> hypervisor, or hot plug interrupt controllers.

I don't understand why you would go to all that lenght to replace irq
numbers with irq_desc * and ... keep then numbers :-)

But again, as I said, this is in no way a fundamental limitation of the
powerpc code. It could be modified easily to allocate the whole range of
a given PIC that uses the "linear" remapping. It makes no sense for PICs
that use the "radix tree" remapping though.

> Sure, and I have the same issue with a big "DESIGNED FOR ppc" in the middle,
> or "DESIGNED FOR arch/x". However the unfortunate truth is that the x86
> has enough volume that frequently other architectures use some x86
> hardware and thus get some of x86's warts. So anything that doesn't
> cope with the x86's warts is frequently doomed to failure.

I fait to see how what I described would not apply nicely to x86 ..

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
On Sat, 2007-02-17 at 02:06 -0700, Eric W. Biederman wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
>
> > In addition, if we remove the numbers, archs will need basically the
> > exact same services provided by the powerpc irq core for reverse mapping
> > (going from a HW irq number on a given PIC back to an irq_desc *).
>
> Ben you seem to be under misapprehension that except for the case of
> ISA (0-16) the linux IRQ number is a hardware number. It is an arbitrary
> software enumeration, and I think it has been that way a very long time.

Did you actually mean "is not a hardware number" ? If not, then I don't
understand your sentence...

> I can only tell you that my impression of this last is that all the
> world's not a PPC.

Yeah and my grandmother is not the pope, thank you.

However, PowerPC is a good example because it has such a diversity of
very different hardware setups to deal with, ranging from the multiple
layers of cascading controllers all over the place, to interrupts
packets encoding vector/target etc... a bit like x86 on cell, to
hypervisors providing a single giant number space etc etc etc...

Thus, it is extremely likely that something that works well for PowerPC
(or for ARM for that matter as it's probably as a "colorful" environment
as PowerPC is) will end up being useful for others.

> I have a version of the x86 code with a partial conversion done and
> I didn't need a reverse mapping. What you call the hardware interrupt
> number never happens to be interesting to me after the system is setup.

Because you have the ability to tell your PIC to give you your "linux"
interrupt number when actually sending the interrupt to the processor ?
You need a way to get to the irq_desc * when getting an IRQ, either you
have a way to map HW numbers back to irq_desc * in sofrware, or your HW
allows you to do it.

> I do suspect there may be an interesting chunk of your ppc work that
> probably makes sense as a library so other arches could use it.

Guess what, one of the options of my code is to not instanciate a
remapper... for archs where it's not necessary. (We have the case for
example of iSeries whose hypervisor can return us the number we want for
an arbitrary interrupt).

Now, I'm not saying we should take the PowerPC code and say "hey' here's
the new generic code".

I'm saying that if we're going to change the IRQ stuff that deeply, it
would be nice if we looked into some of that stuff I've done that I
beleive would be of use for other archs (though you seem to imply that
it would be of no use on x86, good, still...).

I found it overall very useful to have a generic remapping core and have
cascaded PIC setups have a numbering domain local to a given PIC (pretty
much, a domain != an irq_chip) and I'm convinced it would make life
easier for archs with similar setups. The remapping core also shows its
usefulness on archs with very big interrupt numbers, like sparc or
pSeries ppc, and possibly others.

Now, I -do- have a problem with one aspect of your proposed design which
is to keep the "linux" interrupt number in the generic irq_desc, which I
think defeats most of the purpose of moving away from those linux irq
numbers. If you do so, then I'll have to keep a separate remapping layer
and keep a mecanism for virtualizing linux numbers.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
> > #define NO_IRQ <architecture-defined-int-constant>
>
> When did you need a magic constant NO_IRQ in generic code.
> One of the reasons I want to convert the drivers is so we can
> kill the NO_IRQ nonsense.
>
> As for struct irq. Instead of struct irq_desc I really don't
> care, although the C++ camp hasn't not yet weighed in and mentioned
> how that creates a namespace conflict for them.

Yeah, NO_IRQ would be NULL here...

What I do on the powerpc code is since IRQ HW numbers are defined
locally to a domain/PIC, when creating a new domain, The PIC code passes
a value to use as an "illegal" value in that domain. It's not exposed
outside of the core though, it's really only used to initialize the
remapping table with something before any interrupt on that PIC has been
mapped.

> We might need this. But I don't think we need reference counting in
> the traditional sense. For all practical purpose we already have
> dynamic irq allocation and it hasn't proven necessary. I would
> prefer to go to lengths to avoid having to expose that kind of
> an issue to driver code.

I think we do need proper refcounting, but I also think that most
drivers will not need to see it.

For example, a PCI driver will most probably just do something along the
lines of the existing request_irq(pdev->irq), the liftime of pdev->irq
is managed by the PCI core.

Same goes with MSIs imho, the MSI core can manage the lifetime
transparently.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] killing the NR_IRQS arrays. [ In reply to ]
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

>
>> We might need this. But I don't think we need reference counting in
>> the traditional sense. For all practical purpose we already have
>> dynamic irq allocation and it hasn't proven necessary. I would
>> prefer to go to lengths to avoid having to expose that kind of
>> an issue to driver code.
>
> I think we do need proper refcounting, but I also think that most
> drivers will not need to see it.
>
> For example, a PCI driver will most probably just do something along the
> lines of the existing request_irq(pdev->irq), the liftime of pdev->irq
> is managed by the PCI core.
>
> Same goes with MSIs imho, the MSI core can manage the lifetime
> transparently.

Yes. I'm optimistic that we won't find a case where refcounting will
be needed.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

1 2  View All