Mailing List Archive

1 2  View All
Re: Lockups - lost interrupt [ In reply to ]
On Mon, 13 Sep 1999 yodaiken@chelm.cs.nmt.edu wrote:
> int3 for cli
> int4 for sti
> int5 for pushfl
> int6 for popfl
only int3 is a single-byte opcode.
-- mingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Lockups - lost interrupt [ In reply to ]
> On Sun, 12 Sep 1999 mingo@chiara.csoma.elte.hu wrote:
>
> >FYI, i'm working on a patch for 2.3 that adds the NMI oopser (optionally,
> >because 1) it doesnt work on all SMP boxes, and 2) it forces the timer irq
> >to the BP) to the 2.3 kernel. Looks like one of the most common uses of
> >IKD is lockup detection - the rest is mostly used by kernel hackers. I'll
>
> Yes. Also the print-eip patch may be useful to sort out lockups.
>
> >post it together with some other x86 APIC fixes and irq cleanups soon.
>
> If you agree I can merge it into the ikd patch when your new one will be
> ready. Also I think it's the time to port the IKD patch to 2.3.18ac1... ;)
>
> I have a question about debuggers. Is kdb GPL'd? If so and if Scott will
Yes, KDB is gpl'ed.
> agree I can merge it into the IKD patch while porting the current ikd
Please feel free.
> patch to 2.3.x. If I'll merge kdb I can remove the old debugger that
> doesn't work for people (SIGTRAP probelem and I had not the time to go
> into that myself).
Thanks.
scott
>
> Andrea
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Lockups - lost interrupt [ In reply to ]
Wade Hampton wrote:
>
> Mike Black wrote:
> >
> > I'm installing the ikd patch on 2.2.12 with Mingo's 2.2.12 raid patch. With
> > some luck, it shouldn't take long to find the problem. I've got one script
> > that takes about 10 minutes to lockup the machine (locks up in either UP or
> > SMP mode).
> Dumb question -- where can I get the ikd patch? I could put it on the
> Dell (the most repeatable crash) and see if I could find the problem.
Results so far on the Dell WS400 (dual PII/300):
1. stock kernel 2.2.12 with kdb 0.5 patch
2. installed ikd patch
a) had to manually fix the Makefile and arch/i386/config.in (.rej
files)
b) setup for serial console
c) setup for: Detect software lockups
Print %eip,
SMP-IOAPIC NMI SW watchdog,
IRQ 0
Serial console worked fine. Control-A allowed me to break on the
serial console. All seemed fine. Started X, sound, x11amp on the
soundblaster, a play loop on the crystal. Started mformat a:,
dd if=junk of=/dev/fd0 loop on the floppy. After about 5 minutes
the system did not hang so I started a NFS tar from another machine
of about 1GB of MP3 files. After about 5 minutes of this abuse,
the system hung. No response on the serial console, no OOPS's,
nothing.
As my display was on X, the "Print %eip" did not help....
During the dd'ing, I was getting:
floppy0: unexpected interrupt
floppy0: sensi repl[0]=80
Trial #2, same as above, but no floppy activity. Did a make xconfig
and it hung....
I am on trial #3.... The kernel is being built to NMI on IRQ 3
(ttyS1)...

> One of the Penguins (dual PIII) is running under load over the weekend
> on 2.2.11 (yes, I went backwards, but I did not have any crashes with
> 2.2.11). I'll let you know how well it does Monday AM.
This machine is still up and running fine. It was the machine doing the
massive NFS copy from the Dell and it is functioning without any errors.
I am continueing to load this machine on 2.2.11 and will keep you posted
if it crashes.
> >
> > There
> > are about three different cases here
> >
> > 1. VIA chipset bug - known, understood, non SMP
> > 2. A few triton boards - probably a hardware issue
> > 3. SMP - looks like a lock bug.
> >
> > Running the ikd patch is the best help here. I think it will show you a
> > spinlock deadlock. The trace from that should find the guilty party
If I ever get a trace.... Should I move the NMI to another IRQ, for
example
the one from the serial console? This is an older PII motherboard!
Cheers,
--
W. Wade, Hampton <whampton@staffnet.com>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Lockups - lost interrupt [ In reply to ]
Andrea Arcangeli wrote:
>
> On Mon, 13 Sep 1999, Wade Hampton wrote:
>
> >If I ever get a trace.... Should I move the NMI to another IRQ, for
> >example
> >the one from the serial console? This is an older PII motherboard!
>
> Yes, you should move it to the IRQ 0 (default). Why was you using the
> serial irq?
I tried twice on IRQ 0, but never got the OOPS. I tried on IRQ 3
(serial)
and never booted. I tried on IRQ 1 but the keyboard appears hung and
I never got the OOPS as well.
I am now able to lockup my Dell on a regular bassis.
The only information I have been able to get so far
was the addresses via the %eip:
CPU1: c010ced0-4014 in do_IRQ (c010ce90)
CPU2: c010a712-2c56 in ret_from_sys_ccall (c010a70c)
I am basically at a loss but am continueing my tests....
Could there be a problem between the ikd patch and the kdb patch?
Also, I am now using a serial console (ttyS1, 9600 baud).
I am going to remake my kernel from make clean.... reinstall it,
use the default NMI of 0, etc.
Any help or pointers would be appreciated!
--
W. Wade, Hampton <whampton@staffnet.com>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Lockups - lost interrupt [ In reply to ]
On Mon, 13 Sep 1999, Wade Hampton wrote:
>I tried twice on IRQ 0, but never got the OOPS. I tried on IRQ 3
While you are running with the IRQ zero as NMI source, try a `cat
/proc/interrupts`, you should get NMI irqs and you should also get timer
interrupt _only_ on the CPU0 marked as XT (8259) source of irqs. If you
get both NMI and the timer irqs then the NMI-watchdog should work
correctly on your hardware.
If it's working correctly and you get no-oops on the screen, then probably
the local-apic irq is still running on the CPUs. NOTE: the NMI oopser can
help _only_ if at least one CPU is lockedup with irq _disabled_. If the
irqs are enabled in both CPU then the NMI won't do nothing. In such case
you don't need the NMI patch but you only need to enable the SYSRQ keys in
the kernel configuration and then press SYSRQ+P at lockup time to get the
interesting debugging information out of the kernel. You should write on
paper the EIP addresses you'll get from SYSRQ+P. I am not sure if you just
tried the SYSRQ+P approch (I supposed so because you was just using the
NMI code).
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Lockups - lost interrupt [ In reply to ]
On Sun, 12 Sep 1999 mingo@chiara.csoma.elte.hu wrote:
> FYI, i'm working on a patch for 2.3 that adds the NMI oopser (optionally,
> because 1) it doesnt work on all SMP boxes, and 2) it forces the timer irq
> to the BP) to the 2.3 kernel. Looks like one of the most common uses of
> IKD is lockup detection - the rest is mostly used by kernel hackers. I'll
> post it together with some other x86 APIC fixes and irq cleanups soon.
Why wouldn't the NMI oopser work on a given SMP system? And why the
timer has to be delivered to the BP if these NMIs are exploited? I don't
see any reasons but I might be missing something.
Note that even if IRQ0 is not connected to any I/O APIC, the LINT0 line
is usually broadcasted and even if it's not, NMI may be distributed by the
catcher using "all excluding self" IPI. It is theoretically possible that
there exist i82489DX based systems that have the NMI output of local APICs
not connected to the NMI input of CPUs, but have you ever seen or heard of
such a brain-damaged system? It wouldn't be MPS-compliant, anyway.
I might add a modified NMI oopser to my pending APIC patches (I should
make them available tomorrow, BTW -- they are almost ready but I need to
perform some additional tests), if you don't object, that would handle all
legal cases of timer configuration. The current implementation included
in the ikd patch is somewhat simplistic, indeed.
I don't object the oopser to be optional, of course.
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: macro@ds2.pg.gda.pl, PGP key available +
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Lockups - lost interrupt [ In reply to ]
On Thu, 16 Sep 1999, Maciej W. Rozycki wrote:
> Why wouldn't the NMI oopser work on a given SMP system? And why the
> timer has to be delivered to the BP if these NMIs are exploited? I don't
> see any reasons but I might be missing something.
>
> Note that even if IRQ0 is not connected to any I/O APIC, the LINT0 line
> is usually broadcasted and even if it's not, NMI may be distributed by the
> catcher using "all excluding self" IPI. It is theoretically possible that
> there exist i82489DX based systems that have the NMI output of local APICs
> not connected to the NMI input of CPUs, but have you ever seen or heard of
> such a brain-damaged system? It wouldn't be MPS-compliant, anyway.
the NMI input of the CPUs is turned off once the local APIC is turned on.
So the only input signals to the CPU are LINT0, LINT1 and the APIC bus.
LINT1 is historically the NMI signal of the motherboard, broadcasted to
all CPUs. We knew this pretty well. I'm right now experimenting with
configuring LINT0 as NMI as well, although i feel uneasy about this hack,
we do have legitimate cases of through-local-APIC interrupts - which would
thus become NMIs. It's also horribly dangerous because i have to ack the
8259A IRQ0 interrupt from within the NMI handler - shudders. And it
violates the MP standard. I've tried some other hacks as well, and there
is yet another trick to be tried: we can also set up the local APIC's
performance counter LVT into NMI mode, and switch on the performance
counter that counts timer cycles ... then we'll get periodic NMIs. I'm
pretty much dedicated now to get the NMI oopser done without impacting
IRQ0 distribution.
> I might add a modified NMI oopser to my pending APIC patches (I should
dont bother, i've already put the NMI oopser into my APIC tree (and
modified it), and hacked away on it. Please for now just send your timer
and 486 fixes and i'll merge them in - after that we can still look for
better ways of doing NMI-broadcasts.
> make them available tomorrow, BTW -- they are almost ready but I need to
> perform some additional tests), if you don't object, that would handle all
> legal cases of timer configuration. The current implementation included
> in the ikd patch is somewhat simplistic, indeed.
yep.
-- mingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Lockups - lost interrupt [ In reply to ]
On Thu, 16 Sep 1999 mingo@chiara.csoma.elte.hu wrote:
> the NMI input of the CPUs is turned off once the local APIC is turned on.
> So the only input signals to the CPU are LINT0, LINT1 and the APIC bus.
> LINT1 is historically the NMI signal of the motherboard, broadcasted to
> all CPUs. We knew this pretty well. I'm right now experimenting with
Not necessarily -- per MPS, NMI needs only to be delivered to the BSP.
Exact LINT0 and LINT1 routing should be described by the MP Configuration
Table (provided it's correct, sigh).
> configuring LINT0 as NMI as well, although i feel uneasy about this hack,
> we do have legitimate cases of through-local-APIC interrupts - which would
> thus become NMIs. It's also horribly dangerous because i have to ack the
> 8259A IRQ0 interrupt from within the NMI handler - shudders. And it
No, no, no. Although MPS does not explicitly forbide ceasing to connect
of any IRQ lines to I/O APICs, in real life the only IRQs that may be
unconnected are IRQ0 and IRQ13 due to legacy EISA chipsets. As we do not
care of EISA DMA chaining interrupts (do we?), we may set up IRQ0 as a
"through-8259A" interrupt which is more reliable and it needs no acking at
all. This is the Intel-recommended way of handling DMA chaining
interrupts, BTW -- see the 82489DX datasheet. In short, after merging in
my patches, no board should ever use ExtINTA interrupts when in the
symmetric mode.
In fact there is no difference between configuring LINT0 as Fixed and NMI
-- none of them make INTA cycles reach 8259As.
Of course, there might exist a board that would need ExtINTA interrupts
but it's pretty unlikely and even if it existed it would be an ancient
proprietary design and it really would not be able to receive periodic
NMIs in a standard manner. I doubt such boards exist. If you know of
such one, let me know.
> violates the MP standard. I've tried some other hacks as well, and there
AFAIK, the MPS does not force any specific requirements on delivery modes
in the symmetric mode -- it does specify the routing of IRQ lines and the
AT compatibility. If you know of a specific paragraph of MPS that stands
in contradiction to my proposals, please name it!
> is yet another trick to be tried: we can also set up the local APIC's
> performance counter LVT into NMI mode, and switch on the performance
> counter that counts timer cycles ... then we'll get periodic NMIs. I'm
> pretty much dedicated now to get the NMI oopser done without impacting
> IRQ0 distribution.
How would you perform this? The PC does not provide a configurable
delivery mode -- it's always "Fixed". And even if it would it's
completely unportable -- it does not exist on i486 and Pentium systems at
all.
> dont bother, i've already put the NMI oopser into my APIC tree (and
> modified it), and hacked away on it. Please for now just send your timer
> and 486 fixes and i'll merge them in - after that we can still look for
> better ways of doing NMI-broadcasts.
Tomorrow, as I wrote. The oopser is not an absolute necessity -- it may
as well go into 2.5 -- I'd just like to look at it while I am at i386/SMP.
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: macro@ds2.pg.gda.pl, PGP key available +
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Lockups - lost interrupt [ In reply to ]
On Thu, 16 Sep 1999, Maciej W. Rozycki wrote:
> > configuring LINT0 as NMI as well, although i feel uneasy about this hack,
> > we do have legitimate cases of through-local-APIC interrupts - which would
> > thus become NMIs. It's also horribly dangerous because i have to ack the
> > 8259A IRQ0 interrupt from within the NMI handler - shudders. And it
>
> No, no, no. Although MPS does not explicitly forbide ceasing to connect
> of any IRQ lines to I/O APICs, in real life the only IRQs that may be
> unconnected are IRQ0 and IRQ13 due to legacy EISA chipsets. As we do not
... except for the fact that there _are_ motherboards which have no IRQ0
connected to the IOAPIC, and their mptable lies about this fact. This is
why we need mixed mode, or at least this is one of the reasons why we do
not (yet) want to kill LINT0 based interrupts, yet.
> care of EISA DMA chaining interrupts (do we?), we may set up IRQ0 as a
> "through-8259A" interrupt which is more reliable and it needs no acking at
> all. [...]
what exactly do you mean here? To set up LINT0 as ExtINT and to unmask
IRQ0 in the 8259A, and to mask the IOAPIC pin? Or to set up the according
IOAPIC routing entry as ExtINT? The later one is completely pointless - if
there is an IOAPIC pin for IRQ0 then we want that to be a LowPrio IRQ. If
we set it up through the local APIC's LINT0 pin, then we lose the ability
to route to multiple CPUs. (it makes no difference that the 8259A's INTR
output signal is driven to all CPUs, there is no mechanizm to distribute
this between CPUs, we'd get an interrupt on all CPUs at once, and had to
do some sort of software-selection - clearly complex and suboptimal.)
> > is yet another trick to be tried: we can also set up the local APIC's
> > performance counter LVT into NMI mode, and switch on the performance
> > counter that counts timer cycles ... then we'll get periodic NMIs. I'm
> > pretty much dedicated now to get the NMI oopser done without impacting
> > IRQ0 distribution.
>
> How would you perform this? The PC does not provide a configurable
> delivery mode -- it's always "Fixed". And even if it would it's
no, delivery mode can be configured for PCINT.
> completely unportable -- it does not exist on i486 and Pentium systems at
> all.
and? The NMI oopser doesnt work on UP boxes either.
-- mingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
Re: Lockups - lost interrupt [ In reply to ]
On Thu, 16 Sep 1999 mingo@chiara.csoma.elte.hu wrote:
> ... except for the fact that there _are_ motherboards which have no IRQ0
> connected to the IOAPIC, and their mptable lies about this fact. This is
> why we need mixed mode, or at least this is one of the reasons why we do
> not (yet) want to kill LINT0 based interrupts, yet.
I've already written it to you -- my patch solves this issue no matter
what the MP table tries to report of IRQ0.
> > care of EISA DMA chaining interrupts (do we?), we may set up IRQ0 as a
> > "through-8259A" interrupt which is more reliable and it needs no acking at
> > all. [...]
>
> what exactly do you mean here? To set up LINT0 as ExtINT and to unmask
> IRQ0 in the 8259A, and to mask the IOAPIC pin? Or to set up the according
Please see my patches against 2.3.13 -- the new version doesn't introduce
anything new in this matter -- just code rearrangements to fit the current
layout. Or wait till tomorrow.
> IOAPIC routing entry as ExtINT? The later one is completely pointless - if
> there is an IOAPIC pin for IRQ0 then we want that to be a LowPrio IRQ. If
Lowest priority can be done in the "through-8259A" mode and this is the
default operating mode being set by my patch when IRQ0 is not connected to
INTIN2.
> we set it up through the local APIC's LINT0 pin, then we lose the ability
> to route to multiple CPUs. (it makes no difference that the 8259A's INTR
> output signal is driven to all CPUs, there is no mechanizm to distribute
> this between CPUs, we'd get an interrupt on all CPUs at once, and had to
> do some sort of software-selection - clearly complex and suboptimal.)
If the "through-8259A" mode fails (quite possible, as the output of the
master 8259A need not be connected to any APIC either in Virtual Wire or
in PIC mode), I propose to use a "local through-8259A" mode which sets the
timer interrupt as "Fixed" to one of processors. It has the advantage of
not involving the ExtINTA trap (less circuitry means less chance for
errors, especially as the trap is a kind of a hardware hack) and getting
rid of INTA cycles on the external bus.
But instead of using a unicast "Fixed" delivery, we may use a broadcast
"NMI" (if I/O APIC can be involved), or try to broadcast through LINT0
(which need not succeed). If neither of these could be set up, we may
fall back to unicast NMI delivery and distribute NMI further using an IPI
(which is a single APIC write and is not an extreme porformance hit).
> > How would you perform this? The PC does not provide a configurable
> > delivery mode -- it's always "Fixed". And even if it would it's
>
> no, delivery mode can be configured for PCINT.
I recall it was hardwired. I'll look at the docs.
> > completely unportable -- it does not exist on i486 and Pentium systems at
> > all.
>
> and? The NMI oopser doesnt work on UP boxes either.
Well, for UP boxes it's mostly impossible to implement it (though EISA
systems used to have a second 8254 as a watchdog timer on NMI -- that
might be worthwhile to handle). But it's pretty easy to do this for SMP
systems in a compatible way. And what's more important, it would be
neither complicated nor time consuming.
--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: macro@ds2.pg.gda.pl, PGP key available +
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

1 2  View All