Mailing List Archive

Mike Black wrote:
>
> I've been having some lockups on numerous version of the 2.2 series
> (10,11,12) in both SMP and non-SMP mode.
I too have been having lockups and have posted here, with Penguin,
and with the redhat-list.
The machines are 2 Penguin dual PIII (network or load related),
and a Dell WS400 dual PII/300 (sound active then access the floppy).
>
> Usually the machine is just locked up hard (alt-sys-req does not work).
Same. The Dell has the kdb code (from SGI) and I can't access that
either.
>
> This last time though, some scsi timeouts, ide "lost interrupt" messages,
> and ethernet timeouts were showing up on the screen. I still could not do
> anything with alt-sys-req.
I saw some unexpected interrupts from the floppy drive, but it had
not locked up at that time -- it locked up later....
>
> When an IDE timeout occurred the IDE drive showed some activity followed by
> the "hda lost interrupt" message.
>
> It looks like the system was losing ALL interrupts which would also explain
> why the keyboard wasn't working.
>
> Does anybody have any idea what could be causing this? Is this a bad
> motherboard? CPU?
Since I have 3 systems doing it, others have also reported similar,
I think it could be kernel related....
--
W. Wade, Hampton <whampton@staffnet.com>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

alan at lxorguk

Sep 10, 1999, 1:53 PM

Post #3 of 35 (1228 views)

> > Does anybody have any idea what could be causing this? Is this a bad
> > motherboard? CPU?
> Since I have 3 systems doing it, others have also reported similar,
> I think it could be kernel related....
I'm still digging into this. I do have some ideas what may be involved. There
are about three different cases here
1. VIA chipset bug - known, understood, non SMP
2. A few triton boards - probably a hardware issue
3. SMP - looks like a lock bug.
Running the ikd patch is the best help here. I think it will show you a
spinlock deadlock. The trace from that should find the guilty party
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

andre at suse

Sep 11, 1999, 12:21 AM

Post #4 of 35 (1237 views)

Alan I know something about the lost interrupts is directly related to
drive timings. There is a spinlock somewhere to be caught, but I can not
find the beasty. Lob me that "ikd patch", please.
I can forcable wreck the timing values and invoke the error.
Also some of the quick drives must be forcable back speeded on slower
chipsets. Also there are a few chipsets that really get nasty because of
the extremely tight tolerances (well -- zero margin -- almost).
Lastly there are several drives that claim ATA-66, but timeout and there
is nothing to date to catch and convert/update the transfer rates.
With the huge cache buffers and dual processor drives, yes there is a
vender out there throttling the drives with two onboard processors, the
fun is only beginning.
To get an idea on the variables....
Leading or lagging interuppts from the chipset.
Variably based on the limit size imposed on the chipset FIFO.
Command queing of the drives are on this side of the horizen and moving
fast, ie SCSI work horse power is coming to IDE.
There is more, but it is late and I am now only one kernel behind the
curve to catch up on patching.
OT, do you want the very latest back code for 2.2.13 that I have to finish
creating for 2.3.18/9?
Andre Hedrick
The Linux IDE guy
On Fri, 10 Sep 1999, Alan Cox wrote:
> > > Does anybody have any idea what could be causing this? Is this a bad
> > > motherboard? CPU?
> > Since I have 3 systems doing it, others have also reported similar,
> > I think it could be kernel related....
>
> I'm still digging into this. I do have some ideas what may be involved. There
> are about three different cases here
>
> 1. VIA chipset bug - known, understood, non SMP
> 2. A few triton boards - probably a hardware issue
> 3. SMP - looks like a lock bug.
>
> Running the ikd patch is the best help here. I think it will show you a
> spinlock deadlock. The trace from that should find the guilty party
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

mblack at csihq

Sep 11, 1999, 4:36 PM

Post #5 of 35 (1228 views)

Running 2.2.12 with raid0145-19990824-2.2.11 and 2.2.12-ikd1.
I had one raid5 array resyncing and got an oops...fortunately I still had a
console open and could get the oops from dmesg. I see the restore_flags but
no save_flags call...could this be the problem?? The postprocessed source
for raid5.c looks like:
static void raid5_kfree_bh(struct stripe_head *sh, struct buffer_head *bh)
{
unsigned long flags;
(( flags )=__global_save_flags()) ;
__global_cli() ;
put_free_bh(sh, bh);
__global_restore_flags( flags ) ;
}
Unable to handle kernel NULL pointer dereference at virtual address 00000000
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0002
CPU: 0
EIP: 0010:[<c0147cf7>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010006
eax: 00000001 ebx: cff2e000 ecx: 00000004 edx: c0284954
esi: cfb22600 edi: 00000001 ebp: cff2fe3c esp: cff2fe1c
ds: 0018 es: 0018 ss: 0018
Process raid5d (pid: 9, process nr: 10, stackpage=cff2f000)
Stack: cff2fe3c c023d8b1 cfb0fc00 cffa0800 cff2fe4c c0119fcb cfb0fc00
00000000
cff2fe4c c010d574 cfb0fc00 cffd8078 cff2fe8c c01e6fc4 00000001
cfb22600
cff2fe8c c01e6fbb cfb22600 cfb0fc00 cff2fe8c c01e6fb1 c31d6440
00000004
Call Trace: [<c023d8b1>] [<c0119fcb>] [<c010d574>] [<c01e6fc4>] [<c01e6fbb>]
[<c01e6fb1>] [<c01e8306>]
[<c01e82f1>] [<c01e8247>] [<c021c555>] [<c01e8568>] [<c01e8456>]
[<c0109a39>] [<c0119ddf>] [<c01e92cc>]
[<c01e93a2>] [<c01e92e3>] [<c01cd5e2>] [<c01cd5cc>] [<c01e0018>]
[<c0109669>]
Code: c6 05 00 00 00 00 00 8b 5d e8 89 ec 5d c3 8d 76 00 55 89 e5
>>EIP; c0147cf7 <mcount+20b/21c> <=====
Trace; c023d8b1 <__udelay+49/50>
Trace; c0119fcb <__wake_up+13/7c>
Trace; c010d574 <__global_restore_flags+10/58>
Trace; c01e6fc4 <raid5_kfree_bh+38/44>
Trace; c01e6fbb <raid5_kfree_bh+2f/44>
Trace; c01e6fb1 <raid5_kfree_bh+25/44>
Trace; c01e8306 <complete_stripe+d2/19c>
Trace; c01e82f1 <complete_stripe+bd/19c>
Trace; c01e8247 <complete_stripe+13/19c>
Trace; c021c555 <requeue_sd_request+b4d/b5c>
Trace; c01e8568 <handle_stripe+128/c34>
Trace; c01e8456 <handle_stripe+16/c34>
Trace; c0109a39 <__switch_to+11/d0>
Trace; c0119ddf <schedule+1ff/3d8>
Trace; c01e92cc <unplug_devices+10/14>
Trace; c01e93a2 <raid5d+d2/12c>
Trace; c01e92e3 <raid5d+13/12c>
Trace; c01cd5e2 <md_thread+fa/1e0>
Trace; c01cd5cc <md_thread+e4/1e0>
Trace; c01e0018 <do_rw_disk+290/2a0>
Trace; c0109669 <kernel_thread+35/48>
Code; c0147cf7 <mcount+20b/21c>
00000000 <_EIP>:
Code; c0147cf7 <mcount+20b/21c> <=====
0: c6 05 00 00 00 00 00 movb $0x0,0x0 <=====
Code; c0147cfe <mcount+212/21c>
7: 8b 5d e8 movl 0xffffffe8(%ebp),%ebx
Code; c0147d01 <mcount+215/21c>
a: 89 ec movl %ebp,%esp
Code; c0147d03 <mcount+217/21c>
c: 5d popl %ebp
Code; c0147d04 <mcount+218/21c>
d: c3 ret
Code; c0147d05 <mcount+219/21c>
e: 8d 76 00 leal 0x0(%esi),%esi
Code; c0147d08 <mcount_internal+0/1ea>
11: 55 pushl %ebp
Code; c0147d09 <mcount_internal+1/1ea>
12: 89 e5 movl %esp,%ebp
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 11, 1999, 5:34 PM

Post #6 of 35 (1226 views)

The Oops you seen is forced by the IKD debugging code and it's not a bug
(it's just to get a stack trace). But you must have also one other line
before the Oops to tell us the reason of the Oops. You didn't quoted it.
Also probably you want to disable everything in the IKD patch except the
NMI watchdog, if the NMI won't be helpful then you can use other things in
the IKD patch. Actually I am not 100% sure if the other debugging code is
completly reliable, as I don't use it since some time ago. I'll have a
look...
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

whampton at staffnet

Sep 11, 1999, 5:40 PM

Post #7 of 35 (1228 views)

Mike Black wrote:
>
> I'm installing the ikd patch on 2.2.12 with Mingo's 2.2.12 raid patch. With
> some luck, it shouldn't take long to find the problem. I've got one script
> that takes about 10 minutes to lockup the machine (locks up in either UP or
> SMP mode).
Dumb question -- where can I get the ikd patch? I could put it on the
Dell (the most repeatable crash) and see if I could find the problem.
One of the Penguins (dual PIII) is running under load over the weekend
on 2.2.11 (yes, I went backwards, but I did not have any crashes with
2.2.11). I'll let you know how well it does Monday AM.
>
> There
> are about three different cases here
>
> 1. VIA chipset bug - known, understood, non SMP
> 2. A few triton boards - probably a hardware issue
> 3. SMP - looks like a lock bug.
>
> Running the ikd patch is the best help here. I think it will show you a
> spinlock deadlock. The trace from that should find the guilty party
My guess is 3 - a lock bug. Based on the fact that the Dell locks up
with the floppy and the others with NFS/network -- it could be in
multiple
places (same code copied maybe)?
I've been at kid's soccer, piano, etc. today and won't be able to do
any tests until Monday -- the computers that are having problems are
at work. If you have any luck, please let me know!
Cheers,
--
W. Wade, Hampton <whampton@staffnet.com>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 11, 1999, 5:42 PM

Post #8 of 35 (1228 views)

On Sun, 12 Sep 1999, Andrea Arcangeli wrote:
>completly reliable, as I don't use it since some time ago. I'll have a
Did you enabled the SOFTLOCKUP feature? If so disable it and enable only
the NMI watchdog. ;)
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 11, 1999, 6:08 PM

Post #9 of 35 (1226 views)

On Sat, 11 Sep 1999, Wade Hampton wrote:
>Dumb question -- where can I get the ikd patch? I could put it on the
ftp://ftp.suse.com/pub/people/andrea/kernel-patches/ikd/2.2.12-ikd1.gz
Right now enable _only_ the NMI oopser and wait for a stack trace.
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

alan at lxorguk

Sep 11, 1999, 6:15 PM

Post #10 of 35 (1224 views)

> My guess is 3 - a lock bug. Based on the fact that the Dell locks up
> with the floppy and the others with NFS/network -- it could be in
> multiple
> places (same code copied maybe)?
No. The only locks in the floppy code relating to sound are ISA DMA
locks. That wouldn't affect NFS/ethernet cards
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

mblack at csihq

Sep 11, 1999, 7:15 PM

Post #11 of 35 (1226 views)

OK...didn't think this one line mattered:
Sep 11 11:54:55 medusa kernel: Deadlock threshold exceeded, forcing Oops.
Sep 11 11:54:55 medusa kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000000
Sep 11 11:54:55 medusa kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
Sep 11 11:54:55 medusa kernel: *pde = 00000000
Sep 11 11:54:55 medusa kernel: Oops: 0002
----- Original Message -----
From: Andrea Arcangeli <andrea@suse.de>
To: Mike Black <mblack@csihq.com>
Cc: Andre Hedrick <andre@suse.com>; Alan Cox <alan@lxorguk.ukuu.org.uk>;
Wade Hampton <whampton@staffnet.com>; <linux-kernel@vger.rutgers.edu>
Sent: Saturday, September 11, 1999 8:34 PM
Subject: Re: Lockups - lost interrupt
The Oops you seen is forced by the IKD debugging code and it's not a bug
(it's just to get a stack trace). But you must have also one other line
before the Oops to tell us the reason of the Oops. You didn't quoted it.
Also probably you want to disable everything in the IKD patch except the
NMI watchdog, if the NMI won't be helpful then you can use other things in
the IKD patch. Actually I am not 100% sure if the other debugging code is
completly reliable, as I don't use it since some time ago. I'll have a
look...
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 12, 1999, 1:17 AM

Post #12 of 35 (1226 views)

On Sun, 12 Sep 1999, Andrea Arcangeli wrote:
> On Sat, 11 Sep 1999, Wade Hampton wrote:
>
> >Dumb question -- where can I get the ikd patch? I could put it on the
>
> ftp://ftp.suse.com/pub/people/andrea/kernel-patches/ikd/2.2.12-ikd1.gz
>
> Right now enable _only_ the NMI oopser and wait for a stack trace.
FYI, i'm working on a patch for 2.3 that adds the NMI oopser (optionally,
because 1) it doesnt work on all SMP boxes, and 2) it forces the timer irq
to the BP) to the 2.3 kernel. Looks like one of the most common uses of
IKD is lockup detection - the rest is mostly used by kernel hackers. I'll
post it together with some other x86 APIC fixes and irq cleanups soon.
-- mingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 12, 1999, 5:51 AM

Post #13 of 35 (1228 views)

On Sat, 11 Sep 1999, Mike Black wrote:
>OK...didn't think this one line mattered:
>
>Sep 11 11:54:55 medusa kernel: Deadlock threshold exceeded, forcing Oops.
OK disable the SOFTLOCKUP detection. Thanks.
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 12, 1999, 5:59 AM

Post #14 of 35 (1228 views)

On Sun, 12 Sep 1999 mingo@chiara.csoma.elte.hu wrote:
>FYI, i'm working on a patch for 2.3 that adds the NMI oopser (optionally,
>because 1) it doesnt work on all SMP boxes, and 2) it forces the timer irq
>to the BP) to the 2.3 kernel. Looks like one of the most common uses of
>IKD is lockup detection - the rest is mostly used by kernel hackers. I'll
Yes. Also the print-eip patch may be useful to sort out lockups.
>post it together with some other x86 APIC fixes and irq cleanups soon.
If you agree I can merge it into the ikd patch when your new one will be
ready. Also I think it's the time to port the IKD patch to 2.3.18ac1... ;)
I have a question about debuggers. Is kdb GPL'd? If so and if Scott will
agree I can merge it into the IKD patch while porting the current ikd
patch to 2.3.x. If I'll merge kdb I can remove the old debugger that
doesn't work for people (SIGTRAP probelem and I had not the time to go
into that myself).
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: PCI patch for 2.3.18 [ In reply to ]

mj at ucw

Sep 12, 1999, 12:15 PM

Post #15 of 35 (1228 views)

Hi,
> o Don't link syscall.o and setup.o on the PC, they aren't used anyway.
>
> This is an unneeded change I really think. drivers/pci/pci.a is an
> archive, and the symbols+code won't be pulled into the kernel if they
> are unreferenced.
Yes, I know that, but I want to be sure it won't be pulled on a i386
accidentally. Also, it saves a few seconds from the compilation time, but
it probably doesn't matter.
Have a nice fortnight
--
Martin `MJ' Mares <mj@ucw.cz> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
"Outside of a dog, a book is man's best friend. Inside a dog, it's too dark to read."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

sdw at lig

Sep 12, 1999, 6:10 PM

Post #16 of 35 (1234 views)

Another datapoint:
I installed 2.2.12 on two servers, both Pentium Celeron 433/128K cache, 256MB
ram, no IDE, no floppy, no sound, text only console, on board Adaptec 2990
Ultra2Wide SCSI, 1 WD 18300 LVD Ultra2 Wide SCSI drive 18GB and onboard
Realtek 100-base-t ethernet. All of this is on the motherboard with a passive
backplane.
One ISA 3Com 515 ethernet card.
One server had no problem, the other locked up hard, requiring a manual fsck.
Thereafter it crashed and rebooted every few minutes.
Unfortunately these have no heads, being co-located, however I was able to
install 2.2.13Pre6 patch from Alan's directory and the problem disappeared
completely.
I do have the LVS (Linux Virtual Server/LinuxDirector) patch applied also.
It's working great except I'm having trouble keeping the loopback aliases from
generating ARP replies... (The subject of my next message.)
sdw
Mike Black wrote:
> I've been having some lockups on numerous version of the 2.2 series
> (10,11,12) in both SMP and non-SMP mode.
>
> Usually the machine is just locked up hard (alt-sys-req does not work).
>
> This last time though, some scsi timeouts, ide "lost interrupt" messages,
> and ethernet timeouts were showing up on the screen. I still could not do
> anything with alt-sys-req.
>
> When an IDE timeout occurred the IDE drive showed some activity followed by
> the "hda lost interrupt" message.
>
> It looks like the system was losing ALL interrupts which would also explain
> why the keyboard wasn't working.
>
> Does anybody have any idea what could be causing this? Is this a bad
> motherboard? CPU?
>
> ________________________________________
> Michael D. Black Principal Engineer
> mblack@csi.cc 407-676-2923,x203
> http://www.csi.cc Computer Science Innovations
> http://www.csi.cc/~mike My home page
> FAX 407-676-2355
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
--
OptimaLogic - Finding Optimal Solutions Web/Crypto/OO/Unix/Comm/Video/DBMS
sdw@lig.net Stephen D. Williams Senior Consultant/Architect http://sdw.st
43392 Wayside Cir,Ashburn,VA 20147-4622 703-724-0118W 703-995-0407Fax 5Jan1999
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 12, 1999, 7:55 PM

Post #17 of 35 (1228 views)

There are parts of the new irq.c that are not obviously there to support
RTLinux. Please don't chop em. Especially important is: the functions
in the low level handler structure do not invoke any spinlocks and
there are labels on the low level irq catch code that allow the RTL
module to patch to take control.
On Sun, Sep 12, 1999 at 10:17:04AM +0200, mingo@chiara.csoma.elte.hu wrote:
>
> On Sun, 12 Sep 1999, Andrea Arcangeli wrote:
>
> > On Sat, 11 Sep 1999, Wade Hampton wrote:
> >
> > >Dumb question -- where can I get the ikd patch? I could put it on the
> >
> > ftp://ftp.suse.com/pub/people/andrea/kernel-patches/ikd/2.2.12-ikd1.gz
> >
> > Right now enable _only_ the NMI oopser and wait for a stack trace.
>
> FYI, i'm working on a patch for 2.3 that adds the NMI oopser (optionally,
> because 1) it doesnt work on all SMP boxes, and 2) it forces the timer irq
> to the BP) to the 2.3 kernel. Looks like one of the most common uses of
> IKD is lockup detection - the rest is mostly used by kernel hackers. I'll
> post it together with some other x86 APIC fixes and irq cleanups soon.
>
> -- mingo
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 12, 1999, 11:42 PM

Post #18 of 35 (1227 views)

On Sun, 12 Sep 1999 yodaiken@chelm.cs.nmt.edu wrote:
> There are parts of the new irq.c that are not obviously there to support
> RTLinux. Please don't chop em. Especially important is: the functions
> in the low level handler structure do not invoke any spinlocks and
> there are labels on the low level irq catch code that allow the RTL
> module to patch to take control.
i'm not touching the architecture part, thats i think pretty clean right
now. I ment minor stuff like moving the no_irq controller definition out
of i8259.c and the like. (no_irq_type is not really Intel-dependent. The
#if SMP thing in ack_none is just an expression of 'what should we do if
the vector is illegal', which is architecture dependent. But this doesnt
make the no_irq_type controller truly architecture-dependent.) Another
more generic thing i'm thinking about (not done yet), to move the vector
building defines near to every controller's source code section. This
makes the thing a little bit more modular. Not all controllers are truly
independent (there are obvious interactions between 8259A and the first
IOAPIC in the system), but this is not a problem. The APIC/IOAPIC code
OTOH has major modifications/fixes.
what labels do you mean?
-- mingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 13, 1999, 6:12 AM

Post #19 of 35 (1228 views)

On Mon, Sep 13, 1999 at 08:42:20AM +0200, mingo@chiara.csoma.elte.hu wrote:
>
> On Sun, 12 Sep 1999 yodaiken@chelm.cs.nmt.edu wrote:
>
> > There are parts of the new irq.c that are not obviously there to support
> > RTLinux. Please don't chop em. Especially important is: the functions
> > in the low level handler structure do not invoke any spinlocks and
> > there are labels on the low level irq catch code that allow the RTL
> > module to patch to take control.
>
> i'm not touching the architecture part, thats i think pretty clean right
> now. I ment minor stuff like moving the no_irq controller definition out
> of i8259.c and the like. (no_irq_type is not really Intel-dependent. The
> #if SMP thing in ack_none is just an expression of 'what should we do if
> the vector is illegal', which is architecture dependent. But this doesnt
Yes. But it was not obvious to me what to do when it was illegal so
I left it there.
> make the no_irq_type controller truly architecture-dependent.) Another
> more generic thing i'm thinking about (not done yet), to move the vector
> building defines near to every controller's source code section. This
> makes the thing a little bit more modular. Not all controllers are truly
> independent (there are obvious interactions between 8259A and the first
> IOAPIC in the system), but this is not a problem. The APIC/IOAPIC code
> OTOH has major modifications/fixes.
>
> what labels do you mean?
If you look at the build irq macros, you will see that common irq
has a label on the line of code that does "call do_IRQ" (unless linus
removed that later) and the other low level routines are similarly
labled. RTLinux is going to be nearly totally moduler and the init
code in the rtl module will patch that code to call rtl_intercept
instead of do_IRQ. What's missing for me now is an ifdefed code
section that will fill in a data structure with the pointers
#ifdef RTL_CONFIG
struct rtl_code {
void * call_common,do_irq,call_smpx,do_smpx ...
irq_desc, ... }
= initialize
#endif
Then there is a similar size ifdefed parts in system.h
#ifdef RTL_CONFIG
struct irq_control { do_cli,do_sti .... }
#define __cli() irq_control.do_cli()
...
and the export in ksysms
On insert the rtl module will patch the code, make a copy
of irq_desc handler list, replace irq_desc.handler in eah irq
with a soft handler pointer and change the irq_control structure
to point to soft cli and soft sti etc.
Anyway, that's the theory.
On module cleanup, the rtl module will unpatch and restore everything
to original. Interstingly enough, lmbench shows no performance
consequences of replacing cli/sti with indirect function calls.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 13, 1999, 6:51 AM

Post #20 of 35 (1235 views)

On Mon, Sep 13, 1999 at 04:39:32PM +0200, mingo@chiara.csoma.elte.hu wrote:
>
> On Mon, 13 Sep 1999 yodaiken@chelm.cs.nmt.edu wrote:
>
> > > what labels do you mean?
> >
> > If you look at the build irq macros, you will see that common irq
> > has a label on the line of code that does "call do_IRQ" [...]
>
> oh, ok, i see it.
>
> > #define __cli() irq_control.do_cli()
>
> i'm not sure wether this will ever be accepted into the main kernel -
> __cli()/__sti()/etc. right now is heavily used and inlined (mostly via
The price is paid only if you select RTL in config.
my idea is that system.h does
#define do_not_use_this_cli_directly() __asm__("cli")
#ifndef RTL_CONFIG
#define __cli() do_not_use_this_clu_directly()
...
#else
struct irq_control ...
Even with the indirect jump
Lmbench can't detect any performance loss -- remember that cli and sti
are not cheap instructions anyways.
RTL should actually show a slight gain, because in operation __cli and
__sti will be
call x
set memory value
return
which is cheaper on a modern processor than __asm__("cli");
> spinlocks) and it's a single instruction. Maybe building a table of 'cli,
> sti, popfl, pushfl' addresses into a special section can do the trick
> without interfering with the 'normal' kernel? A single-instruction 'int 3'
> could be patched into those places, or something like that.
The "int" would cost too much in the rtl case. On
the other hand, I had thought of
of a section. Not sure what the advantage would be. With the structure,
the compiler generates
movel N+irq_desc,%eax
call *%eax
>
> -- mingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 13, 1999, 7:24 AM

Post #21 of 35 (1234 views)

On Mon, Sep 13, 1999 at 05:10:49PM +0200, mingo@chiara.csoma.elte.hu wrote:
>
> On Mon, 13 Sep 1999 yodaiken@chelm.cs.nmt.edu wrote:
>
> > > i'm not sure wether this will ever be accepted into the main kernel -
> > > __cli()/__sti()/etc. right now is heavily used and inlined (mostly via
> >
> > The price is paid only if you select RTL in config.
> > my idea is that system.h does
>
> oh, ok. I thought you want to do this 'runtime', by patching a pretty
> normal kernel dynamically.
I'll patch a rtl configed kernel. The idea is that if you selectrtl_config you
get calls via the patch table and if you then insert the rtl module
the patch is made. If you don't want the possibility of rtl you get
inlined cli/sti . Note that on other architectures there is not even
a bloat price since we don't have 8bit cli/sti instructions.
> > movel N+irq_desc,%eax
> > call *%eax
>
> if the int3 solution is implementable (it's a tough problem i think), the
> advantage would be a completely unaffected 'main kernel'. RTL could be
> switched on/off runtime. OTOH, people recompile kernels routinely anyway.
My feeling is that the cost is undetectable and that if you don't want
rtl you run standard code, if you do want, you run a jump table that
allows rtl to be turned on and off.
But I do like the idea of a section, I just don't know how to do it
wouldnt it compile out to similar code ?
...
jmp 1f
.section cli_stuff
1: cli
.section text
...
>
> -- mingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 13, 1999, 7:39 AM

Post #22 of 35 (1228 views)

On Mon, 13 Sep 1999 yodaiken@chelm.cs.nmt.edu wrote:
> > what labels do you mean?
>
> If you look at the build irq macros, you will see that common irq
> has a label on the line of code that does "call do_IRQ" [...]
oh, ok, i see it.
> #define __cli() irq_control.do_cli()
i'm not sure wether this will ever be accepted into the main kernel -
__cli()/__sti()/etc. right now is heavily used and inlined (mostly via
spinlocks) and it's a single instruction. Maybe building a table of 'cli,
sti, popfl, pushfl' addresses into a special section can do the trick
without interfering with the 'normal' kernel? A single-instruction 'int 3'
could be patched into those places, or something like that.
-- mingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 13, 1999, 8:06 AM

Post #23 of 35 (1231 views)

On Mon, Sep 13, 1999 at 05:41:31PM +0200, mingo@chiara.csoma.elte.hu wrote:
> > My feeling is that the cost is undetectable [...]
>
> i understand what you mean, but Linux kernel's speed is a 'sum' of many
> such 'undetectable' improvements. You cannot remove any of those speedups
> just because the speedup is undetectable.
Sure. That's why it should be a config unless we can figure out
this table idea. I'll have to look at the exceptions code -- but I
think there are some jumps necessary.
>
> > [...] and that if you don't want
> > rtl you run standard code, if you do want, you run a jump table that
> > allows rtl to be turned on and off.
> >
> > But I do like the idea of a section, I just don't know how to do it
> > wouldnt it compile out to similar code ?
> > ...
> > jmp 1f
> > .section cli_stuff
> > 1: cli
> > .section text
> > ...
>
> thats the hard part i think too. One way to do it is like for exceptions
> (check out how exceptions build their tables, Documentation/exception.txt)
> : patch int3 into the necessery places if RT is enabled (int3 [or
> equivalent] in this case is a full replacement for all 4 type of
> instructions, cli, sti, popfl and pushfl), then search the 'exception
> table' for the address. (the return address is pushed onto the stack by
int3 for cli
int4 for sti
int5 for pushfl
int6 for popfl
> int3) This search can be rather slow though, and thats the main problem i
> think. There is no cost to the main kernel, apart from the (presumably not
> very big) kernel-resident address-tables.
>
> -- mingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 13, 1999, 8:10 AM

Post #24 of 35 (1233 views)

On Mon, 13 Sep 1999 yodaiken@chelm.cs.nmt.edu wrote:
> > i'm not sure wether this will ever be accepted into the main kernel -
> > __cli()/__sti()/etc. right now is heavily used and inlined (mostly via
>
> The price is paid only if you select RTL in config.
> my idea is that system.h does
oh, ok. I thought you want to do this 'runtime', by patching a pretty
normal kernel dynamically.
> Lmbench can't detect any performance loss -- remember that cli and sti
> are not cheap instructions anyways.
they are ~7 cycles, but the real cost is the slight kernel bloat (== more
cache footprint) caused by the inlined function call. Anyway, this is of
course not a problem for an optional thing, i thought you are trying to do
this runtime as well.
> > spinlocks) and it's a single instruction. Maybe building a table of 'cli,
> > sti, popfl, pushfl' addresses into a special section can do the trick
> > without interfering with the 'normal' kernel? A single-instruction 'int 3'
> > could be patched into those places, or something like that.
>
> The "int" would cost too much in the rtl case. On
the int is basically a function call if you do it on ring 0, but yes it's
more expensive than a normal function call.
> the other hand, I had thought of
> of a section. Not sure what the advantage would be. With the structure,
> the compiler generates
>
> movel N+irq_desc,%eax
> call *%eax
if the int3 solution is implementable (it's a tough problem i think), the
advantage would be a completely unaffected 'main kernel'. RTL could be
switched on/off runtime. OTOH, people recompile kernels routinely anyway.
-- mingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Re: Lockups - lost interrupt [ In reply to ]

Sep 13, 1999, 8:41 AM

Post #25 of 35 (1232 views)