Mailing List Archive

BUG: soft lockup detected on CPU#0! on 3.0.2-2
BUG: soft lockup detected on CPU#0!
Pid: 2213, comm: smbiod
EIP: 0061:[<f4990f2e>] CPU: 0
EIP is at smbiod+0x116/0x16d [smbfs]
EFLAGS: 00000246 Tainted: GF (2.6.16-xen-automount #1)
EAX: 00000000 EBX: f4996400 ECX: f2c99f68 EDX: f2c98000
ESI: f2c98000 EDI: c06f5780 EBP: f2c99fb8 DS: 007b ES: 007b
CR0: 8005003b CR2: b7f77000 CR3: 326e2000 CR4: 00000640
[<c0131bb3>] autoremove_wake_function+0x0/0x4b
[<c0104b5e>] ret_from_fork+0x6/0x10
[<c0131bb3>] autoremove_wake_function+0x0/0x4b
[<f4990e18>] smbiod+0x0/0x16d [smbfs]
[<c0102e6d>] kernel_thread_helper+0x5/0xb
smb_add_request: request [f26c6e80, mid=6567] timed out!
smb_lookup: find java/com failed, error=-5
smb_add_request: request [f26c6b80, mid=6566] timed out!
smb_add_request: request [f26c6080, mid=6568] timed out!



I get the same problem on -unstable. I can reliably reproduce the problem
whenever I start a particular set of java programs that read files on the
samba mount (the smbmount grabs the files off a windows PC)

this is a 3.0.2-2 2.6.16 kernel with a RHEL3 userland (with updated
module-init-tools)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: BUG: soft lockup detected on CPU#0! on 3.0.2-2 [ In reply to ]
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-
> bounces@lists.xensource.com] On Behalf Of Luke Crawford
> Sent: 01 September 2006 23:03
> To: xen-devel@lists.xensource.com
> Subject: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2
>
> BUG: soft lockup detected on CPU#0!
> Pid: 2213, comm: smbiod
> EIP: 0061:[<f4990f2e>] CPU: 0
> EIP is at smbiod+0x116/0x16d [smbfs]
> EFLAGS: 00000246 Tainted: GF (2.6.16-xen-automount #1)
> EAX: 00000000 EBX: f4996400 ECX: f2c99f68 EDX: f2c98000
> ESI: f2c98000 EDI: c06f5780 EBP: f2c99fb8 DS: 007b ES: 007b
> CR0: 8005003b CR2: b7f77000 CR3: 326e2000 CR4: 00000640
> [<c0131bb3>] autoremove_wake_function+0x0/0x4b
> [<c0104b5e>] ret_from_fork+0x6/0x10
> [<c0131bb3>] autoremove_wake_function+0x0/0x4b
> [<f4990e18>] smbiod+0x0/0x16d [smbfs]
> [<c0102e6d>] kernel_thread_helper+0x5/0xb
> smb_add_request: request [f26c6e80, mid=6567] timed out!
> smb_lookup: find java/com failed, error=-5
> smb_add_request: request [f26c6b80, mid=6566] timed out!
> smb_add_request: request [f26c6080, mid=6568] timed out!
>
> I get the same problem on -unstable. I can reliably reproduce the
problem
> whenever I start a particular set of java programs that read files on
the
> samba mount (the smbmount grabs the files off a windows PC)
>
> this is a 3.0.2-2 2.6.16 kernel with a RHEL3 userland (with updated
> module-init-tools)

I presume this is in a guest? As an experiment, try running it in dom0
and see what happens.

Are these SMP guests?
Are you sure the problem doesn't happen with native 2.6.16?

Thanks,
Ian





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: BUG: soft lockup detected on CPU#0! on 3.0.2-2 [ In reply to ]
On Tue, 5 Sep 2006, Ian Pratt wrote:
> I presume this is in a guest? As an experiment, try running it in dom0

I will try that on Tusday. (the box is at a client's location, and I
don't have off-site access)

> Are these SMP guests?

yes. this is a SMP guest

> Are you sure the problem doesn't happen with native 2.6.16?

No. I am sure the problem doesn't happen in native 2.4 with the RHEL3
patches, though.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: BUG: soft lockup detected on CPU#0! on 3.0.2-2 [ In reply to ]
> > Are you sure the problem doesn't happen with native 2.6.16?
>
> No. I am sure the problem doesn't happen in native 2.4 with the RHEL3
> patches, though.

I'll wager this is a native problem. Smbfs is deprecated these days, so
you should probably be using cifs on modern kernels -- see
/sbin/mount.cifs

Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: BUG: soft lockup detected on CPU#0! on 3.0.2-2 [ In reply to ]
On 5/9/06 12:16 am, "Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> wrote:

>>> Are you sure the problem doesn't happen with native 2.6.16?
>>
>> No. I am sure the problem doesn't happen in native 2.4 with the RHEL3
>> patches, though.
>
> I'll wager this is a native problem. Smbfs is deprecated these days, so
> you should probably be using cifs on modern kernels -- see
> /sbin/mount.cifs

3.0.2-2 doesn't include the fix to SEDF scheduler to prevent domain0 from
taking all CPU time. Without that, domU's can be starved and hence trigger
the softlockup warning. The tip of 3.0.2 repository is a much better
prospect, having lots of other bug fixes too.

-- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: BUG: soft lockup detected on CPU#0! on 3.0.2-2 [ In reply to ]
On Tue, 5 Sep 2006, Keir Fraser wrote:
> 3.0.2-2 doesn't include the fix to SEDF scheduler to prevent domain0 from
> taking all CPU time. Without that, domU's can be starved and hence trigger
> the softlockup warning. The tip of 3.0.2 repository is a much better
> prospect, having lots of other bug fixes too.


I installed 3-unstable, and was able to reproduce the problem.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: BUG: soft lockup detected on CPU#0! on 3.0.2-2 [ In reply to ]
It appears that you are correct; simply changing the mount to smbfs made
the problem go away. (that caused some permissions issues, but those were
easy enough to hack around)

Thanks! two guys have been working on this for a week, and the problem is
now solved and explained.


On Tue, 5 Sep 2006, Ian Pratt wrote:

> Date: Tue, 5 Sep 2006 00:16:50 +0100
> From: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
> To: Luke Crawford <lsc@prgmr.com>
> Cc: xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] BUG: soft lockup detected on CPU#0! on 3.0.2-2
>
>
>>> Are you sure the problem doesn't happen with native 2.6.16?
>>
>> No. I am sure the problem doesn't happen in native 2.4 with the RHEL3
>> patches, though.
>
> I'll wager this is a native problem. Smbfs is deprecated these days, so
> you should probably be using cifs on modern kernels -- see
> /sbin/mount.cifs
>
> Ian
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: BUG: soft lockup detected on CPU#0! on 3.0.2-2 [ In reply to ]
Upgrading from SMB to CIFS looked like it fixed the problem for around a
week. It's back, but nobody can reproduce it yet.

I had them on the 3.0.3 hg clone for a while, but as soon as they figured
out that the problem appeared to be fixed by the SMB->CIFS fix, they moved
back to 3.0.2-2. This error occurred on 3.0.2-2.

If I can get them to reproduce the error, I will test with 3.0-testing

Now, this is a bug, right? this isn't just one of the VMs being heavily
used? because the VM in question is running a massive Java app that
would stress the server even if I ran it native.

Pid: 23136, comm: csh
EIP: 0061:[<c0168c05>] CPU: 1
EIP is at generic_fillattr+0x6d/0xa4
EFLAGS: 00000202 Tainted: GF (2.6.16-xen-automount #1)
EAX: 0000448b EBX: 00000000 ECX: 0000448b EDX: 00000001
ESI: 00000000 EDI: d113e678 EBP: cc1a3f64 DS: 007b ES: 007b
CR0: 8005003b CR2: 08127000 CR3: 30abb000 CR4: 00000640
[<f4a37a66>] cifs_getattr+0x32/0x3a [cifs]
[<c0168e17>] vfs_fstat+0x33/0x44
[<c016949b>] sys_fstat64+0x18/0x36
[<c01541ff>] get_swap_page+0xbf/0x270
[<c015e900>] sys_open+0x27/0x2b
[<c0104be1>] syscall_call+0x7/0xb
CIFS VFS: No response for cmd 50 mid 46953

CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\
\utils\pogo\home\cdc-ops\bin\mv_gamelogs

CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\
\utils\pogo\home\cdc-ops\bin\mv_gamelogs

CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\
\utils\pogo\home\cdc-ops\bin\mv_gamelogs

CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\
\utils\pogo\home\cdc-ops\bin\mv_gamelogs



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: BUG: soft lockup detected on CPU#0! on 3.0.2-2 [ In reply to ]
> Upgrading from SMB to CIFS looked like it fixed the problem for around
a
> week. It's back, but nobody can reproduce it yet.
>
> I had them on the 3.0.3 hg clone for a while, but as soon as they
figured
> out that the problem appeared to be fixed by the SMB->CIFS fix, they
moved
> back to 3.0.2-2. This error occurred on 3.0.2-2.
>
> If I can get them to reproduce the error, I will test with 3.0-testing
>
> Now, this is a bug, right? this isn't just one of the VMs being
heavily
> used? because the VM in question is running a massive Java app that
> would stress the server even if I ran it native.

It's hard to tell for sure, but this doesn't look to me like a xen
issue. It may be possible to repro on an equivalent native kernel.
Was the guest still pingable or did it crash?

Ian


> Pid: 23136, comm: csh
> EIP: 0061:[<c0168c05>] CPU: 1
> EIP is at generic_fillattr+0x6d/0xa4
> EFLAGS: 00000202 Tainted: GF (2.6.16-xen-automount #1)
> EAX: 0000448b EBX: 00000000 ECX: 0000448b EDX: 00000001
> ESI: 00000000 EDI: d113e678 EBP: cc1a3f64 DS: 007b ES: 007b
> CR0: 8005003b CR2: 08127000 CR3: 30abb000 CR4: 00000640
> [<f4a37a66>] cifs_getattr+0x32/0x3a [cifs]
> [<c0168e17>] vfs_fstat+0x33/0x44
> [<c016949b>] sys_fstat64+0x18/0x36
> [<c01541ff>] get_swap_page+0xbf/0x270
> [<c015e900>] sys_open+0x27/0x2b
> [<c0104be1>] syscall_call+0x7/0xb
> CIFS VFS: No response for cmd 50 mid 46953
>
> CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\
> \utils\pogo\home\cdc-ops\bin\mv_gamelogs
>
> CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\
> \utils\pogo\home\cdc-ops\bin\mv_gamelogs
>
> CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\
> \utils\pogo\home\cdc-ops\bin\mv_gamelogs
>
> CIFS VFS: Error 0xffffff90 on cifs_get_inode_info in lookup of\
> \utils\pogo\home\cdc-ops\bin\mv_gamelogs
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: BUG: soft lockup detected on CPU#0! on 3.0.2-2 [ In reply to ]
On Sat, 16 Sep 2006, Ian Pratt wrote:
>> Now, this is a bug, right? this isn't just one of the VMs being
> heavily
>> used? because the VM in question is running a massive Java app that
>> would stress the server even if I ran it native.
>
> It's hard to tell for sure, but this doesn't look to me like a xen
> issue. It may be possible to repro on an equivalent native kernel.
> Was the guest still pingable or did it crash?
>
> Ian

completely unpingable. console was also dead, nobody tried the xen
console. (I just setup a better reboot procedure for my hosting company;
I need to setup something similar here so that we don't loose the data we
need to figure this out.)

Where should I start looking to find out exactly what "bug: soft lockup on
cpu0" means? linux source/docs? or Xen source/docs?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: BUG: soft lockup detected on CPU#0! on 3.0.2-2 [ In reply to ]
On 16/9/06 2:34 am, "Luke Crawford" <lsc@prgmr.com> wrote:

> completely unpingable. console was also dead, nobody tried the xen
> console. (I just setup a better reboot procedure for my hosting company;
> I need to setup something similar here so that we don't loose the data we
> need to figure this out.)
>
> Where should I start looking to find out exactly what "bug: soft lockup on
> cpu0" means? linux source/docs? or Xen source/docs?

The watchdog code runs a kernel thread on every CPU. This is supposed to
wake up every second and update a per-CPU counter. A hook from the timer
interrupt checks the per-CPU counter and prints a softlockup warning if the
counter is not updated for 10 seconds.

3.0.2-2 is known to be susceptible to softlockups because the Xen scheduler
will starve domains to run domain0. It's not clear if that's what is
happening here, but you need to repro on tip of xen-3.0-testing to find out
one way or the other. Because of the number of bug fixes since 3.0.2-3 we
don't recommend running any old releases of 3.0.2.

-- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: BUG: soft lockup detected on CPU#0! on 3.0.2-2 [ In reply to ]
Keir Fraser <Keir.Fraser <at> cl.cam.ac.uk> writes:

>
> On 16/9/06 2:34 am, "Luke Crawford" <lsc <at> prgmr.com> wrote:
>
> > completely unpingable. console was also dead, nobody tried the xen
> > console. (I just setup a better reboot procedure for my hosting company;
> > I need to setup something similar here so that we don't loose the data we
> > need to figure this out.)
> >
> > Where should I start looking to find out exactly what "bug: soft lockup on
> > cpu0" means? linux source/docs? or Xen source/docs?
>
> The watchdog code runs a kernel thread on every CPU. This is supposed to
> wake up every second and update a per-CPU counter. A hook from the timer
> interrupt checks the per-CPU counter and prints a softlockup warning if the
> counter is not updated for 10 seconds.
>
> 3.0.2-2 is known to be susceptible to softlockups because the Xen scheduler
> will starve domains to run domain0. It's not clear if that's what is
> happening here, but you need to repro on tip of xen-3.0-testing to find out
> one way or the other. Because of the number of bug fixes since 3.0.2-3 we
> don't recommend running any old releases of 3.0.2.
>
> -- Keir
>


I also get soft lockup warnings in my Xen domU. I'd really love to be able to
determine the source of the error(s) and perhaps fix them myself. Not a kernel
hacker and my C is rather flaky but can anyone point me to some docs one how ti
interpret data from:-

Pausing... 5<3>BUG: soft lockup detected on CPU#0!

Pid: 1, comm: init
EIP: 0061:[<c0107c64>] CPU: 0
EIP is at delay_tsc+0x14/0x20
EFLAGS: 00000287 Not tainted (2.6.16-xen #1)
EAX: 79d31a46 EBX: 000c74e4 ECX: 79c788c9 EDX: 00004616
ESI: 00000005 EDI: c0112520 EBP: bff6c010 DS: 007b ES: 007b
CR0: 8005003b CR2: 431ea00c CR3: 003e3000 CR4: 00000640
[<c011264d>] do_fixup_4gb_segment+0x12d/0x160
[<c0113fa0>] do_page_fault+0x4a0/0x7ac
[<c03de17c>] icmp_init+0xdc/0x110
[<c03de284>] inet_init+0x74/0x380
[<c03de17c>] icmp_init+0xdc/0x110
[<c0105243>] error_code+0x2b/0x30
Continuing...

Are there any local docs, tools or dirs (on my machine) or URLs that anyone can
point me to?

- jm



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: Re: BUG: soft lockup detected on CPU#0! on 3.0.2-2 [ In reply to ]
I also meet this bug when booting up Linux guest OS VMX domain.
But if boot up Linux guest OS with kernel parameter "quiet", it can boot up successfully.

Best Regards
Larry


-----Original Message-----
From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of john maclean
Sent: 2006Äê11ÔÂ2ÈÕ 22:50
To: xen-devel@lists.xensource.com
Subject: [Xen-devel] Re: BUG: soft lockup detected on CPU#0! on 3.0.2-2


Keir Fraser <Keir.Fraser <at> cl.cam.ac.uk> writes:

>
> On 16/9/06 2:34 am, "Luke Crawford" <lsc <at> prgmr.com> wrote:
>
> > completely unpingable. console was also dead, nobody tried the xen
> > console. (I just setup a better reboot procedure for my hosting company;
> > I need to setup something similar here so that we don't loose the data we
> > need to figure this out.)
> >
> > Where should I start looking to find out exactly what "bug: soft lockup on
> > cpu0" means? linux source/docs? or Xen source/docs?
>
> The watchdog code runs a kernel thread on every CPU. This is supposed to
> wake up every second and update a per-CPU counter. A hook from the timer
> interrupt checks the per-CPU counter and prints a softlockup warning if the
> counter is not updated for 10 seconds.
>
> 3.0.2-2 is known to be susceptible to softlockups because the Xen scheduler
> will starve domains to run domain0. It's not clear if that's what is
> happening here, but you need to repro on tip of xen-3.0-testing to find out
> one way or the other. Because of the number of bug fixes since 3.0.2-3 we
> don't recommend running any old releases of 3.0.2.
>
> -- Keir
>


I also get soft lockup warnings in my Xen domU. I'd really love to be able to
determine the source of the error(s) and perhaps fix them myself. Not a kernel
hacker and my C is rather flaky but can anyone point me to some docs one how ti
interpret data from:-

Pausing... 5<3>BUG: soft lockup detected on CPU#0!

Pid: 1, comm: init
EIP: 0061:[<c0107c64>] CPU: 0
EIP is at delay_tsc+0x14/0x20
EFLAGS: 00000287 Not tainted (2.6.16-xen #1)
EAX: 79d31a46 EBX: 000c74e4 ECX: 79c788c9 EDX: 00004616
ESI: 00000005 EDI: c0112520 EBP: bff6c010 DS: 007b ES: 007b
CR0: 8005003b CR2: 431ea00c CR3: 003e3000 CR4: 00000640
[<c011264d>] do_fixup_4gb_segment+0x12d/0x160
[<c0113fa0>] do_page_fault+0x4a0/0x7ac
[<c03de17c>] icmp_init+0xdc/0x110
[<c03de284>] inet_init+0x74/0x380
[<c03de17c>] icmp_init+0xdc/0x110
[<c0105243>] error_code+0x2b/0x30
Continuing...

Are there any local docs, tools or dirs (on my machine) or URLs that anyone can
point me to?

- jm



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel