Mailing List Archive

[BUG] cpu hotplug vs scheduler
I'm testing host cpu hotplug with kvm. Basically running 7 guests on a
4 core machine, offlining and onlining host cpus at random. Eventually
I hit this:

[4298303.496645] Booting processor 3/7 ip 6000
[4298303.506116] Initializing CPU#3
[4298303.506116] Calibrating delay using timer specific routine..
5319.66 BogoMIPS (lpj=2659833)
[4298303.506116] CPU: L1 I cache: 32K, L1 D cache: 32K
[4298303.506116] CPU: L2 cache: 4096K
[4298303.506116] CPU: Physical Processor ID: 3
[4298303.506116] CPU: Processor Core ID: 1
[4298303.506116] x86 PAT enabled: cpu 3, old 0x7040600070406, new
0x7010600070106
[4298303.582937] CPU3: Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
stepping 06
[4298303.585087] checking TSC synchronization [CPU#0 -> CPU#3]: passed.
[4298303.707287] Switched to high resolution mode on CPU 3
[4298303.712943] kvm: enabling virtualization on CPU3
[4298303.713955] CPU0 attaching sched-domain:
[4298303.713901] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000158
[4298303.713901] IP: [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c
[4298303.713901] PGD 0
[4298303.713901] Oops: 0000 [1] PREEMPT SMP
[4298303.713901] CPU 3
[4298303.713901] Modules linked in: kvm_intel kvm netconsole autofs4 nfs
lockd nfs_acl sunrpc bridge llc acpi_cpufreq backlight sg e1000 button
serio_raw rtc_cmos rtc_core rtc_lib ata_piix dm_snapshot dm_mod ahci
libata dock sd_mod scsi_mod [last unloaded: kvm]
[4298303.713901] Pid: 15115, comm: migration/3 Not tainted 2.6.26-rc2 #723
[4298303.713901] RIP: 0010:[<ffffffff8022e722>] [<ffffffff8022e722>]
pick_next_task_fair+0x55/0x7c
[4298303.713901] RSP: 0018:ffff81004fdfbe20 EFLAGS: 00010046
[4298303.713901] RAX: 0000000000000000 RBX: ffff81000103df80 RCX:
0000000000000000
[4298303.713901] RDX: ffff81000103e038 RSI: 000000003b9aca00 RDI:
ffff81000103df00
[4298303.713901] RBP: ffff81004fdfbe40 R08: ffff81004fdfbdd0 R09:
ffff81000103a0a0
[4298303.713901] R10: 0000000000000000 R11: 0000000000000003 R12:
0000000000000000
[4298303.713901] R13: ffff81000103df00 R14: ffffffff8060a140 R15:
0000000000000003
[4298303.713901] FS: 0000000000000000(0000) GS:ffff81007f806a80(0000)
knlGS:0000000000000000
[4298303.713901] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[4298303.713901] CR2: 0000000000000158 CR3: 0000000000201000 CR4:
00000000000026a0
[4298303.713901] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[4298303.713901] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[4298303.713901] Process migration/3 (pid: 15115, threadinfo
ffff81004fdfa000, task ffff81003f5f0700)
[4298303.713901] Stack: ffff81004fdfbe40 ffff81000103df00
ffffffff804543c0 ffff810020985bf8
[4298303.713901] ffff81004fdfbee0 ffffffff804373fe ffff81004fdfbe80
ffffffff8023060a
[4298303.713901] ffff81000103df00 ffff81003f5f0700 0000000000000000
ffff81003f5f0a60
[4298303.713901] Call Trace:
[4298303.713901] [<ffffffff804373fe>] schedule+0x414/0x6ab
[4298303.713901] [<ffffffff8023060a>] ? hrtick_set+0x9d/0xe8
[4298303.713901] [<ffffffff8043772f>] ? thread_return+0x9a/0xbf
[4298303.713901] [<ffffffff80231652>] migration_thread+0x185/0x22d
[4298303.713901] [<ffffffff802314cd>] ? migration_thread+0x0/0x22d
[4298303.713901] [<ffffffff8024afe6>] kthread+0x49/0x77
[4298303.713901] [<ffffffff8020d228>] child_rip+0xa/0x12
[4298303.713901] [<ffffffff8024af9d>] ? kthread+0x0/0x77
[4298303.713901] [<ffffffff8020d21e>] ? child_rip+0x0/0x12
[4298303.713901]
[4298303.713901]
[4298303.713901] Code: c0 74 28 48 8b 7b 58 4c 8d 60 f0 48 85 ff 74 10
4c 89 e6 e8 df cc ff ff 85 c0 75 04 4c 8b 63 58 4c 89 e6 48 89 df e8 4a
e5 ff ff <49> 8b 9c 24 58 01 00 00 48 85 db 75 bf 49 83 ec 38 4c 89 ef 4c
[4298303.713901] RIP [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c

This seems to be the assignment to cfs_rq after pick_next_entity().

I'm running kvm.git, which is currently 2.6.26-rc2 plus a few kvm
patches. It could be kvm's fault, but it doesn't appear so from the traces.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] cpu hotplug vs scheduler [ In reply to ]
Avi Kivity wrote:
> I'm testing host cpu hotplug with kvm. Basically running 7 guests on
> a 4 core machine, offlining and onlining host cpus at random.
> Eventually I hit this:
>

[snip oops]

> This seems to be the assignment to cfs_rq after pick_next_entity().
>
> I'm running kvm.git, which is currently 2.6.26-rc2 plus a few kvm
> patches. It could be kvm's fault, but it doesn't appear so from the
> traces.
>

I get the same oops running with the kvm modules unloaded. None of
kvm.git's changes are in core code, so this is definitely a mainline
problem.

It took about 40 minutes to reproduce; 1200 cpus onlined and 1200 cpus
offlined.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] cpu hotplug vs scheduler [ In reply to ]
On Tue, May 13, 2008 at 06:33:00PM +0300, Avi Kivity wrote:
> Avi Kivity wrote:
>> I'm testing host cpu hotplug with kvm. Basically running 7 guests on
>> a 4 core machine, offlining and onlining host cpus at random.
>> Eventually I hit this:
>>
>
> [snip oops]
>
>> This seems to be the assignment to cfs_rq after pick_next_entity().
>>
>> I'm running kvm.git, which is currently 2.6.26-rc2 plus a few kvm
>> patches. It could be kvm's fault, but it doesn't appear so from the
>> traces.
>>
>
> I get the same oops running with the kvm modules unloaded. None of
> kvm.git's changes are in core code, so this is definitely a mainline
> problem.
>
> It took about 40 minutes to reproduce; 1200 cpus onlined and 1200 cpus
> offlined.

FWIW, this happens on s390 as well (without kvm support). But until now
I didn't have time to look into it. Didn't report it here yet, since we
had plenty of cpu hotplug changes in arch code, so I suspected a bug there.
This doesn't seem to be the case.

It takes only a few minutes to trigger the bug here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] cpu hotplug vs scheduler [ In reply to ]
Hi,

> [ ... ]
>
> [4298303.713901] Call Trace:
> [4298303.713901] [<ffffffff804373fe>] schedule+0x414/0x6ab
> [4298303.713901] [<ffffffff8023060a>] ? hrtick_set+0x9d/0xe8
> [4298303.713901] [<ffffffff8043772f>] ? thread_return+0x9a/0xbf
> [4298303.713901] [<ffffffff80231652>] migration_thread+0x185/0x22d
> [4298303.713901] [<ffffffff802314cd>] ? migration_thread+0x0/0x22d
> [4298303.713901] [<ffffffff8024afe6>] kthread+0x49/0x77
> [4298303.713901] [<ffffffff8020d228>] child_rip+0xa/0x12
> [4298303.713901] [<ffffffff8024af9d>] ? kthread+0x0/0x77
> [4298303.713901] [<ffffffff8020d21e>] ? child_rip+0x0/0x12
> [4298303.713901]
> [4298303.713901]
> [4298303.713901] Code: c0 74 28 48 8b 7b 58 4c 8d 60 f0 48 85 ff 74 10 4c
> 89 e6 e8 df cc ff ff 85 c0 75 04 4c 8b 63 58 4c 89 e6 48 89 df e8 4a e5 ff
> ff <49> 8b 9c 24 58 01 00 00 48 85 db 75 bf 49 83 ec 38 4c 89 ef 4c
> [4298303.713901] RIP [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c
>
> This seems to be the assignment to cfs_rq after pick_next_entity().

[ cc'ed a few folks. ]


So the cfs-tree likely gets out-of-sync. I pressume, it won't be
reproducible with CONFIG_SCHED_GROUP options being disabled.

Anyway, would you try one of these debug-patches (not sure about the
workability of the second one though :-/)

Let's check what are the values for 'cfs_rq->weight.load/nr_running'.

thanks in advance,

(non-whitespace-damaged versions are enclosed)

---
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1291,6 +1291,12 @@ static struct task_struct
*pick_next_task_fair(struct rq *rq)

do {
se = pick_next_entity(cfs_rq);
+
+ if (unlikely(!se))
+ printk(KERN_ERR "BUG: se == NULL but
nr_running (%ld), load (%ld),"
+ " rq-nr_running (%ld), rq-load (%ld)\n",
+ cfs_rq->nr_running,
cfs_rq->load.weight, rq->nr_running, rq->load.weight);
+
cfs_rq = group_cfs_rq(se);
} while (cfs_rq);


---

--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -1280,6 +1280,8 @@ static void check_preempt_wakeup(struct rq *rq,
struct task_str uct *p)
resched_task(curr);
}

+static void sysrq_sched_debug_show(void);
+
static struct task_struct *pick_next_task_fair(struct rq *rq)
{
struct task_struct *p;
@@ -1291,6 +1293,10 @@ static struct task_struct
*pick_next_task_fair(struct rq *rq)

do {
se = pick_next_entity(cfs_rq);
+
+ if (unlikely(!se))
+ sysrq_sched_debug_show();
+
cfs_rq = group_cfs_rq(se);
} while (cfs_rq);


---

--
Best regards,
Dmitry Adamushko
Re: [BUG] cpu hotplug vs scheduler [ In reply to ]
Dmitry Adamushko wrote:
> Hi,
>
>
>> [ ... ]
>>
>> [4298303.713901] Call Trace:
>> [4298303.713901] [<ffffffff804373fe>] schedule+0x414/0x6ab
>> [4298303.713901] [<ffffffff8023060a>] ? hrtick_set+0x9d/0xe8
>> [4298303.713901] [<ffffffff8043772f>] ? thread_return+0x9a/0xbf
>> [4298303.713901] [<ffffffff80231652>] migration_thread+0x185/0x22d
>> [4298303.713901] [<ffffffff802314cd>] ? migration_thread+0x0/0x22d
>> [4298303.713901] [<ffffffff8024afe6>] kthread+0x49/0x77
>> [4298303.713901] [<ffffffff8020d228>] child_rip+0xa/0x12
>> [4298303.713901] [<ffffffff8024af9d>] ? kthread+0x0/0x77
>> [4298303.713901] [<ffffffff8020d21e>] ? child_rip+0x0/0x12
>> [4298303.713901]
>> [4298303.713901]
>> [4298303.713901] Code: c0 74 28 48 8b 7b 58 4c 8d 60 f0 48 85 ff 74 10 4c
>> 89 e6 e8 df cc ff ff 85 c0 75 04 4c 8b 63 58 4c 89 e6 48 89 df e8 4a e5 ff
>> ff <49> 8b 9c 24 58 01 00 00 48 85 db 75 bf 49 83 ec 38 4c 89 ef 4c
>> [4298303.713901] RIP [<ffffffff8022e722>] pick_next_task_fair+0x55/0x7c
>>
>> This seems to be the assignment to cfs_rq after pick_next_entity().
>>
>
> [ cc'ed a few folks. ]
>
>
> So the cfs-tree likely gets out-of-sync. I pressume, it won't be
> reproducible with CONFIG_SCHED_GROUP options being disabled.
>
> Anyway, would you try one of these debug-patches (not sure about the
> workability of the second one though :-/)
>
> Let's check what are the values for 'cfs_rq->weight.load/nr_running'.
>
>

Got this for the first patch:

[4302727.615522] Booting processor 3/7 ip 6000
[4302727.625923] Initializing CPU#3
[4302727.625923] Calibrating delay using timer specific routine..
5319.76 BogoMIPS (lpj=2659883)
[4302727.625923] CPU: L1 I cache: 32K, L1 D cache: 32K
[4302727.625923] CPU: L2 cache: 4096K
[4302727.625923] CPU: Physical Processor ID: 3
[4302727.625923] CPU: Processor Core ID: 1
[4302727.625923] x86 PAT enabled: cpu 3, old 0x7040600070406, new
0x7010600070106
[4302727.692484] CPU3: Intel(R) Xeon(R) CPU 5150 @ 2.66GHz
stepping 06
[4302727.694236] checking TSC synchronization [CPU#1 -> CPU#3]: passed.
[4302727.824185] Switched to high resolution mode on CPU 3
[4302727.859184] kvm: enabling virtualization on CPU3
[4302727.859714] Sched Debug Version: v0.07, 2.6.26-rc2 #726
[4302727.859714] now at 6918576.148656 msecs
[4302727.859714] .sysctl_sched_latency : 60.000000
[4302727.859714] .sysctl_sched_min_granularity : 12.000000
[4302727.859714] .sysctl_sched_wakeup_granularity : 30.000000
[4302727.859714] .sysctl_sched_child_runs_first : 0.000001
[4302727.860191] .sysctl_sched_features : 895
[4302727.860191]
[4302727.860191] cpu#0, 2659.999 MHz
[4302727.860191] .nr_running : 2
[4302727.860191] .load : 841
[4302727.860191] .nr_switches : 3427530
[4302727.861205] .nr_load_updates : 2183358
[4302727.861205] .nr_uninterruptible : 15
[4302727.861205] .jiffies : 4301585875
[4302727.861205] .next_balance : 4301.585696
[4302727.861205] .curr->pid : 4678
[4302727.861205] .clock : 6918579.002757
[4302727.862216] .cpu_load[0] : 841
[4302727.862216] .cpu_load[1] : 841
[4302727.862216] .cpu_load[2] : 841
[4302727.862216] .cpu_load[3] : 841
[4302727.862216] .cpu_load[4] : 841
[4302727.862216]
[4302727.862216] cfs_rq[0]:
[4302727.867209] .exec_clock : 3970.569663
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 5178969.408050
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : 0.000000
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 513
[4302727.867209] .nr_spread_over : 6
[4302727.867209] .shares : 1024
[4302727.867209]
[4302727.867209] cfs_rq[0]:
[4302727.867209] .exec_clock : 14.588517
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 5178971.405628
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : 0.000000
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 513
[4302727.867209] .nr_spread_over : 0
[4302727.867209] .shares : 1024
[4302727.867209]
[4302727.867209] cfs_rq[0]:
[4302727.867209] .exec_clock : 41.615870
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 5178973.403544
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : 0.000000
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 513
[4302727.867209] .nr_spread_over : 2
[4302727.867209] .shares : 1024
[4302727.867209]
[4302727.867209] cfs_rq[0]:
[4302727.867209] .exec_clock : 0.000000
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 5178975.401320
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : 0.000000
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 513
[4302727.867209] .nr_spread_over : 0
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867209] cfs_rq[0]:
[4302727.867209] .exec_clock : 0.000001
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 5178977.398314
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : 0.000000
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 513
[4302727.867209] .nr_spread_over : 0
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867209] cfs_rq[0]:
[4302727.867209] .exec_clock : 2165242.484786
[4302727.867209] .MIN_vruntime : 10323214.742376
[4302727.867209] .min_vruntime : 5178979.396488
[4302727.867209] .max_vruntime : 10323214.742376
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : 0.000000
[4302727.867209] .nr_running : 2
[4302727.867209] .load : 2048
[4302727.867209] .bkl_count : 513
[4302727.867209] .nr_spread_over : 1789825
[4302727.867209] .shares : 843
[4302727.867209]
[4302727.867209] runnable tasks:
[4302727.867209] task PID tree-key switches
prio exec-runtime sum-exec sum-sleep
[4302727.867209]
----------------------------------------------------------------------------------------------------------
[4302727.867209] Rqemu-system-x86 4678 10323337.578253 553310
120 10323337.578255 1380505.796830 42439.250368
[4302727.867209]
[4302727.867209] cpu#1, 2659.999 MHz
[4302727.867209] .nr_running : 3
[4302727.867209] .load : 415
[4302727.867209] .nr_switches : 629498
[4302727.867209] .nr_load_updates : 838874
[4302727.867209] .nr_uninterruptible : -6
[4302727.867209] .jiffies : 4301585895
[4302727.867209] .next_balance : 4301.585634
[4302727.867209] .curr->pid : 7799
[4302727.867209] .clock : 6918576.130865
[4302727.867209] .cpu_load[0] : 415
[4302727.867209] .cpu_load[1] : 415
[4302727.867209] .cpu_load[2] : 415
[4302727.867209] .cpu_load[3] : 415
[4302727.867209] .cpu_load[4] : 415
[4302727.867209]
[4302727.867209] cfs_rq[1]:
[4302727.867209] .exec_clock : 74.637431
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 759396.868495
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : -4419588.520858
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 299
[4302727.867209] .nr_spread_over : 1
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867209] cfs_rq[1]:
[4302727.867209] .exec_clock : 22.707771
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 759396.868495
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : -4419590.518446
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 299
[4302727.867209] .nr_spread_over : 1
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867209] cfs_rq[1]:
[4302727.867209] .exec_clock : 0.033026
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 759396.868495
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : -4419590.518446
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 299
[4302727.867209] .nr_spread_over : 0
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867209] cfs_rq[1]:
[4302727.867209] .exec_clock : 0.000000
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 759396.868495
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : -4419590.518446
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 299
[4302727.867209] .nr_spread_over : 0
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867209] cfs_rq[1]:
[4302727.867209] .exec_clock : 0.026450
[4302727.867209] .MIN_vruntime : 0.000001
[4302727.867209] .min_vruntime : 759396.868495
[4302727.867209] .max_vruntime : 0.000001
[4302727.867209] .spread : 0.000000
[4302727.867209] .spread0 : -4419590.518446
[4302727.867209] .nr_running : 0
[4302727.867209] .load : 0
[4302727.867209] .bkl_count : 299
[4302727.867209] .nr_spread_over : 0
[4302727.867209] .shares : 0
[4302727.867209]
[4302727.867210] cfs_rq[1]:
[4302727.867210] .exec_clock : 754981.092689
[4302727.867210] .MIN_vruntime : 1239813.449102
[4302727.867210] .min_vruntime : 759396.868495
[4302727.867210] .max_vruntime : 1239819.334711
[4302727.867210] .spread : 5.885609
[4302727.867210] .spread0 : -4419590.518446
[4302727.867210] .nr_running : 3
[4302727.867210] .load : 3072
[4302727.867210] .bkl_count : 299
[4302727.867210] .nr_spread_over : 53817
[4302727.867210] .shares : 415
[4302727.867210]
[4302727.867210] runnable tasks:
[4302727.867210] task PID tree-key switches
prio exec-runtime sum-exec sum-sleep
[4302727.867210]
----------------------------------------------------------------------------------------------------------
[4302727.900184] qemu-system-x86 4987 1239813.449102 709828
120 1239813.449102 1410504.949783 22865.206608
[4302727.900184] qemu-system-x86 5052 1239819.334711 530481
120 1239819.334711 1365146.519564 50937.064744
[4302727.900184] Rtoggle-processo 7799 1239811.208673 47886
120 1239811.208673 57552.631854 1592798.974913
[4302727.900184]
[4302727.900184] cpu#3, 2659.999 MHz
[4302727.900184] .nr_running : 1
[4302727.900184] .load : 285
[4302727.900184] .nr_switches : 611209
[4302727.900184] .nr_load_updates : 843051
[4302727.900184] .nr_uninterruptible : -2
[4302727.900184] .jiffies : 4301585916
[4302727.900184] .next_balance : 4301.586873
[4302727.900184] .curr->pid : 0
[4302727.900184] .clock : 6918576.376068
[4302727.900184] .cpu_load[0] : 0
[4302727.900184] .cpu_load[1] : 0
[4302727.900184] .cpu_load[2] : 0
[4302727.900184] .cpu_load[3] : 181
[4302727.900184] .cpu_load[4] : 1108
[4302727.900184]
[4302727.900184] cfs_rq[3]:
[4302727.900184] .exec_clock : 8.224765
[4302727.900184] BUG: spinlock recursion on CPU#3, swapper/0
[4302727.900184] lock: ffff81000103df00, .magic: dead4ead, .owner:
swapper/0, .owner_cpu: 3
[4302727.900184] Pid: 0, comm: swapper Not tainted 2.6.26-rc2 #726
[4302727.900184]
[4302727.900184] Call Trace:
[4302727.900184] [<ffffffff803249de>] spin_bug+0x9e/0xe9
[4302727.900184] [<ffffffff80324af4>] _raw_spin_lock+0x41/0x123
[4302727.900184] [<ffffffff80439638>] _spin_lock_irqsave+0x2f/0x37
[4302727.900184] [<ffffffff8022ef7c>] print_cfs_rq+0xca/0x46a
[4302727.900184] [<ffffffff80231f97>] sched_debug_show+0x7a3/0xb8c
[4302727.900184] [<ffffffff8023238d>] sysrq_sched_debug_show+0xd/0xf
[4302727.900184] [<ffffffff802323ee>] pick_next_task_fair+0x5f/0x86
[4302727.900184] [<ffffffff804373f6>] schedule+0x3fc/0x6ab
[4302727.900184] [<ffffffff8024e03f>] ? ktime_get_ts+0x49/0x4e
[4302727.900184] [<ffffffff80253a28>] ? tick_nohz_stop_idle+0x2d/0x54
[4302727.900184] [<ffffffff8021283f>] ? mwait_idle+0x0/0x59
[4302727.900184] [<ffffffff8020ae37>] cpu_idle+0xc8/0xd7
[4302727.900184] [<ffffffff804332f1>] start_secondary+0x173/0x178
[4302727.900184]




--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] cpu hotplug vs scheduler [ In reply to ]
2008/5/14 Avi Kivity <avi@qumranet.com>:

> [ ... ]
>
> [4302727.900184] Call Trace:
> [4302727.900184] [<ffffffff803249de>] spin_bug+0x9e/0xe9
> [4302727.900184] [<ffffffff80324af4>] _raw_spin_lock+0x41/0x123
> [4302727.900184] [<ffffffff80439638>] _spin_lock_irqsave+0x2f/0x37
> [4302727.900184] [<ffffffff8022ef7c>] print_cfs_rq+0xca/0x46a
> [4302727.900184] [<ffffffff80231f97>] sched_debug_show+0x7a3/0xb8c
> [4302727.900184] [<ffffffff8023238d>] sysrq_sched_debug_show+0xd/0xf
> [4302727.900184] [<ffffffff802323ee>] pick_next_task_fair+0x5f/0x86


Err... sorry for the broken patch. The patch below on top of the
previous one should address this issue (ugly, but should be ok for
debugging).
'tasklist_lock' shouldn't cause a double lock, I guess.

Sorry for rather 'blind' attempts. If no, then I'll
prepare/test/take-a-closer-look at it later today when I'm at home.

TIA,


------ kernel/sched_debug-prev.c 2008-05-14 14:53:28.000000000 +0200
+++ kernel/sched_debug.c 2008-05-14 14:58:12.000000000 +0200
@@ -125,6 +125,7 @@ void print_cfs_rq(struct seq_file *m, in
char path[128] = "";
struct cgroup *cgroup = NULL;
struct task_group *tg = cfs_rq->tg;
+ int was_locked;

if (tg)
cgroup = tg->css.cgroup;
@@ -138,7 +139,11 @@ void print_cfs_rq(struct seq_file *m, in
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "exec_clock",
SPLIT_NS(cfs_rq->exec_clock));

- spin_lock_irqsave(&rq->lock, flags);
+ was_locked = spin_is_locked(&rq->lock);
+
+ if (!was_locked)
+ spin_lock_irqsave(&rq->lock, flags);
+
if (cfs_rq->rb_leftmost)
MIN_vruntime = (__pick_next_entity(cfs_rq))->vruntime;
last = __pick_last_entity(cfs_rq);
@@ -146,7 +151,10 @@ void print_cfs_rq(struct seq_file *m, in
max_vruntime = last->vruntime;
min_vruntime = rq->cfs.min_vruntime;
rq0_min_vruntime = per_cpu(runqueues, 0).cfs.min_vruntime;
- spin_unlock_irqrestore(&rq->lock, flags);
+
+ if (!was_locked)
+ spin_unlock_irqrestore(&rq->lock, flags);
+
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "MIN_vruntime",
SPLIT_NS(MIN_vruntime));
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "min_vruntime",

---


--
Best regards,
Dmitry Adamushko
Re: [BUG] cpu hotplug vs scheduler [ In reply to ]
Dmitry Adamushko wrote:
> 2008/5/14 Avi Kivity <avi@qumranet.com>:
>
>
>> [ ... ]
>>
>> [4302727.900184] Call Trace:
>> [4302727.900184] [<ffffffff803249de>] spin_bug+0x9e/0xe9
>> [4302727.900184] [<ffffffff80324af4>] _raw_spin_lock+0x41/0x123
>> [4302727.900184] [<ffffffff80439638>] _spin_lock_irqsave+0x2f/0x37
>> [4302727.900184] [<ffffffff8022ef7c>] print_cfs_rq+0xca/0x46a
>> [4302727.900184] [<ffffffff80231f97>] sched_debug_show+0x7a3/0xb8c
>> [4302727.900184] [<ffffffff8023238d>] sysrq_sched_debug_show+0xd/0xf
>> [4302727.900184] [<ffffffff802323ee>] pick_next_task_fair+0x5f/0x86
>>
>
>
> Err... sorry for the broken patch. The patch below on top of the
> previous one should address this issue (ugly, but should be ok for
> debugging).
> 'tasklist_lock' shouldn't cause a double lock, I guess.
>
> Sorry for rather 'blind' attempts. If no, then I'll
> prepare/test/take-a-closer-look at it later today when I'm at home.
>
>

Now it died when allocating the pda:

[4301986.526627] Booting processor 1/6 ip 6000
[4301986.537195] swapper: page allocation failure. order:2, mode:0x20
[4301986.537195] Pid: 0, comm: swapper Not tainted 2.6.26-rc2 #728
[4301986.537195]
[4301986.537195] Call Trace:
[4301986.537195] [<ffffffff802388ae>] ? __printk_ratelimit+0x9/0xb
[4301986.537195] [<ffffffff80272e07>] __alloc_pages_internal+0x387/0x39f
[4301986.537195] [<ffffffff80272e35>] __alloc_pages+0xb/0xd
[4301986.537195] [<ffffffff80272e4c>] __get_free_pages+0x15/0x46
[4301986.537195] [<ffffffff80210cdf>] pda_init+0x87/0xac
[4301986.537195] [<ffffffff8042fed1>] cpu_init+0x5e/0x34a
[4301986.537195] [<ffffffff8026b437>] ? rcu_needs_cpu+0x36/0x47
[4301986.537195] [<ffffffff802311a7>] ? idle_task_exit+0x71/0xad
[4301986.537195] [<ffffffff804331aa>] start_secondary+0xc/0x178
[4301986.537195]
[4301986.537195] Mem-info:
[4301986.537195] DMA per-cpu:
[4301986.537195] CPU 0: hi: 0, btch: 1 usd: 0
[4301986.537195] CPU 2: hi: 0, btch: 1 usd: 0
[4301986.537195] DMA32 per-cpu:
[4301986.537195] CPU 0: hi: 186, btch: 31 usd: 153
[4301986.537195] CPU 2: hi: 186, btch: 31 usd: 135
[4301986.537195] Active:347879 inactive:109278 dirty:107 writeback:0
unstable:0
[4301986.537195] free:5311 slab:10186 mapped:30496 pagetables:1587 bounce:0
[4301986.537195] DMA free:8012kB min:24kB low:28kB high:36kB active:60kB
inactive:0kB present:8644kB pages_scanned:0 all_unreclaimable? no
[4301986.537195] lowmem_reserve[]: 0 1999 1999 1999
[4301986.537195] DMA32 free:13232kB min:5704kB low:7128kB high:8556kB
active:1391456kB inactive:437112kB present:2047248kB pages_scanned:0
all_unreclaimable? no
[4301986.537195] lowmem_reserve[]: 0 0 0 0
[4301986.537195] DMA: 19*4kB 16*8kB 10*16kB 11*32kB 6*64kB 4*128kB
1*256kB 2*512kB 3*1024kB 1*2048kB 0*4096kB = 8012kB
[4301986.537195] DMA32: 2034*4kB 570*8kB 1*16kB 0*32kB 0*64kB 0*128kB
0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 13224kB
[4301986.537195] 212750 total pagecache pages
[4301986.537195] Swap cache: add 73, delete 18, find 0/0
[4301986.537195] Free swap = 3210964kB
[4301986.537195] Total swap = 3211256kB

There seem to be some order-2 allocations available, though. And in any
case, cpu hotplug should't crash on such failures, only fail.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] cpu hotplug vs scheduler [ In reply to ]
> Now it died when allocating the pda:
>
> [4301986.526627] Booting processor 1/6 ip 6000
> [4301986.537195] swapper: page allocation failure. order:2, mode:0x20
> [4301986.537195] Pid: 0, comm: swapper Not tainted 2.6.26-rc2 #728
> [4301986.537195]
> [4301986.537195] Call Trace:
> [4301986.537195] [<ffffffff802388ae>] ? __printk_ratelimit+0x9/0xb
> [4301986.537195] [<ffffffff80272e07>] __alloc_pages_internal+0x387/0x39f
> [4301986.537195] [<ffffffff80272e35>] __alloc_pages+0xb/0xd
> [4301986.537195] [<ffffffff80272e4c>] __get_free_pages+0x15/0x46
> [4301986.537195] [<ffffffff80210cdf>] pda_init+0x87/0xac
> [4301986.537195] [<ffffffff8042fed1>] cpu_init+0x5e/0x34a
> [4301986.537195] [<ffffffff8026b437>] ? rcu_needs_cpu+0x36/0x47
> [4301986.537195] [<ffffffff802311a7>] ? idle_task_exit+0x71/0xad
> [4301986.537195] [<ffffffff804331aa>] start_secondary+0xc/0x178
> [4301986.537195]
> [4301986.537195] Mem-info:
> [4301986.537195] DMA per-cpu:
> [4301986.537195] CPU 0: hi: 0, btch: 1 usd: 0
> [4301986.537195] CPU 2: hi: 0, btch: 1 usd: 0
> [4301986.537195] DMA32 per-cpu:
> [4301986.537195] CPU 0: hi: 186, btch: 31 usd: 153
> [4301986.537195] CPU 2: hi: 186, btch: 31 usd: 135
> [4301986.537195] Active:347879 inactive:109278 dirty:107 writeback:0
> unstable:0
> [4301986.537195] free:5311 slab:10186 mapped:30496 pagetables:1587 bounce:0
> [4301986.537195] DMA free:8012kB min:24kB low:28kB high:36kB active:60kB
> inactive:0kB present:8644kB pages_scanned:0 all_unreclaimable? no
> [4301986.537195] lowmem_reserve[]: 0 1999 1999 1999
> [4301986.537195] DMA32 free:13232kB min:5704kB low:7128kB high:8556kB
> active:1391456kB inactive:437112kB present:2047248kB pages_scanned:0
> all_unreclaimable? no
> [4301986.537195] lowmem_reserve[]: 0 0 0 0
> [4301986.537195] DMA: 19*4kB 16*8kB 10*16kB 11*32kB 6*64kB 4*128kB
> 1*256kB 2*512kB 3*1024kB 1*2048kB 0*4096kB = 8012kB
> [4301986.537195] DMA32: 2034*4kB 570*8kB 1*16kB 0*32kB 0*64kB 0*128kB
> 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 13224kB
> [4301986.537195] 212750 total pagecache pages
> [4301986.537195] Swap cache: add 73, delete 18, find 0/0
> [4301986.537195] Free swap = 3210964kB
> [4301986.537195] Total swap = 3211256kB
>
> There seem to be some order-2 allocations available, though. And in any
> case, cpu hotplug should't crash on such failures, only fail.

Hi Avi,

I think you fixed this specific bug. Does cpu hotplug stresstest now work
for you?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] cpu hotplug vs scheduler [ In reply to ]
Heiko Carstens wrote:
>> Now it died when allocating the pda:
>>
>> [4301986.526627] Booting processor 1/6 ip 6000
>> [4301986.537195] swapper: page allocation failure. order:2, mode:0x20
>> [4301986.537195] Pid: 0, comm: swapper Not tainted 2.6.26-rc2 #728
>> [4301986.537195]
>> [4301986.537195] Call Trace:
>> [4301986.537195] [<ffffffff802388ae>] ? __printk_ratelimit+0x9/0xb
>> [4301986.537195] [<ffffffff80272e07>] __alloc_pages_internal+0x387/0x39f
>> [4301986.537195] [<ffffffff80272e35>] __alloc_pages+0xb/0xd
>> [4301986.537195] [<ffffffff80272e4c>] __get_free_pages+0x15/0x46
>> [4301986.537195] [<ffffffff80210cdf>] pda_init+0x87/0xac
>> [4301986.537195] [<ffffffff8042fed1>] cpu_init+0x5e/0x34a
>> [4301986.537195] [<ffffffff8026b437>] ? rcu_needs_cpu+0x36/0x47
>> [4301986.537195] [<ffffffff802311a7>] ? idle_task_exit+0x71/0xad
>> [4301986.537195] [<ffffffff804331aa>] start_secondary+0xc/0x178
>> [4301986.537195]
>> [4301986.537195] Mem-info:
>> [4301986.537195] DMA per-cpu:
>> [4301986.537195] CPU 0: hi: 0, btch: 1 usd: 0
>> [4301986.537195] CPU 2: hi: 0, btch: 1 usd: 0
>> [4301986.537195] DMA32 per-cpu:
>> [4301986.537195] CPU 0: hi: 186, btch: 31 usd: 153
>> [4301986.537195] CPU 2: hi: 186, btch: 31 usd: 135
>> [4301986.537195] Active:347879 inactive:109278 dirty:107 writeback:0
>> unstable:0
>> [4301986.537195] free:5311 slab:10186 mapped:30496 pagetables:1587 bounce:0
>> [4301986.537195] DMA free:8012kB min:24kB low:28kB high:36kB active:60kB
>> inactive:0kB present:8644kB pages_scanned:0 all_unreclaimable? no
>> [4301986.537195] lowmem_reserve[]: 0 1999 1999 1999
>> [4301986.537195] DMA32 free:13232kB min:5704kB low:7128kB high:8556kB
>> active:1391456kB inactive:437112kB present:2047248kB pages_scanned:0
>> all_unreclaimable? no
>> [4301986.537195] lowmem_reserve[]: 0 0 0 0
>> [4301986.537195] DMA: 19*4kB 16*8kB 10*16kB 11*32kB 6*64kB 4*128kB
>> 1*256kB 2*512kB 3*1024kB 1*2048kB 0*4096kB = 8012kB
>> [4301986.537195] DMA32: 2034*4kB 570*8kB 1*16kB 0*32kB 0*64kB 0*128kB
>> 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 13224kB
>> [4301986.537195] 212750 total pagecache pages
>> [4301986.537195] Swap cache: add 73, delete 18, find 0/0
>> [4301986.537195] Free swap = 3210964kB
>> [4301986.537195] Total swap = 3211256kB
>>
>> There seem to be some order-2 allocations available, though. And in any
>> case, cpu hotplug should't crash on such failures, only fail.
>>
>
> Hi Avi,
>
> I think you fixed this specific bug. Does cpu hotplug stresstest now work
> for you?
>

Which bug, the order-2 allocs? I certainly didn't fix it.

Last I tried, cpu hotplug still failed, but it was some time ago as
patches flow.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] cpu hotplug vs scheduler [ In reply to ]
On Wed, May 21, 2008 at 03:42:51PM +0300, Avi Kivity wrote:
> Heiko Carstens wrote:
>> I think you fixed this specific bug. Does cpu hotplug stresstest now work
>> for you?
>
> Which bug, the order-2 allocs? I certainly didn't fix it.
>
> Last I tried, cpu hotplug still failed, but it was some time ago as
> patches flow.

Ah no. I thought "x86: fix crash on cpu hotplug on pat-incapable machines"
would have fixed the cpu hotplug bug for you.

I was actually talking of the pick_next_task_fair incarnation of one of
the cpu hotplug bugs.

s390 backtrace on 2.6.25 looks like this.

<1>Unable to handle kernel pointer dereference at virtual kernel address 00000200000db000
<4>Oops: 003b [#1] PREEMPT SMP
<4>Modules linked in: dm_multipath sunrpc qeth_l2 dm_mod cu3088 qeth ccwgroup
<4>CPU: 0 Not tainted 2.6.25-27.x.20080422-s390xdefault #1
<4>Process migration/0 (pid: 5806, task: 00000000aac3a838, ksp: 000000017a2f3dd0)
<4>Krnl PSW : 0400200180000000 0000000000125f48 (pick_next_task_fair+0x34/0xd4)
<4> R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:2 PM:0 EA:3
<4>Krnl GPRS: 0000079e029c6ecb 0000000000000000 000000008548ea60 000000008548d700
<4> 00000000004526f8 0000000000000597 0000000000000000 0000000000000000
<4> 0000000000672a00 0000000000000000 00000000aac3a838 000000017a2f3d10
<4> 00000200000db85a 00000000004672a0 000000017a2f3d50 000000017a2f3d10
<4>Krnl Code: 0000000000125f38: e3e0f0980024 stg %r14,152(%r15)
<4> 0000000000125f3e: d507d000c010 clc 0(8,%r13),16(%r12)
<4> 0000000000125f44: a784004e brc 8,125fe0
<4> >0000000000125f48: d507d000c030 clc 0(8,%r13),48(%r12)
<4> 0000000000125f4e: b904002c lgr %r2,%r12
<4> 0000000000125f52: a7990000 lghi %r9,0
<4> 0000000000125f56: a7840033 brc 8,125fbc
<4> 0000000000125f5a: c0e5ffffee23 brasl %r14,123ba0
<4>Call Trace:
<4>([<00000000aac3a838>] 0xaac3a838)
<4> [<000000000045270c>] schedule+0x178/0x978
<4> [<000000000012fe86>] migration_thread+0x1aa/0x30c
<4> [<0000000000152ad8>] kthread+0x68/0xa0
<4> [<000000000010a0be>] kernel_thread_starter+0x6/0xc
<4> [<000000000010a0b8>] kernel_thread_starter+0x0/0xc
<4>Last Breaking-Event-Address:
<4> [<0000000000125fc6>] pick_next_task_fair+0xb2/0xd4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] cpu hotplug vs scheduler [ In reply to ]
Heiko Carstens wrote:
> On Wed, May 21, 2008 at 03:42:51PM +0300, Avi Kivity wrote:
>
>> Heiko Carstens wrote:
>>
>>> I think you fixed this specific bug. Does cpu hotplug stresstest now work
>>> for you?
>>>
>> Which bug, the order-2 allocs? I certainly didn't fix it.
>>
>> Last I tried, cpu hotplug still failed, but it was some time ago as
>> patches flow.
>>
>
> Ah no. I thought "x86: fix crash on cpu hotplug on pat-incapable machines"
> would have fixed the cpu hotplug bug for you.
>
>

No, that only fixes an immediate oops when running in a virtual machine
(which doesn't have pat).

> I was actually talking of the pick_next_task_fair incarnation of one of
> the cpu hotplug bugs.
>
>

That hasn't been addressed, AFAIK.

> s390 backtrace on 2.6.25 looks like this.
>
> <1>Unable to handle kernel pointer dereference at virtual kernel address 00000200000db000
> <4>Oops: 003b [#1] PREEMPT SMP
> <4>Modules linked in: dm_multipath sunrpc qeth_l2 dm_mod cu3088 qeth ccwgroup
> <4>CPU: 0 Not tainted 2.6.25-27.x.20080422-s390xdefault #1
> <4>Process migration/0 (pid: 5806, task: 00000000aac3a838, ksp: 000000017a2f3dd0)
> <4>Krnl PSW : 0400200180000000 0000000000125f48 (pick_next_task_fair+0x34/0xd4)
> <4> R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:2 PM:0 EA:3
> <4>Krnl GPRS: 0000079e029c6ecb 0000000000000000 000000008548ea60 000000008548d700
> <4> 00000000004526f8 0000000000000597 0000000000000000 0000000000000000
> <4> 0000000000672a00 0000000000000000 00000000aac3a838 000000017a2f3d10
> <4> 00000200000db85a 00000000004672a0 000000017a2f3d50 000000017a2f3d10
> <4>Krnl Code: 0000000000125f38: e3e0f0980024 stg %r14,152(%r15)
> <4> 0000000000125f3e: d507d000c010 clc 0(8,%r13),16(%r12)
> <4> 0000000000125f44: a784004e brc 8,125fe0
> <4> >0000000000125f48: d507d000c030 clc 0(8,%r13),48(%r12)
> <4> 0000000000125f4e: b904002c lgr %r2,%r12
> <4> 0000000000125f52: a7990000 lghi %r9,0
> <4> 0000000000125f56: a7840033 brc 8,125fbc
> <4> 0000000000125f5a: c0e5ffffee23 brasl %r14,123ba0
> <4>Call Trace:
> <4>([<00000000aac3a838>] 0xaac3a838)
> <4> [<000000000045270c>] schedule+0x178/0x978
> <4> [<000000000012fe86>] migration_thread+0x1aa/0x30c
> <4> [<0000000000152ad8>] kthread+0x68/0xa0
> <4> [<000000000010a0be>] kernel_thread_starter+0x6/0xc
> <4> [<000000000010a0b8>] kernel_thread_starter+0x0/0xc
> <4>Last Breaking-Event-Address:
> <4> [<0000000000125fc6>] pick_next_task_fair+0xb2/0xd4
>

Seems to be the same bug, yes.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/