Mailing List Archive

[PATCH] Re: Another blktap2-ish shutdown crash
On Wed, 2010-04-28 at 20:37 -0400, Jeremy Fitzhardinge wrote:
> Crash on shutdown with one domain running:

Sending this one separately, since it doesn't seem to belong onto the
blktap2 branch.

This only broke after the forward port in

commit ab77527f9a63a5c657c6d6a50e70a66adceaa3a0
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date: Thu Feb 18 17:22:09 2010 -0800

xen/blktap2: make compile

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

The blk_* queue calls won't implicitly dequeue anymore, where the prior
elv_* calls did. I didn't play much around with it, but I'm pretty sure
the patch below will help.

Cheers,
Daniel

> Sending all processes the TERM signal... [ OK ]
> blktap_ring_vm_close: unmapping ring 0
> blktap_ring_release: freeing device 0
> blktap_device_do_request: device closed: failing secs 5481032 - 5481040
> ------------[ cut here ]------------
> kernel BUG at block/blk-core.c:2111!
> invalid opcode: 0000 [#1] PREEMPT SMP
> last sysfs file: /sys/devices/virtual/sound/timer/uevent
> CPU 0
> Modules linked in: bridge stp llc nf_conntrack_netbios_ns dm_multipath e1000e thermal processor acpi_processor xts gf128mul cryptd aes_x86_64 aes_generic dm_crypt dm_snapshot dm_zero dm_mirror dm_region_hash dm_log raid10 ahci [last unloaded: microcode]
> Pid: 13481, comm: blkback.3.xvda Not tainted 2.6.32.12 #9 X8SIL
> RIP: e030:[<ffffffff811c3095>] [<ffffffff811c3095>] blk_finish_request+0x2e/0x27f
> RSP: e02b:ffff8800b332d7a0 EFLAGS: 00010087
> RAX: 0000000000000001 RBX: ffff8800bd1c25a0 RCX: 0000000000000000
> RDX: ffff88009ce4f000 RSI: 0000000000000000 RDI: ffff8800bd1c25a0
> RBP: ffff8800b332d7f0 R08: 0000000000000000 R09: 0000000000000001
> R10: ffffffff81036d5a R11: ffffffff81036d5a R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000001000
> FS: 00007fc7e7c52700(0000) GS:ffff8800032c1000(0000) knlGS:0000000000000000
> CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000003629227b28 CR3: 0000000001001000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process blkback.3.xvda (pid: 13481, threadinfo ffff8800b332c000, task ffff88009ce4f000)
> Stack:
> 0000000000000000 0000000000000000 ffff8800b332d7c0 ffffffff8124871b
> <0> ffff8800b332d7f0 ffff8800bd1c25a0 0000000000000000 0000000000000000
> <0> 0000000000000000 0000000000001000 ffff8800b332d810 ffffffff811c34eb
> Call Trace:
> [<ffffffff8124871b>] ? add_disk_randomness+0x29/0x2b
> [<ffffffff811c34eb>] __blk_end_request+0x27/0x2e
> [<ffffffff811c355e>] __blk_end_request_cur+0x30/0x32
> [<ffffffff812401e3>] blktap_device_do_request+0x279/0x298
> [<ffffffff81066ab5>] ? trace_hardirqs_off+0xd/0xf
> [<ffffffff81489edc>] ? _spin_unlock_irqrestore+0x56/0x74
> [<ffffffff8104d553>] ? del_timer+0x79/0x85
> [<ffffffff811c3e01>] __generic_unplug_device+0x30/0x35
> [<ffffffff811c1892>] elv_insert+0x190/0x199
> [<ffffffff811c192e>] __elv_add_request+0x93/0x9a
> [<ffffffff811c4891>] __make_request+0x361/0x3de
> [<ffffffff811c2cc1>] generic_make_request+0x29d/0x2f9
> [<ffffffff810a12a3>] ? mempool_alloc_slab+0x10/0x12
> [<ffffffff811c2e42>] submit_bio+0x125/0x146
> [<ffffffff8100fe32>] ? check_events+0x12/0x20
> [<ffffffff8123bcda>] dispatch_rw_block_io+0x57c/0x641
> [<ffffffff8100fe1f>] ? xen_restore_fl_direct_end+0x0/0x1
> [<ffffffff8106a196>] ? lock_release+0x15a/0x165
> [<ffffffff8100f539>] ? xen_force_evtchn_callback+0xd/0xf
> [<ffffffff8100fe32>] ? check_events+0x12/0x20
> [<ffffffff8100f539>] ? xen_force_evtchn_callback+0xd/0xf
> [<ffffffff810581af>] ? prepare_to_wait+0x1e/0x69
> [<ffffffff8100f539>] ? xen_force_evtchn_callback+0xd/0xf
> [<ffffffff8100fe32>] ? check_events+0x12/0x20
> [<ffffffff810581af>] ? prepare_to_wait+0x1e/0x69
> [<ffffffff8103d2e6>] ? cpuacct_charge+0x20/0xc6
> [<ffffffff8103d346>] ? cpuacct_charge+0x80/0xc6
> [<ffffffff8105af04>] ? lock_hrtimer_base+0x25/0x4b
> [<ffffffff8100fe32>] ? check_events+0x12/0x20
> [<ffffffff8105af04>] ? lock_hrtimer_base+0x25/0x4b
> [<ffffffff8105af04>] ? lock_hrtimer_base+0x25/0x4b
> [<ffffffff8100fe32>] ? check_events+0x12/0x20
> [<ffffffff8105af04>] ? lock_hrtimer_base+0x25/0x4b
> [<ffffffff8123be03>] ? do_block_io_op+0x64/0x2bc
> [<ffffffff8100f539>] ? xen_force_evtchn_callback+0xd/0xf
> [<ffffffff8100fe32>] ? check_events+0x12/0x20
> [<ffffffff8123be03>] ? do_block_io_op+0x64/0x2bc
> [<ffffffff8123be03>] ? do_block_io_op+0x64/0x2bc
> [<ffffffff8123be03>] ? do_block_io_op+0x64/0x2bc
> [<ffffffff8100f539>] ? xen_force_evtchn_callback+0xd/0xf
> [<ffffffff8100fe32>] ? check_events+0x12/0x20
> [<ffffffff8123be03>] ? do_block_io_op+0x64/0x2bc
> [<ffffffff8123be03>] ? do_block_io_op+0x64/0x2bc
> [<ffffffff8123bfe7>] do_block_io_op+0x248/0x2bc
> [<ffffffff8123c2ae>] blkif_schedule+0x1d2/0x290
> [<ffffffff81057fd3>] ? autoremove_wake_function+0x0/0x34
> [<ffffffff8123c0dc>] ? blkif_schedule+0x0/0x290
> [<ffffffff81057d61>] kthread+0x7a/0x82
> [<ffffffff81014cba>] child_rip+0xa/0x20
> [<ffffffff81014614>] ? restore_args+0x0/0x30
> [<ffffffff81014cb0>] ? child_rip+0x0/0x20
> Code: e5 41 57 41 56 41 89 f6 41 55 41 54 53 48 89 fb 48 83 ec 28 f6 47 49 10 74 0c 48 8b 7f 40 48 89 de e8 df 1a 00 00 48 39 1b 74 04 <0f> 0b eb fe 83 3d 38 7c f5 00 00 74 0b 83 7b 4c 01 75 05 e8 66
> RIP [<ffffffff811c3095>] blk_finish_request+0x2e/0x27f
> RSP <ffff8800b332d7a0>
> ---[ end trace 213b85711bd407a1 ]---
> note: blkback.3.xvda[13481] exited with preempt_count 1
> Sending all processes the KILL signal... [ OK ]
> Saving random seed: [ OK ]
> Syncing hardware clock to system time type=1111 audit(1272501348.517:54918): user pid=21683 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:hwclock_t:s0 msg='changing system time: exe="/sbin/hwclock" hostname=? addr=? terminal=console res=success'
> [ OK ]
> Turning off swap: [ OK ]
> Turning off quotas: [ OK ]
> Unmounting file systems: [ OK ]
> Stopping disk encryption for luks-11eb0145-088a-4c13-acf4-9784e82bb1e3 [ OK ]
> Stopping disk encryption for luks-37e6b844-a112-427c-be60-ae6f4d9c5041 [ OK ]
> mount: /dev/shm not mounted already, or bad option
> Please stand by while rebooting the system...
> type=1128 audit(1272501349.996:54919): user pid=21439 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:initrc_t:s0 msg='init: exe="/sbin/reboot" hostname=? addr=? terminal=console res=success'
> md: stopping all md devices.
> md: md0 switched to read-only mode.
> xenbus_dev_shutdown: backend/console/3/0: Initialising != Connected, skipping
>
>
[PATCH] Re: Another blktap2-ish shutdown crash [ In reply to ]
>>> On 03.06.10 at 03:50, Daniel Stodden <daniel.stodden@citrix.com> wrote:

Why would you want blk_start_request() only after the blk_fs_request()
check, but not after the blk_barrier_rq() one?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH] Re: Another blktap2-ish shutdown crash [ In reply to ]
On Mon, 2010-06-07 at 03:29 -0400, Jan Beulich wrote:
> >>> On 03.06.10 at 03:50, Daniel Stodden <daniel.stodden@citrix.com> wrote:
>
> Why would you want blk_start_request() only after the blk_fs_request()
> check, but not after the blk_barrier_rq() one?

Huh? But cases did get the blk_start_request call (?!)

Daniel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH] Re: Another blktap2-ish shutdown crash [ In reply to ]
>>> On 07.06.10 at 13:12, Daniel Stodden <daniel.stodden@citrix.com> wrote:
> On Mon, 2010-06-07 at 03:29 -0400, Jan Beulich wrote:
>> >>> On 03.06.10 at 03:50, Daniel Stodden <daniel.stodden@citrix.com> wrote:
>>
>> Why would you want blk_start_request() only after the blk_fs_request()
>> check, but not after the blk_barrier_rq() one?
>
> Huh? But cases did get the blk_start_request call (?!)

I have to admit that I don't understand your response at all.

Assuming that you think my original question was rubbish, this is
the original (before your patch) code I look at

while ((req = blk_peek_request(rq)) != NULL) {
if (!blk_fs_request(req)) {
blk_end_request(req, -EIO, 0);
continue;
}

if (blk_barrier_rq(req)) {
blk_end_request(req, -EIO, 0);
continue;
}
...
blk_start_request(req);
...

Your patch inserts a call to blk_start_request() into the
first if clause's body, and I was asking why the second
one's wouldn't also need such a call.

Sorry if I'm being dense - I'll appreciate any enlightenment.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: [PATCH] Re: Another blktap2-ish shutdown crash [ In reply to ]
On Mon, 2010-06-07 at 09:43 -0400, Jan Beulich wrote:
> >>> On 07.06.10 at 13:12, Daniel Stodden <daniel.stodden@citrix.com> wrote:
> > On Mon, 2010-06-07 at 03:29 -0400, Jan Beulich wrote:
> >> >>> On 03.06.10 at 03:50, Daniel Stodden <daniel.stodden@citrix.com> wrote:
> >>
> >> Why would you want blk_start_request() only after the blk_fs_request()
> >> check, but not after the blk_barrier_rq() one?
> >
> > Huh? But cases did get the blk_start_request call (?!)
>
> I have to admit that I don't understand your response at all.

".. both ..", sorrry :)

> Assuming that you think my original question was rubbish, this is
> the original (before your patch) code I look at
>
> while ((req = blk_peek_request(rq)) != NULL) {
> if (!blk_fs_request(req)) {
> blk_end_request(req, -EIO, 0);
> continue;
> }
>
> if (blk_barrier_rq(req)) {
> blk_end_request(req, -EIO, 0);
> continue;
> }
> ...
> blk_start_request(req);
> ...
>
> Your patch inserts a call to blk_start_request() into the
> first if clause's body, and I was asking why the second
> one's wouldn't also need such a call.
>
> Sorry if I'm being dense - I'll appreciate any enlightenment.

This is the relevant hunk from

http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=blobdiff;f=drivers/xen/blktap/device.c;h=8650f028485a6ac10bcb89ebfeaa5756ff2204e6;hp=a50b622ef018c0db88f0d491ce389768c570a4bc;hb=b34e5649ea3fc0f70f87e104c9b08a0dc013b6bb;hpb=4b647759e9f4afd46c23e26e54241a9dfa7d8f4b

while ((req = blk_peek_request(rq)) != NULL) {
if (!blk_fs_request(req)) {
+ blk_start_request(req);
__blk_end_request_cur(req, 0);
continue;
}

if (blk_barrier_rq(req)) {
+ blk_start_request(req);
__blk_end_request_cur(req, 0);
continue;
}

Looks correct to me. What are you looking at?

Cheers,
Daniel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel