Mailing List Archive

Xen and iscsitarget
Hi all,

I've been using Xen for a few months now and I'm quite impressed. Not
only is the virtualization platform stable and well-designed, but the
surrounding administrative tools and processes are usable and complete.
(Although I admit, I was kind of disappointed with the performance until I
realized recently that I hadn't compiled support for my ATA chipset into
the dom0 kernel and all disk I/O was PIO.)

My problem is that I'm unable to get the Cisco linux-iscsi initiator
running under Xen. It seems other people have been able to do this so
maybe I'm doing something wrong.

I'm running the Xen 2.0 release candidate (which, BTW, is very nice
otherwise, I've had no other problems) and have iscsitarget 0.3.4 running
on an external host. The iscsitarget has been working fine with non-Xen
hosts.

Unfortunately, I'm unable to get unprivileged domains to talk to the
iscsitarget. Using Linux 2.4.27-xenU with linux-iscsi 3.6.2 I get the
error "xmit_data failed to send 8240 bytes, rc 48", and using Linux
2.6.9-xenU with linux-iscsi 4.0.1.10 it simply locks up the whole kernel
on session initiation. (Pings still work though.)

In domain 0, I can use linux-iscsi-4.0.1.10 to talk to the target
successfully, however, under load I will often get syslog messages
"iscsi-tx: page allocation failure. order:0, mode:0x20" and a big call
trace (see below). These apparently cause no real problems though, and
all my data seems to be intact and the filesystem works fine.

Domain: Kernel: Linux-iscsi: Result:
unpriv 2.4.27-xenU 3.6.2 "xmit_data failed"
unpriv 2.6.9-xenU 4.0.1.10 immediate lock-up
domain 0 2.6.9-xen0 4.0.1.10 works fine, with strange errs

The lock-up with 2.6.9 happens after less than a kilobyte of data has been
transmitted in either direction.

My unprivileged domains otherwise work fine and I've pumped many
gigabytes of data between them and the network.

Since this only breaks from an unprivileged domain, I'm guessing that
there's some incompatibility between how VIFs are implemented and how
linux-iscsi wants to use them.

Can anybody suggest kernel and linux-iscsi versions that seem to work
correctly? -Nathan


Using Linux 2.4.27-xenU with linux-iscsi 3.6.2:

Nov 4 12:39:56 iscsi-test kernel: iSCSI: session c0724000 xmit_data
failed to send 8240 bytes, rc 48
Nov 4 12:39:56 iscsi-test kernel: iSCSI: session c0724000 to
iqn.1998-07.org.litech.daily-post:storage.test.loop0 dropped
Nov 4 12:39:56 iscsi-test kernel: iSCSI: session c0724000 to
iqn.1998-07.org.litech.daily-post:storage.test.loop0 waiting 2 seconds
before next login attempt
Nov 4 12:39:58 iscsi-test kernel: iSCSI: bus 0 target 2 trying to
establish session c0724000 to portal 0, address 192.168.16.24 port 3260
group 1
Nov 4 12:39:58 iscsi-test kernel: iSCSI: session c0724000 login
negotiation failed, can't accept =NotUnderstood in security stage
Nov 4 12:39:58 iscsi-test kernel: iSCSI: session c0724000 may be in use,
retrying login to portal 0 at 35927
Nov 4 12:39:58 iscsi-test kernel: iSCSI: session c0724000 to
iqn.1998-07.org.litech.daily-post:storage.test.loop0 waiting 1 seconds
before next login attempt

Using Linux 2.6.9-xen0 with linux-iscsi 4.0.1.10:

Nov 4 13:15:13 faboo kernel: iscsi-tx: page allocation failure. order:0,
mode:0x20
Nov 4 13:15:13 faboo kernel: [__alloc_pages+447/880]
__alloc_pages+0x1bf/0x370
Nov 4 13:15:13 faboo kernel: [<c013e1df>]
__alloc_pages+0x1bf/0x370
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [__get_free_pages+31/64]
__get_free_pages+0x1f/0x40
Nov 4 13:15:13 faboo kernel: [<c013e3af>] __get_free_pages+0x1f/0x40
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [kmem_getpages+34/224]
kmem_getpages+0x22/0xe0
Nov 4 13:15:13 faboo kernel: [<c0141df2>] kmem_getpages+0x22/0xe0
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [cache_grow+187/400] cache_grow+0xbb/0x190
Nov 4 13:15:13 faboo kernel: [<c0142b6b>] cache_grow+0xbb/0x190
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [cache_alloc_refill+223/560]
cache_alloc_refill+0xdf/0x230
Nov 4 13:15:13 faboo kernel: [<c0142d1f>] cache_alloc_refill+0xdf/0x230
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [__kmalloc+141/160] __kmalloc+0x8d/0xa0
Nov 4 13:15:13 faboo kernel: [<c014330d>] __kmalloc+0x8d/0xa0
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [alloc_skb+71/224] alloc_skb+0x47/0xe0
Nov 4 13:15:13 faboo kernel: [<c02902d7>] alloc_skb+0x47/0xe0
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [tcp_sendmsg+4076/4528]
tcp_sendmsg+0xfec/0x11b0
Nov 4 13:15:13 faboo kernel: [<c02b3ecc>] tcp_sendmsg+0xfec/0x11b0
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [cache_alloc_refill+223/560]
cache_alloc_refill+0xdf/0x230
Nov 4 13:15:13 faboo kernel: [<c0142d1f>] cache_alloc_refill+0xdf/0x230
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [inet_sendmsg+77/96] inet_sendmsg+0x4d/0x60
Nov 4 13:15:13 faboo kernel: [<c02d57dd>] inet_sendmsg+0x4d/0x60
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [sock_sendmsg+229/256]
sock_sendmsg+0xe5/0x100
Nov 4 13:15:13 faboo kernel: [<c028c225>] sock_sendmsg+0xe5/0x100
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [ip_rcv_finish+0/656]
ip_rcv_finish+0x0/0x290
Nov 4 13:15:13 faboo kernel: [<c02aa850>] ip_rcv_finish+0x0/0x290
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [inet_sendmsg+77/96] inet_sendmsg+0x4d/0x60
Nov 4 13:15:13 faboo kernel: [<c02d57dd>] inet_sendmsg+0x4d/0x60
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [sock_sendmsg+229/256]
sock_sendmsg+0xe5/0x100
Nov 4 13:15:13 faboo kernel: [<c028c225>] sock_sendmsg+0xe5/0x100
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [autoremove_wake_function+0/96]
autoremove_wake_function+0x0/0x60
Nov 4 13:15:13 faboo kernel: [<c011cc10>]
autoremove_wake_function+0x0/0x60
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [kernel_sendmsg+70/96]
kernel_sendmsg+0x46/0x60
Nov 4 13:15:13 faboo kernel: [<c028c286>] kernel_sendmsg+0x46/0x60
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [sock_no_sendpage+138/144]
sock_no_sendpage+0x8a/0x90
Nov 4 13:15:13 faboo kernel: [<c028f92a>] sock_no_sendpage+0x8a/0x90
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [tcp_sendpage+81/160]
tcp_sendpage+0x51/0xa0
Nov 4 13:15:13 faboo kernel: [<c02b2e91>] tcp_sendpage+0x51/0xa0
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [pg0+143619158/1002090496]
iscsi_sendpage+0xc6/0x100 [iscsi_sfnet]
Nov 4 13:15:13 faboo kernel: [<c8d49456>] iscsi_sendpage+0xc6/0x100
[iscsi_sfnet]
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [pg0+143637917/1002090496]
iscsi_xmit_data+0x66d/0xbb0 [iscsi_sfnet]
Nov 4 13:15:13 faboo kernel: [<c8d4dd9d>] iscsi_xmit_data+0x66d/0xbb0
[iscsi_sfnet]
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [pg0+143613807/1002090496]
iscsi_xmit_task+0x39f/0x540 [iscsi_sfnet]
Nov 4 13:15:13 faboo kernel: [<c8d47f6f>] iscsi_xmit_task+0x39f/0x540
[iscsi_sfnet]
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [pg0+143639673/1002090496]
iscsi_xmit_r2t_data+0xc9/0x1c0 [iscsi_sfnet]
Nov 4 13:15:13 faboo kernel: [<c8d4e479>] iscsi_xmit_r2t_data+0xc9/0x1c0
[iscsi_sfnet]
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [pg0+143588753/1002090496]
process_tx_requests+0x291/0x340 [iscsi_sfnet]
Nov 4 13:15:13 faboo kernel: [<c8d41d91>]
process_tx_requests+0x291/0x340 [iscsi_sfnet]
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [autoremove_wake_function+0/96]
autoremove_wake_function+0x0/0x60
Nov 4 13:15:13 faboo kernel: [<c011cc10>]
autoremove_wake_function+0x0/0x60
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [autoremove_wake_function+0/96]
autoremove_wake_function+0x0/0x60
Nov 4 13:15:13 faboo kernel: [<c011cc10>]
autoremove_wake_function+0x0/0x60
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [pg0+143589280/1002090496]
iscsi_tx_thread+0x160/0x1e0 [iscsi_sfnet]
Nov 4 13:15:13 faboo kernel: [<c8d41fa0>] iscsi_tx_thread+0x160/0x1e0
[iscsi_sfnet]
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [default_wake_function+0/32]
default_wake_function+0x0/0x20
Nov 4 13:15:13 faboo kernel: [<c011b750>] default_wake_function+0x0/0x20
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [pg0+143588928/1002090496]
iscsi_tx_thread+0x0/0x1e0 [iscsi_sfnet]
Nov 4 13:15:13 faboo kernel: [<c8d41e40>] iscsi_tx_thread+0x0/0x1e0
[iscsi_sfnet]
Nov 4 13:15:13 faboo kernel:
Nov 4 13:15:13 faboo kernel: [kernel_thread_helper+5/16]
kernel_thread_helper+0x5/0x10
Nov 4 13:15:13 faboo kernel: [<c01100d5>] kernel_thread_helper+0x5/0x10
Nov 4 13:15:13 faboo kernel:
Re: Xen and iscsitarget [ In reply to ]
Nathan Lutchansky wrote:

>Hi all,
>
>I've been using Xen for a few months now and I'm quite impressed. Not
>only is the virtualization platform stable and well-designed, but the
>surrounding administrative tools and processes are usable and complete.
>(Although I admit, I was kind of disappointed with the performance until I
>realized recently that I hadn't compiled support for my ATA chipset into
>the dom0 kernel and all disk I/O was PIO.)
>
>My problem is that I'm unable to get the Cisco linux-iscsi initiator
>running under Xen. It seems other people have been able to do this so
>maybe I'm doing something wrong.
>
>I'm running the Xen 2.0 release candidate (which, BTW, is very nice
>otherwise, I've had no other problems) and have iscsitarget 0.3.4 running
>on an external host. The iscsitarget has been working fine with non-Xen
>hosts.
>
>Unfortunately, I'm unable to get unprivileged domains to talk to the
>iscsitarget. Using Linux 2.4.27-xenU with linux-iscsi 3.6.2 I get the
>error "xmit_data failed to send 8240 bytes, rc 48", and using Linux
>2.6.9-xenU with linux-iscsi 4.0.1.10 it simply locks up the whole kernel
>on session initiation. (Pings still work though.)
>
>In domain 0, I can use linux-iscsi-4.0.1.10 to talk to the target
>successfully, however, under load I will often get syslog messages
>"iscsi-tx: page allocation failure. order:0, mode:0x20" and a big call
>trace (see below). These apparently cause no real problems though, and
>all my data seems to be intact and the filesystem works fine.
>
>Domain: Kernel: Linux-iscsi: Result:
>unpriv 2.4.27-xenU 3.6.2 "xmit_data failed"
>unpriv 2.6.9-xenU 4.0.1.10 immediate lock-up
>domain 0 2.6.9-xen0 4.0.1.10 works fine, with strange errs
>
>The lock-up with 2.6.9 happens after less than a kilobyte of data has been
>transmitted in either direction.
>
>My unprivileged domains otherwise work fine and I've pumped many
>gigabytes of data between them and the network.
>
>Since this only breaks from an unprivileged domain, I'm guessing that
>there's some incompatibility between how VIFs are implemented and how
>linux-iscsi wants to use them.
>
>Can anybody suggest kernel and linux-iscsi versions that seem to work
>correctly? -Nathan
>
>
>
I am running a Fedora Core 2 as the OS on disk for Xen0
The /proc/version string is: Linux version 2.6.8.1-xen0
(root@xenmaster1) (gcc version 3.3.3 20040412 (Red Hat Linux 3.3.3-7))
#1 Fri Oct 15 17:47:44 EDT 2004
the version of iscsi is: iscsid version 4:0.1.10 ( 8-Oct-2004)

I am not saying that this is a fact but I have found that:
1) run iSCSI over gigabit ethernet only.
2) insure flow control is turned on for all the switches and NIC's
3) use only e10000 or sysconnect NIC's

I have had lots of weird problems with via-rhine and 100BT connecitons.

I am using the IET(iscsitarget) 0.3.2

I am importing 3 target sytemss as an array and having the XenU's do the
actual raiding of them. So far So Good. Knock on wood. Keep my fingers
crossed





--
Alvin Starr || voice: (416)585-9971
Interlink Connectivity || fax: (416)785-3668
alvin@iplink.net ||




-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel