Mailing List Archive

A weird bug in Xen networking?
I think I've hit a weird and mostly hidden bug in Xen, but I'm not 100%
sure...

Here's the setup - I have a OpenSuSE 11.2 based Dom0 (Xen 3.4.1). Dom0
is also acting as a router / firewall and it provides WAN connectivity
for DomU's by means of IPSEC (OpenSwan). I use 'bridged' networking for
DomU's, there are several NIC's as each DomU belongs to a separate
subnet. Dom0's bridge interfaces have an IP also belonging to respective
subnet and this IP is used as a gateway for the subnet.

DomU's are also OpenSuSE 11.2. I use 'cfengine' to centrally manage most
of the configuration and (custom) software distribution.

That's where things go south - when I run cfengine's 'cfagent', it runs
and it works up to a point where it just hangs. I can interrupt it with
'CTRL-C' or I can wait till it timeout's (socket timeout). Initially I
thought it's cfengine's problem, but then I noticed that a similar thing
happens when I connect to a DomU with SSH and run 'ls -lR /' - it goes
through some directories but eventually it just stalls (and I have to
disconnect the SSH session to 'get out').

Everytime such a 'hang' happens I see some OpenSwan / ipsec errors on Dom0:

klips_error:ipsec_xmit_encap_once: tried to skb_put 20, 16
available. This should never happen, please report.

The numbers vary somewhat (sometimes it's 21, 17 instead 20,16).

I posted all my 'findings' on OpenSwam mailing list thinking it might be
an OpenSwan issue, but one of the developers said it doesn't look like
'their' issue and that I should talk to 'Xen guys'. Here is the relevant
part of his reply:

>
> Yeah, this does not seem to be an openswan bug. The code in question is:
> (one instance of it):
>
> /* Set the data pointer */
> skb_reserve(n,skb->data-skb->head+headroom);
> /* Set the tail pointer and length */
> if(skb_tailroom(n) < skb->len) {
> printk(KERN_WARNING "klips_error:skb_copy_expand: "
> "tried to skb_put %ld, %d available. This
> should never happen, please report.\n",
> (unsigned long int)skb->len,
> skb_tailroom(n));
> ipsec_kfree_skb(n);
> return NULL;
> }
>
> I would check with the xen people to see what might be going on.

So here I am, asking the 'Xen guys'.

Does anyone have any idea what might be going on?


Regards, Danilo



_______________________________________________
Xen-community mailing list
Xen-community@lists.xensource.com
http://lists.xensource.com/mailman/listinfo/xen-community