Mailing List Archive

compound skb frag pages appearing in start_xmit
Hi Eric,

Sander has discovered an issue where xen-netback is given a compound
page as one of the skb frag pages to transmit. Currently netback can
only handle PAGE_SIZE'd frags and bugs out.

I suspect this is something to do with 69b08f62e174 "net: use bigger
pages in __netdev_alloc_frag", although perhaps not because it looks
like only tg3 uses it and Sander has an r8169. Also tg3 seems to only
call netdev_alloc_frag for sizes < PAGE_SIZE. I'm probably missing
something.

Are all net drivers expected to be able to handle compound pages in the
frags? Obviously it is to their benefit to do so, so it is something
I'll want to look into for netback.

I expect the main factor here is bridging/forwarding, since the
receiving NIC and its driver appear to support compound pages but the
outgoing NIC (netback in this case) does not.

I guess my question is should I be rushing to fix netback ASAP or should
I rather be looking for a bug somewhere which caused a frag of this type
to get as far as netback's start_xmit in the first place?

Or am I just barking up the wrong tree to start with?

Thanks,
Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Tue, 2012-10-09 at 14:47 +0100, Ian Campbell wrote:
> Hi Eric,
>

Hi Ian

> Sander has discovered an issue where xen-netback is given a compound
> page as one of the skb frag pages to transmit. Currently netback can
> only handle PAGE_SIZE'd frags and bugs out.
>
> I suspect this is something to do with 69b08f62e174 "net: use bigger
> pages in __netdev_alloc_frag", although perhaps not because it looks
> like only tg3 uses it and Sander has an r8169. Also tg3 seems to only
> call netdev_alloc_frag for sizes < PAGE_SIZE. I'm probably missing
> something.


Its not the commit you want ;)

>
> Are all net drivers expected to be able to handle compound pages in the
> frags? Obviously it is to their benefit to do so, so it is something
> I'll want to look into for netback.
>

Not sure why a net driver would care of COMPOUND page at all ?

a Fragment has a struct page *, and a size.

a page can be order-0, order-1, order-2, order-3, ...

> I expect the main factor here is bridging/forwarding, since the
> receiving NIC and its driver appear to support compound pages but the
> outgoing NIC (netback in this case) does not.
>
> I guess my question is should I be rushing to fix netback ASAP or should
> I rather be looking for a bug somewhere which caused a frag of this type
> to get as far as netback's start_xmit in the first place?
>
> Or am I just barking up the wrong tree to start with?



The problem comes because of

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=5640f7685831e088fe6c2e1f863a6805962f8e81

And yes, we must find a way to cope with this problem in your driver,
because you can also benefit from increase of performance once fixed ;)

And yes I can certainly help, as I am the author of this patch ;)

Thanks



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Tue, 2012-10-09 at 15:54 +0200, Eric Dumazet wrote:
> On Tue, 2012-10-09 at 14:47 +0100, Ian Campbell wrote:
> > Hi Eric,
> >
>
> Hi Ian
>
> > Sander has discovered an issue where xen-netback is given a compound
> > page as one of the skb frag pages to transmit. Currently netback can
> > only handle PAGE_SIZE'd frags and bugs out.
> >
> > I suspect this is something to do with 69b08f62e174 "net: use bigger
> > pages in __netdev_alloc_frag", although perhaps not because it looks
> > like only tg3 uses it and Sander has an r8169. Also tg3 seems to only
> > call netdev_alloc_frag for sizes < PAGE_SIZE. I'm probably missing
> > something.
>
>
> Its not the commit you want ;)

Hmm, I take it back. It also can give you the same problem :

We use this allocator for rx path of drivers :

__netdev_alloc_skb()

So its now absolutely possible that one skb->head is backed by a order-3
page.

Is the problem coming from xen_netbk_count_skb_slots() ?

Give me more information if you want me to help.






_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Tue, 2012-10-09 at 14:54 +0100, Eric Dumazet wrote:
> On Tue, 2012-10-09 at 14:47 +0100, Ian Campbell wrote:
> > Hi Eric,
> >
>
> Hi Ian
>
> > Sander has discovered an issue where xen-netback is given a compound
> > page as one of the skb frag pages to transmit. Currently netback can
> > only handle PAGE_SIZE'd frags and bugs out.
> >
> > I suspect this is something to do with 69b08f62e174 "net: use bigger
> > pages in __netdev_alloc_frag", although perhaps not because it looks
> > like only tg3 uses it and Sander has an r8169. Also tg3 seems to only
> > call netdev_alloc_frag for sizes < PAGE_SIZE. I'm probably missing
> > something.
>
>
> Its not the commit you want ;)
>
> >
> > Are all net drivers expected to be able to handle compound pages in the
> > frags? Obviously it is to their benefit to do so, so it is something
> > I'll want to look into for netback.
> >
>
> Not sure why a net driver would care of COMPOUND page at all ?
>
> a Fragment has a struct page *, and a size.
>
> a page can be order-0, order-1, order-2, order-3, ...

I keep falling into this trap that a struct page * can be order > 0.

The Xen PV interfaces deal in order-0 pages only. Also things which are
contiguous in physical space may not be contiguous in DMA space (which
we call "machine memory" in Xen terminology).

The first is probably a specific quirk of Xen, but I thought there were
other architectures where physical and DMA space we not necessarily
contiguous and which would therefore need special handling (I guess
those platforms all have IOMMUs)

> > I expect the main factor here is bridging/forwarding, since the
> > receiving NIC and its driver appear to support compound pages but the
> > outgoing NIC (netback in this case) does not.
> >
> > I guess my question is should I be rushing to fix netback ASAP or should
> > I rather be looking for a bug somewhere which caused a frag of this type
> > to get as far as netback's start_xmit in the first place?
> >
> > Or am I just barking up the wrong tree to start with?
>
>
>
> The problem comes because of
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=5640f7685831e088fe6c2e1f863a6805962f8e81
>
> And yes, we must find a way to cope with this problem in your driver,
> because you can also benefit from increase of performance once fixed ;)
>
> And yes I can certainly help, as I am the author of this patch ;)

I think I can mostly deal with this in the same way netback deals with
large skb heads i.e. by busting the multipage page into individual 4096
page chunks.

Does the higher order pages effectively reduce the number of frags which
are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0 pages you could
have 64K worth of frag data.

If we switch to order-3 pages everywhere then can the skb contain 512K
of data, or does the effective maximum number of frags in an skb reduce
to 2?

If it's the latter then I think fixing netback is simple, if it's the
former then I might need to think a bit harder.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Tue, 2012-10-09 at 15:01 +0100, Eric Dumazet wrote:
> On Tue, 2012-10-09 at 15:54 +0200, Eric Dumazet wrote:
> > On Tue, 2012-10-09 at 14:47 +0100, Ian Campbell wrote:
> > > Hi Eric,
> > >
> >
> > Hi Ian
> >
> > > Sander has discovered an issue where xen-netback is given a compound
> > > page as one of the skb frag pages to transmit. Currently netback can
> > > only handle PAGE_SIZE'd frags and bugs out.
> > >
> > > I suspect this is something to do with 69b08f62e174 "net: use bigger
> > > pages in __netdev_alloc_frag", although perhaps not because it looks
> > > like only tg3 uses it and Sander has an r8169. Also tg3 seems to only
> > > call netdev_alloc_frag for sizes < PAGE_SIZE. I'm probably missing
> > > something.
> >
> >
> > Its not the commit you want ;)
>
> Hmm, I take it back. It also can give you the same problem :
>
> We use this allocator for rx path of drivers :
>
> __netdev_alloc_skb()
>
> So its now absolutely possible that one skb->head is backed by a order-3
> page.
>
> Is the problem coming from xen_netbk_count_skb_slots() ?
>
> Give me more information if you want me to help.

The interesting code is in netbk_gop_skb(), specifically the two calls
to netbk_gop_frag_copy.

netbk_gop_frag_copy can only copy order-0 pages to the peer since they
go over a shared ring transport which can only deal in order-0 pages.

For the SKB head there is a loop which handles order>0 heads, I suspect
we just need something similar for the frag case.

Although see my question in the other response about the maximum number
of frags we can have when order is > 0 since if using larger pages
causes us to end up with a much larger number of order-0 pages once
we've broken them up then we have a problem and I need to put my
thinking cap on a bit (perhaps substantially) tighter.

Konrad, it looks like netfront has a similar issue in
xennet_make_frags() since it doesn't shatter large order mappings
either.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:

> Does the higher order pages effectively reduce the number of frags which
> are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0 pages you could
> have 64K worth of frag data.
>
> If we switch to order-3 pages everywhere then can the skb contain 512K
> of data, or does the effective maximum number of frags in an skb reduce
> to 2?

effective number of frags reduce to 2 or 3

(We still limit GSO packets to ~63536 bytes)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Tue, 2012-10-09 at 15:23 +0100, Ian Campbell wrote:
> On Tue, 2012-10-09 at 15:01 +0100, Eric Dumazet wrote:
> > On Tue, 2012-10-09 at 15:54 +0200, Eric Dumazet wrote:
> > > On Tue, 2012-10-09 at 14:47 +0100, Ian Campbell wrote:
> > > > Hi Eric,
> > > >
> > >
> > > Hi Ian
> > >
> > > > Sander has discovered an issue where xen-netback is given a compound
> > > > page as one of the skb frag pages to transmit. Currently netback can
> > > > only handle PAGE_SIZE'd frags and bugs out.
> > > >
> > > > I suspect this is something to do with 69b08f62e174 "net: use bigger
> > > > pages in __netdev_alloc_frag", although perhaps not because it looks
> > > > like only tg3 uses it and Sander has an r8169. Also tg3 seems to only
> > > > call netdev_alloc_frag for sizes < PAGE_SIZE. I'm probably missing
> > > > something.
> > >
> > >
> > > Its not the commit you want ;)
> >
> > Hmm, I take it back. It also can give you the same problem :
> >
> > We use this allocator for rx path of drivers :
> >
> > __netdev_alloc_skb()
> >
> > So its now absolutely possible that one skb->head is backed by a order-3
> > page.
> >
> > Is the problem coming from xen_netbk_count_skb_slots() ?
> >
> > Give me more information if you want me to help.
>
> The interesting code is in netbk_gop_skb(), specifically the two calls
> to netbk_gop_frag_copy.
>
> netbk_gop_frag_copy can only copy order-0 pages to the peer since they
> go over a shared ring transport which can only deal in order-0 pages.
>
> For the SKB head there is a loop which handles order>0 heads, I suspect
> we just need something similar for the frag case.
>
> Although see my question in the other response about the maximum number
> of frags we can have when order is > 0 since if using larger pages
> causes us to end up with a much larger number of order-0 pages once
> we've broken them up then we have a problem and I need to put my
> thinking cap on a bit (perhaps substantially) tighter.
>
> Konrad, it looks like netfront has a similar issue in
> xennet_make_frags() since it doesn't shatter large order mappings
> either.

Hmm...

In theory, if a skb has 16+1 frags backed by compound pages, you could
need ~48 order-0 frags.

(4098 bytes could need 1-4096-1 (3 frags))

In practice, it should be around ~17 order-0 frags as before.




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Tue, 2012-10-09 at 15:27 +0100, Eric Dumazet wrote:
> On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:
>
> > Does the higher order pages effectively reduce the number of frags which
> > are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0 pages you could
> > have 64K worth of frag data.
> >
> > If we switch to order-3 pages everywhere then can the skb contain 512K
> > of data, or does the effective maximum number of frags in an skb reduce
> > to 2?
>
> effective number of frags reduce to 2 or 3
>
> (We still limit GSO packets to ~63536 bytes)

Great! Then I think the fix is more/less trivial...

As an aside, when the skb head is < 4096 bytes is that necessarily a
compound page or might it just be a large kmalloc area?

Only really relevant since it impacts the possibility for code sharing
between the head and the frags sending.

Ian



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Tue, 2012-10-09 at 15:40 +0100, Ian Campbell wrote:
> On Tue, 2012-10-09 at 15:27 +0100, Eric Dumazet wrote:
> > On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:
> >
> > > Does the higher order pages effectively reduce the number of frags which
> > > are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0 pages you could
> > > have 64K worth of frag data.
> > >
> > > If we switch to order-3 pages everywhere then can the skb contain 512K
> > > of data, or does the effective maximum number of frags in an skb reduce
> > > to 2?
> >
> > effective number of frags reduce to 2 or 3
> >
> > (We still limit GSO packets to ~63536 bytes)
>
> Great! Then I think the fix is more/less trivial...
>
> As an aside, when the skb head is < 4096 bytes is that necessarily a
> compound page or might it just be a large kmalloc area?
>

skb->head can be either allocated by kmalloc() (standard alloc_skb()) or
a page frag (if allocated in rx path)

Not sure its related to headlen/size...




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Tue, 2012-10-09 at 15:33 +0100, Eric Dumazet wrote:
> On Tue, 2012-10-09 at 15:23 +0100, Ian Campbell wrote:
> > On Tue, 2012-10-09 at 15:01 +0100, Eric Dumazet wrote:
> > > On Tue, 2012-10-09 at 15:54 +0200, Eric Dumazet wrote:
> > > > On Tue, 2012-10-09 at 14:47 +0100, Ian Campbell wrote:
> > > > > Hi Eric,
> > > > >
> > > >
> > > > Hi Ian
> > > >
> > > > > Sander has discovered an issue where xen-netback is given a compound
> > > > > page as one of the skb frag pages to transmit. Currently netback can
> > > > > only handle PAGE_SIZE'd frags and bugs out.
> > > > >
> > > > > I suspect this is something to do with 69b08f62e174 "net: use bigger
> > > > > pages in __netdev_alloc_frag", although perhaps not because it looks
> > > > > like only tg3 uses it and Sander has an r8169. Also tg3 seems to only
> > > > > call netdev_alloc_frag for sizes < PAGE_SIZE. I'm probably missing
> > > > > something.
> > > >
> > > >
> > > > Its not the commit you want ;)
> > >
> > > Hmm, I take it back. It also can give you the same problem :
> > >
> > > We use this allocator for rx path of drivers :
> > >
> > > __netdev_alloc_skb()
> > >
> > > So its now absolutely possible that one skb->head is backed by a order-3
> > > page.
> > >
> > > Is the problem coming from xen_netbk_count_skb_slots() ?
> > >
> > > Give me more information if you want me to help.
> >
> > The interesting code is in netbk_gop_skb(), specifically the two calls
> > to netbk_gop_frag_copy.
> >
> > netbk_gop_frag_copy can only copy order-0 pages to the peer since they
> > go over a shared ring transport which can only deal in order-0 pages.
> >
> > For the SKB head there is a loop which handles order>0 heads, I suspect
> > we just need something similar for the frag case.
> >
> > Although see my question in the other response about the maximum number
> > of frags we can have when order is > 0 since if using larger pages
> > causes us to end up with a much larger number of order-0 pages once
> > we've broken them up then we have a problem and I need to put my
> > thinking cap on a bit (perhaps substantially) tighter.
> >
> > Konrad, it looks like netfront has a similar issue in
> > xennet_make_frags() since it doesn't shatter large order mappings
> > either.
>
> Hmm...
>
> In theory, if a skb has 16+1 frags backed by compound pages, you could
> need ~48 order-0 frags.
>
> (4098 bytes could need 1-4096-1 (3 frags))
>
> In practice, it should be around ~17 order-0 frags as before.

Right, thanks. I think I can cope with that without needing to change
the PV protocol in any way.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Tue, 2012-10-09 at 15:40 +0100, Ian Campbell wrote:
> On Tue, 2012-10-09 at 15:27 +0100, Eric Dumazet wrote:
> > On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:
> >
> > > Does the higher order pages effectively reduce the number of frags which
> > > are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0 pages you could
> > > have 64K worth of frag data.
> > >
> > > If we switch to order-3 pages everywhere then can the skb contain 512K
> > > of data, or does the effective maximum number of frags in an skb reduce
> > > to 2?
> >
> > effective number of frags reduce to 2 or 3
> >
> > (We still limit GSO packets to ~63536 bytes)
>
> Great! Then I think the fix is more/less trivial...

The following seems to work for me.

I haven't tackled netfront yet.

8<--------------------------------------------------------------

From 551e42e3dd203f2eb97cb082985013bb33b8f020 Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.campbell@citrix.com>
Date: Tue, 9 Oct 2012 15:51:20 +0100
Subject: [PATCH] xen: netback: handle compound page fragments on transmit.

An SKB paged fragment can consist of a compound page with order > 0.
However the netchannel protocol deals only in PAGE_SIZE frames.

Handle this in netbk_gop_frag_copy and xen_netbk_count_skb_slots by
iterating over the frames which make up the page.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
---
drivers/net/xen-netback/netback.c | 40 ++++++++++++++++++++++++++++++++----
1 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 4ebfcf3..d747e30 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -335,21 +335,35 @@ unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb)

for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
+ unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
unsigned long bytes;
+
+ offset &= ~PAGE_MASK;
+
while (size > 0) {
+ BUG_ON(offset >= PAGE_SIZE);
BUG_ON(copy_off > MAX_BUFFER_OFFSET);

- if (start_new_rx_buffer(copy_off, size, 0)) {
+ bytes = PAGE_SIZE - offset;
+
+ if (bytes > size)
+ bytes = size;
+
+ if (start_new_rx_buffer(copy_off, bytes, 0)) {
count++;
copy_off = 0;
}

- bytes = size;
if (copy_off + bytes > MAX_BUFFER_OFFSET)
bytes = MAX_BUFFER_OFFSET - copy_off;

copy_off += bytes;
+
+ offset += bytes;
size -= bytes;
+
+ if (offset == PAGE_SIZE)
+ offset = 0;
}
}
return count;
@@ -403,14 +417,24 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
unsigned long bytes;

/* Data must not cross a page boundary. */
- BUG_ON(size + offset > PAGE_SIZE);
+ BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));

meta = npo->meta + npo->meta_prod - 1;

+ /* Skip unused frames from start of page */
+ page += offset >> PAGE_SHIFT;
+ offset &= ~PAGE_MASK;
+
while (size > 0) {
+ BUG_ON(offset >= PAGE_SIZE);
BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);

- if (start_new_rx_buffer(npo->copy_off, size, *head)) {
+ bytes = PAGE_SIZE - offset;
+
+ if (bytes > size)
+ bytes = size;
+
+ if (start_new_rx_buffer(npo->copy_off, bytes, *head)) {
/*
* Netfront requires there to be some data in the head
* buffer.
@@ -420,7 +444,6 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
meta = get_next_rx_buffer(vif, npo);
}

- bytes = size;
if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
bytes = MAX_BUFFER_OFFSET - npo->copy_off;

@@ -453,6 +476,13 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
offset += bytes;
size -= bytes;

+ /* Next frame */
+ if (offset == PAGE_SIZE) {
+ BUG_ON(!PageCompound(page));
+ page++;
+ offset = 0;
+ }
+
/* Leave a gap for the GSO descriptor. */
if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
vif->rx.req_cons++;
--
1.7.2.5




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
Wednesday, October 10, 2012, 12:13:04 PM, you wrote:

> On Tue, 2012-10-09 at 15:40 +0100, Ian Campbell wrote:
>> On Tue, 2012-10-09 at 15:27 +0100, Eric Dumazet wrote:
>> > On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:
>> >
>> > > Does the higher order pages effectively reduce the number of frags which
>> > > are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0 pages you could
>> > > have 64K worth of frag data.
>> > >
>> > > If we switch to order-3 pages everywhere then can the skb contain 512K
>> > > of data, or does the effective maximum number of frags in an skb reduce
>> > > to 2?
>> >
>> > effective number of frags reduce to 2 or 3
>> >
>> > (We still limit GSO packets to ~63536 bytes)
>>
>> Great! Then I think the fix is more/less trivial...

> The following seems to work for me.

But it doesn't seem to work for me ... dmesg attached.

I don't know if the "mcelog:4359 map pfn expected mapping type write-back for [mem 0x0009f000-0x000a0fff], got uncached-minus"
is related, is shows up right after the nics get initialized ?

netback still fails with:

[ 191.777994] ------------[ cut here ]------------
[ 191.784245] kernel BUG at drivers/net/xen-netback/netback.c:481!
[ 191.790423] invalid opcode: 0000 [#1] PREEMPT SMP
[ 191.796462] Modules linked in:
[ 191.802315] CPU 1
[ 191.802367] Pid: 1177, comm: netback/1 Tainted: G W 3.6.0pre-rc1-20121010 #1 MSI MS-7640/890FXA-GD70 (MS-7640)
[ 191.814043] RIP: e030:[<ffffffff8146de61>] [<ffffffff8146de61>] netbk_gop_frag_copy+0x3f1/0x400
[ 191.820171] RSP: e02b:ffff880037c6bb98 EFLAGS: 00010246
[ 191.826271] RAX: 0000000000000244 RBX: ffffc90010827f98 RCX: ffff880031ed9880
[ 191.832450] RDX: 00000000000000a8 RSI: ffff880037c6bd24 RDI: ffffea0000b03f80
[ 191.838581] RBP: ffff880037c6bc28 R08: ffff8800319f8100 R09: 0000000000001000
[ 191.844739] R10: 0000000000000000 R11: 0000000000000132 R12: 00000000000000a8
[ 191.850785] R13: ffff880037c6bcd8 R14: 0000000000001000 R15: ffffc9001082cf70
[ 191.856741] FS: 00007f9f3c944700(0000) GS:ffff88003f840000(0000) knlGS:0000000000000000
[ 191.862841] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 191.868901] CR2: 0000000001337ca0 CR3: 0000000032cec000 CR4: 0000000000000660
[ 191.875053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 191.881175] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 191.887247] Process netback/1 (pid: 1177, threadinfo ffff880037c6a000, task ffff880039984140)
[ 191.893325] Stack:
[ 191.899328] ffff880037c6bd24 00000000000000a8 ffff8800319f8100 ffff880031ed9880
[ 191.905534] ffffc90000000000 0000000000001000 0000000000000000 0000000000000000
[ 191.911742] ffff880000000000 ffffffff817459f3 ffffc90010823420 ffffea0000b03f80
[ 191.917898] Call Trace:
[ 191.923939] [<ffffffff817459f3>] ? _raw_spin_unlock_irqrestore+0x53/0xa0
[ 191.930141] [<ffffffff8146e1cb>] xen_netbk_rx_action+0x30b/0x830
[ 191.936543] [<ffffffff810ad22d>] ? trace_hardirqs_on+0xd/0x10
[ 191.942942] [<ffffffff8146f6da>] xen_netbk_kthread+0xba/0xa90
[ 191.949147] [<ffffffff81095b06>] ? try_to_wake_up+0x1b6/0x310
[ 191.955250] [<ffffffff81086b40>] ? wake_up_bit+0x40/0x40
[ 191.961421] [<ffffffff8146f620>] ? xen_netbk_tx_build_gops+0xa70/0xa70
[ 191.967660] [<ffffffff810864d6>] kthread+0xd6/0xe0
[ 191.973834] [<ffffffff81086400>] ? __init_kthread_worker+0x70/0x70
[ 191.979953] [<ffffffff8174677c>] ret_from_fork+0x7c/0x90
[ 191.986107] [<ffffffff81086400>] ? __init_kthread_worker+0x70/0x70
[ 191.992174] Code: b8 b3 00 00 48 8d 8c f1 60 01 00 00 48 3b 14 01 0f 85 72 fc ff ff e9 7a fc ff ff 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe <0f> 0b eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
[ 192.005230] RIP [<ffffffff8146de61>] netbk_gop_frag_copy+0x3f1/0x400
[ 192.011786] RSP <ffff880037c6bb98>
[ 192.018402] ---[ end trace c51ab5e2c2c918fc ]---


--

Sander

> I haven't tackled netfront yet.

> 8<--------------------------------------------------------------

> From 551e42e3dd203f2eb97cb082985013bb33b8f020 Mon Sep 17 00:00:00 2001
> From: Ian Campbell <ian.campbell@citrix.com>
> Date: Tue, 9 Oct 2012 15:51:20 +0100
> Subject: [PATCH] xen: netback: handle compound page fragments on transmit.

> An SKB paged fragment can consist of a compound page with order > 0.
> However the netchannel protocol deals only in PAGE_SIZE frames.

> Handle this in netbk_gop_frag_copy and xen_netbk_count_skb_slots by
> iterating over the frames which make up the page.

> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
> Cc: Sander Eikelenboom <linux@eikelenboom.it>
> ---
> drivers/net/xen-netback/netback.c | 40 ++++++++++++++++++++++++++++++++----
> 1 files changed, 35 insertions(+), 5 deletions(-)

> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 4ebfcf3..d747e30 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -335,21 +335,35 @@ unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb)
>
> for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
> unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
> + unsigned long offset = skb_shinfo(skb)->frags[i].page_offset;
> unsigned long bytes;
> +
> + offset &= ~PAGE_MASK;
> +
> while (size > 0) {
> + BUG_ON(offset >= PAGE_SIZE);
> BUG_ON(copy_off > MAX_BUFFER_OFFSET);
>
> - if (start_new_rx_buffer(copy_off, size, 0)) {
> + bytes = PAGE_SIZE - offset;
> +
> + if (bytes > size)
> + bytes = size;
> +
> + if (start_new_rx_buffer(copy_off, bytes, 0)) {
> count++;
> copy_off = 0;
> }
>
> - bytes = size;
> if (copy_off + bytes > MAX_BUFFER_OFFSET)
> bytes = MAX_BUFFER_OFFSET - copy_off;
>
> copy_off += bytes;
> +
> + offset += bytes;
> size -= bytes;
> +
> + if (offset == PAGE_SIZE)
> + offset = 0;
> }
> }
> return count;
> @@ -403,14 +417,24 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
> unsigned long bytes;
>
> /* Data must not cross a page boundary. */
> - BUG_ON(size + offset > PAGE_SIZE);
> + BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
>
> meta = npo->meta + npo->meta_prod - 1;
>
> + /* Skip unused frames from start of page */
> + page += offset >> PAGE_SHIFT;
> + offset &= ~PAGE_MASK;
> +
> while (size > 0) {
> + BUG_ON(offset >= PAGE_SIZE);
> BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
>
> - if (start_new_rx_buffer(npo->copy_off, size, *head)) {
> + bytes = PAGE_SIZE - offset;
> +
> + if (bytes > size)
> + bytes = size;
> +
> + if (start_new_rx_buffer(npo->copy_off, bytes, *head)) {
> /*
> * Netfront requires there to be some data in the head
> * buffer.
> @@ -420,7 +444,6 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
> meta = get_next_rx_buffer(vif, npo);
> }
>
> - bytes = size;
> if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
> bytes = MAX_BUFFER_OFFSET - npo->copy_off;
>
> @@ -453,6 +476,13 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
> offset += bytes;
> size -= bytes;
>
> + /* Next frame */
> + if (offset == PAGE_SIZE) {
> + BUG_ON(!PageCompound(page));
> + page++;
> + offset = 0;
> + }
> +
> /* Leave a gap for the GSO descriptor. */
> if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
> vif->rx.req_cons++;


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Wed, 2012-10-10 at 13:24 +0100, Sander Eikelenboom wrote:
> Wednesday, October 10, 2012, 12:13:04 PM, you wrote:
>
> > On Tue, 2012-10-09 at 15:40 +0100, Ian Campbell wrote:
> >> On Tue, 2012-10-09 at 15:27 +0100, Eric Dumazet wrote:
> >> > On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:
> >> >
> >> > > Does the higher order pages effectively reduce the number of frags which
> >> > > are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0 pages you could
> >> > > have 64K worth of frag data.
> >> > >
> >> > > If we switch to order-3 pages everywhere then can the skb contain 512K
> >> > > of data, or does the effective maximum number of frags in an skb reduce
> >> > > to 2?
> >> >
> >> > effective number of frags reduce to 2 or 3
> >> >
> >> > (We still limit GSO packets to ~63536 bytes)
> >>
> >> Great! Then I think the fix is more/less trivial...
>
> > The following seems to work for me.
>
> But it doesn't seem to work for me ... dmesg attached.

> [ 191.777994] ------------[ cut here ]------------
> [ 191.784245] kernel BUG at drivers/net/xen-netback/netback.c:481!

Looks like that BUG_ON is a little aggressive. It'll trigger if the data
happens to end on a frame boundary. Hopefully this will fix it for you:

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index d747e30..f2d6b78 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -477,7 +477,7 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
size -= bytes;

/* Next frame */
- if (offset == PAGE_SIZE) {
+ if (offset == PAGE_SIZE && size) {
BUG_ON(!PageCompound(page));
page++;
offset = 0;



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Wed, 2012-10-10 at 11:13 +0100, Ian Campbell wrote:
> I haven't tackled netfront yet.

I seem to be totally unable to reproduce the equivalent issue on the
netfront xmit side, even though it seems like the loop in
xennet_make_frags ought to be obviously susceptible to it.

Konrad, Sander, are either of you able to repro, e.g. with:

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index b06ef81..8a3f770 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -462,6 +462,8 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
ref = gnttab_claim_grant_reference(&np->gref_tx_head);
BUG_ON((signed short)ref < 0);

+ BUG_ON(PageCompound(skb_frag_page(frag)));
+
mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
mfn, GNTMAP_readonly);

My repro for netback was just to netcat a wodge of data from dom0->domU
but going the other way doesn't seem to trigger.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
Wednesday, October 10, 2012, 2:29:09 PM, you wrote:

> On Wed, 2012-10-10 at 13:24 +0100, Sander Eikelenboom wrote:
>> Wednesday, October 10, 2012, 12:13:04 PM, you wrote:
>>
>> > On Tue, 2012-10-09 at 15:40 +0100, Ian Campbell wrote:
>> >> On Tue, 2012-10-09 at 15:27 +0100, Eric Dumazet wrote:
>> >> > On Tue, 2012-10-09 at 15:17 +0100, Ian Campbell wrote:
>> >> >
>> >> > > Does the higher order pages effectively reduce the number of frags which
>> >> > > are in use? e.g if MAX_SKB_FRAGS is 16, then for order-0 pages you could
>> >> > > have 64K worth of frag data.
>> >> > >
>> >> > > If we switch to order-3 pages everywhere then can the skb contain 512K
>> >> > > of data, or does the effective maximum number of frags in an skb reduce
>> >> > > to 2?
>> >> >
>> >> > effective number of frags reduce to 2 or 3
>> >> >
>> >> > (We still limit GSO packets to ~63536 bytes)
>> >>
>> >> Great! Then I think the fix is more/less trivial...
>>
>> > The following seems to work for me.
>>
>> But it doesn't seem to work for me ... dmesg attached.

>> [ 191.777994] ------------[ cut here ]------------
>> [ 191.784245] kernel BUG at drivers/net/xen-netback/netback.c:481!

> Looks like that BUG_ON is a little aggressive. It'll trigger if the data
> happens to end on a frame boundary. Hopefully this will fix it for you:

Yes it does !
Thanks .. will recompile and test the netfront case as well

--
Sander

> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index d747e30..f2d6b78 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -477,7 +477,7 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
> size -= bytes;
>
> /* Next frame */
> - if (offset == PAGE_SIZE) {
> + if (offset == PAGE_SIZE && size) {
> BUG_ON(!PageCompound(page));
> page++;
> offset = 0;





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
Wednesday, October 10, 2012, 3:09:58 PM, you wrote:

> On Wed, 2012-10-10 at 11:13 +0100, Ian Campbell wrote:
>> I haven't tackled netfront yet.

> I seem to be totally unable to reproduce the equivalent issue on the
> netfront xmit side, even though it seems like the loop in
> xennet_make_frags ought to be obviously susceptible to it.

> Konrad, Sander, are either of you able to repro, e.g. with:


Hmrrrmm i don't see any traces, only strange behaviour ..

- i can connect to guests by ssh, but it's sluggish, and sometimes stops working
- The guest seem to keep trying to connect to netback:

[ 658.276719] xen_bridge: port 2(vif40.0) entered forwarding state
[ 658.282258] xen_bridge: port 2(vif40.0) entered forwarding state
[ 663.945964] xen_bridge: port 7(vif39.0) entered forwarding state
[ 669.674277] xen_bridge: port 2(vif40.0) entered disabled state
[ 669.680290] device vif40.0 left promiscuous mode
[ 669.685464] xen_bridge: port 2(vif40.0) entered disabled state
[ 672.857222] device vif41.0 entered promiscuous mode
[ 673.166254] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
[ 673.176368] xen_bridge: port 2(vif41.0) entered forwarding state
[ 673.182042] xen_bridge: port 2(vif41.0) entered forwarding state
[ 674.439725] xen_bridge: port 7(vif39.0) entered disabled state
[ 674.445708] device vif39.0 left promiscuous mode
[ 674.450955] xen_bridge: port 7(vif39.0) entered disabled state
[ 677.726040] device vif42.0 entered promiscuous mode
[ 678.053381] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
[ 678.062804] xen_bridge: port 7(vif42.0) entered forwarding state
[ 678.068433] xen_bridge: port 7(vif42.0) entered forwarding state
[ 688.224736] xen_bridge: port 2(vif41.0) entered forwarding state
[ 693.080557] xen_bridge: port 7(vif42.0) entered forwarding state
[ 700.786276] xen_bridge: port 7(vif42.0) entered disabled state
[ 700.792484] device vif42.0 left promiscuous mode
[ 700.802409] xen_bridge: port 7(vif42.0) entered disabled state
[ 704.133606] device vif43.0 entered promiscuous mode
[ 704.460160] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
[ 704.469800] xen_bridge: port 7(vif43.0) entered forwarding state
[ 704.475303] xen_bridge: port 7(vif43.0) entered forwarding state
[ 719.493788] xen_bridge: port 7(vif43.0) entered forwarding state
[ 726.302456] xen_bridge: port 7(vif43.0) entered disabled state
[ 726.308898] device vif43.0 left promiscuous mode
[ 726.314029] xen_bridge: port 7(vif43.0) entered disabled state

All the guests are already up, but this keeps on going and going and going ....



> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index b06ef81..8a3f770 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -462,6 +462,8 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
> ref = gnttab_claim_grant_reference(&np->gref_tx_head);
> BUG_ON((signed short)ref < 0);
>
> + BUG_ON(PageCompound(skb_frag_page(frag)));
> +
> mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
> gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
> mfn, GNTMAP_readonly);

> My repro for netback was just to netcat a wodge of data from dom0->domU
> but going the other way doesn't seem to trigger.




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Wed, 2012-10-10 at 15:49 +0100, Sander Eikelenboom wrote:
> Wednesday, October 10, 2012, 3:09:58 PM, you wrote:
>
> > On Wed, 2012-10-10 at 11:13 +0100, Ian Campbell wrote:
> >> I haven't tackled netfront yet.
>
> > I seem to be totally unable to reproduce the equivalent issue on the
> > netfront xmit side, even though it seems like the loop in
> > xennet_make_frags ought to be obviously susceptible to it.
>
> > Konrad, Sander, are either of you able to repro, e.g. with:
>
>
> Hmrrrmm i don't see any traces, only strange behaviour ..
>
> - i can connect to guests by ssh, but it's sluggish, and sometimes stops working

I saw something like this (ssh sluggish) even with dom0 itself. I'm
trying to see if I can characterise it enough to reliably bisect it.

I already switched out xen-unstable for 4.2-testing but that didn't make
any difference.

> - The guest seem to keep trying to connect to netback:
>
> [ 658.276719] xen_bridge: port 2(vif40.0) entered forwarding state
> [ 658.282258] xen_bridge: port 2(vif40.0) entered forwarding state
> [ 663.945964] xen_bridge: port 7(vif39.0) entered forwarding state
> [ 669.674277] xen_bridge: port 2(vif40.0) entered disabled state
> [ 669.680290] device vif40.0 left promiscuous mode
> [ 669.685464] xen_bridge: port 2(vif40.0) entered disabled state
> [ 672.857222] device vif41.0 entered promiscuous mode
> [ 673.166254] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
> [ 673.176368] xen_bridge: port 2(vif41.0) entered forwarding state
> [ 673.182042] xen_bridge: port 2(vif41.0) entered forwarding state
> [ 674.439725] xen_bridge: port 7(vif39.0) entered disabled state
> [ 674.445708] device vif39.0 left promiscuous mode
> [ 674.450955] xen_bridge: port 7(vif39.0) entered disabled state
> [ 677.726040] device vif42.0 entered promiscuous mode
> [ 678.053381] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
> [ 678.062804] xen_bridge: port 7(vif42.0) entered forwarding state
> [ 678.068433] xen_bridge: port 7(vif42.0) entered forwarding state
> [ 688.224736] xen_bridge: port 2(vif41.0) entered forwarding state
> [ 693.080557] xen_bridge: port 7(vif42.0) entered forwarding state
> [ 700.786276] xen_bridge: port 7(vif42.0) entered disabled state
> [ 700.792484] device vif42.0 left promiscuous mode
> [ 700.802409] xen_bridge: port 7(vif42.0) entered disabled state
> [ 704.133606] device vif43.0 entered promiscuous mode
> [ 704.460160] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
> [ 704.469800] xen_bridge: port 7(vif43.0) entered forwarding state
> [ 704.475303] xen_bridge: port 7(vif43.0) entered forwarding state
> [ 719.493788] xen_bridge: port 7(vif43.0) entered forwarding state
> [ 726.302456] xen_bridge: port 7(vif43.0) entered disabled state
> [ 726.308898] device vif43.0 left promiscuous mode
> [ 726.314029] xen_bridge: port 7(vif43.0) entered disabled state
>
> All the guests are already up, but this keeps on going and going and going ....

The domain number seems to be climbing, are you sure something isn't
(crashing and) restarting?

> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index b06ef81..8a3f770 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -462,6 +462,8 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
> > ref = gnttab_claim_grant_reference(&np->gref_tx_head);
> > BUG_ON((signed short)ref < 0);
> >
> > + BUG_ON(PageCompound(skb_frag_page(frag)));
> > +
> > mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
> > gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
> > mfn, GNTMAP_readonly);
>
> > My repro for netback was just to netcat a wodge of data from dom0->domU
> > but going the other way doesn't seem to trigger.
>
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
Thursday, October 11, 2012, 10:02:26 AM, you wrote:

> On Wed, 2012-10-10 at 15:49 +0100, Sander Eikelenboom wrote:
>> Wednesday, October 10, 2012, 3:09:58 PM, you wrote:
>>
>> > On Wed, 2012-10-10 at 11:13 +0100, Ian Campbell wrote:
>> >> I haven't tackled netfront yet.
>>
>> > I seem to be totally unable to reproduce the equivalent issue on the
>> > netfront xmit side, even though it seems like the loop in
>> > xennet_make_frags ought to be obviously susceptible to it.
>>
>> > Konrad, Sander, are either of you able to repro, e.g. with:
>>
>>
>> Hmrrrmm i don't see any traces, only strange behaviour ..
>>
>> - i can connect to guests by ssh, but it's sluggish, and sometimes stops working

> I saw something like this (ssh sluggish) even with dom0 itself. I'm
> trying to see if I can characterise it enough to reliably bisect it.

> I already switched out xen-unstable for 4.2-testing but that didn't make
> any difference.



>> - The guest seem to keep trying to connect to netback:
>>
>> [ 658.276719] xen_bridge: port 2(vif40.0) entered forwarding state
>> [ 658.282258] xen_bridge: port 2(vif40.0) entered forwarding state
>> [ 663.945964] xen_bridge: port 7(vif39.0) entered forwarding state
>> [ 669.674277] xen_bridge: port 2(vif40.0) entered disabled state
>> [ 669.680290] device vif40.0 left promiscuous mode
>> [ 669.685464] xen_bridge: port 2(vif40.0) entered disabled state
>> [ 672.857222] device vif41.0 entered promiscuous mode
>> [ 673.166254] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
>> [ 673.176368] xen_bridge: port 2(vif41.0) entered forwarding state
>> [ 673.182042] xen_bridge: port 2(vif41.0) entered forwarding state
>> [ 674.439725] xen_bridge: port 7(vif39.0) entered disabled state
>> [ 674.445708] device vif39.0 left promiscuous mode
>> [ 674.450955] xen_bridge: port 7(vif39.0) entered disabled state
>> [ 677.726040] device vif42.0 entered promiscuous mode
>> [ 678.053381] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
>> [ 678.062804] xen_bridge: port 7(vif42.0) entered forwarding state
>> [ 678.068433] xen_bridge: port 7(vif42.0) entered forwarding state
>> [ 688.224736] xen_bridge: port 2(vif41.0) entered forwarding state
>> [ 693.080557] xen_bridge: port 7(vif42.0) entered forwarding state
>> [ 700.786276] xen_bridge: port 7(vif42.0) entered disabled state
>> [ 700.792484] device vif42.0 left promiscuous mode
>> [ 700.802409] xen_bridge: port 7(vif42.0) entered disabled state
>> [ 704.133606] device vif43.0 entered promiscuous mode
>> [ 704.460160] xen-blkback:ring-ref 8, event-channel 9, protocol 1 (x86_64-abi)
>> [ 704.469800] xen_bridge: port 7(vif43.0) entered forwarding state
>> [ 704.475303] xen_bridge: port 7(vif43.0) entered forwarding state
>> [ 719.493788] xen_bridge: port 7(vif43.0) entered forwarding state
>> [ 726.302456] xen_bridge: port 7(vif43.0) entered disabled state
>> [ 726.308898] device vif43.0 left promiscuous mode
>> [ 726.314029] xen_bridge: port 7(vif43.0) entered disabled state
>>
>> All the guests are already up, but this keeps on going and going and going ....

> The domain number seems to be climbing, are you sure something isn't
> (crashing and) restarting?

Probably due to the BUG_ON from the patch below, i changed it into a WARN_ON.
And i seem to hit it, but only in one of the guests at the moment and it triggers quite irregularly.

[ 34.298549] ------------[ cut here ]------------
[ 34.298567] WARNING: at drivers/net/xen-netfront.c:465 xennet_start_xmit+0x7fe/0x860()
[ 34.298574] Modules linked in:
[ 34.298597] Pid: 1580, comm: sshd Not tainted 3.6.0pre-rc1-20121011 #1
[ 34.298603] Call Trace:
[ 34.298611] [<ffffffff810664ea>] warn_slowpath_common+0x7a/0xb0
[ 34.298617] [<ffffffff81066535>] warn_slowpath_null+0x15/0x20
[ 34.298623] [<ffffffff8146d89e>] xennet_start_xmit+0x7fe/0x860
[ 34.298631] [<ffffffff8161f349>] dev_hard_start_xmit+0x209/0x460
[ 34.298637] [<ffffffff8163b036>] sch_direct_xmit+0xf6/0x290
[ 34.298643] [<ffffffff8161f746>] dev_queue_xmit+0x1a6/0x5a0
[ 34.298649] [<ffffffff8161f5a0>] ? dev_hard_start_xmit+0x460/0x460
[ 34.298656] [<ffffffff810aa8e5>] ? trace_softirqs_off+0x85/0x1b0
[ 34.298663] [<ffffffff816b9536>] ip_finish_output+0x226/0x530
[ 34.298668] [<ffffffff816b93dd>] ? ip_finish_output+0xcd/0x530
[ 34.298674] [<ffffffff816b9899>] ip_output+0x59/0xe0
[ 34.298680] [<ffffffff816b83b8>] ip_local_out+0x28/0x90
[ 34.298687] [<ffffffff816b896f>] ip_queue_xmit+0x17f/0x4a0
[ 34.298692] [<ffffffff816b87f0>] ? ip_send_unicast_reply+0x340/0x340
[ 34.298699] [<ffffffff810a0ba7>] ? getnstimeofday+0x47/0xe0
[ 34.298705] [<ffffffff8160f4c9>] ? __skb_clone+0x29/0x120
[ 34.298711] [<ffffffff816cea20>] tcp_transmit_skb+0x400/0x8d0
[ 34.298717] [<ffffffff816d19fa>] tcp_write_xmit+0x21a/0xa50
[ 34.298723] [<ffffffff816d225b>] tcp_push_one+0x2b/0x40
[ 34.298728] [<ffffffff816c2dec>] tcp_sendmsg+0x8dc/0xe20
[ 34.298735] [<ffffffff816e8f19>] inet_sendmsg+0xa9/0x100
[ 34.298740] [<ffffffff816e8e70>] ? inet_autobind+0x70/0x70
[ 34.298746] [<ffffffff810b0f88>] ? lock_acquire+0xd8/0x100
[ 34.298753] [<ffffffff8160630d>] sock_aio_write+0x12d/0x140
[ 34.298762] [<ffffffff811435b2>] do_sync_write+0xa2/0xe0
[ 34.298768] [<ffffffff810ad22d>] ? trace_hardirqs_on+0xd/0x10
[ 34.298774] [<ffffffff811441d4>] vfs_write+0x174/0x190
[ 34.298779] [<ffffffff811442fa>] sys_write+0x5a/0xa0
[ 34.298786] [<ffffffff812b33de>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 34.298792] [<ffffffff817491cc>] cstar_dispatch+0x7/0x26
[ 34.298797] ---[ end trace 2e28eec93b7a8b74 ]---


Complete dmesg from guest attached.



>> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> > index b06ef81..8a3f770 100644
>> > --- a/drivers/net/xen-netfront.c
>> > +++ b/drivers/net/xen-netfront.c
>> > @@ -462,6 +462,8 @@ static void xennet_make_frags(struct sk_buff *skb, struct net_device *dev,
>> > ref = gnttab_claim_grant_reference(&np->gref_tx_head);
>> > BUG_ON((signed short)ref < 0);
>> >
>> > + BUG_ON(PageCompound(skb_frag_page(frag)));
>> > +
>> > mfn = pfn_to_mfn(page_to_pfn(skb_frag_page(frag)));
>> > gnttab_grant_foreign_access_ref(ref, np->xbdev->otherend_id,
>> > mfn, GNTMAP_readonly);
>>
>> > My repro for netback was just to netcat a wodge of data from dom0->domU
>> > but going the other way doesn't seem to trigger.
>>
>>
>>
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Thu, 2012-10-11 at 12:00 +0200, Sander Eikelenboom wrote:

> Probably due to the BUG_ON from the patch below, i changed it into a WARN_ON.
> And i seem to hit it, but only in one of the guests at the moment and it triggers quite irregularly.

xennet_make_frags() is able to split the skb->head in multiple page-size
chunks.

It should do the same for fragments



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Thu, 2012-10-11 at 11:05 +0100, Eric Dumazet wrote:
> On Thu, 2012-10-11 at 12:00 +0200, Sander Eikelenboom wrote:
>
> > Probably due to the BUG_ON from the patch below, i changed it into a WARN_ON.
> > And i seem to hit it, but only in one of the guests at the moment and it triggers quite irregularly.
>
> xennet_make_frags() is able to split the skb->head in multiple page-size
> chunks.
>
> It should do the same for fragments

Right, I just want to be reproduce the issue so I can know I've fixed it
properly ;-)

Ian.




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
Thursday, October 11, 2012, 12:14:54 PM, you wrote:

> On Thu, 2012-10-11 at 11:05 +0100, Eric Dumazet wrote:
>> On Thu, 2012-10-11 at 12:00 +0200, Sander Eikelenboom wrote:
>>
>> > Probably due to the BUG_ON from the patch below, i changed it into a WARN_ON.
>> > And i seem to hit it, but only in one of the guests at the moment and it triggers quite irregularly.
>>
>> xennet_make_frags() is able to split the skb->head in multiple page-size
>> chunks.
>>
>> It should do the same for fragments

> Right, I just want to be reproduce the issue so I can know I've fixed it
> properly ;-)

Trying to scp/sftp files from a guest seems to trigger it for me ..

> Ian.






_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On 2012-10-11 18:14, Ian Campbell wrote:
> On Thu, 2012-10-11 at 11:05 +0100, Eric Dumazet wrote:
>> On Thu, 2012-10-11 at 12:00 +0200, Sander Eikelenboom wrote:
>>
>>> Probably due to the BUG_ON from the patch below, i changed it into a WARN_ON.
>>> And i seem to hit it, but only in one of the guests at the moment and it triggers quite irregularly.
>> xennet_make_frags() is able to split the skb->head in multiple page-size
>> chunks.
>>
>> It should do the same for fragments
> Right, I just want to be reproduce the issue so I can know I've fixed it
> properly ;-)
Hi Ian,

I can reproduce this BUG_ON when running netperf/netserver test between
two domus running on the same dom0. The domu and dom0 all use v3.7-rc1.

When I tried to rebase my persistent grant netfront/netback patch on
latest kernel, netperf/netserver test never succeeded. I did some test
to find out that v3.6-rc7 works fine, but v3.7-rc1, v3.7-rc2 and
v3.7-rc4 does not succeed in netperf/netserver test. So I keep my
persistent grant patch only based on v3.4-rc3 now.

Konrad thought about commit 6a8ed462f16b8455eec5ae00eb6014159a6721f0 in
v3.7-rc1, and suggested me to test your debug patch in netfront. This
BUG_ON happens soon after running the netperf/netserver test case.

Thanks
Annie
>
> Ian.
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
Thursday, November 15, 2012, 3:31:42 AM, you wrote:

> On 2012-10-11 18:14, Ian Campbell wrote:
>> On Thu, 2012-10-11 at 11:05 +0100, Eric Dumazet wrote:
>>> On Thu, 2012-10-11 at 12:00 +0200, Sander Eikelenboom wrote:
>>>
>>>> Probably due to the BUG_ON from the patch below, i changed it into a WARN_ON.
>>>> And i seem to hit it, but only in one of the guests at the moment and it triggers quite irregularly.
>>> xennet_make_frags() is able to split the skb->head in multiple page-size
>>> chunks.
>>>
>>> It should do the same for fragments
>> Right, I just want to be reproduce the issue so I can know I've fixed it
>> properly ;-)
> Hi Ian,

> I can reproduce this BUG_ON when running netperf/netserver test between
> two domus running on the same dom0. The domu and dom0 all use v3.7-rc1.

> When I tried to rebase my persistent grant netfront/netback patch on
> latest kernel, netperf/netserver test never succeeded. I did some test
> to find out that v3.6-rc7 works fine, but v3.7-rc1, v3.7-rc2 and
> v3.7-rc4 does not succeed in netperf/netserver test. So I keep my
> persistent grant patch only based on v3.4-rc3 now.

> Konrad thought about commit 6a8ed462f16b8455eec5ae00eb6014159a6721f0 in
> v3.7-rc1, and suggested me to test your debug patch in netfront. This
> BUG_ON happens soon after running the netperf/netserver test case.

> Thanks
> Annie

Is there any progression with this bug (rc6 is out the door, so the release of 3.7-final seems to be eminent and this bug completely cripples any networking with guests) ?

--
Sander

>>
>> Ian.
>>
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On 19.11.2012 16:43, Sander Eikelenboom wrote:
>
> Thursday, November 15, 2012, 3:31:42 AM, you wrote:
>
>> On 2012-10-11 18:14, Ian Campbell wrote:
>>> On Thu, 2012-10-11 at 11:05 +0100, Eric Dumazet wrote:
>>>> On Thu, 2012-10-11 at 12:00 +0200, Sander Eikelenboom wrote:
>>>>
>>>>> Probably due to the BUG_ON from the patch below, i changed it into a WARN_ON.
>>>>> And i seem to hit it, but only in one of the guests at the moment and it triggers quite irregularly.
>>>> xennet_make_frags() is able to split the skb->head in multiple page-size
>>>> chunks.
>>>>
>>>> It should do the same for fragments
>>> Right, I just want to be reproduce the issue so I can know I've fixed it
>>> properly ;-)
>> Hi Ian,
>
>> I can reproduce this BUG_ON when running netperf/netserver test between
>> two domus running on the same dom0. The domu and dom0 all use v3.7-rc1.
>
>> When I tried to rebase my persistent grant netfront/netback patch on
>> latest kernel, netperf/netserver test never succeeded. I did some test
>> to find out that v3.6-rc7 works fine, but v3.7-rc1, v3.7-rc2 and
>> v3.7-rc4 does not succeed in netperf/netserver test. So I keep my
>> persistent grant patch only based on v3.4-rc3 now.
>
>> Konrad thought about commit 6a8ed462f16b8455eec5ae00eb6014159a6721f0 in
>> v3.7-rc1, and suggested me to test your debug patch in netfront. This
>> BUG_ON happens soon after running the netperf/netserver test case.
>
>> Thanks
>> Annie
>
> Is there any progression with this bug (rc6 is out the door, so the release of 3.7-final seems to be eminent and this bug completely cripples any networking with guests) ?
>

+1 on that. I was testing yesterday with a PVM domU running 3.7-rc5 on Xen 4.2
(but also reported from EC2 running Xen 3.4.3) c with one VCPU. I actually can
trigger it by just ssh'ing into the domU (from another machine) and then run
"find /". Output starts to stutter and then stops completely. When this happens
a new connection still can be made and as long as only shorter output is
generated the ssh connection is ok. From a dump taken it looks like user-space
is waiting in some select call (without any warnon I rather won't see the tx path).

-Stefan
Re: compound skb frag pages appearing in start_xmit [ In reply to ]
On Tue, 2012-11-20 at 08:30 +0000, Stefan Bader wrote:
> >> When I tried to rebase my persistent grant netfront/netback patch on
> >> latest kernel, netperf/netserver test never succeeded. I did some test
> >> to find out that v3.6-rc7 works fine, but v3.7-rc1, v3.7-rc2 and
> >> v3.7-rc4 does not succeed in netperf/netserver test. So I keep my
> >> persistent grant patch only based on v3.4-rc3 now.
> >
> >> Konrad thought about commit 6a8ed462f16b8455eec5ae00eb6014159a6721f0 in
> >> v3.7-rc1, and suggested me to test your debug patch in netfront. This
> >> BUG_ON happens soon after running the netperf/netserver test case.
> >
> >> Thanks
> >> Annie
> >
> > Is there any progression with this bug (rc6 is out the door, so the
> release of 3.7-final seems to be eminent and this bug completely
> cripples any networking with guests) ?
> >
>
> +1 on that. I was testing yesterday with a PVM domU running 3.7-rc5 on Xen 4.2
> (but also reported from EC2 running Xen 3.4.3) c with one VCPU. I actually can
> trigger it by just ssh'ing into the domU (from another machine) and then run
> "find /". Output starts to stutter and then stops completely. When this happens
> a new connection still can be made and as long as only shorter output is
> generated the ssh connection is ok. From a dump taken it looks like user-space
> is waiting in some select call (without any warnon I rather won't see the tx path).

Annie, are you still looking into this or shall I?

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

1 2  View All