Mailing List Archive

RE: RE: Rather slow time of Pin in Windows with GPL PVdriver
>
> You have to be careful here. Xen will only ever deliver the evtchn
interrupt
> to VCPU0. I can't immediately see anything preventing an HVM domain
trying to
> bind and evtchn to another VCPU but you can see from the code in
> hvm_assert_evtchn_irq() that the guest will only be kicked for events
bound to
> VCPU0 (is_hvm_pv_evtchn_vcpu() will only be true for Linux PVonHVM
domains).
> Thus if you bind your DPC to a CPU other than zero and don't set it to
> HighImportance then it will not be immediately scheduled since default
DPC
> importance is MediumImportance.
>

Are you sure? That's not what I remember seeing. You always have to
query shared_info_area->vcpu_info[0] not
shared_info_area->vcpu_info[vcpu], but the actual VCPU the interrupt is
scheduled onto can be any.

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: RE: Rather slow time of Pin in Windows with GPL PVdriver [ In reply to ]
Yeah, you're right. We have a patch in XenServer to just use the lowest numbered vCPU but in unstable it still pointlessly round robins. Thus, if you bind DPCs and don't set their importance up you will end up with them not being immediately scheduled quite a lot of the time.

Paul

> -----Original Message-----
> From: James Harper [mailto:james.harper@bendigoit.com.au]
> Sent: 10 March 2011 09:30
> To: Paul Durrant; MaoXiaoyun
> Cc: xen devel
> Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
> GPL PVdriver
>
> >
> > You have to be careful here. Xen will only ever deliver the evtchn
> interrupt
> > to VCPU0. I can't immediately see anything preventing an HVM
> domain
> trying to
> > bind and evtchn to another VCPU but you can see from the code in
> > hvm_assert_evtchn_irq() that the guest will only be kicked for
> events
> bound to
> > VCPU0 (is_hvm_pv_evtchn_vcpu() will only be true for Linux PVonHVM
> domains).
> > Thus if you bind your DPC to a CPU other than zero and don't set
> it to
> > HighImportance then it will not be immediately scheduled since
> default
> DPC
> > importance is MediumImportance.
> >
>
> Are you sure? That's not what I remember seeing. You always have to
> query shared_info_area->vcpu_info[0] not shared_info_area-
> >vcpu_info[vcpu], but the actual VCPU the interrupt is scheduled
> onto can be any.
>
> James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: RE: Rather slow time of Pin in Windows with GPL PVdriver [ In reply to ]
>
> Yeah, you're right. We have a patch in XenServer to just use the
lowest
> numbered vCPU but in unstable it still pointlessly round robins. Thus,
if you
> bind DPCs and don't set their importance up you will end up with them
not
> being immediately scheduled quite a lot of the time.
>

You say "pointlessly round robins"... why is the behaviour considered
pointless? (assuming you don't use bound DPCs)

I'm looking at my networking code and if I could schedule DPC's on
processors on a round-robin basis (eg because the IRQ's are submitted on
a round robin basis), one CPU could grab the rx ring lock, pull the data
off the ring into local buffers, release the lock, then process the
local buffers (build packets, submit to NDIS, etc). While the first CPU
is processing packets, another CPU can then start servicing the ring
too.

If Xen is changed to always send the IRQ to CPU zero then I'd have to
start round-robining DPC's myself if I wanted to do it that way...

Currently I'm suffering a bit from the small ring sizes not being able
to hold enough buffers to keep packets flowing quickly in all
situations.

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: RE: Rather slow time of Pin in Windows with GPL PVdriver [ In reply to ]
It's kind of pointless because you're always having to go to vCPU0's shared info for the event info. so you're just going to keep pinging this between caches all the time. Same holds true of data you access in your DPC if it's constantly moving around. Better IMO to keep locality by default and distribute DPCs accessing distinct data explicitly.

Paul

> -----Original Message-----
> From: James Harper [mailto:james.harper@bendigoit.com.au]
> Sent: 10 March 2011 10:41
> To: Paul Durrant; MaoXiaoyun
> Cc: xen devel
> Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
> GPL PVdriver
>
> >
> > Yeah, you're right. We have a patch in XenServer to just use the
> lowest
> > numbered vCPU but in unstable it still pointlessly round robins.
> Thus,
> if you
> > bind DPCs and don't set their importance up you will end up with
> them
> not
> > being immediately scheduled quite a lot of the time.
> >
>
> You say "pointlessly round robins"... why is the behaviour
> considered pointless? (assuming you don't use bound DPCs)
>
> I'm looking at my networking code and if I could schedule DPC's on
> processors on a round-robin basis (eg because the IRQ's are
> submitted on a round robin basis), one CPU could grab the rx ring
> lock, pull the data off the ring into local buffers, release the
> lock, then process the local buffers (build packets, submit to NDIS,
> etc). While the first CPU is processing packets, another CPU can
> then start servicing the ring too.
>
> If Xen is changed to always send the IRQ to CPU zero then I'd have
> to start round-robining DPC's myself if I wanted to do it that
> way...
>
> Currently I'm suffering a bit from the small ring sizes not being
> able to hold enough buffers to keep packets flowing quickly in all
> situations.
>
> James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: RE: Rather slow time of Pin in Windows with GPL PVdriver [ In reply to ]
On Thu, Mar 10, 2011 at 11:05:56AM +0000, Paul Durrant wrote:
> It's kind of pointless because you're always having to go to vCPU0's shared info for the event info. so you're just going to keep pinging this between caches all the time. Same holds true of data you access in your DPC if it's constantly moving around. Better IMO to keep locality by default and distribute DPCs accessing distinct data explicitly.
>

Should this patch be upstreamed then?

-- Pasi

> Paul
>
> > -----Original Message-----
> > From: James Harper [mailto:james.harper@bendigoit.com.au]
> > Sent: 10 March 2011 10:41
> > To: Paul Durrant; MaoXiaoyun
> > Cc: xen devel
> > Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
> > GPL PVdriver
> >
> > >
> > > Yeah, you're right. We have a patch in XenServer to just use the
> > lowest
> > > numbered vCPU but in unstable it still pointlessly round robins.
> > Thus,
> > if you
> > > bind DPCs and don't set their importance up you will end up with
> > them
> > not
> > > being immediately scheduled quite a lot of the time.
> > >
> >
> > You say "pointlessly round robins"... why is the behaviour
> > considered pointless? (assuming you don't use bound DPCs)
> >
> > I'm looking at my networking code and if I could schedule DPC's on
> > processors on a round-robin basis (eg because the IRQ's are
> > submitted on a round robin basis), one CPU could grab the rx ring
> > lock, pull the data off the ring into local buffers, release the
> > lock, then process the local buffers (build packets, submit to NDIS,
> > etc). While the first CPU is processing packets, another CPU can
> > then start servicing the ring too.
> >
> > If Xen is changed to always send the IRQ to CPU zero then I'd have
> > to start round-robining DPC's myself if I wanted to do it that
> > way...
> >
> > Currently I'm suffering a bit from the small ring sizes not being
> > able to hold enough buffers to keep packets flowing quickly in all
> > situations.
> >
> > James
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: RE: Rather slow time of Pin in Windows with GPL PVdriver [ In reply to ]
Hi Paul:

Sorry I'm not fully follow your point.
One quick question is when you mention "pointless round robin", which piece of code did you refer to?

thanks.

> From: Paul.Durrant@citrix.com
> To: james.harper@bendigoit.com.au; tinnycloud@hotmail.com
> CC: xen-devel@lists.xensource.com
> Date: Thu, 10 Mar 2011 11:05:56 +0000
> Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with GPL PVdriver
>
> It's kind of pointless because you're always having to go to vCPU0's shared info for the event info. so you're just going to keep pinging this between caches all the time. Same holds true of data you access in your DPC if it's constantly moving around. Better IMO to keep locality by default and distribute DPCs accessing distinct data explicitly.
>
> Paul
>
> > -----Original Message-----
> > From: James Harper [mailto:james.harper@bendigoit.com.au]
> > Sent: 10 March 2011 10:41
> > To: Paul Durrant; MaoXiaoyun
> > Cc: xen devel
> > Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
> > GPL PVdriver
> >
> > >
> > > Yeah, you're right. We have a patch in XenServer to just use the
> > lowest
> > > numbered vCPU but in unstable it still pointlessly round robins.
> > Thus,
> > if you
> > > bind DPCs and don't set their importance up you will end up with
> > them
> > not
> > > being immediately scheduled quite a lot of the time.
> > >
> >
> > You say "pointlessly round robins"... why is the behaviour
> > considered pointless? (assuming you don't use bound DPCs)
> >
> > I'm looking at my networking code and if I could schedule DPC's on
> > processors on a round-robin basis (eg because the IRQ's are
> > submitted on a round robin basis), one CPU could grab the rx ring
> > lock, pull the data off the ring into local buffers, release the
> > lock, then process the local buffers (build packets, submit to NDIS,
> > etc). While the first CPU is processing packets, another CPU can
> > then start servicing the ring too.
> >
> > If Xen is changed to always send the IRQ to CPU zero then I'd have
> > to start round-robining DPC's myself if I wanted to do it that
> > way...
> >
> > Currently I'm suffering a bit from the small ring sizes not being
> > able to hold enough buffers to keep packets flowing quickly in all
> > situations.
> >
> > James
RE: RE: Rather slow time of Pin in Windows with GPL PVdriver [ In reply to ]
I did post a patch ages ago. It was deemed a bit too hacky. I think it would probably be better to re-examine the way Windows PV drivers are handling interrupts. It would be much nicer if we could properly bind event channels across all our vCPUs; we may be able to leverage what Stefano did for Linux PV-on-HVM.

Paul

> -----Original Message-----
> From: Pasi Kärkkäinen [mailto:pasik@iki.fi]
> Sent: 10 March 2011 18:23
> To: Paul Durrant
> Cc: James Harper; MaoXiaoyun; xen devel
> Subject: Re: [Xen-devel] RE: Rather slow time of Pin in Windows with
> GPL PVdriver
>
> On Thu, Mar 10, 2011 at 11:05:56AM +0000, Paul Durrant wrote:
> > It's kind of pointless because you're always having to go to
> vCPU0's shared info for the event info. so you're just going to keep
> pinging this between caches all the time. Same holds true of data
> you access in your DPC if it's constantly moving around. Better IMO
> to keep locality by default and distribute DPCs accessing distinct
> data explicitly.
> >
>
> Should this patch be upstreamed then?
>
> -- Pasi
>
> > Paul
> >
> > > -----Original Message-----
> > > From: James Harper [mailto:james.harper@bendigoit.com.au]
> > > Sent: 10 March 2011 10:41
> > > To: Paul Durrant; MaoXiaoyun
> > > Cc: xen devel
> > > Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows
> with
> > > GPL PVdriver
> > >
> > > >
> > > > Yeah, you're right. We have a patch in XenServer to just use
> the
> > > lowest
> > > > numbered vCPU but in unstable it still pointlessly round
> robins.
> > > Thus,
> > > if you
> > > > bind DPCs and don't set their importance up you will end up
> with
> > > them
> > > not
> > > > being immediately scheduled quite a lot of the time.
> > > >
> > >
> > > You say "pointlessly round robins"... why is the behaviour
> > > considered pointless? (assuming you don't use bound DPCs)
> > >
> > > I'm looking at my networking code and if I could schedule DPC's
> on
> > > processors on a round-robin basis (eg because the IRQ's are
> > > submitted on a round robin basis), one CPU could grab the rx
> ring
> > > lock, pull the data off the ring into local buffers, release the
> > > lock, then process the local buffers (build packets, submit to
> NDIS,
> > > etc). While the first CPU is processing packets, another CPU can
> > > then start servicing the ring too.
> > >
> > > If Xen is changed to always send the IRQ to CPU zero then I'd
> have
> > > to start round-robining DPC's myself if I wanted to do it that
> > > way...
> > >
> > > Currently I'm suffering a bit from the small ring sizes not
> being
> > > able to hold enough buffers to keep packets flowing quickly in
> all
> > > situations.
> > >
> > > James
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: RE: Rather slow time of Pin in Windows with GPL PVdriver [ In reply to ]
I've just pushed a bit of a rewrite of the rx path in gplpv. It's not
particularly well tested yet but I can't get it to crash. It should
scale much better with SMP too. I'm using more lock free data structures
so the lock's are held for much less time.

James

> -----Original Message-----
> From: MaoXiaoyun [mailto:tinnycloud@hotmail.com]
> Sent: Friday, 11 March 2011 16:10
> To: paul.durrant@citrix.com; James Harper
> Cc: xen devel
> Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
GPL
> PVdriver
>
> Hi Paul:
>
> Sorry I'm not fully follow your point.
> One quick question is when you mention "pointless round robin",
which
> piece of code did you refer to?
>
> thanks.
>
> > From: Paul.Durrant@citrix.com
> > To: james.harper@bendigoit.com.au; tinnycloud@hotmail.com
> > CC: xen-devel@lists.xensource.com
> > Date: Thu, 10 Mar 2011 11:05:56 +0000
> > Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
GPL
> PVdriver
> >
> > It's kind of pointless because you're always having to go to vCPU0's
shared
> info for the event info. so you're just going to keep pinging this
between
> caches all the time. Same holds true of data you access in your DPC if
it's
> constantly moving around. Better IMO to keep locality by default and
> distribute DPCs accessing distinct data explicitly.
> >
> > Paul
> >
> > > -----Original Message-----
> > > From: James Harper [mailto:james.harper@bendigoit.com.au]
> > > Sent: 10 March 2011 10:41
> > > To: Paul Durrant; MaoXiaoyun
> > > Cc: xen devel
> > > Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows
with
> > > GPL PVdriver
> > >
> > > >
> > > > Yeah, you're right. We have a patch in XenServer to just use the
> > > lowest
> > > > numbered vCPU but in unstable it still pointlessly round robins.
> > > Thus,
> > > if you
> > > > bind DPCs and don't set their importance up you will end up with
> > > them
> > > not
> > > > being immediately scheduled quite a lot of the time.
> > > >
> > >
> > > You say "pointlessly round robins"... why is the behaviour
> > > considered pointless? (assuming you don't use bound DPCs)
> > >
> > > I'm looking at my networking code and if I could schedule DPC's on
> > > processors on a round-robin basis (eg because the IRQ's are
> > > submitted on a round robin basis), one CPU could grab the rx ring
> > > lock, pull the data off the ring into local buffers, release the
> > > lock, then process the local buffers (build packets, submit to
NDIS,
> > > etc). While the first CPU is processing packets, another CPU can
> > > then start servicing the ring too.
> > >
> > > If Xen is changed to always send the IRQ to CPU zero then I'd have
> > > to start round-robining DPC's myself if I wanted to do it that
> > > way...
> > >
> > > Currently I'm suffering a bit from the small ring sizes not being
> > > able to hold enough buffers to keep packets flowing quickly in all
> > > situations.
> > >
> > > James


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: RE: Rather slow time of Pin in Windows with GPL PVdriver [ In reply to ]
>
> I've just pushed a bit of a rewrite of the rx path in gplpv. It's not
> particularly well tested yet but I can't get it to crash. It should
scale much
> better with SMP too. I'm using more lock free data structures so the
lock's
> are held for much less time.
>

Unfortunately performance still isn't good. What I've found is that NDIS
really does want you to only process packets on one CPU at one time (eg
CPU0), otherwise they are indicated to NDIS out of order causing serious
performance problems (according to the docs).

In addition to KeSetTargetProcessorDpc(&xi->rx_dpc, 0), we also need to
do KeSetImportanceDpc(&xi->rx_dpc, HighImportance) - as Paul stated,
which makes sure the DPC runs immediately even if it is triggered from
another CPU (I assume this has IPI overhead though). I think I could
detect >1 CPU's and schedule the rx and tx onto different CPU's to each
other, but always the same CPU.

Windows does support RSS which ensures per-connection in-order
processing of packets. From reading the "Receive-Side Scaling
Enhancements in Windows Server 2008" document, it appears that we would
need to hash various fields in the packet header and compute a CPU
number for that connection, then schedule the DPC onto that CPU. It
shouldn't be that hard except that xennet.sys is an NDIS5.1 driver, not
an NDIS6.0 driver, and in order to support NDIS6.0 I would need to
maintain two trees which I'm reluctant to do without a very good reason.
Other docs state the RSS is supported for Windows 2003 SP2 but I can't
find any specifics - I've asked the question on the ntdev list.

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: RE: Rather slow time of Pin in Windows with GPL PVdriver [ In reply to ]
Do you mean if we discard KeSetTargetProcessorDpc(&xi->rx_dpc, 0) , the interrupts
will be processed across on different VCPUS, but will cause serious performance issue?
Where could I find the releated docs?

So actually we need do KeSetImportanceDpc(&xi->rx_dpc, HighImportance) to solve
ping problem. Though performance is not the best, but it should not decrease, right?

many thanks.

> Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with GPL PVdriver
> Date: Mon, 14 Mar 2011 11:45:46 +1100
> From: james.harper@bendigoit.com.au
> To: tinnycloud@hotmail.com; paul.durrant@citrix.com
> CC: xen-devel@lists.xensource.com
>
> >
> > I've just pushed a bit of a rewrite of the rx path in gplpv. It's not
> > particularly well tested yet but I can't get it to crash. It should
> scale much
> > better with SMP too. I'm using more lock free data structures so the
> lock's
> > are held for much less time.
> >
>
> Unfortunately performance still isn't good. What I've found is that NDIS
> really does want you to only process packets on one CPU at one time (eg
> CPU0), otherwise they are indicated to NDIS out of order causing serious
> performance problems (according to the docs).
>
> In addition to KeSetTargetProcessorDpc(&xi->rx_dpc, 0), we also need to
> do KeSetImportanceDpc(&xi->rx_dpc, HighImportance) - as Paul stated,
> which makes sure the DPC runs immediately even if it is triggered from
> another CPU (I assume this has IPI overhead though). I think I could
> detect >1 CPU's and schedule the rx and tx onto different CPU's to each
> other, but always the same CPU.
>
> Windows does support RSS which ensures per-connection in-order
> processing of packets. From reading the "Receive-Side Scaling
> Enhancements in Windows Server 2008" document, it appears that we would
> need to hash various fields in the packet header and compute a CPU
> number for that connection, then schedule the DPC onto that CPU. It
> shouldn't be that hard except that xennet.sys is an NDIS5.1 driver, not
> an NDIS6.0 driver, and in order to support NDIS6.0 I would need to
> maintain two trees which I'm reluctant to do without a very good reason.
> Other docs state the RSS is supported for Windows 2003 SP2 but I can't
> find any specifics - I've asked the question on the ntdev list.
>
> James
RE: RE: Rather slow time of Ping in Windows with GPL PVdriver [ In reply to ]
>
> Do you mean if we discard KeSetTargetProcessorDpc(&xi->rx_dpc, 0) ,
the
> interrupts
> will be processed across on different VCPUS, but will cause serious
> performance issue?
> Where could I find the releated docs?
>
> So actually we need do KeSetImportanceDpc(&xi->rx_dpc, HighImportance)
to
> solve
> ping problem. Though performance is not the best, but it should not
decrease,
> right?
>

In my testing, without the KeSetTargetProcessorDpc, iperf would give
inconsistent results, which I assume is because packets were being
delivered to NDIS out of order.

KeSetImportanceDpc(HighImportance) should resolve the 15ms response time
you were seeing as the DPC will be immediately scheduled on the other
processor, rather than scheduled some time later.

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: RE: Rather slow time of Ping in Windows with GPL PVdriver [ In reply to ]
Thanks James.

I will do some iperf test either.

One more quesiton:
Does "Xen will only ever deliver the evtchn interrupt to VCPU0" mentioned by Paul right?
If so, how to explain the log I printed before?
It looks like all VCPUS have got the packets.

===============Result================================
1) KeSetTargetProcessorDpc(&xi->rx_dpc, 0) is commentted.
XnetNet pcpu = 1
XnetNet pcpu = 3
XnetNet pcpu = 2
XnetNet pcpu = 3
XnetNet pcpu = 7
XnetNet pcpu = 0
XnetNet pcpu = 5
XnetNet pcpu = 3
XnetNet pcpu = 0
XnetNet pcpu = 3
XnetNet pcpu = 7
XnetNet pcpu = 4
XnetNet pcpu = 5
XnetNet pcpu = 2
XnetNet pcpu = 4
XnetNet pcpu = 5
XnetNet pcpu = 6
XnetNet pcpu = 0
XnetNet pcpu = 6

> Subject: RE: [Xen-devel] RE: Rather slow time of Ping in Windows with GPL PVdriver
> Date: Mon, 14 Mar 2011 14:10:46 +1100
> From: james.harper@bendigoit.com.au
> To: tinnycloud@hotmail.com; paul.durrant@citrix.com
> CC: xen-devel@lists.xensource.com
>
> >
> > Do you mean if we discard KeSetTargetProcessorDpc(&xi->rx_dpc, 0) ,
> the
> > interrupts
> > will be processed across on different VCPUS, but will cause serious
> > performance issue?
> > Where could I find the releated docs?
> >
> > So actually we need do KeSetImportanceDpc(&xi->rx_dpc, HighImportance)
> to
> > solve
> > ping problem. Though performance is not the best, but it should not
> decrease,
> > right?
> >
>
> In my testing, without the KeSetTargetProcessorDpc, iperf would give
> inconsistent results, which I assume is because packets were being
> delivered to NDIS out of order.
>
> KeSetImportanceDpc(HighImportance) should resolve the 15ms response time
> you were seeing as the DPC will be immediately scheduled on the other
> processor, rather than scheduled some time later.
>
> James
RE: RE: Rather slow time of Ping in Windows with GPL PVdriver [ In reply to ]
> Thanks James.
>
> I will do some iperf test either.
>
> One more quesiton:
> Does "Xen will only ever deliver the evtchn interrupt to VCPU0"
mentioned by
> Paul right?

I think he later mentioned that the feature he referred to wasn't in the
version we were using, just the citrix version.

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: RE: Rather slow time of Pin in Windows with GPL PVdriver [ In reply to ]
NDIS 5.x on Vista+ has some serious issues: see http://www.osronline.com/showThread.cfm?link=124242

This probably doesn't explain an immediate performance issue though. RSS is supported on Windows 2k3 SP2 IIRC but you need to bind as NDIS 5.2. I don't think it's present in the 6.x -> 5.x wrapper in Vista+ though. You'd need to use NDIS 6.1+.

Paul

> -----Original Message-----
> From: James Harper [mailto:james.harper@bendigoit.com.au]
> Sent: 14 March 2011 00:46
> To: MaoXiaoyun; Paul Durrant
> Cc: xen devel
> Subject: RE: [Xen-devel] RE: Rather slow time of Pin in Windows with
> GPL PVdriver
>
> >
> > I've just pushed a bit of a rewrite of the rx path in gplpv. It's
> not
> > particularly well tested yet but I can't get it to crash. It
> should
> scale much
> > better with SMP too. I'm using more lock free data structures so
> the
> lock's
> > are held for much less time.
> >
>
> Unfortunately performance still isn't good. What I've found is that
> NDIS really does want you to only process packets on one CPU at one
> time (eg CPU0), otherwise they are indicated to NDIS out of order
> causing serious performance problems (according to the docs).
>
> In addition to KeSetTargetProcessorDpc(&xi->rx_dpc, 0), we also need
> to do KeSetImportanceDpc(&xi->rx_dpc, HighImportance) - as Paul
> stated, which makes sure the DPC runs immediately even if it is
> triggered from another CPU (I assume this has IPI overhead though).
> I think I could detect >1 CPU's and schedule the rx and tx onto
> different CPU's to each other, but always the same CPU.
>
> Windows does support RSS which ensures per-connection in-order
> processing of packets. From reading the "Receive-Side Scaling
> Enhancements in Windows Server 2008" document, it appears that we
> would need to hash various fields in the packet header and compute a
> CPU number for that connection, then schedule the DPC onto that CPU.
> It shouldn't be that hard except that xennet.sys is an NDIS5.1
> driver, not an NDIS6.0 driver, and in order to support NDIS6.0 I
> would need to maintain two trees which I'm reluctant to do without a
> very good reason.
> Other docs state the RSS is supported for Windows 2003 SP2 but I
> can't find any specifics - I've asked the question on the ntdev
> list.
>
> James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: RE: Rather slow time of Ping in Windows with GPL PVdriver [ In reply to ]
No, as James said, the interrupt targeting patch is only in Citrix XenServer.

Paul

From: MaoXiaoyun [mailto:tinnycloud@hotmail.com]
Sent: 14 March 2011 03:49
To: james.harper@bendigoit.com.au; Paul Durrant
Cc: xen devel
Subject: RE: [Xen-devel] RE: Rather slow time of Ping in Windows with GPL PVdriver

Thanks James.

I will do some iperf test either.

One more quesiton:
Does "Xen will only ever deliver the evtchn interrupt to VCPU0" mentioned by Paul right?
If so, how to explain the log I printed before?
It looks like all VCPUS have got the packets.

===============Result================================
1) KeSetTargetProcessorDpc(&xi->rx_dpc, 0) is commentted.
XnetNet pcpu = 1
XnetNet pcpu = 3
XnetNet pcpu = 2
XnetNet pcpu = 3
XnetNet pcpu = 7
XnetNet pcpu = 0
XnetNet pcpu = 5
XnetNet pcpu = 3
XnetNet pcpu = 0
XnetNet pcpu = 3
XnetNet pcpu = 7
XnetNet pcpu = 4
XnetNet pcpu = 5
XnetNet pcpu = 2
XnetNet pcpu = 4
XnetNet pcpu = 5
XnetNet pcpu = 6
XnetNet pcpu = 0
XnetNet pcpu = 6



> Subject: RE: [Xen-devel] RE: Rather slow time of Ping in Windows with GPL PVdriver
> Date: Mon, 14 Mar 2011 14:10:46 +1100
> From: james.harper@bendigoit.com.au<mailto:james.harper@bendigoit.com.au>
> To: tinnycloud@hotmail.com<mailto:tinnycloud@hotmail.com>; paul.durrant@citrix.com<mailto:paul.durrant@citrix.com>
> CC: xen-devel@lists.xensource.com<mailto:xen-devel@lists.xensource.com>
>
> >
> > Do you mean if we discard KeSetTargetProcessorDpc(&xi->rx_dpc, 0) ,
> the
> > interrupts
> > will be processed across on different VCPUS, but will cause serious
> > performance issue?
> > Where could I find the releated docs?
> >
> > So actually we need do KeSetImportanceDpc(&xi->rx_dpc, HighImportance)
> to
> > solve
> > ping problem. Though performance is not the best, but it should not
> decrease,
> > right?
> >
>
> In my testing, without the KeSetTargetProcessorDpc, iperf would give
> inconsistent results, which I assume is because packets were being
> delivered to NDIS out of order.
>
> KeSetImportanceDpc(HighImportance) should resolve the 15ms response time
> you were seeing as the DPC will be immediately scheduled on the other
> processor, rather than scheduled some time later.
>
> James
RE: RE: Rather slow time of Pin in Windows with GPL PVdriver [ In reply to ]
>
> NDIS 5.x on Vista+ has some serious issues: see
> http://www.osronline.com/showThread.cfm?link=124242
>

I was exited then as someone has reported a problem with GPLPV where the
rx path appears to hang (due to running out of resources) after some
days, but unfortunately it's with 2003 not 2008, and it looks like it's
packets that it's running out of, not buffers. D'oh.

> This probably doesn't explain an immediate performance issue though.
RSS is
> supported on Windows 2k3 SP2 IIRC but you need to bind as NDIS 5.2. I
don't
> think it's present in the 6.x -> 5.x wrapper in Vista+ though. You'd
need to
> use NDIS 6.1+.
>

That kind of removes the attraction a bit. It sounds like 5.2 is a bit
of an orphan as it isn't mentioned anywhere but the SNP KB pages (in
overview form), and most of the links from there have suffered some
major bitrot and either redirect to NDIS6.2 or to 'page not found'.

James

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel