Mailing List Archive

Direct I/O to domU seeing a 30% performance hit
I was asked to test direct I/O to a PV domU. Since, I had a system with
two NICs, I gave one to a domU and one dom0. (Each is running the same
kernel: xen 3.0.3 x86_64.)

I'm running netperf from an outside system to the domU and dom0 and I am
seeing 30% less throughput for the domU vs dom0.

Is this to be expected? If so, why? If not, does anyone have a guess as
to what I might be doing wrong or what the issue might be?

Thanks,

John Byrne






_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: Direct I/O to domU seeing a 30% performance hit [ In reply to ]
> I was asked to test direct I/O to a PV domU. Since, I had a system
with
> two NICs, I gave one to a domU and one dom0. (Each is running the same
> kernel: xen 3.0.3 x86_64.)
>
> I'm running netperf from an outside system to the domU and dom0 and I
am
> seeing 30% less throughput for the domU vs dom0.

That doesn't make sense -- they should be identical. Is there other
stuff going on in the system that could be taking CPU time away from the
domU?

Ian


> Is this to be expected? If so, why? If not, does anyone have a guess
as
> to what I might be doing wrong or what the issue might be?
>
> Thanks,
>
> John Byrne
>
>
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Direct I/O to domU seeing a 30% performance hit [ In reply to ]
There have been a couple of network receive throughput
performance regressions to domUs over time that were
subsequently fixed. I think one may have crept in to
3.0.3.

Are you seeing any dropped packets on the vif associated
with your domU in your dom0? If so, propagating changeset
11861 from unstable may help:

changeset: 11861:637eace6d5c6
user: kfraser@localhost.localdomain
date: Mon Oct 23 11:20:37 2006 +0100
summary: [NET] back: Fix packet queuing so that packets are drained if the


In the past, we also had receive throughput issues to
domUs that were due to socket buffer size logic but those
were fixed a while ago.

Can you send netstat -i output from dom0?

Emmanuel.


On Mon, Nov 06, 2006 at 09:55:17PM -0800, John Byrne wrote:
>
> I was asked to test direct I/O to a PV domU. Since, I had a system with
> two NICs, I gave one to a domU and one dom0. (Each is running the same
> kernel: xen 3.0.3 x86_64.)
>
> I'm running netperf from an outside system to the domU and dom0 and I am
> seeing 30% less throughput for the domU vs dom0.
>
> Is this to be expected? If so, why? If not, does anyone have a guess as
> to what I might be doing wrong or what the issue might be?
>
> Thanks,
>
> John Byrne

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: Direct I/O to domU seeing a 30% performance hit [ In reply to ]
> There have been a couple of network receive throughput
> performance regressions to domUs over time that were
> subsequently fixed. I think one may have crept in to 3.0.3.

The report was (I believe) with a NIC directly assigned to the domU, so
not using netfront/back at all.

John: please can you give more details on your config.

Ian

> Are you seeing any dropped packets on the vif associated with
> your domU in your dom0? If so, propagating changeset
> 11861 from unstable may help:
>
> changeset: 11861:637eace6d5c6
> user: kfraser@localhost.localdomain
> date: Mon Oct 23 11:20:37 2006 +0100
> summary: [NET] back: Fix packet queuing so that packets
> are drained if the
>
>
> In the past, we also had receive throughput issues to domUs
> that were due to socket buffer size logic but those were
> fixed a while ago.
>
> Can you send netstat -i output from dom0?
>
> Emmanuel.
>
>
> On Mon, Nov 06, 2006 at 09:55:17PM -0800, John Byrne wrote:
> >
> > I was asked to test direct I/O to a PV domU. Since, I had a system
> > with two NICs, I gave one to a domU and one dom0. (Each is
> running the
> > same
> > kernel: xen 3.0.3 x86_64.)
> >
> > I'm running netperf from an outside system to the domU and
> dom0 and I
> > am seeing 30% less throughput for the domU vs dom0.
> >
> > Is this to be expected? If so, why? If not, does anyone
> have a guess
> > as to what I might be doing wrong or what the issue might be?
> >
> > Thanks,
> >
> > John Byrne
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Direct I/O to domU seeing a 30% performance hit [ In reply to ]
Ian,

(Sorry. I sent things from the wrong e-mail address, so you've probably
been getting bounces. This one should work.)

My config is attached,

Both dom0 and the domU are SLES 10, so I don't know why the "idle"
performance of the two should be different. The obvious asymmetry is the
disk. Since the disk isn't direct, any disk I/O by the domU would
certainly impact dom0, but I don't think there should be much, if any. I
did run a dom0 test with the domU started, but idle and there was no
real change to dom0's numbers.

What's the best way to gather information about what is going on with
the domains without perturbing them? (Or, at least, perturbing everyone
equally.)

As to the test, I am running netperf 2.4.1 on an outside machine to the
dom0 and the domU. (So the doms are running the netserver portion.) I
was originally running it in the doms to the outside machine, but when
the bad numbers showed up I moved it to the outside machine because I
wondered if the bad numbers were due to something happening to the
system time in domU. The numbers is the "outside" test to domU look worse.

Thanks,

John Byrne


Ian Pratt wrote:
>
>> There have been a couple of network receive throughput
>> performance regressions to domUs over time that were
>> subsequently fixed. I think one may have crept in to 3.0.3.
>
> The report was (I believe) with a NIC directly assigned to the domU, so
> not using netfront/back at all.
>
> John: please can you give more details on your config.
>
> Ian
>
>> Are you seeing any dropped packets on the vif associated with
>> your domU in your dom0? If so, propagating changeset
>> 11861 from unstable may help:
>>
>> changeset: 11861:637eace6d5c6
>> user: kfraser@localhost.localdomain
>> date: Mon Oct 23 11:20:37 2006 +0100
>> summary: [NET] back: Fix packet queuing so that packets
>> are drained if the
>>
>>
>> In the past, we also had receive throughput issues to domUs
>> that were due to socket buffer size logic but those were
>> fixed a while ago.
>>
>> Can you send netstat -i output from dom0?
>>
>> Emmanuel.
>>
>>
>> On Mon, Nov 06, 2006 at 09:55:17PM -0800, John Byrne wrote:
>>> I was asked to test direct I/O to a PV domU. Since, I had a system
>>> with two NICs, I gave one to a domU and one dom0. (Each is
>> running the
>>> same
>>> kernel: xen 3.0.3 x86_64.)
>>>
>>> I'm running netperf from an outside system to the domU and
>> dom0 and I
>>> am seeing 30% less throughput for the domU vs dom0.
>>>
>>> Is this to be expected? If so, why? If not, does anyone
>> have a guess
>>> as to what I might be doing wrong or what the issue might be?
>>>
>>> Thanks,
>>>
>>> John Byrne
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
RE: Direct I/O to domU seeing a 30% performance hit [ In reply to ]
> Both dom0 and the domU are SLES 10, so I don't know why the "idle"
> performance of the two should be different. The obvious asymmetry is
the
> disk. Since the disk isn't direct, any disk I/O by the domU would
> certainly impact dom0, but I don't think there should be much, if any.
I
> did run a dom0 test with the domU started, but idle and there was no
> real change to dom0's numbers.
>
> What's the best way to gather information about what is going on with
> the domains without perturbing them? (Or, at least, perturbing
everyone
> equally.)
>
> As to the test, I am running netperf 2.4.1 on an outside machine to
the
> dom0 and the domU. (So the doms are running the netserver portion.) I
> was originally running it in the doms to the outside machine, but when
> the bad numbers showed up I moved it to the outside machine because I
> wondered if the bad numbers were due to something happening to the
> system time in domU. The numbers is the "outside" test to domU look
worse.


It might be worth checking that there's no interrupt sharing happening.
While running the test against the domU, see how much CPU dom0 burns in
the same period using 'xm vcpu-list'.

To keep things simple, have dom0 and domU as uniprocessor guests.

Ian


> Ian Pratt wrote:
> >
> >> There have been a couple of network receive throughput
> >> performance regressions to domUs over time that were
> >> subsequently fixed. I think one may have crept in to 3.0.3.
> >
> > The report was (I believe) with a NIC directly assigned to the domU,
so
> > not using netfront/back at all.
> >
> > John: please can you give more details on your config.
> >
> > Ian
> >
> >> Are you seeing any dropped packets on the vif associated with
> >> your domU in your dom0? If so, propagating changeset
> >> 11861 from unstable may help:
> >>
> >> changeset: 11861:637eace6d5c6
> >> user: kfraser@localhost.localdomain
> >> date: Mon Oct 23 11:20:37 2006 +0100
> >> summary: [NET] back: Fix packet queuing so that packets
> >> are drained if the
> >>
> >>
> >> In the past, we also had receive throughput issues to domUs
> >> that were due to socket buffer size logic but those were
> >> fixed a while ago.
> >>
> >> Can you send netstat -i output from dom0?
> >>
> >> Emmanuel.
> >>
> >>
> >> On Mon, Nov 06, 2006 at 09:55:17PM -0800, John Byrne wrote:
> >>> I was asked to test direct I/O to a PV domU. Since, I had a system
> >>> with two NICs, I gave one to a domU and one dom0. (Each is
> >> running the
> >>> same
> >>> kernel: xen 3.0.3 x86_64.)
> >>>
> >>> I'm running netperf from an outside system to the domU and
> >> dom0 and I
> >>> am seeing 30% less throughput for the domU vs dom0.
> >>>
> >>> Is this to be expected? If so, why? If not, does anyone
> >> have a guess
> >>> as to what I might be doing wrong or what the issue might be?
> >>>
> >>> Thanks,
> >>>
> >>> John Byrne
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xensource.com
> >> http://lists.xensource.com/xen-devel
> >>
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> >


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: Direct I/O to domU seeing a 30% performance hit [ In reply to ]
Ian,

I had screwed up and had a process running in dom0 chewing up CPU in
dom0. I thought I had taken care of it.

After fixing that, most of the numbers for dom0, domU, and the base SLES
kernel are within a couple of tenths of percent of each other. However,
there are some fairly large differences in some of the runs where the
socket buffers are small.

Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

262142 262142 4096 60.00 941.03 base
262142 262142 4096 60.00 939.95 dom0
262142 262142 4096 60.00 937.22 domU

16384 16384 32768 60.00 379.68 base
16384 16384 32768 60.00 350.15 dom0
16384 16384 32768 60.00 367.89 domU

In the latter case, the divergence from the base performance is much
larger. I assume that when the socket buffers are small, the extra
overhead for the interrupts is showing up more because more interrupts
are required.

Overall, though, the numbers are now acceptable. Thanks for your help.
It allowed me to spot my goof. (Sorry about wasting your time though.)

One last question: is there an easy way to break out the amount of CPU
time spent in the hypervisor?

Thanks,

John Byrne


Ian Pratt wrote:
>> Both dom0 and the domU are SLES 10, so I don't know why the "idle"
>> performance of the two should be different. The obvious asymmetry is
> the
>> disk. Since the disk isn't direct, any disk I/O by the domU would
>> certainly impact dom0, but I don't think there should be much, if any.
> I
>> did run a dom0 test with the domU started, but idle and there was no
>> real change to dom0's numbers.
>>
>> What's the best way to gather information about what is going on with
>> the domains without perturbing them? (Or, at least, perturbing
> everyone
>> equally.)
>>
>> As to the test, I am running netperf 2.4.1 on an outside machine to
> the
>> dom0 and the domU. (So the doms are running the netserver portion.) I
>> was originally running it in the doms to the outside machine, but when
>> the bad numbers showed up I moved it to the outside machine because I
>> wondered if the bad numbers were due to something happening to the
>> system time in domU. The numbers is the "outside" test to domU look
> worse.
>
>
> It might be worth checking that there's no interrupt sharing happening.
> While running the test against the domU, see how much CPU dom0 burns in
> the same period using 'xm vcpu-list'.
>
> To keep things simple, have dom0 and domU as uniprocessor guests.
>
> Ian
>
>
>> Ian Pratt wrote:
>>>> There have been a couple of network receive throughput
>>>> performance regressions to domUs over time that were
>>>> subsequently fixed. I think one may have crept in to 3.0.3.
>>> The report was (I believe) with a NIC directly assigned to the domU,
> so
>>> not using netfront/back at all.
>>>
>>> John: please can you give more details on your config.
>>>
>>> Ian
>>>
>>>> Are you seeing any dropped packets on the vif associated with
>>>> your domU in your dom0? If so, propagating changeset
>>>> 11861 from unstable may help:
>>>>
>>>> changeset: 11861:637eace6d5c6
>>>> user: kfraser@localhost.localdomain
>>>> date: Mon Oct 23 11:20:37 2006 +0100
>>>> summary: [NET] back: Fix packet queuing so that packets
>>>> are drained if the
>>>>
>>>>
>>>> In the past, we also had receive throughput issues to domUs
>>>> that were due to socket buffer size logic but those were
>>>> fixed a while ago.
>>>>
>>>> Can you send netstat -i output from dom0?
>>>>
>>>> Emmanuel.
>>>>
>>>>
>>>> On Mon, Nov 06, 2006 at 09:55:17PM -0800, John Byrne wrote:
>>>>> I was asked to test direct I/O to a PV domU. Since, I had a system
>>>>> with two NICs, I gave one to a domU and one dom0. (Each is
>>>> running the
>>>>> same
>>>>> kernel: xen 3.0.3 x86_64.)
>>>>>
>>>>> I'm running netperf from an outside system to the domU and
>>>> dom0 and I
>>>>> am seeing 30% less throughput for the domU vs dom0.
>>>>>
>>>>> Is this to be expected? If so, why? If not, does anyone
>>>> have a guess
>>>>> as to what I might be doing wrong or what the issue might be?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> John Byrne
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
RE: Direct I/O to domU seeing a 30% performance hit [ In reply to ]
> One last question: is there an easy way to break out the
> amount of CPU time spent in the hypervisor?

It may be possible to configure the CPU perf counters to record the
amount of time you spend in ring0. Otherwise, use xen-oprofile for an
estimate.

Ian

> Thanks,
>
> John Byrne
>
>
> Ian Pratt wrote:
> >> Both dom0 and the domU are SLES 10, so I don't know why the "idle"
> >> performance of the two should be different. The obvious
> asymmetry is
> > the
> >> disk. Since the disk isn't direct, any disk I/O by the domU would
> >> certainly impact dom0, but I don't think there should be
> much, if any.
> > I
> >> did run a dom0 test with the domU started, but idle and
> there was no
> >> real change to dom0's numbers.
> >>
> >> What's the best way to gather information about what is
> going on with
> >> the domains without perturbing them? (Or, at least, perturbing
> > everyone
> >> equally.)
> >>
> >> As to the test, I am running netperf 2.4.1 on an outside machine to
> > the
> >> dom0 and the domU. (So the doms are running the netserver
> portion.) I
> >> was originally running it in the doms to the outside machine, but
> >> when the bad numbers showed up I moved it to the outside machine
> >> because I wondered if the bad numbers were due to
> something happening
> >> to the system time in domU. The numbers is the "outside"
> test to domU
> >> look
> > worse.
> >
> >
> > It might be worth checking that there's no interrupt
> sharing happening.
> > While running the test against the domU, see how much CPU
> dom0 burns
> > in the same period using 'xm vcpu-list'.
> >
> > To keep things simple, have dom0 and domU as uniprocessor guests.
> >
> > Ian
> >
> >
> >> Ian Pratt wrote:
> >>>> There have been a couple of network receive throughput
> performance
> >>>> regressions to domUs over time that were subsequently fixed. I
> >>>> think one may have crept in to 3.0.3.
> >>> The report was (I believe) with a NIC directly assigned
> to the domU,
> > so
> >>> not using netfront/back at all.
> >>>
> >>> John: please can you give more details on your config.
> >>>
> >>> Ian
> >>>
> >>>> Are you seeing any dropped packets on the vif associated
> with your
> >>>> domU in your dom0? If so, propagating changeset
> >>>> 11861 from unstable may help:
> >>>>
> >>>> changeset: 11861:637eace6d5c6
> >>>> user: kfraser@localhost.localdomain
> >>>> date: Mon Oct 23 11:20:37 2006 +0100
> >>>> summary: [NET] back: Fix packet queuing so that packets
> >>>> are drained if the
> >>>>
> >>>>
> >>>> In the past, we also had receive throughput issues to domUs that
> >>>> were due to socket buffer size logic but those were
> fixed a while
> >>>> ago.
> >>>>
> >>>> Can you send netstat -i output from dom0?
> >>>>
> >>>> Emmanuel.
> >>>>
> >>>>
> >>>> On Mon, Nov 06, 2006 at 09:55:17PM -0800, John Byrne wrote:
> >>>>> I was asked to test direct I/O to a PV domU. Since, I
> had a system
> >>>>> with two NICs, I gave one to a domU and one dom0. (Each is
> >>>> running the
> >>>>> same
> >>>>> kernel: xen 3.0.3 x86_64.)
> >>>>>
> >>>>> I'm running netperf from an outside system to the domU and
> >>>> dom0 and I
> >>>>> am seeing 30% less throughput for the domU vs dom0.
> >>>>>
> >>>>> Is this to be expected? If so, why? If not, does anyone
> >>>> have a guess
> >>>>> as to what I might be doing wrong or what the issue might be?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> John Byrne
> >>>> _______________________________________________
> >>>> Xen-devel mailing list
> >>>> Xen-devel@lists.xensource.com
> >>>> http://lists.xensource.com/xen-devel
> >>>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@lists.xensource.com
> >>> http://lists.xensource.com/xen-devel
> >>>
> >
> >
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel