Mailing List Archive

On setting clear criteria for declaring a feature acceptable (was "vmx: VT-d posted-interrupt core logic handling")
[Changing the subject and CC'ing more people]

On 09/03/16 13:39, Jan Beulich wrote:
>>>> On 08.03.16 at 19:38, <george.dunlap@citrix.com> wrote:
>> If I may go "meta" for a moment here -- this is exactly what I'm talking
>> about with "Something bad may happen" being difficult to work with.
>> Rather than you spelling out exactly the situation which you think may
>> happen, (which I could then either accept or refute on its merits) *I*
>> am now spending a lot of time and effort trying to imagine what
>> situations you may be talking about and then refuting them myself.
[snip]
>
>> If you have concerns, you need to make those concerns concrete, or at
>> least set clear criteria for how someone could go about addressing your
>> concerns. And yes, it is *your* job, as the person doing the objecting
>> (and even moreso as the x86 maintainer), to make your concerns explicit
>> and/or set those criteria, and not Feng's job (or even my job) to try to
>> guess what it is might make you happy.
>
> I'm sorry, George, but no, I don't think this is how things should
> work. If for a new feature to be enabled by default it is unclear
> whether that puts the system at risk, it's the party suggesting the
> default enabling to prove there's no such risk.

And it's up to the maintainers to clearly define what kind of "proof"
would be sufficient. I have no objection to saying that Feng (or someone
else who cares about the feature) must do some work to demonstrate that
the feature is in fact safe before it's enabled by default; that's
perfectly reasonable. I have already suggested something that would shed
light on the issue and potentially satisfy me. But it's not at all
reasonable to give them the impossible task of trying to guess what will
satisfy you.

I don't know why this is controversial -- this seems obvious to me.
What do other committers / maintainers think?

> We just can't allow
> code in that sets us up for future security issues. If anything
> that's what we should have learned from the various disasters in
> the past (XSAVE enabling having been the first and foremost,
> which by now I count 4 related XSAs for).

I'm not familiar with the XSAVE feature or its related XSAs, but do you
think simply saying "I'm not sure; prove to me that it's safe" would
have actually helped matters? Would it have prompted the authors of
that code to actually do some sort of testing / analysis that would have
turned up the subsequent security issues?

It seems to me that saying "I'm not sure, prove it to me", without
further guidance about *how* to prove it, would have ended in one of you
two giving up: either Intel not doing any more work on the feature, or
you eventually giving up and letting it go in anyway, with the same
security bugs it had before.

Again, without knowing much about the XSAVE feature or the XSAs, a
couple of responses which might have led to better outcomes:

* "I'd like to see an analysis of the XSAVE code -- what are all the
possible ways in can be loaded and stored? How can we be sure that
nothing is leaked? See
marc.info/?i=<1371746007-19073-1-git-send-email-george.dunlap@eu.citrix.com>
for an example of the kind of analysis I'm talking about."

* "I'd like to see a framework that tests a lot of the corner cases to
make sure nothing leaks"

Those are both a fair amount of work, but they're also fairly concrete,
and actually move people towards a helpful conclusion.

-George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: On setting clear criteria for declaring a feature acceptable [ In reply to ]
>>> On 09.03.16 at 17:23, <george.dunlap@citrix.com> wrote:
> [Changing the subject and CC'ing more people]
>
> On 09/03/16 13:39, Jan Beulich wrote:
>>>>> On 08.03.16 at 19:38, <george.dunlap@citrix.com> wrote:
>>> If I may go "meta" for a moment here -- this is exactly what I'm talking
>>> about with "Something bad may happen" being difficult to work with.
>>> Rather than you spelling out exactly the situation which you think may
>>> happen, (which I could then either accept or refute on its merits) *I*
>>> am now spending a lot of time and effort trying to imagine what
>>> situations you may be talking about and then refuting them myself.
> [snip]
>>
>>> If you have concerns, you need to make those concerns concrete, or at
>>> least set clear criteria for how someone could go about addressing your
>>> concerns. And yes, it is *your* job, as the person doing the objecting
>>> (and even moreso as the x86 maintainer), to make your concerns explicit
>>> and/or set those criteria, and not Feng's job (or even my job) to try to
>>> guess what it is might make you happy.
>>
>> I'm sorry, George, but no, I don't think this is how things should
>> work. If for a new feature to be enabled by default it is unclear
>> whether that puts the system at risk, it's the party suggesting the
>> default enabling to prove there's no such risk.
>
> And it's up to the maintainers to clearly define what kind of "proof"
> would be sufficient. I have no objection to saying that Feng (or someone
> else who cares about the feature) must do some work to demonstrate that
> the feature is in fact safe before it's enabled by default; that's
> perfectly reasonable. I have already suggested something that would shed
> light on the issue and potentially satisfy me. But it's not at all
> reasonable to give them the impossible task of trying to guess what will
> satisfy you.
>
> I don't know why this is controversial -- this seems obvious to me.

And it is not controversial - as said on the original thread, I
was of the opinion that I had clearly explained which specific
case I want to see taken care of (or to be more precise,
avoided). With that ...

> What do other committers / maintainers think?
>
>> We just can't allow
>> code in that sets us up for future security issues. If anything
>> that's what we should have learned from the various disasters in
>> the past (XSAVE enabling having been the first and foremost,
>> which by now I count 4 related XSAs for).
>
> I'm not familiar with the XSAVE feature or its related XSAs, but do you
> think simply saying "I'm not sure; prove to me that it's safe" would

.. this is just an unfair simplification of what I've raised as concerns
so far. That said, ...

> have actually helped matters? Would it have prompted the authors of
> that code to actually do some sort of testing / analysis that would have
> turned up the subsequent security issues?
>
> It seems to me that saying "I'm not sure, prove it to me", without
> further guidance about *how* to prove it, would have ended in one of you
> two giving up: either Intel not doing any more work on the feature, or
> you eventually giving up and letting it go in anyway, with the same
> security bugs it had before.

... I agree with you on these points. Just that I don't feel guilty
having acted in this way. Yes, I wasn't anywhere close to precise
in how the questionable state could be reached, but I was (and
continue to be) of the opinion that this doesn't matter, so long as
there are no provisions anywhere in the system that preclude
such a state from being reached (and of that lack of provisions I
am reasonably certain).

And then there's a non-technical aspect to this whole situation:
The set of people introducing features and the set of people
fixing bugs doesn't have a very large overlap. If I could be
certain that it will be those who introduce PI (or any other
feature requiring large or intrusive changes) who are also
going to look into problems with it later on (and in a timely
manner), I might take a more relaxed position. But since history
teaches me that this is not likely going to be the case, my
(submitter dependent) hesitance to accept (even if only
theoretically) risky new features is also some sort of self
defense - not just at the open source project level, but also
from a distro pov.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: On setting clear criteria for declaring a feature acceptable (was "vmx: VT-d posted-interrupt core logic handling") [ In reply to ]
On 09/03/16 16:23, George Dunlap wrote:
>
> I don't know why this is controversial -- this seems obvious to me.
> What do other committers / maintainers think?

I started on a reply to this but then I went back and read the original
thread...

+ /*
+ * XXX: The length of the list depends on how many vCPU is current
+ * blocked on this specific pCPU. This may hurt the interrupt
+ * latency if the list grows to too many entries.
+ */

Even the original author knows that there's a problem here, so in this
case George, I think you are unfairly criticizing Jan.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: On setting clear criteria for declaring a feature acceptable (was "vmx: VT-d posted-interrupt core logic handling") [ In reply to ]
> -----Original Message-----
> From: David Vrabel [mailto:david.vrabel@citrix.com]
> Sent: Thursday, March 10, 2016 2:02 AM
> To: George Dunlap <george.dunlap@citrix.com>; Jan Beulich
> <JBeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>; Dario Faggioli
> <dario.faggioli@citrix.com>; George Dunlap <George.Dunlap@eu.citrix.com>;
> Wu, Feng <feng.wu@intel.com>; Tian, Kevin <kevin.tian@intel.com>; xen-
> devel@lists.xen.org; Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>; Ian
> Jackson <Ian.Jackson@eu.citrix.com>; Lars Kurth <lars.kurth@citrix.com>
> Subject: Re: On setting clear criteria for declaring a feature acceptable (was
> "vmx: VT-d posted-interrupt core logic handling")
>
> On 09/03/16 16:23, George Dunlap wrote:
> >
> > I don't know why this is controversial -- this seems obvious to me.
> > What do other committers / maintainers think?
>
> I started on a reply to this but then I went back and read the original
> thread...
>
> + /*
> + * XXX: The length of the list depends on how many vCPU is current
> + * blocked on this specific pCPU. This may hurt the interrupt
> + * latency if the list grows to too many entries.
> + */
>
> Even the original author knows that there's a problem here, so in this
> case George, I think you are unfairly criticizing Jan.

This is the potential issue Jan pointed out, and adding the comments is
according Jan's comments then. But as George pointed out, it is not
very clear to how to reproduce this scenario in real world and what
is the criteria of "the list is too long", so here we are discussing whether
it is reasonable to make this feature default off just because of this
theoretically existing issue, and hence the " criteria for declaring a
feature acceptable ".

Thanks,
Feng

>
> David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: On setting clear criteria for declaring a feature acceptable (was "vmx: VT-d posted-interrupt core logic handling") [ In reply to ]
> From: George Dunlap [mailto:george.dunlap@citrix.com]
> Sent: Thursday, March 10, 2016 12:23 AM
>
> [Changing the subject and CC'ing more people]
>
> On 09/03/16 13:39, Jan Beulich wrote:
> >>>> On 08.03.16 at 19:38, <george.dunlap@citrix.com> wrote:
> >> If I may go "meta" for a moment here -- this is exactly what I'm talking
> >> about with "Something bad may happen" being difficult to work with.
> >> Rather than you spelling out exactly the situation which you think may
> >> happen, (which I could then either accept or refute on its merits) *I*
> >> am now spending a lot of time and effort trying to imagine what
> >> situations you may be talking about and then refuting them myself.
> [snip]
> >
> >> If you have concerns, you need to make those concerns concrete, or at
> >> least set clear criteria for how someone could go about addressing your
> >> concerns. And yes, it is *your* job, as the person doing the objecting
> >> (and even moreso as the x86 maintainer), to make your concerns explicit
> >> and/or set those criteria, and not Feng's job (or even my job) to try to
> >> guess what it is might make you happy.
> >
> > I'm sorry, George, but no, I don't think this is how things should
> > work. If for a new feature to be enabled by default it is unclear
> > whether that puts the system at risk, it's the party suggesting the
> > default enabling to prove there's no such risk.
>
> And it's up to the maintainers to clearly define what kind of "proof"
> would be sufficient. I have no objection to saying that Feng (or someone
> else who cares about the feature) must do some work to demonstrate that
> the feature is in fact safe before it's enabled by default; that's
> perfectly reasonable. I have already suggested something that would shed
> light on the issue and potentially satisfy me. But it's not at all
> reasonable to give them the impossible task of trying to guess what will
> satisfy you.
>
> I don't know why this is controversial -- this seems obvious to me.
> What do other committers / maintainers think?
>

My 2 cents here.

It's always good to have a clear definition to which extend a performance
issue would become a security risk. I saw 200us/500us used as example
in this thread, however no one can give an accrual criteria. In that case,
how do we call it a problem even when Feng collected some data? Based
on mindset from all maintainers?

I think a good way of looking at this is based on which capability is impacted.
In this specific case the directly impacted metric is the interrupt delivery
latency. However today Xen is not RT-capable. Xen doesn't commit to
deliver a worst-case 10us interrupt latency. The whole interrupt delivery path
(from Xen into Guest) has not been optimized yet, then there could be other
reasons impacting latency too beside the concern on this specific list walk.
There is no baseline worst-case data w/o PI. There is no final goal to hit.
There is no test case to measure.

Then why blocking this feature due to this unmeasurable concern and why
not enabling it and then improving it later when it becomes a measurable
concern when Xen will commit a clear interrupt latency goal will be committed
by Xen (at that time people working on that effort will have to identify all kinds
of problems impacting interrupt latency and then can optimize together)?
People should understand possibly bad interrupt latency in extreme cases
like discussed in this thread (w/ or w/o PI), since Xen doesn't commit anything
here.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: On setting clear criteria for declaring a feature acceptable (was "vmx: VT-d posted-interrupt core logic handling") [ In reply to ]
On Wed, Mar 9, 2016 at 6:02 PM, David Vrabel <david.vrabel@citrix.com> wrote:
> On 09/03/16 16:23, George Dunlap wrote:
>>
>> I don't know why this is controversial -- this seems obvious to me.
>> What do other committers / maintainers think?
>
> I started on a reply to this but then I went back and read the original
> thread...
>
> + /*
> + * XXX: The length of the list depends on how many vCPU is current
> + * blocked on this specific pCPU. This may hurt the interrupt
> + * latency if the list grows to too many entries.
> + */
>
> Even the original author knows that there's a problem here, so in this
> case George, I think you are unfairly criticizing Jan.

Yes, as Feng points out, that comment was put there because Jan made
it a prerequisite for acceptance, not because Feng had concrete reason
to believe there was a potential problem. I don't have any objection
to that in general, *as long as* it's accompanied by actionable
suggestions for evaluating whether it's true and/or fixing it.

-George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel