Mailing List Archive

[goals][upgrade-checkers] Week R-26 Update
The big update this week is version 0.1.0 of oslo.upgradecheck was
released. The documentation along with usage examples can be found here
[1]. A big thanks to Ben Nemec for getting that done since a few
projects were waiting for it.

In other updates, some changes were proposed in other projects [2].

And finally, Lance Bragstad and I had a discussion this week [3] about
the validity of upgrade checks looking for deleted configuration
options. The main scenario I'm thinking about here is FFU where someone
is going from Mitaka to Pike. Let's say a config option was deprecated
in Newton and then removed in Ocata. As the operator is rolling through
from Mitaka to Pike, they might have missed the deprecation signal in
Newton and removal in Ocata. Does that mean we should have upgrade
checks that look at the configuration for deleted options, or options
where the deprecated alias is removed? My thought is that if things will
not work once they get to the target release and restart the service
code, which would definitely impact the upgrade, then checking for those
scenarios is probably OK. If on the other hand the removed options were
just tied to functionality that was removed and are otherwise not
causing any harm then I don't think we need a check for that. It was
noted that oslo.config has a new validation tool [4] so that would take
care of some of this same work if run during upgrades. So I think
whether or not an upgrade check should be looking for config option
removal ultimately depends on the severity of what happens if the manual
intervention to handle that removed option is not performed. That's
pretty broad, but these upgrade checks aren't really set in stone for
what is applied to them. I'd like to get input from others on this,
especially operators and if they would find these types of checks useful.

[1] https://docs.openstack.org/oslo.upgradecheck/latest/
[2] https://storyboard.openstack.org/#!/story/2003657
[3]
http://eavesdrop.openstack.org/irclogs/%23openstack-dev/%23openstack-dev.2018-10-10.log.html#t2018-10-10T15:17:17
[4]
http://lists.openstack.org/pipermail/openstack-dev/2018-October/135688.html

--

Thanks,

Matt

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [openstack-dev] [goals][upgrade-checkers] Week R-26 Update [ In reply to ]
On Fri, 2018-10-12 at 17:05 -0500, Matt Riedemann wrote:
> The big update this week is version 0.1.0 of oslo.upgradecheck was
> released. The documentation along with usage examples can be found
> here
> [1]. A big thanks to Ben Nemec for getting that done since a few
> projects were waiting for it.
>
> In other updates, some changes were proposed in other projects [2].
>
> And finally, Lance Bragstad and I had a discussion this week [3]
> about
> the validity of upgrade checks looking for deleted configuration
> options. The main scenario I'm thinking about here is FFU where
> someone
> is going from Mitaka to Pike. Let's say a config option was
> deprecated
> in Newton and then removed in Ocata. As the operator is rolling
> through
> from Mitaka to Pike, they might have missed the deprecation signal
> in
> Newton and removal in Ocata. Does that mean we should have upgrade
> checks that look at the configuration for deleted options, or
> options
> where the deprecated alias is removed? My thought is that if things
> will
> not work once they get to the target release and restart the service
> code, which would definitely impact the upgrade, then checking for
> those
> scenarios is probably OK. If on the other hand the removed options
> were
> just tied to functionality that was removed and are otherwise not
> causing any harm then I don't think we need a check for that. It was
> noted that oslo.config has a new validation tool [4] so that would
> take
> care of some of this same work if run during upgrades. So I think
> whether or not an upgrade check should be looking for config option
> removal ultimately depends on the severity of what happens if the
> manual
> intervention to handle that removed option is not performed. That's
> pretty broad, but these upgrade checks aren't really set in stone
> for
> what is applied to them. I'd like to get input from others on this,
> especially operators and if they would find these types of checks
> useful.
>
> [1] https://docs.openstack.org/oslo.upgradecheck/latest/
> [2] https://storyboard.openstack.org/#!/story/2003657
> [3]
> http://eavesdrop.openstack.org/irclogs/%23openstack-dev/%23openstack-dev.2018-10-10.log.html#t2018-10-10T15:17:17
> [4]
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/135688.html
>

Hey,

Nice topic, thanks Matt!

TL:DR; I would rather fail explicitly for all removals, warning on all
deprecations. My concern is, by being more surgical, we'd have to
decide what's "not causing any harm" (and I think deployers/users are
best to determine what's not causing them any harm).
Also, it's probably more work to classify based on "severity".
The quick win here (for upgrade-checks) is not about being smart, but
being an exhaustive, standardized across projects, and _always used_
source of truth for upgrades, which is complemented by release notes.

Long answer:

At some point in the past, I was working full time on upgrades using
OpenStack-Ansible.

Our process was the following:
1) Read all the project's releases notes to find upgrade documentation
2) With said release notes, Adapt our deploy tools to handle the
upgrade, or/and write ourselves extra documentation+release notes for
our deployers.
3) Try the upgrade manually, fail because some release note was missing
x or y. Find root cause and retry from step 2 until success.

Here is where I see upgrade checkers improving things:
1) No need for deployment projects to parse all release notes for
configuration changes, as tooling to upgrade check would be directly
outputting things that need to change for scenario x or y that is
included in the deployment project. No need to iterate either.

2) Test real deployer use cases. The deployers using openstack-ansible
have ultimate flexibility without our code changes. Which means they
may have different code paths than our gating. Including these checks
in all upgrades, always requiring them to pass, and making them
explicit about the changes is tremendously helpful for deployers:
- If config deprecations are handled as warnings as part of the same
process, we will output said warnings to generate a list of action
items for the deployers. We would use only one tool as source of truth
for giving the action items (and still continue the upgrade);
- If config removals are handled as errors, the upgrade will fail,
which is IMO normal, as the deployer would not have respected its
action items.

In OSA, we could probably implement a deployer override (variable). It
would allow the deployers an explicit bypass of an upgrade failure. "I
know I am doing this!". It would be useful for doing multiple serial
upgrades.

In that case, deployers could then share together their "recipes" for
handling upgrade failure bypasses for certain multi-upgrade (jumps)
scenarios. After a while, we could think of feeding those back to
upgrade checkers.

3) I like the approach of having oslo-config-validator. However, I must
admit it's not part of our process to always validate a config file
before trying to start a service in OSA. I am not sure where other
deployment projects are in terms of that usage. I am not familiar with
upgrade checker code, but I would love to see it re-using oslo-config-
validator, as it would be the unique source of truth for upgrades
before the upgrade happens (vs having to do multiple steps).
If I am completely out of my league here, tell me.

Just my 2 cents.
Jean-Philippe Evrard (evrardjp)



_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [openstack-dev] [goals][upgrade-checkers] Week R-26 Update [ In reply to ]
On 10/15/18 3:27 AM, Jean-Philippe Evrard wrote:
> On Fri, 2018-10-12 at 17:05 -0500, Matt Riedemann wrote:
>> The big update this week is version 0.1.0 of oslo.upgradecheck was
>> released. The documentation along with usage examples can be found
>> here
>> [1]. A big thanks to Ben Nemec for getting that done since a few
>> projects were waiting for it.
>>
>> In other updates, some changes were proposed in other projects [2].
>>
>> And finally, Lance Bragstad and I had a discussion this week [3]
>> about
>> the validity of upgrade checks looking for deleted configuration
>> options. The main scenario I'm thinking about here is FFU where
>> someone
>> is going from Mitaka to Pike. Let's say a config option was
>> deprecated
>> in Newton and then removed in Ocata. As the operator is rolling
>> through
>> from Mitaka to Pike, they might have missed the deprecation signal
>> in
>> Newton and removal in Ocata. Does that mean we should have upgrade
>> checks that look at the configuration for deleted options, or
>> options
>> where the deprecated alias is removed? My thought is that if things
>> will
>> not work once they get to the target release and restart the service
>> code, which would definitely impact the upgrade, then checking for
>> those
>> scenarios is probably OK. If on the other hand the removed options
>> were
>> just tied to functionality that was removed and are otherwise not
>> causing any harm then I don't think we need a check for that. It was
>> noted that oslo.config has a new validation tool [4] so that would
>> take
>> care of some of this same work if run during upgrades. So I think
>> whether or not an upgrade check should be looking for config option
>> removal ultimately depends on the severity of what happens if the
>> manual
>> intervention to handle that removed option is not performed. That's
>> pretty broad, but these upgrade checks aren't really set in stone
>> for
>> what is applied to them. I'd like to get input from others on this,
>> especially operators and if they would find these types of checks
>> useful.
>>
>> [1] https://docs.openstack.org/oslo.upgradecheck/latest/
>> [2] https://storyboard.openstack.org/#!/story/2003657
>> [3]
>> http://eavesdrop.openstack.org/irclogs/%23openstack-dev/%23openstack-dev.2018-10-10.log.html#t2018-10-10T15:17:17
>> [4]
>> http://lists.openstack.org/pipermail/openstack-dev/2018-October/135688.html
>>
>
> Hey,
>
> Nice topic, thanks Matt!
>
> TL:DR; I would rather fail explicitly for all removals, warning on all
> deprecations. My concern is, by being more surgical, we'd have to
> decide what's "not causing any harm" (and I think deployers/users are
> best to determine what's not causing them any harm).
> Also, it's probably more work to classify based on "severity".
> The quick win here (for upgrade-checks) is not about being smart, but
> being an exhaustive, standardized across projects, and _always used_
> source of truth for upgrades, which is complemented by release notes.
>
> Long answer:
>
> At some point in the past, I was working full time on upgrades using
> OpenStack-Ansible.
>
> Our process was the following:
> 1) Read all the project's releases notes to find upgrade documentation
> 2) With said release notes, Adapt our deploy tools to handle the
> upgrade, or/and write ourselves extra documentation+release notes for
> our deployers.
> 3) Try the upgrade manually, fail because some release note was missing
> x or y. Find root cause and retry from step 2 until success.
>
> Here is where I see upgrade checkers improving things:
> 1) No need for deployment projects to parse all release notes for
> configuration changes, as tooling to upgrade check would be directly
> outputting things that need to change for scenario x or y that is
> included in the deployment project. No need to iterate either.
>
> 2) Test real deployer use cases. The deployers using openstack-ansible
> have ultimate flexibility without our code changes. Which means they
> may have different code paths than our gating. Including these checks
> in all upgrades, always requiring them to pass, and making them
> explicit about the changes is tremendously helpful for deployers:
> - If config deprecations are handled as warnings as part of the same
> process, we will output said warnings to generate a list of action
> items for the deployers. We would use only one tool as source of truth
> for giving the action items (and still continue the upgrade);
> - If config removals are handled as errors, the upgrade will fail,
> which is IMO normal, as the deployer would not have respected its
> action items.

Note that deprecated config opts should already be generating warnings
in the logs. It is also possible now to use fatal-deprecations with
config opts:
https://github.com/openstack/oslo.config/commit/5f8b0e0185dafeb68cf04590948b9c9f7d727051

I'm not sure that's exactly what you're talking about, but those might
be useful to get us at least part of the way there.

>
> In OSA, we could probably implement a deployer override (variable). It
> would allow the deployers an explicit bypass of an upgrade failure. "I
> know I am doing this!". It would be useful for doing multiple serial
> upgrades.
>
> In that case, deployers could then share together their "recipes" for
> handling upgrade failure bypasses for certain multi-upgrade (jumps)
> scenarios. After a while, we could think of feeding those back to
> upgrade checkers.
>
> 3) I like the approach of having oslo-config-validator. However, I must
> admit it's not part of our process to always validate a config file
> before trying to start a service in OSA. I am not sure where other
> deployment projects are in terms of that usage. I am not familiar with
> upgrade checker code, but I would love to see it re-using oslo-config-
> validator, as it would be the unique source of truth for upgrades
> before the upgrade happens (vs having to do multiple steps).
> If I am completely out of my league here, tell me.

This is a bit tricky as the validator requires information that is not
necessarily available in a production environment. Specifically, it
either needs the oslo-config-generator configuration file that lists all
of the namespaces a project uses, or it needs a generated
machine-readable sample config that contains all of the opt data. The
latter is not generally available today, and I'm not sure whether the
former is either. A quick pip install of an OpenStack service suggests
that it is not.

Ideally, the machine-readable sample config would be available from
packages anyway as it has other uses too, but it's a pretty big ask to
get all of the packagers shipping that this cycle. I'm not sure how it
would work with pip installs either, although it seems like we should be
able to figure out something there.

Anyway, not saying we shouldn't do it, but I want to make it clear that
this isn't as simple as just adding one more check to the upgrade
checkers. There are some other dependencies to doing this in a
non-service-specific way.

>
> Just my 2 cents.
> Jean-Philippe Evrard (evrardjp)
>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
Re: [goals][upgrade-checkers] Week R-26 Update [ In reply to ]
---- On Sat, 13 Oct 2018 07:05:53 +0900 Matt Riedemann <mriedemos@gmail.com> wrote ----
> The big update this week is version 0.1.0 of oslo.upgradecheck was
> released. The documentation along with usage examples can be found here
> [1]. A big thanks to Ben Nemec for getting that done since a few
> projects were waiting for it.
>
> In other updates, some changes were proposed in other projects [2].
>
> And finally, Lance Bragstad and I had a discussion this week [3] about
> the validity of upgrade checks looking for deleted configuration
> options. The main scenario I'm thinking about here is FFU where someone
> is going from Mitaka to Pike. Let's say a config option was deprecated
> in Newton and then removed in Ocata. As the operator is rolling through
> from Mitaka to Pike, they might have missed the deprecation signal in
> Newton and removal in Ocata. Does that mean we should have upgrade
> checks that look at the configuration for deleted options, or options
> where the deprecated alias is removed? My thought is that if things will
> not work once they get to the target release and restart the service
> code, which would definitely impact the upgrade, then checking for those
> scenarios is probably OK. If on the other hand the removed options were
> just tied to functionality that was removed and are otherwise not
> causing any harm then I don't think we need a check for that. It was
> noted that oslo.config has a new validation tool [4] so that would take
> care of some of this same work if run during upgrades. So I think
> whether or not an upgrade check should be looking for config option
> removal ultimately depends on the severity of what happens if the manual
> intervention to handle that removed option is not performed. That's
> pretty broad, but these upgrade checks aren't really set in stone for
> what is applied to them. I'd like to get input from others on this,
> especially operators and if they would find these types of checks useful.
>
> [1] https://docs.openstack.org/oslo.upgradecheck/latest/
> [2] https://storyboard.openstack.org/#!/story/2003657
> [3]
> http://eavesdrop.openstack.org/irclogs/%23openstack-dev/%23openstack-dev.2018-10-10.log.html#t2018-10-10T15:17:17
> [4]
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/135688.html

Other point is about policy change and how we should accommodate those in upgrade-checks.

There are below categorization of policy changes:
1. Policy rule name has been changed.
Upgrade Impact: If that policy rule is overridden in policy.json then, yes we need to tell this in upgrade-check CLI. If not overridden which means operators depends on policy in code then, it would not impact their upgrade.
2. Policy rule (deprecated) has been removed
Upgrade Impact: YES, as it can impact their API access after upgrade. This needs to be cover in upgrade-checks
3. Default value (including scope) of Policy rule has been changed
Upgrade Impact: YES, this can change the access level of their API after upgrade. This needs to be cover in upgrade-checks
4. New Policy rule introduced
Upgrade Impact: YES, same reason.

I think policy changes can be added in upgrade checker by checking all the above category because everything will impact upgrade?

For Example, cinder policy change [1]:

"Add granularity to the volume_extension:volume_type_encryption policy with the addition of distinct actions for create, get, update, and delete:

volume_extension:volume_type_encryption:create
volume_extension:volume_type_encryption:get
volume_extension:volume_type_encryption:update
volume_extension:volume_type_encryption:delete
To address backwards compatibility, the new rules added to the volume_type.py policy file, default to the existing rule, volume_extension:volume_type_encryption, if it is set to a non-default value. "

[1] https://docs.openstack.org/releasenotes/cinder/unreleased.html#upgrade-notes

-gmann

>
> --
>
> Thanks,
>
> Matt
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>



_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators