Mailing List Archive: nova_api resource_providers table issues on ocata

nova_api resource_providers table issues on ocata

ignaziocassano at gmail

Oct 16, 2018, 6:27 AM

Post #1 of 15 (1510 views)

Hi everybody,
when on my ocata installation based on centos7 I update (only update not
changing openstack version) some kvm compute nodes, I diescovered uuid in
resource_providers nova_api db table are different from uuid in
compute_nodes nova db table.
This causes several errors in nova-compute service, because it not able to
receive instances anymore.
Aligning uuid from compute_nodes solves this problem.
Could anyone tel me if it is a bug ?

Regards
Ignazio

Re: nova_api resource_providers table issues on ocata [ In reply to ]

sbauza at redhat

Oct 16, 2018, 7:11 AM

Post #2 of 15 (1510 views)

On Tue, Oct 16, 2018 at 3:28 PM Ignazio Cassano <ignaziocassano@gmail.com>
wrote:

> Hi everybody,
> when on my ocata installation based on centos7 I update (only update not
> changing openstack version) some kvm compute nodes, I diescovered uuid in
> resource_providers nova_api db table are different from uuid in
> compute_nodes nova db table.
> This causes several errors in nova-compute service, because it not able to
> receive instances anymore.
> Aligning uuid from compute_nodes solves this problem.
> Could anyone tel me if it is a bug ?
>
>
What do you mean by "updating some compute nodes" ? In Nova, we consider
uniqueness of compute nodes by a tuple (host, hypervisor_hostname) where
host is your nova-compute service name for this compute host, and
hypervisor_hostname is in the case of libvirt the 'hostname' reported by
the libvirt API [1]

If somehow one of the two values change, then the Nova Resource Tracker
will consider this new record as a separate compute node, hereby creating a
new compute_nodes table record, and then a new UUID.
Could you please check your compute_nodes table and see whether some
entries were recently created ?

-Sylvain

[1]
https://libvirt.org/docs/libvirt-appdev-guide-python/en-US/html/libvirt_application_development_guide_using_python-Connections-Host_Info.html

Regards
> Ignazio
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>

Re: nova_api resource_providers table issues on ocata [ In reply to ]

ignaziocassano at gmail

Oct 16, 2018, 7:22 AM

Post #3 of 15 (1510 views)

Hi Sylvain,
I mean launching "yum update" on compute nodes.
Now I am going to describe what happened.
We had an environment made up of 3 kvm nodes.
We added two new compute nodes.
Since the addition has been made after 3 or 4 months after the first
openstack installation, the 2 new compute nodes are updated to most recent
ocata packages.
So we launched a yum update also on the 3 old compute nodes.
After the above operations, the resource_providers table contains wrong
uuid for the 3 old nodes and they stooped to work.
Updating resource_providers uuid getting them from compute_nodes table, the
old 3 nodes return to work fine.
Regards
Ignazio

Il giorno mar 16 ott 2018 alle ore 16:11 Sylvain Bauza <sbauza@redhat.com>
ha scritto:

>
>
> On Tue, Oct 16, 2018 at 3:28 PM Ignazio Cassano <ignaziocassano@gmail.com>
> wrote:
>
>> Hi everybody,
>> when on my ocata installation based on centos7 I update (only update not
>> changing openstack version) some kvm compute nodes, I diescovered uuid in
>> resource_providers nova_api db table are different from uuid in
>> compute_nodes nova db table.
>> This causes several errors in nova-compute service, because it not able
>> to receive instances anymore.
>> Aligning uuid from compute_nodes solves this problem.
>> Could anyone tel me if it is a bug ?
>>
>>
> What do you mean by "updating some compute nodes" ? In Nova, we consider
> uniqueness of compute nodes by a tuple (host, hypervisor_hostname) where
> host is your nova-compute service name for this compute host, and
> hypervisor_hostname is in the case of libvirt the 'hostname' reported by
> the libvirt API [1]
>
> If somehow one of the two values change, then the Nova Resource Tracker
> will consider this new record as a separate compute node, hereby creating a
> new compute_nodes table record, and then a new UUID.
> Could you please check your compute_nodes table and see whether some
> entries were recently created ?
>
> -Sylvain
>
> [1]
> https://libvirt.org/docs/libvirt-appdev-guide-python/en-US/html/libvirt_application_development_guide_using_python-Connections-Host_Info.html
>
> Regards
>> Ignazio
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>

Re: nova_api resource_providers table issues on ocata [ In reply to ]

iain.macdonnell at oracle

Oct 16, 2018, 8:19 AM

Post #4 of 15 (1510 views)

Is it possible that the hostnames of the nodes changed when you updated
them? e.g. maybe they were using fully-qualified names before and
changed to short-form, or vice versa ?

~iain

On 10/16/2018 07:22 AM, Ignazio Cassano wrote:
> Hi Sylvain,
> I mean launching "yum update" on compute nodes.
> Now I am going to describe what happened.
> We had an environment made up of 3 kvm nodes.
> We added two new compute nodes.
> Since the addition has been made after 3 or 4 months after the first
> openstack installation, the 2 new compute nodes are updated to most
> recent ocata packages.
> So we launched a yum update also on the 3 old compute nodes.
> After the above operations, the resource_providers table contains wrong
> uuid for the 3 old nodes and they stooped to work.
> Updating resource_providers uuid getting them from compute_nodes table,
> the old 3 nodes return to work fine.
> Regards
> Ignazio
>
> Il giorno mar 16 ott 2018 alle ore 16:11 Sylvain Bauza
> <sbauza@redhat.com <mailto:sbauza@redhat.com>> ha scritto:
>
>
>
> On Tue, Oct 16, 2018 at 3:28 PM Ignazio Cassano
> <ignaziocassano@gmail.com <mailto:ignaziocassano@gmail.com>> wrote:
>
> Hi everybody,
> when on my ocata installation based on centos7 I update (only
> update not changing openstack version) some kvm compute nodes,
> I diescovered uuid in resource_providers nova_api db table are
> different from uuid in compute_nodes nova db table.
> This causes several errors in nova-compute service, because it
> not able to receive instances anymore.
> Aligning uuid from compute_nodes solves this problem.
> Could anyone tel me if it is a bug ?
>
>
> What do you mean by "updating some compute nodes" ? In Nova, we
> consider uniqueness of compute nodes by a tuple (host,
> hypervisor_hostname) where host is your nova-compute service name
> for this compute host, and hypervisor_hostname is in the case of
> libvirt the 'hostname' reported by the libvirt API [1]
>
> If somehow one of the two values change, then the Nova Resource
> Tracker will consider this new record as a separate compute node,
> hereby creating a new compute_nodes table record, and then a new UUID.
> Could you please check your compute_nodes table and see whether some
> entries were recently created ?
>
> -Sylvain
>
> [1]
> https://libvirt.org/docs/libvirt-appdev-guide-python/en-US/html/libvirt_application_development_guide_using_python-Connections-Host_Info.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__libvirt.org_docs_libvirt-2Dappdev-2Dguide-2Dpython_en-2DUS_html_libvirt-5Fapplication-5Fdevelopment-5Fguide-5Fusing-5Fpython-2DConnections-2DHost-5FInfo.html&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=_TK1Um7U6rr6DWfsEbv4Rlnc21v6RU0YDRepaIogZrI&s=-qYx_DDcBW_aiXp2tLBcR4pN0VZ9ZNclcx5LfIVor_E&e=>
>
> Regards
> Ignazio
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> <mailto:OpenStack-operators@lists.openstack.org>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_cgi-2Dbin_mailman_listinfo_openstack-2Doperators&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=_TK1Um7U6rr6DWfsEbv4Rlnc21v6RU0YDRepaIogZrI&s=COsaMeTCgWBDl9EQVZB_AGikvKqCIaWcA5RY7IcLYgw&e=>
>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_cgi-2Dbin_mailman_listinfo_openstack-2Doperators&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=_TK1Um7U6rr6DWfsEbv4Rlnc21v6RU0YDRepaIogZrI&s=COsaMeTCgWBDl9EQVZB_AGikvKqCIaWcA5RY7IcLYgw&e=
>

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: nova_api resource_providers table issues on ocata [ In reply to ]

ignaziocassano at gmail

Oct 16, 2018, 8:54 AM

Post #5 of 15 (1510 views)

hello Iain, it is not possible.
I checked hostnames several times.
No changes .
I tried same procedure on 3 different ocata installations because we have
3 distinct openstack . Same results.
Regards
Ignazio

Il Mar 16 Ott 2018 17:20 iain MacDonnell <iain.macdonnell@oracle.com> ha
scritto:

>
> Is it possible that the hostnames of the nodes changed when you updated
> them? e.g. maybe they were using fully-qualified names before and
> changed to short-form, or vice versa ?
>
> ~iain
>
>
> On 10/16/2018 07:22 AM, Ignazio Cassano wrote:
> > Hi Sylvain,
> > I mean launching "yum update" on compute nodes.
> > Now I am going to describe what happened.
> > We had an environment made up of 3 kvm nodes.
> > We added two new compute nodes.
> > Since the addition has been made after 3 or 4 months after the first
> > openstack installation, the 2 new compute nodes are updated to most
> > recent ocata packages.
> > So we launched a yum update also on the 3 old compute nodes.
> > After the above operations, the resource_providers table contains wrong
> > uuid for the 3 old nodes and they stooped to work.
> > Updating resource_providers uuid getting them from compute_nodes table,
> > the old 3 nodes return to work fine.
> > Regards
> > Ignazio
> >
> > Il giorno mar 16 ott 2018 alle ore 16:11 Sylvain Bauza
> > <sbauza@redhat.com <mailto:sbauza@redhat.com>> ha scritto:
> >
> >
> >
> > On Tue, Oct 16, 2018 at 3:28 PM Ignazio Cassano
> > <ignaziocassano@gmail.com <mailto:ignaziocassano@gmail.com>> wrote:
> >
> > Hi everybody,
> > when on my ocata installation based on centos7 I update (only
> > update not changing openstack version) some kvm compute nodes,
> > I diescovered uuid in resource_providers nova_api db table are
> > different from uuid in compute_nodes nova db table.
> > This causes several errors in nova-compute service, because it
> > not able to receive instances anymore.
> > Aligning uuid from compute_nodes solves this problem.
> > Could anyone tel me if it is a bug ?
> >
> >
> > What do you mean by "updating some compute nodes" ? In Nova, we
> > consider uniqueness of compute nodes by a tuple (host,
> > hypervisor_hostname) where host is your nova-compute service name
> > for this compute host, and hypervisor_hostname is in the case of
> > libvirt the 'hostname' reported by the libvirt API [1]
> >
> > If somehow one of the two values change, then the Nova Resource
> > Tracker will consider this new record as a separate compute node,
> > hereby creating a new compute_nodes table record, and then a new
> UUID.
> > Could you please check your compute_nodes table and see whether some
> > entries were recently created ?
> >
> > -Sylvain
> >
> > [1]
> >
> https://libvirt.org/docs/libvirt-appdev-guide-python/en-US/html/libvirt_application_development_guide_using_python-Connections-Host_Info.html
> > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__libvirt.org_docs_libvirt-2Dappdev-2Dguide-2Dpython_en-2DUS_html_libvirt-5Fapplication-5Fdevelopment-5Fguide-5Fusing-5Fpython-2DConnections-2DHost-5FInfo.html&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=_TK1Um7U6rr6DWfsEbv4Rlnc21v6RU0YDRepaIogZrI&s=-qYx_DDcBW_aiXp2tLBcR4pN0VZ9ZNclcx5LfIVor_E&e=
> >
> >
> > Regards
> > Ignazio
> > _______________________________________________
> > OpenStack-operators mailing list
> > OpenStack-operators@lists.openstack.org
> > <mailto:OpenStack-operators@lists.openstack.org>
> >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > <
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_cgi-2Dbin_mailman_listinfo_openstack-2Doperators&d=DwMFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=_TK1Um7U6rr6DWfsEbv4Rlnc21v6RU0YDRepaIogZrI&s=COsaMeTCgWBDl9EQVZB_AGikvKqCIaWcA5RY7IcLYgw&e=
> >
> >
> >
> >
> > _______________________________________________
> > OpenStack-operators mailing list
> > OpenStack-operators@lists.openstack.org
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.openstack.org_cgi-2Dbin_mailman_listinfo_openstack-2Doperators&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=_TK1Um7U6rr6DWfsEbv4Rlnc21v6RU0YDRepaIogZrI&s=COsaMeTCgWBDl9EQVZB_AGikvKqCIaWcA5RY7IcLYgw&e=
> >
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>

Re: nova_api resource_providers table issues on ocata [ In reply to ]

jaypipes at gmail

Oct 16, 2018, 3:56 PM

Post #6 of 15 (1505 views)

On 10/16/2018 10:11 AM, Sylvain Bauza wrote:
> On Tue, Oct 16, 2018 at 3:28 PM Ignazio Cassano
> <ignaziocassano@gmail.com <mailto:ignaziocassano@gmail.com>> wrote:
>
> Hi everybody,
> when on my ocata installation based on centos7 I update (only update
> not changing openstack version) some kvm compute nodes, I
> diescovered uuid in resource_providers nova_api db table are
> different from uuid in compute_nodes nova db table.
> This causes several errors in nova-compute service, because it not
> able to receive instances anymore.
> Aligning uuid from compute_nodes solves this problem.
> Could anyone tel me if it is a bug ?
>
>
> What do you mean by "updating some compute nodes" ? In Nova, we consider
> uniqueness of compute nodes by a tuple (host, hypervisor_hostname) where
> host is your nova-compute service name for this compute host, and
> hypervisor_hostname is in the case of libvirt the 'hostname' reported by
> the libvirt API [1]
>
> If somehow one of the two values change, then the Nova Resource Tracker
> will consider this new record as a separate compute node, hereby
> creating a new compute_nodes table record, and then a new UUID.
> Could you please check your compute_nodes table and see whether some
> entries were recently created ?

The compute_nodes table has no unique constraint on the
hypervisor_hostname field unfortunately, even though it should. It's not
like you can have two compute nodes with the same hostname. But, alas,
this is one of those vestigial tails in nova due to poor initial table
design and coupling between the concept of a nova-compute service worker
and the hypervisor resource node itself.

Ignazio, I was tempted to say you may have run into this:

https://bugs.launchpad.net/nova/+bug/1714248

But then I see you're not using Ironic... I'm not entirely sure how you
ended up with duplicate hypervisor_hostname records for the same compute
node, but some of those duplicate records must have had the deleted
field set to a non-zero value, given the constraint we currently have on
(host, hypervisor_hostname, deleted).

This means that your deployment script or some external scripts must
have been deleting compute node records somehow, though I'm not entirely
sure how...

Best,
-jay

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: nova_api resource_providers table issues on ocata [ In reply to ]

ignaziocassano at gmail

Oct 16, 2018, 10:41 PM

Post #7 of 15 (1505 views)

Hello Jay, when I add a New compute node I run nova-manage cell_v2
discover host .
IS it possible this command update the old host uuid in resource table?
Regards
Ignazio

Il Mer 17 Ott 2018 00:56 Jay Pipes <jaypipes@gmail.com> ha scritto:

> On 10/16/2018 10:11 AM, Sylvain Bauza wrote:
> > On Tue, Oct 16, 2018 at 3:28 PM Ignazio Cassano
> > <ignaziocassano@gmail.com <mailto:ignaziocassano@gmail.com>> wrote:
> >
> > Hi everybody,
> > when on my ocata installation based on centos7 I update (only update
> > not changing openstack version) some kvm compute nodes, I
> > diescovered uuid in resource_providers nova_api db table are
> > different from uuid in compute_nodes nova db table.
> > This causes several errors in nova-compute service, because it not
> > able to receive instances anymore.
> > Aligning uuid from compute_nodes solves this problem.
> > Could anyone tel me if it is a bug ?
> >
> >
> > What do you mean by "updating some compute nodes" ? In Nova, we consider
> > uniqueness of compute nodes by a tuple (host, hypervisor_hostname) where
> > host is your nova-compute service name for this compute host, and
> > hypervisor_hostname is in the case of libvirt the 'hostname' reported by
> > the libvirt API [1]
> >
> > If somehow one of the two values change, then the Nova Resource Tracker
> > will consider this new record as a separate compute node, hereby
> > creating a new compute_nodes table record, and then a new UUID.
> > Could you please check your compute_nodes table and see whether some
> > entries were recently created ?
>
> The compute_nodes table has no unique constraint on the
> hypervisor_hostname field unfortunately, even though it should. It's not
> like you can have two compute nodes with the same hostname. But, alas,
> this is one of those vestigial tails in nova due to poor initial table
> design and coupling between the concept of a nova-compute service worker
> and the hypervisor resource node itself.
>
> Ignazio, I was tempted to say you may have run into this:
>
> https://bugs.launchpad.net/nova/+bug/1714248
>
> But then I see you're not using Ironic... I'm not entirely sure how you
> ended up with duplicate hypervisor_hostname records for the same compute
> node, but some of those duplicate records must have had the deleted
> field set to a non-zero value, given the constraint we currently have on
> (host, hypervisor_hostname, deleted).
>
> This means that your deployment script or some external scripts must
> have been deleting compute node records somehow, though I'm not entirely
> sure how...
>
> Best,
> -jay
>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>

Re: nova_api resource_providers table issues on ocata [ In reply to ]

sbauza at redhat

Oct 17, 2018, 7:01 AM

Post #8 of 15 (1503 views)

On Wed, Oct 17, 2018 at 12:56 AM Jay Pipes <jaypipes@gmail.com> wrote:

> On 10/16/2018 10:11 AM, Sylvain Bauza wrote:
> > On Tue, Oct 16, 2018 at 3:28 PM Ignazio Cassano
> > <ignaziocassano@gmail.com <mailto:ignaziocassano@gmail.com>> wrote:
> >
> > Hi everybody,
> > when on my ocata installation based on centos7 I update (only update
> > not changing openstack version) some kvm compute nodes, I
> > diescovered uuid in resource_providers nova_api db table are
> > different from uuid in compute_nodes nova db table.
> > This causes several errors in nova-compute service, because it not
> > able to receive instances anymore.
> > Aligning uuid from compute_nodes solves this problem.
> > Could anyone tel me if it is a bug ?
> >
> >
> > What do you mean by "updating some compute nodes" ? In Nova, we consider
> > uniqueness of compute nodes by a tuple (host, hypervisor_hostname) where
> > host is your nova-compute service name for this compute host, and
> > hypervisor_hostname is in the case of libvirt the 'hostname' reported by
> > the libvirt API [1]
> >
> > If somehow one of the two values change, then the Nova Resource Tracker
> > will consider this new record as a separate compute node, hereby
> > creating a new compute_nodes table record, and then a new UUID.
> > Could you please check your compute_nodes table and see whether some
> > entries were recently created ?
>
> The compute_nodes table has no unique constraint on the
> hypervisor_hostname field unfortunately, even though it should. It's not
> like you can have two compute nodes with the same hostname. But, alas,
> this is one of those vestigial tails in nova due to poor initial table
> design and coupling between the concept of a nova-compute service worker
> and the hypervisor resource node itself.
>
>
Sorry if I was unclear, but I meant we have a UK for (host,
hypervisor_hostname, deleted) (I didn't explain about deleted, but meh).
https://github.com/openstack/nova/blob/01c33c5/nova/db/sqlalchemy/models.py#L116-L118

But yeah, we don't have any UK for just (hypervisor_hostname, deleted),
sure.

Ignazio, I was tempted to say you may have run into this:
>
> https://bugs.launchpad.net/nova/+bug/1714248
>
> But then I see you're not using Ironic... I'm not entirely sure how you
> ended up with duplicate hypervisor_hostname records for the same compute
> node, but some of those duplicate records must have had the deleted
> field set to a non-zero value, given the constraint we currently have on
> (host, hypervisor_hostname, deleted).
>
> This means that your deployment script or some external scripts must
> have been deleting compute node records somehow, though I'm not entirely
> sure how...
>
>
Yeah that's why I asked for the compute_nodes records. Ignazio, could you
please verify this ?
Do you have multiple records for the same (host, hypervisor_hostname) tuple
?

'select from compute_nodes where host=XXX and hypervisor_hostname=YYY'

-Sylvain

Best,
> -jay
>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>

Re: nova_api resource_providers table issues on ocata [ In reply to ]

ignaziocassano at gmail

Oct 17, 2018, 7:13 AM

Post #9 of 15 (1503 views)

Hello Sylvain, here the output of some selects:
MariaDB [nova]> select host,hypervisor_hostname from compute_nodes;
+--------------+---------------------+
| host | hypervisor_hostname |
+--------------+---------------------+
| podto1-kvm01 | podto1-kvm01 |
| podto1-kvm02 | podto1-kvm02 |
| podto1-kvm03 | podto1-kvm03 |
| podto1-kvm04 | podto1-kvm04 |
| podto1-kvm05 | podto1-kvm05 |
+--------------+---------------------+

MariaDB [nova]> select host from compute_nodes where host='podto1-kvm01'
and hypervisor_hostname='podto1-kvm01';
+--------------+
| host |
+--------------+
| podto1-kvm01 |
+--------------+

Il giorno mer 17 ott 2018 alle ore 16:02 Sylvain Bauza <sbauza@redhat.com>
ha scritto:

>
>
> On Wed, Oct 17, 2018 at 12:56 AM Jay Pipes <jaypipes@gmail.com> wrote:
>
>> On 10/16/2018 10:11 AM, Sylvain Bauza wrote:
>> > On Tue, Oct 16, 2018 at 3:28 PM Ignazio Cassano
>> > <ignaziocassano@gmail.com <mailto:ignaziocassano@gmail.com>> wrote:
>> >
>> > Hi everybody,
>> > when on my ocata installation based on centos7 I update (only update
>> > not changing openstack version) some kvm compute nodes, I
>> > diescovered uuid in resource_providers nova_api db table are
>> > different from uuid in compute_nodes nova db table.
>> > This causes several errors in nova-compute service, because it not
>> > able to receive instances anymore.
>> > Aligning uuid from compute_nodes solves this problem.
>> > Could anyone tel me if it is a bug ?
>> >
>> >
>> > What do you mean by "updating some compute nodes" ? In Nova, we
>> consider
>> > uniqueness of compute nodes by a tuple (host, hypervisor_hostname)
>> where
>> > host is your nova-compute service name for this compute host, and
>> > hypervisor_hostname is in the case of libvirt the 'hostname' reported
>> by
>> > the libvirt API [1]
>> >
>> > If somehow one of the two values change, then the Nova Resource Tracker
>> > will consider this new record as a separate compute node, hereby
>> > creating a new compute_nodes table record, and then a new UUID.
>> > Could you please check your compute_nodes table and see whether some
>> > entries were recently created ?
>>
>> The compute_nodes table has no unique constraint on the
>> hypervisor_hostname field unfortunately, even though it should. It's not
>> like you can have two compute nodes with the same hostname. But, alas,
>> this is one of those vestigial tails in nova due to poor initial table
>> design and coupling between the concept of a nova-compute service worker
>> and the hypervisor resource node itself.
>>
>>
> Sorry if I was unclear, but I meant we have a UK for (host,
> hypervisor_hostname, deleted) (I didn't explain about deleted, but meh).
>
> https://github.com/openstack/nova/blob/01c33c5/nova/db/sqlalchemy/models.py#L116-L118
>
> But yeah, we don't have any UK for just (hypervisor_hostname, deleted),
> sure.
>
> Ignazio, I was tempted to say you may have run into this:
>>
>> https://bugs.launchpad.net/nova/+bug/1714248
>>
>> But then I see you're not using Ironic... I'm not entirely sure how you
>> ended up with duplicate hypervisor_hostname records for the same compute
>> node, but some of those duplicate records must have had the deleted
>> field set to a non-zero value, given the constraint we currently have on
>> (host, hypervisor_hostname, deleted).
>>
>> This means that your deployment script or some external scripts must
>> have been deleting compute node records somehow, though I'm not entirely
>> sure how...
>>
>>
> Yeah that's why I asked for the compute_nodes records. Ignazio, could you
> please verify this ?
> Do you have multiple records for the same (host, hypervisor_hostname)
> tuple ?
>
> 'select from compute_nodes where host=XXX and hypervisor_hostname=YYY'
>
>
> -Sylvain
>
> Best,
>> -jay
>>
>>
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>

Re: nova_api resource_providers table issues on ocata [ In reply to ]

jaypipes at gmail

Oct 17, 2018, 7:19 AM

Post #10 of 15 (1503 views)

On 10/17/2018 01:41 AM, Ignazio Cassano wrote:
> Hello Jay, when I add a New compute node I run nova-manage cell_v2
> discover host .
> IS it possible this command update the old host uuid in resource table?

No, not unless you already had a nova-compute installed on a host with
the exact same hostname... which, from looking at the output of your
SELECT from compute_nodes table, doesn't seem to be the case.

In short, I think both Sylvain and I are stumped as to how your
placement resource_providers table ended up with these phantom records :(

-jay

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: nova_api resource_providers table issues on ocata [ In reply to ]

ignaziocassano at gmail

Oct 17, 2018, 7:37 AM

Post #11 of 15 (1503 views)

Hello, I am sure we are not using nova-compute with duplicate names.
As I told previously we tried on 3 differents openstack installations and
we faced the same issue.
Procedure used
We have an openstack with 3 compute nodes : podto1-kvm01, podto1-kvm02,
podto1-kvm03
1) install a new compute node (podto1-kvm04)
2) On controller we discovered the new compute node: su -s /bin/sh -c
"nova-manage cell_v2 discover_hosts --verbose" nova
3) Evacuate podto1-kvm01
4) yum update on podto1-kvm01 and reboot it
5) Evacuate podto1-kvm02
6) yum update on podto1-kvm02 and reboot it
7) Evacuate podto1-kvm03
8) yum update podto1-kvm03 and reboot it

Regards

Il giorno mer 17 ott 2018 alle ore 16:19 Jay Pipes <jaypipes@gmail.com> ha
scritto:

> On 10/17/2018 01:41 AM, Ignazio Cassano wrote:
> > Hello Jay, when I add a New compute node I run nova-manage cell_v2
> > discover host .
> > IS it possible this command update the old host uuid in resource table?
>
> No, not unless you already had a nova-compute installed on a host with
> the exact same hostname... which, from looking at the output of your
> SELECT from compute_nodes table, doesn't seem to be the case.
>
> In short, I think both Sylvain and I are stumped as to how your
> placement resource_providers table ended up with these phantom records :(
>
> -jay
>

Re: nova_api resource_providers table issues on ocata [ In reply to ]

mriedemos at gmail

Oct 17, 2018, 7:37 AM

Post #12 of 15 (1503 views)

On 10/17/2018 9:13 AM, Ignazio Cassano wrote:
> Hello Sylvain, here the output of some selects:
> MariaDB [nova]> select host,hypervisor_hostname from compute_nodes;
> +--------------+---------------------+
> | host         | hypervisor_hostname |
> +--------------+---------------------+
> | podto1-kvm01 | podto1-kvm01        |
> | podto1-kvm02 | podto1-kvm02        |
> | podto1-kvm03 | podto1-kvm03        |
> | podto1-kvm04 | podto1-kvm04        |
> | podto1-kvm05 | podto1-kvm05        |
> +--------------+---------------------+
>
> MariaDB [nova]> select host from compute_nodes where host='podto1-kvm01'
> and hypervisor_hostname='podto1-kvm01';
> +--------------+
> | host         |
> +--------------+
> | podto1-kvm01 |
> +--------------+

Does your upgrade tooling run a db archive/purge at all? It's possible
that the actual services table record was deleted via the os-services
REST API for some reason, which would delete the compute_nodes table
record, and then a restart of the nova-compute process would recreate
the services and compute_nodes table records, but with a new compute
node uuid and thus a new resource provider.

Maybe query your shadow_services and shadow_compute_nodes tables for
"podto1-kvm01" and see if a record existed at one point, was deleted and
then archived to the shadow tables.

--

Thanks,

Matt

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: nova_api resource_providers table issues on ocata [ In reply to ]

ignaziocassano at gmail

Oct 17, 2018, 7:44 AM

Post #13 of 15 (1503 views)

Hello, here the select you suggested:

MariaDB [nova]> select * from shadow_services;
Empty set (0,00 sec)

MariaDB [nova]> select * from shadow_compute_nodes;
Empty set (0,00 sec)

As far as the upgrade tooling is concerned, we are using only yum update on
old compute nodes to have same packages installed on the new compute-nodes
Procedure used
We have an openstack with 3 compute nodes : podto1-kvm01, podto1-kvm02,
podto1-kvm03
1) install a new compute node (podto1-kvm04)
2) On controller we discovered the new compute node: su -s /bin/sh -c
"nova-manage cell_v2 discover_hosts --verbose" nova
3) Evacuate podto1-kvm01
4) yum update on podto1-kvm01 and reboot it
5) Evacuate podto1-kvm02
6) yum update on podto1-kvm02 and reboot it
7) Evacuate podto1-kvm03
8) yum update podto1-kvm03 and reboot it

Il giorno mer 17 ott 2018 alle ore 16:37 Matt Riedemann <mriedemos@gmail.com>
ha scritto:

> On 10/17/2018 9:13 AM, Ignazio Cassano wrote:
> > Hello Sylvain, here the output of some selects:
> > MariaDB [nova]> select host,hypervisor_hostname from compute_nodes;
> > +--------------+---------------------+
> > | host | hypervisor_hostname |
> > +--------------+---------------------+
> > | podto1-kvm01 | podto1-kvm01 |
> > | podto1-kvm02 | podto1-kvm02 |
> > | podto1-kvm03 | podto1-kvm03 |
> > | podto1-kvm04 | podto1-kvm04 |
> > | podto1-kvm05 | podto1-kvm05 |
> > +--------------+---------------------+
> >
> > MariaDB [nova]> select host from compute_nodes where host='podto1-kvm01'
> > and hypervisor_hostname='podto1-kvm01';
> > +--------------+
> > | host |
> > +--------------+
> > | podto1-kvm01 |
> > +--------------+
>
> Does your upgrade tooling run a db archive/purge at all? It's possible
> that the actual services table record was deleted via the os-services
> REST API for some reason, which would delete the compute_nodes table
> record, and then a restart of the nova-compute process would recreate
> the services and compute_nodes table records, but with a new compute
> node uuid and thus a new resource provider.
>
> Maybe query your shadow_services and shadow_compute_nodes tables for
> "podto1-kvm01" and see if a record existed at one point, was deleted and
> then archived to the shadow tables.
>
> --
>
> Thanks,
>
> Matt
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>

Re: nova_api resource_providers table issues on ocata [ In reply to ]

sbauza at redhat

Oct 18, 2018, 1:24 AM

Post #14 of 15 (1503 views)

On Wed, Oct 17, 2018 at 4:46 PM Ignazio Cassano <ignaziocassano@gmail.com>
wrote:

> Hello, here the select you suggested:
>
> MariaDB [nova]> select * from shadow_services;
> Empty set (0,00 sec)
>
> MariaDB [nova]> select * from shadow_compute_nodes;
> Empty set (0,00 sec)
>
> As far as the upgrade tooling is concerned, we are using only yum update
> on old compute nodes to have same packages installed on the new
> compute-nodes
>

Well, to be honest, I was looking at some other bug for OSP
https://bugzilla.redhat.com/show_bug.cgi?id=1636463 which is pretty
identical so you're not alone :-)
For some reason, yum update modifies something in the DB that I don't know
yet. Which exact packages are you using ? RDO ones ?

I marked the downstream bug as NOTABUG since I wasn't able to reproduce it
and given I also provided a SQL query for fixing it, but maybe we should
try to see which specific package has a problem...

-Sylvain

Procedure used
> We have an openstack with 3 compute nodes : podto1-kvm01, podto1-kvm02,
> podto1-kvm03
> 1) install a new compute node (podto1-kvm04)
> 2) On controller we discovered the new compute node: su -s /bin/sh -c
> "nova-manage cell_v2 discover_hosts --verbose" nova
> 3) Evacuate podto1-kvm01
> 4) yum update on podto1-kvm01 and reboot it
> 5) Evacuate podto1-kvm02
> 6) yum update on podto1-kvm02 and reboot it
> 7) Evacuate podto1-kvm03
> 8) yum update podto1-kvm03 and reboot it
>
>
>
> Il giorno mer 17 ott 2018 alle ore 16:37 Matt Riedemann <
> mriedemos@gmail.com> ha scritto:
>
>> On 10/17/2018 9:13 AM, Ignazio Cassano wrote:
>> > Hello Sylvain, here the output of some selects:
>> > MariaDB [nova]> select host,hypervisor_hostname from compute_nodes;
>> > +--------------+---------------------+
>> > | host | hypervisor_hostname |
>> > +--------------+---------------------+
>> > | podto1-kvm01 | podto1-kvm01 |
>> > | podto1-kvm02 | podto1-kvm02 |
>> > | podto1-kvm03 | podto1-kvm03 |
>> > | podto1-kvm04 | podto1-kvm04 |
>> > | podto1-kvm05 | podto1-kvm05 |
>> > +--------------+---------------------+
>> >
>> > MariaDB [nova]> select host from compute_nodes where
>> host='podto1-kvm01'
>> > and hypervisor_hostname='podto1-kvm01';
>> > +--------------+
>> > | host |
>> > +--------------+
>> > | podto1-kvm01 |
>> > +--------------+
>>
>> Does your upgrade tooling run a db archive/purge at all? It's possible
>> that the actual services table record was deleted via the os-services
>> REST API for some reason, which would delete the compute_nodes table
>> record, and then a restart of the nova-compute process would recreate
>> the services and compute_nodes table records, but with a new compute
>> node uuid and thus a new resource provider.
>>
>> Maybe query your shadow_services and shadow_compute_nodes tables for
>> "podto1-kvm01" and see if a record existed at one point, was deleted and
>> then archived to the shadow tables.
>>
>> --
>>
>> Thanks,
>>
>> Matt
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>

Re: nova_api resource_providers table issues on ocata [ In reply to ]

ignaziocassano at gmail

Oct 18, 2018, 12:00 PM

Post #15 of 15 (1498 views)

Hello, sorry for late in my answer....
the following is the content of my ocata repo file:

[centos-openstack-ocata]
name=CentOS-7 - OpenStack ocata
baseurl=http://mirror.centos.org/centos/7/cloud/$basearch/openstack-ocata/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Cloud
exclude=sip,PyQt4

Epel is not enable as suggested in documentation-
Regards
Ignazio

Il giorno gio 18 ott 2018 alle ore 10:24 Sylvain Bauza <sbauza@redhat.com>
ha scritto:

>
>
> On Wed, Oct 17, 2018 at 4:46 PM Ignazio Cassano <ignaziocassano@gmail.com>
> wrote:
>
>> Hello, here the select you suggested:
>>
>> MariaDB [nova]> select * from shadow_services;
>> Empty set (0,00 sec)
>>
>> MariaDB [nova]> select * from shadow_compute_nodes;
>> Empty set (0,00 sec)
>>
>> As far as the upgrade tooling is concerned, we are using only yum update
>> on old compute nodes to have same packages installed on the new
>> compute-nodes
>>
>
>
> Well, to be honest, I was looking at some other bug for OSP
> https://bugzilla.redhat.com/show_bug.cgi?id=1636463 which is pretty
> identical so you're not alone :-)
> For some reason, yum update modifies something in the DB that I don't know
> yet. Which exact packages are you using ? RDO ones ?
>
> I marked the downstream bug as NOTABUG since I wasn't able to reproduce it
> and given I also provided a SQL query for fixing it, but maybe we should
> try to see which specific package has a problem...
>
> -Sylvain
>
>
> Procedure used
>> We have an openstack with 3 compute nodes : podto1-kvm01, podto1-kvm02,
>> podto1-kvm03
>> 1) install a new compute node (podto1-kvm04)
>> 2) On controller we discovered the new compute node: su -s /bin/sh -c
>> "nova-manage cell_v2 discover_hosts --verbose" nova
>> 3) Evacuate podto1-kvm01
>> 4) yum update on podto1-kvm01 and reboot it
>> 5) Evacuate podto1-kvm02
>> 6) yum update on podto1-kvm02 and reboot it
>> 7) Evacuate podto1-kvm03
>> 8) yum update podto1-kvm03 and reboot it
>>
>>
>>
>> Il giorno mer 17 ott 2018 alle ore 16:37 Matt Riedemann <
>> mriedemos@gmail.com> ha scritto:
>>
>>> On 10/17/2018 9:13 AM, Ignazio Cassano wrote:
>>> > Hello Sylvain, here the output of some selects:
>>> > MariaDB [nova]> select host,hypervisor_hostname from compute_nodes;
>>> > +--------------+---------------------+
>>> > | host | hypervisor_hostname |
>>> > +--------------+---------------------+
>>> > | podto1-kvm01 | podto1-kvm01 |
>>> > | podto1-kvm02 | podto1-kvm02 |
>>> > | podto1-kvm03 | podto1-kvm03 |
>>> > | podto1-kvm04 | podto1-kvm04 |
>>> > | podto1-kvm05 | podto1-kvm05 |
>>> > +--------------+---------------------+
>>> >
>>> > MariaDB [nova]> select host from compute_nodes where
>>> host='podto1-kvm01'
>>> > and hypervisor_hostname='podto1-kvm01';
>>> > +--------------+
>>> > | host |
>>> > +--------------+
>>> > | podto1-kvm01 |
>>> > +--------------+
>>>
>>> Does your upgrade tooling run a db archive/purge at all? It's possible
>>> that the actual services table record was deleted via the os-services
>>> REST API for some reason, which would delete the compute_nodes table
>>> record, and then a restart of the nova-compute process would recreate
>>> the services and compute_nodes table records, but with a new compute
>>> node uuid and thus a new resource provider.
>>>
>>> Maybe query your shadow_services and shadow_compute_nodes tables for
>>> "podto1-kvm01" and see if a record existed at one point, was deleted and
>>> then archived to the shadow tables.
>>>
>>> --
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators@lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>