Mailing List Archive

missing ovs flows and extra interfaces in pike
Hi,

[. I have no idea how much of the following information is necessary ]

We're running Openstack Pike, deployed with Openstack-Ansible 16.0.5.
The system is running on a bunch of compute-nodes and three combined
network/management-nodes, we're using OVS, DVR and VXLAN for networking.

The DVRs are set up with snat disabled, that's handled by different
systems.

We have recently noticed that we don't have north-south-connectivity in
a couple of qdhcp-netns and after a weeks worth of debugging it boils
down to missing OVS-flows on br-tun that should be directing the
northbound traffic at the node with the live snat-netns.

We also noticed that while every node has the ports for the
qdhcp-netns that belong on the node we also have a couple of taps and
flows for ports that are on other nodes.

To make that a bit clearer:
If you have network A with dhcp-services F, G, H we found that the ip
netns containing the dnsmasq for F, G, H are on nodes 1, 2, 3
respectively, but node 1 would also have the tap-interface and flows for
G on br-int dangeling freely without any netns.

Is there a simple explanation for this and maybe even a fix?

What we found so far seems to suggest we should either restart the
management-nodes or the neutron-agent-containers or at least stop, clean
and start ovs and neutron-openvswitch-agent inside the containers.

Is it possible to somehow redeploy or validate the flows from neutron to
make sure that everything is consistent apart from restarts?


--

Cheers,
Hartwig Hauschild

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: missing ovs flows and extra interfaces in pike [ In reply to ]
Hi,

we see something similar. We are running ovs, vxlan - but legacy routers
directly on hvs without any network nodes.

Currently we didnt find a way to fix or reproduce the issue. We just
wrote a small script which calcuates which flow should be on with hv and
if something is missing -> send an alert.

The operator will then restart the suitable ovs-agent and everything is
working again.

We also found out, that the problem is gone as soon as we disable l2pop,
but this is not possible if we (as you already did) switch to dvr.

So at the moment we plan to disable l2pop and move our routers back to
some network nodes.

I would be glad if someone is able to reproduce or even better - fix the
issue.

Fabian

Am 19.10.18 um 15:32 schrieb Hartwig Hauschild:
> Hi,
>
> [. I have no idea how much of the following information is necessary ]
>
> We're running Openstack Pike, deployed with Openstack-Ansible 16.0.5.
> The system is running on a bunch of compute-nodes and three combined
> network/management-nodes, we're using OVS, DVR and VXLAN for networking.
>
> The DVRs are set up with snat disabled, that's handled by different
> systems.
>
> We have recently noticed that we don't have north-south-connectivity in
> a couple of qdhcp-netns and after a weeks worth of debugging it boils
> down to missing OVS-flows on br-tun that should be directing the
> northbound traffic at the node with the live snat-netns.
>
> We also noticed that while every node has the ports for the
> qdhcp-netns that belong on the node we also have a couple of taps and
> flows for ports that are on other nodes.
>
> To make that a bit clearer:
> If you have network A with dhcp-services F, G, H we found that the ip
> netns containing the dnsmasq for F, G, H are on nodes 1, 2, 3
> respectively, but node 1 would also have the tap-interface and flows for
> G on br-int dangeling freely without any netns.
>
> Is there a simple explanation for this and maybe even a fix?
>
> What we found so far seems to suggest we should either restart the
> management-nodes or the neutron-agent-containers or at least stop, clean
> and start ovs and neutron-openvswitch-agent inside the containers.
>
> Is it possible to somehow redeploy or validate the flows from neutron to
> make sure that everything is consistent apart from restarts?
>
>

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: missing ovs flows and extra interfaces in pike [ In reply to ]
Hi,

How are you calculating which flows you're missing?

We thought we're missing the flow for "if you're looking for this MAC go
this way", but it turned out that what's actually missing is a bunch of
interfaces on the multicast-flow for the vlan that we're investigating.

Is that what you're seeing as well?

Cheers,

Hardy

Am 19.10.2018 schrieb Fabian Zimmermann:
> Hi,
>
> we see something similar. We are running ovs, vxlan - but legacy routers
> directly on hvs without any network nodes.
>
> Currently we didnt find a way to fix or reproduce the issue. We just wrote a
> small script which calcuates which flow should be on with hv and if
> something is missing -> send an alert.
>
> The operator will then restart the suitable ovs-agent and everything is
> working again.
>
> We also found out, that the problem is gone as soon as we disable l2pop, but
> this is not possible if we (as you already did) switch to dvr.
>
> So at the moment we plan to disable l2pop and move our routers back to some
> network nodes.
>
> I would be glad if someone is able to reproduce or even better - fix the
> issue.
>
> Fabian
>
> Am 19.10.18 um 15:32 schrieb Hartwig Hauschild:
> > Hi,
> >
> > [. I have no idea how much of the following information is necessary ]
> >
> > We're running Openstack Pike, deployed with Openstack-Ansible 16.0.5.
> > The system is running on a bunch of compute-nodes and three combined
> > network/management-nodes, we're using OVS, DVR and VXLAN for networking.
> >
> > The DVRs are set up with snat disabled, that's handled by different
> > systems.
> >
> > We have recently noticed that we don't have north-south-connectivity in
> > a couple of qdhcp-netns and after a weeks worth of debugging it boils
> > down to missing OVS-flows on br-tun that should be directing the
> > northbound traffic at the node with the live snat-netns.
> >
> > We also noticed that while every node has the ports for the
> > qdhcp-netns that belong on the node we also have a couple of taps and
> > flows for ports that are on other nodes.
> >
> > To make that a bit clearer:
> > If you have network A with dhcp-services F, G, H we found that the ip
> > netns containing the dnsmasq for F, G, H are on nodes 1, 2, 3
> > respectively, but node 1 would also have the tap-interface and flows for
> > G on br-int dangeling freely without any netns.
> >
> > Is there a simple explanation for this and maybe even a fix?
> >
> > What we found so far seems to suggest we should either restart the
> > management-nodes or the neutron-agent-containers or at least stop, clean
> > and start ovs and neutron-openvswitch-agent inside the containers.
> >
> > Is it possible to somehow redeploy or validate the flows from neutron to
> > make sure that everything is consistent apart from restarts?
> >
> >

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Re: missing ovs flows and extra interfaces in pike [ In reply to ]
Hi,

you may contact me directly, this would speed up my responsetime ;)

Am 26.10.18 um 16:15 schrieb Hartwig Hauschild:
> We thought we're missing the flow for "if you're looking for this MAC go
> this way", but it turned out that what's actually missing is a bunch of
> interfaces on the multicast-flow for the vlan that we're investigating.
>
> Is that what you're seeing as well?

exactly.

I wrote a small script which uses the database to calculate which
network is located on which HV and checks if there are suitable
vxlan-tunnel build.

Its a quick and dirty hack, but works for us (so far).

I personally dont like publishing code in this quality, but I dont think
I will improve this in the near future - so here maybe it helps you a bit.

https://github.com/noris-network/check_vxlan_mesh

If you have any questions, dont hesitate to ask.

Fabian

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack