Mailing List Archive

Scalable Agent Communication
Hi,
First and foremost sorry for being a bit unclear last night. I am not my
best at 1:55am. Would it be possible to move the meeting a few hours
forwards or backwards?

Update:
I have a POC running where instead of the agents polling the plugin
database a request is sent from the agent to the plugin to retrieve all
of the relevant network details (this is only done when a tap device has
been added or deleted (same for the gw)). I am still addressing the
agent network configuration (was distracted by a bug and issues with
devstack along the way). The current implementation has the agent
polling the hosts network devices to detect if there is a change. I
started with this following Sumits comments about decoupling the agent
and the VIF driver.

I hope to have a detailed design document ready in a few days -
reviewing will give a better idea and it will also help for
documentation in the future.

Problem that I tried to mention last night:
The VIF driver creates the tap device for the attachment. This is done
by taking the attachment UUID and using the first 11 bytes, which is the
name of the tap device. The agent detects the new tap device and
notifies the plugin which in turn sends the network details back to the
plugin.
The problem that I have with this (and it is a bug in the existing
Quantum code) is that if there is more than one attachment with the same
prefix then the networking will be incorrect (same for the network ID's).
For example:
Network A - 30e46c6c-fd53-4c32-86bb-628423c3083f
Attachment X on A - 04ea2bb8-d2fb-4517-97fc-046fe8eb04c5
On the host there will be - gw-30e46c6c-fd and tap04ea2bb8-d2 created.

Network B - *30e46c6c-fd*53-0000-2222-628423c3083f => problems with the
gateway
Attachment Y on B - *04ea2bb8-d2*fb-0000-1111-046fe8eb04c5
*The host will have to create gw-30e46c6c-fd and tap04ea2bb8-d2 created. *

First and foremost this will fail on the VIF driver with the creation.
On the agent side which is requesting the information - it may receive
the wrong network or attachment information. Should the plugin ensure
that the prefixes of the UUID's are unique until linux enables device
names whose length can solve the problem.

This is why I think that it is important that the VIF driver notifies
the agent with the actual ID's.

In addition to this I think that there are a number of additional issues
that we need to address:
1. Inclusion of openstack common - on the IRC last night it was
mentioned to have a blueprint for the config (I feel this only addresses
a small part of the problem). I think that we should do this for the
openstack common project. Thgis will be healthier in the short and long run.
2. Python 2.4. I have yet to understand how to identify which modules
are from later versions. If this is a MUST for the agents then we can
leave the agents as they are and introduce new agents that support RPC.
Is this a viable solution?
3. I am in favour of the drivers notifying the agents. Yes, this has a
bit of coupling ans syncing but it is a healthier solution.

Thanks
Gary
Re: Scalable Agent Communication [ In reply to ]
Hi Gary,

Comments inline, sorry for the slow reply, I'm swamped lately :)

On Wed, May 16, 2012 at 12:37 AM, Gary Kotton <gkotton@redhat.com> wrote:

> **
> Hi,
> First and foremost sorry for being a bit unclear last night. I am not my
> best at 1:55am. Would it be possible to move the meeting a few hours
> forwards or backwards?
>
> Update:
> I have a POC running where instead of the agents polling the plugin
> database a request is sent from the agent to the plugin to retrieve all of
> the relevant network details (this is only done when a tap device has been
> added or deleted (same for the gw)). I am still addressing the agent
> network configuration (was distracted by a bug and issues with devstack
> along the way). The current implementation has the agent polling the hosts
> network devices to detect if there is a change. I started with this
> following Sumits comments about decoupling the agent and the VIF driver.
>
> I hope to have a detailed design document ready in a few days - reviewing
> will give a better idea and it will also help for documentation in the
> future.
>
> Problem that I tried to mention last night:
> The VIF driver creates the tap device for the attachment. This is done by
> taking the attachment UUID and using the first 11 bytes, which is the name
> of the tap device.
>

Btw, this actually isn't the case for the OVS plugin. The OVS vif driver
in Nova passes the entire attachment UUID to OVS by setting an attribute on
the local OVSDB entry for that port (note: the ovsdb is a simple embedded
database that runs as part of OVS on the hypervisor... it is completely
distinct from the primary database used by the OVS plugin).



> The agent detects the new tap device and notifies the plugin which in turn
> sends the network details back to the plugin.
> The problem that I have with this (and it is a bug in the existing Quantum
> code) is that if there is more than one attachment with the same prefix
> then the networking will be incorrect (same for the network ID's).
> For example:
> Network A - 30e46c6c-fd53-4c32-86bb-628423c3083f
> Attachment X on A - 04ea2bb8-d2fb-4517-97fc-046fe8eb04c5
> On the host there will be - gw-30e46c6c-fd and tap04ea2bb8-d2 created.
>
> Network B - *30e46c6c-fd*53-0000-2222-628423c3083f => problems with the
> gateway
> Attachment Y on B - *04ea2bb8-d2*fb-0000-1111-046fe8eb04c5
> *The host will have to create gw-30e46c6c-fd and tap04ea2bb8-d2 created. *
>

By my reckoning, each device has 10 random hexidecimal digits, of which
there are 16^10 > 1 trillion possible values.

Based on your description of how the linux bridge plugin works, those
values must be unique across all VIFs and networks in a deployment. Even
with tens of thousands of VIFs and networks, we're still looking at a 1 in
a million chance of collision (assuming I can do math on a monday morning
:P).

With the OVS plugin, things are a bit better, as VIFs are already
identified by full UUIDs (the device name includes the UUID only for
convenience), meaning that you really have only a (# of network / 1
trillion) plus (# VIFs per hypervisor / 1 trillion) chance of a collision.

That said, at least with OVS, it would be reasonable to generate random
device names, checking for a collision and generating a new name if needed.
This is because with OVS the full UUID can be stored as metadata on the
port and the use of part of the UUID in the device name was more for ease
of debugging than out of necessity.



> First and foremost this will fail on the VIF driver with the creation. On
> the agent side which is requesting the information - it may receive the
> wrong network or attachment information. Should the plugin ensure that the
> prefixes of the UUID's are unique until linux enables device names whose
> length can solve the problem.
>

It could do this for networks, though currently the attachment IDs used for
VIFs are nova interface ids, not generated by Quantum.


>
> This is why I think that it is important that the VIF driver notifies the
> agent with the actual ID's.
>

Yes, this is actually what the OVS plugin already does (you'll see that it
sets the iface-id attribute to the Nova VIF UUID). With the bridge, we'll
require a bit of extra machinery, as you suggest.



>
> In addition to this I think that there are a number of additional issues
> that we need to address:
> 1. Inclusion of openstack common - on the IRC last night it was mentioned
> to have a blueprint for the config (I feel this only addresses a small part
> of the problem). I think that we should do this for the openstack common
> project. Thgis will be healthier in the short and long run.
>

I think the proposal was to use the existing config library that is already
a part of openstack common. Is that what you are suggesting, or something
else?


> 2. Python 2.4. I have yet to understand how to identify which modules are
> from later versions. If this is a MUST for the agents then we can leave the
> agents as they are and introduce new agents that support RPC. Is this a
> viable solution?
>

I'd REALLY like to avoid having the core team work on two separate versions
the agents for 2.4 vs. > 2.4. I think it would slow us down. For 2.4
things that are purely syntactic (e.g., not using "as" for exceptions), I
think its fine for us to enforce this as part of our code review process.
If there are libraries important to new capabilities where the clearly
superior choice is not an option for 2.4, I think we need to raise this as
a community discussion point. Is there a particular module you have in
mind?


> 3. I am in favour of the drivers notifying the agents. Yes, this has a bit
> of coupling ans syncing but it is a healthier solution.
>

No general disagreement here. As I said, any plugin using OVS already does
this. Building a mechanism to do this for the linux bridge plugin would
certainly be reasonable.

Dan


>
> Thanks
> Gary
>
> --
> Mailing list: https://launchpad.net/~netstack
> Post to : netstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~netstack
> More help : https://help.launchpad.net/ListHelp
>
>


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dan Wendlandt
Nicira, Inc: www.nicira.com
twitter: danwendlandt
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Re: Scalable Agent Communication [ In reply to ]
Hi,
Thanks for the comments. Please see my replies inline. I hope that these
will not take up too much CPU on your side.
Thanks
Gary

On 05/21/2012 07:57 PM, Dan Wendlandt wrote:
> Btw, this actually isn't the case for the OVS plugin. The OVS vif
> driver in Nova passes the entire attachment UUID to OVS by setting an
> attribute on the local OVSDB entry for that port (note: the ovsdb is a
> simple embedded database that runs as part of OVS on the hypervisor...
> it is completely distinct from the primary database used by the OVS
> plugin).
Can you please point me out to the code in Nova. I want to make sure
that I have my bases covered.
>
>
> In addition to this I think that there are a number of additional
> issues that we need to address:
> 1. Inclusion of openstack common - on the IRC last night it was
> mentioned to have a blueprint for the config (I feel this only
> addresses a small part of the problem). I think that we should do
> this for the openstack common project. Thgis will be healthier in
> the short and long run.
>
>
> I think the proposal was to use the existing config library that is
> already a part of openstack common. Is that what you are suggesting,
> or something else?
Yes, this is correct. As far as I understand the Open Stack common
library may not support 2.4. It may have to be updated.
>
> 2. Python 2.4. I have yet to understand how to identify which
> modules are from later versions. If this is a MUST for the agents
> then we can leave the agents as they are and introduce new agents
> that support RPC. Is this a viable solution?
>
>
> I'd REALLY like to avoid having the core team work on two separate
> versions the agents for 2.4 vs. > 2.4. I think it would slow us down.
> For 2.4 things that are purely syntactic (e.g., not using "as" for
> exceptions), I think its fine for us to enforce this as part of our
> code review process. If there are libraries important to new
> capabilities where the clearly superior choice is not an option for
> 2.4, I think we need to raise this as a community discussion point.
> Is there a particular module you have in mind?
I am not familiar with Xen. I am trying to understand why the agents
have to run in dom0. From my understand the VIF driver does not run in
dom0. Would it be possible to understand why the driver has to run in dom0?

I am not sure if you have read
https://docs.google.com/document/d/1MbcBA2Os4b98ybdgAw2qe_68R1NG6KMh8zdZKgOlpvg/edit.
I have the linuxbridge up and running. This make use of a hacked library
of the RPC - hopefully in the near future we will be able to import the
common library. Once the linux bridge library is up and running I'll
proceed to make the changes to the OVS.

Thanks
Gary
Re: Scalable Agent Communication [ In reply to ]
On May 21, 2012, at 12:20 PM, Gary Kotton wrote:
>
> Hi,
> Thanks for the comments. Please see my replies inline. I hope that these will not take up too much CPU on your side.
> Thanks
> Gary
>
> On 05/21/2012 07:57 PM, Dan Wendlandt wrote:
>> Btw, this actually isn't the case for the OVS plugin. The OVS vif driver in Nova passes the entire attachment UUID to OVS by setting an attribute on the local OVSDB entry for that port (note: the ovsdb is a simple embedded database that runs as part of OVS on the hypervisor... it is completely distinct from the primary database used by the OVS plugin).
> Can you please point me out to the code in Nova. I want to make sure that I have my bases covered.

Helping Dan out here:

It's in nova/virt/libvirt/vif.py, under the classLibvirtOpenVswitchVirtualPortDriver, function plug.

>>
>>
>> In addition to this I think that there are a number of additional issues that we need to address:
>> 1. Inclusion of openstack common - on the IRC last night it was mentioned to have a blueprint for the config (I feel this only addresses a small part of the problem). I think that we should do this for the openstack common project. Thgis will be healthier in the short and long run.
>>
>> I think the proposal was to use the existing config library that is already a part of openstack common. Is that what you are suggesting, or something else?
> Yes, this is correct. As far as I understand the Open Stack common library may not support 2.4. It may have to be updated.
>>
>> 2. Python 2.4. I have yet to understand how to identify which modules are from later versions. If this is a MUST for the agents then we can leave the agents as they are and introduce new agents that support RPC. Is this a viable solution?
>>
>> I'd REALLY like to avoid having the core team work on two separate versions the agents for 2.4 vs. > 2.4. I think it would slow us down. For 2.4 things that are purely syntactic (e.g., not using "as" for exceptions), I think its fine for us to enforce this as part of our code review process. If there are libraries important to new capabilities where the clearly superior choice is not an option for 2.4, I think we need to raise this as a community discussion point. Is there a particular module you have in mind?
> I am not familiar with Xen. I am trying to understand why the agents have to run in dom0. From my understand the VIF driver does not run in dom0. Would it be possible to understand why the driver has to run in dom0?
>
> I am not sure if you have read https://docs.google.com/document/d/1MbcBA2Os4b98ybdgAw2qe_68R1NG6KMh8zdZKgOlpvg/edit. I have the linuxbridge up and running. This make use of a hacked library of the RPC - hopefully in the near future we will be able to import the common library. Once the linux bridge library is up and running I'll proceed to make the changes to the OVS.
>
> Thanks
> Gary




--
Mailing list: https://launchpad.net/~netstack
Post to : netstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~netstack
More help : https://help.launchpad.net/ListHelp
Re: Scalable Agent Communication [ In reply to ]
On Mon, May 21, 2012 at 10:20 AM, Gary Kotton <gkotton@redhat.com> wrote:

> **
> Hi,
> Thanks for the comments. Please see my replies inline. I hope that these
> will not take up too much CPU on your side.
> Thanks
> Gary
>
>
> On 05/21/2012 07:57 PM, Dan Wendlandt wrote:
>
> Btw, this actually isn't the case for the OVS plugin. The OVS vif
> driver in Nova passes the entire attachment UUID to OVS by setting an
> attribute on the local OVSDB entry for that port (note: the ovsdb is a
> simple embedded database that runs as part of OVS on the hypervisor... it
> is completely distinct from the primary database used by the OVS plugin).
>
> Can you please point me out to the code in Nova. I want to make sure that
> I have my bases covered.
>

There are two version of the OVS + libvirt vif-driver... one for libvirt
0.9.10 and earlier (no built-in support for OVS) and below that, one for
libvir 0.9.11 +.

For the older version of OVS + libvirt you can see here that we grab the
vif UUID, and then specify it as the 'iface-id' attribute in OVS
external-ids for port:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/vif.py#L124

For the newer version, we pass the vif-uuid in directly as a parameter to
libvirt, which in turn communicates it to OVS:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/vif.py#L187


>
>
>
> In addition to this I think that there are a number of additional issues
>> that we need to address:
>> 1. Inclusion of openstack common - on the IRC last night it was mentioned
>> to have a blueprint for the config (I feel this only addresses a small part
>> of the problem). I think that we should do this for the openstack common
>> project. Thgis will be healthier in the short and long run.
>>
>
> I think the proposal was to use the existing config library that is
> already a part of openstack common. Is that what you are suggesting, or
> something else?
>
> Yes, this is correct. As far as I understand the Open Stack common library
> may not support 2.4. It may have to be updated.
>

>
>> 2. Python 2.4. I have yet to understand how to identify which modules
>> are from later versions. If this is a MUST for the agents then we can leave
>> the agents as they are and introduce new agents that support RPC. Is this a
>> viable solution?
>>
>
> I'd REALLY like to avoid having the core team work on two separate
> versions the agents for 2.4 vs. > 2.4. I think it would slow us down. For
> 2.4 things that are purely syntactic (e.g., not using "as" for exceptions),
> I think its fine for us to enforce this as part of our code review process.
> If there are libraries important to new capabilities where the clearly
> superior choice is not an option for 2.4, I think we need to raise this as
> a community discussion point. Is there a particular module you have in
> mind?
>
> I am not familiar with Xen. I am trying to understand why the agents have
> to run in dom0. From my understand the VIF driver does not run in dom0.
> Would it be possible to understand why the driver has to run in dom0?
>

Right now, agents run commands to manipulate bridges locally. It could be
rewritten so that the agent runs in a service VM along with nova-compute
and then communicates OVS changes to dom0 via another channel. We just
have to weigh the complexity of implementing, maintaining and documenting a
separate choice over the cost of keeping agents 2.4 compatible.



>
> I am not sure if you have read
> https://docs.google.com/document/d/1MbcBA2Os4b98ybdgAw2qe_68R1NG6KMh8zdZKgOlpvg/edit.
> I have the linuxbridge up and running. This make use of a hacked library of
> the RPC - hopefully in the near future we will be able to import the common
> library. Once the linux bridge library is up and running I'll proceed to
> make the changes to the OVS.
>

Just looking at it now. I'd really caution against having generic calls
like "device_added", since the set of things that may need to be fetched
when a device appears will likely increase significantly in the future,
even if Folsom-2 (things like security groups, QoS settings, etc.). I'd
rather have specific calls that have a well-defined schema that we think
will be reasonably stable over time. In Nova we've had very bad
experiences with RPCs that just pass large dictionaries of data around,
where the "schema" of that dictionary grows organically over time is and is
not really documented anywhere.

Also, at least for the OVS plugin, I'd like to avoid using the device name
as the key that is sent back. The OVS agent already knows the
attachment-id, so there's no need to pollute the code with the device name.


Thanks,

Dan


>
> Thanks
> Gary
>



--
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dan Wendlandt
Nicira, Inc: www.nicira.com
twitter: danwendlandt
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Re: Scalable Agent Communication [ In reply to ]
Thanks

On 05/21/2012 11:30 PM, Dan Wendlandt wrote:
> Just looking at it now. I'd really caution against having generic
> calls like "device_added", since the set of things that may need to be
> fetched when a device appears will likely increase significantly in
> the future, even if Folsom-2 (things like security groups, QoS
> settings, etc.). I'd rather have specific calls that have a
> well-defined schema that we think will be reasonably stable over time.
> In Nova we've had very bad experiences with RPCs that just pass large
> dictionaries of data around, where the "schema" of that dictionary
> grows organically over time is and is not really documented anywhere.
Good point. I'll address this
>
> Also, at least for the OVS plugin, I'd like to avoid using the device
> name as the key that is sent back. The OVS agent already knows the
> attachment-id, so there's no need to pollute the code with the device
> name.
Great - if the information is there then it will be used :)
>
> Thanks,
>
> Dan
>
>
> Thanks
> Gary
>
>
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Dan Wendlandt
> Nicira, Inc: www.nicira.com <http://www.nicira.com>
> twitter: danwendlandt
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
Re: Scalable Agent Communication [ In reply to ]
Hi,
I have taken a closer look and the fact that the attachment ID exists on
the agent is great. The gateway ID is not there. I spent some time
investigating this and if the following changes were made top the nova
code then the gateway ID would also exist (and it will certainly be a
lot healthier for future developments).

1. /opt/stack/nova/nova/network/quantum/manager.py
def enable_dhcp(self, context, quantum_net_id, network_ref, vif_rec,
project_id):
...
self.q_conn.create_and_attach_port(q_tenant_id,
quantum_net_id, network_ref['uuid'])

The problem here is that this does not work for the linux bridge plugin
(not sure about UCS and RYU). Is there anyway of identifying which
Quantum plugin is running at runtime on Nova? If so then I can add the
fix and it will work for OVS. Please advise.

2. /opt/stack/nova/nova/network/linux_net.py
class LinuxOVSInterfaceDriver(LinuxNetInterfaceDriver):

def plug(self, network, mac_address, gateway=True):
...
_execute('ovs-vsctl',
'--', '--may-exist', 'add-port', bridge, dev,
'--', 'set', 'Interface', dev, 'type=internal',
'--', 'set', 'Interface', dev,
'external-ids:iface-id=%s' % network['uuid'],
'--', 'set', 'Interface', dev,
'external-ids:iface-status=active',
'--', 'set', 'Interface', dev,
'external-ids:attached-mac=%s' % mac_address,
run_as_root=True)

Thanks
Gary

On 05/22/2012 09:23 AM, Gary Kotton wrote:
> Thanks
>
> On 05/21/2012 11:30 PM, Dan Wendlandt wrote:
>> Just looking at it now. I'd really caution against having generic
>> calls like "device_added", since the set of things that may need to
>> be fetched when a device appears will likely increase significantly
>> in the future, even if Folsom-2 (things like security groups, QoS
>> settings, etc.). I'd rather have specific calls that have a
>> well-defined schema that we think will be reasonably stable over
>> time. In Nova we've had very bad experiences with RPCs that just
>> pass large dictionaries of data around, where the "schema" of that
>> dictionary grows organically over time is and is not really
>> documented anywhere.
> Good point. I'll address this
>>
>> Also, at least for the OVS plugin, I'd like to avoid using the device
>> name as the key that is sent back. The OVS agent already knows the
>> attachment-id, so there's no need to pollute the code with the device
>> name.
> Great - if the information is there then it will be used :)
>>
>> Thanks,
>>
>> Dan
>>
>>
>> Thanks
>> Gary
>>
>>
>>
>>
>> --
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Dan Wendlandt
>> Nicira, Inc: www.nicira.com <http://www.nicira.com>
>> twitter: danwendlandt
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>