Mailing List Archive

Quantum agents
Hi,
Sorry for not being able to attend the IRC meeting - it was in the
middle of the night :)

Whilst I was working on the integration of Quantum into oVirt I
encountered a number of issues and challenges regarding the agents. Some
of the issues were discussed in yesterdays meeting namely:
1. High availability
2. Scalability

*High Availability*
I discovered that when the agent was unable to access the database it
would terminate on a exception. This has been addressed partially by
https://review.openstack.org/#/c/6744/ (thanks to Maru Newby for the
comments - updated, tested manullay for linuxbridge and ovs). I saw that
Maru opened a bug regarding agent unit tests (kudos). I have tested the
ovs agent and the linux bridge agent manually.
I have yet to update the RYU agent (Isaku Yamahata suggested that we
speak about this at the meeting). I think that we need to address this
across the board and not only in the agents, but also in the plugins.
The database access should be done via a common class that takes the
connectivity into account. I do not feel that this is part of the bug
fix above it is more of a system wide fix.

*Scalability*
This is a recurring topic. I understand that from the summit the idea of
using AMQP came up. This still requires a "PUSH" from the plugin to the
specific agent. After dealing with the agents above I wonder if we
actually need the agents? Let me try and elaborate: when a VM is
deployed the VIF plugin (I think that that is the terminology) creates
the tap device, sets it to up and in the case of OVS notifies the
integration bridge of the tap device. In the background the agent is
running. When the agent discovers that the tap device exists and it
matches a attachment from the database it "plugs" the device in and
updates the database with the port status.
Why not have the VIF plugin also do the interface plugin? This can and
may solve a large number of scalability issues mentioned. This will be
moving part of the logic from the agent to the VIF plugin.
It would be intersting to know the rationale of the current implementation.

Thanks
Gary
Re: Quantum agents [ In reply to ]
Hi Gary,



Thanks for taking up the work on improving the HA aspects of some of the
agents. (I will respond to your review request once I get a chance to
test the changes, but the diff looks good.)



I agree with your assessment that the factoring of common agent code is
probably a larger activity, and probably can be targeted as a separate
patch.



On your question regarding the need for an agent and if it can be done
in the VIF driver - the VIF driver is not actually an independent thread
of control, it gets executed as a part of the VM creation process. If a
VM's VIF had to be always plugged into the corresponding port only when
a VM was created, then I guess it would be fine to do the plugging from
with the VIF driver. However, we also want to be able to support the use
case that you can bring up the VM and then plug it into a port at a
later time, or unplug and plug into a different network.



In general, there is also a thought that the VIF driver should be really
thin, and to the extent possible Quantum plugin-specific details should
be pulled out it.



Thanks,

~Sumit.



From: netstack-bounces+snaiksat=cisco.com@lists.launchpad.net
[mailto:netstack-bounces+snaiksat=cisco.com@lists.launchpad.net] On
Behalf Of Gary Kotton
Sent: Wednesday, April 25, 2012 12:55 AM
To: netstack@lists.launchpad.net
Subject: [Netstack] Quantum agents



Hi,
Sorry for not being able to attend the IRC meeting - it was in the
middle of the night :)

Whilst I was working on the integration of Quantum into oVirt I
encountered a number of issues and challenges regarding the agents. Some
of the issues were discussed in yesterdays meeting namely:
1. High availability
2. Scalability

High Availability
I discovered that when the agent was unable to access the database it
would terminate on a exception. This has been addressed partially by
https://review.openstack.org/#/c/6744/ (thanks to Maru Newby for the
comments - updated, tested manullay for linuxbridge and ovs). I saw that
Maru opened a bug regarding agent unit tests (kudos). I have tested the
ovs agent and the linux bridge agent manually.
I have yet to update the RYU agent (Isaku Yamahata suggested that we
speak about this at the meeting). I think that we need to address this
across the board and not only in the agents, but also in the plugins.
The database access should be done via a common class that takes the
connectivity into account. I do not feel that this is part of the bug
fix above it is more of a system wide fix.

Scalability
This is a recurring topic. I understand that from the summit the idea of
using AMQP came up. This still requires a "PUSH" from the plugin to the
specific agent. After dealing with the agents above I wonder if we
actually need the agents? Let me try and elaborate: when a VM is
deployed the VIF plugin (I think that that is the terminology) creates
the tap device, sets it to up and in the case of OVS notifies the
integration bridge of the tap device. In the background the agent is
running. When the agent discovers that the tap device exists and it
matches a attachment from the database it "plugs" the device in and
updates the database with the port status.
Why not have the VIF plugin also do the interface plugin? This can and
may solve a large number of scalability issues mentioned. This will be
moving part of the logic from the agent to the VIF plugin.
It would be intersting to know the rationale of the current
implementation.

Thanks
Gary
Re: Quantum agents [ In reply to ]
On 04/26/2012 09:24 AM, Sumit Naiksatam (snaiksat) wrote:
>
> Hi Gary,
>
> Thanks for taking up the work on improving the HA aspects of some of
> the agents. (I will respond to your review request once I get a chance
> to test the changes, but the diff looks good.)
>
Thanks
>
> I agree with your assessment that the factoring of common agent code
> is probably a larger activity, and probably can be targeted as a
> separate patch.
>
> On your question regarding the need for an agent and if it can be done
> in the VIF driver -- the VIF driver is not actually an independent
> thread of control, it gets executed as a part of the VM creation
> process. If a VM's VIF had to be always plugged into the corresponding
> port only when a VM was created, then I guess it would be fine to do
> the plugging from with the VIF driver. However, we also want to be
> able to support the use case that you can bring up the VM and then
> plug it into a port at a later time, or unplug and plug into a
> different network.
>
Thanks for the clarification. This makes sense :)
>
> In general, there is also a thought that the VIF driver should be
> really thin, and to the extent possible Quantum plugin-specific
> details should be pulled out it.
>
> Thanks,
>
> ~Sumit.
>
> *From:*netstack-bounces+snaiksat=cisco.com@lists.launchpad.net
> [mailto:netstack-bounces+snaiksat=cisco.com@lists.launchpad.net] *On
> Behalf Of *Gary Kotton
> *Sent:* Wednesday, April 25, 2012 12:55 AM
> *To:* netstack@lists.launchpad.net
> *Subject:* [Netstack] Quantum agents
>
> Hi,
> Sorry for not being able to attend the IRC meeting - it was in the
> middle of the night :)
>
> Whilst I was working on the integration of Quantum into oVirt I
> encountered a number of issues and challenges regarding the agents.
> Some of the issues were discussed in yesterdays meeting namely:
> 1. High availability
> 2. Scalability
>
> *High Availability*
> I discovered that when the agent was unable to access the database it
> would terminate on a exception. This has been addressed partially by
> https://review.openstack.org/#/c/6744/ (thanks to Maru Newby for the
> comments - updated, tested manullay for linuxbridge and ovs). I saw
> that Maru opened a bug regarding agent unit tests (kudos). I have
> tested the ovs agent and the linux bridge agent manually.
> I have yet to update the RYU agent (Isaku Yamahata suggested that we
> speak about this at the meeting). I think that we need to address this
> across the board and not only in the agents, but also in the plugins.
> The database access should be done via a common class that takes the
> connectivity into account. I do not feel that this is part of the bug
> fix above it is more of a system wide fix.
>
> *Scalability*
> This is a recurring topic. I understand that from the summit the idea
> of using AMQP came up. This still requires a "PUSH" from the plugin to
> the specific agent. After dealing with the agents above I wonder if we
> actually need the agents? Let me try and elaborate: when a VM is
> deployed the VIF plugin (I think that that is the terminology) creates
> the tap device, sets it to up and in the case of OVS notifies the
> integration bridge of the tap device. In the background the agent is
> running. When the agent discovers that the tap device exists and it
> matches a attachment from the database it "plugs" the device in and
> updates the database with the port status.
> Why not have the VIF plugin also do the interface plugin? This can and
> may solve a large number of scalability issues mentioned. This will be
> moving part of the logic from the agent to the VIF plugin.
> It would be intersting to know the rationale of the current
> implementation.
>
> Thanks
> Gary
>
Re: Quantum agents [ In reply to ]
On 04/26/2012 07:09 PM, Gary Kotton wrote:
>>
>> On your question regarding the need for an agent and if it can be
>> done in the VIF driver -- the VIF driver is not actually an
>> independent thread of control, it gets executed as a part of the VM
>> creation process. If a VM's VIF had to be always plugged into the
>> corresponding port only when a VM was created, then I guess it would
>> be fine to do the plugging from with the VIF driver. However, we also
>> want to be able to support the use case that you can bring up the VM
>> and then plug it into a port at a later time, or unplug and plug into
>> a different network.
>>
> Thanks for the clarification. This makes sense :)
>>
>> In general, there is also a thought that the VIF driver should be
>> really thin, and to the extent possible Quantum plugin-specific
>> details should be pulled out it.
>>
I understand what you have explained above. After giving it additional
thought I am still not 100% convinced that the attachment needs to be
done in a separate process, that is, the quantum agent. I think that
this is still at the VM management level. At the moment I am only
familiar with the openvswitch and linuxbridge implementations - I need
to understand the other agents. I think that there are a number of use
cases here:
1. Adding or removing a vNic from the VM
2. Moving a vNic from one network to another
3. Updating the vNic status (down or disabled)
I think that the above are still at the VM management side, and in most
cases may have to be dealt with by the hypervisor to update the VM
attributes (all except #2). I think that if there was a well defined API
for the above operations then each plugin could provide a class to
implement this on the VIF driver side.
The downside of the above is that the VM management will need to know
details about the network - for example the network tag - this
information could be passed as meta data for the vNic. In the current
implementation this is accessed by the quantum agent when retrieving the
details from the database.
The above may save a lot of cycles on the compute node, have less
traffic on the network and provide a solution for the scalability
problems discussed.
Thanks
Gary

>> Thanks,
>>
>> ~Sumit.
>>
>> *From:*netstack-bounces+snaiksat=cisco.com@lists.launchpad.net
>> [mailto:netstack-bounces+snaiksat=cisco.com@lists.launchpad.net] *On
>> Behalf Of *Gary Kotton
>> *Sent:* Wednesday, April 25, 2012 12:55 AM
>> *To:* netstack@lists.launchpad.net
>> *Subject:* [Netstack] Quantum agents
>>
>> Hi,
>> Sorry for not being able to attend the IRC meeting - it was in the
>> middle of the night :)
>>
>> Whilst I was working on the integration of Quantum into oVirt I
>> encountered a number of issues and challenges regarding the agents.
>> Some of the issues were discussed in yesterdays meeting namely:
>> 1. High availability
>> 2. Scalability
>>
>> *High Availability*
>> I discovered that when the agent was unable to access the database it
>> would terminate on a exception. This has been addressed partially by
>> https://review.openstack.org/#/c/6744/ (thanks to Maru Newby for the
>> comments - updated, tested manullay for linuxbridge and ovs). I saw
>> that Maru opened a bug regarding agent unit tests (kudos). I have
>> tested the ovs agent and the linux bridge agent manually.
>> I have yet to update the RYU agent (Isaku Yamahata suggested that we
>> speak about this at the meeting). I think that we need to address
>> this across the board and not only in the agents, but also in the
>> plugins. The database access should be done via a common class that
>> takes the connectivity into account. I do not feel that this is part
>> of the bug fix above it is more of a system wide fix.
>>
>> *Scalability*
>> This is a recurring topic. I understand that from the summit the idea
>> of using AMQP came up. This still requires a "PUSH" from the plugin
>> to the specific agent. After dealing with the agents above I wonder
>> if we actually need the agents? Let me try and elaborate: when a VM
>> is deployed the VIF plugin (I think that that is the terminology)
>> creates the tap device, sets it to up and in the case of OVS notifies
>> the integration bridge of the tap device. In the background the agent
>> is running. When the agent discovers that the tap device exists and
>> it matches a attachment from the database it "plugs" the device in
>> and updates the database with the port status.
>> Why not have the VIF plugin also do the interface plugin? This can
>> and may solve a large number of scalability issues mentioned. This
>> will be moving part of the logic from the agent to the VIF plugin.
>> It would be intersting to know the rationale of the current
>> implementation.
>>
>> Thanks
>> Gary
>>
>
Re: Quantum agents [ In reply to ]
Hi Gary,

On Sat, Apr 28, 2012 at 10:39 PM, Gary Kotton <gkotton@redhat.com> wrote:
>
> In general, there is also a thought that the VIF driver should be really
> thin, and to the extent possible Quantum plugin-specific details should be
> pulled out it.
>
>
I definitely agree with Sumit's comment.


> ****
>
> I understand what you have explained above. After giving it additional
> thought I am still not 100% convinced that the attachment needs to be done
> in a separate process, that is, the quantum agent. I think that this is
> still at the VM management level. At the moment I am only familiar with the
> openvswitch and linuxbridge implementations - I need to understand the
> other agents. I think that there are a number of use cases here:
> 1. Adding or removing a vNic from the VM
> 2. Moving a vNic from one network to another
> 3. Updating the vNic status (down or disabled)
> I think that the above are still at the VM management side, and in most
> cases may have to be dealt with by the hypervisor to update the VM
> attributes (all except #2). I think that if there was a well defined API
> for the above operations then each plugin could provide a class to
> implement this on the VIF driver side.
>

While the basic operations might be achieved by Nova as part of
vif-plugging, I think the likely list of things that plugin agents will do
in the future is much longer than this. It would include implementing
packet filtering, QoS, potentially creating tunnels to other servers,
setting monitoring policies, collecting packet statistics, etc., all of
which could need to be updated on demand based on events other than a VM
spinning up or adding a NIC. One of the main goals of Quantum was to
remove this networking complexity from the Nova code base. I think you'd
quickly converge on having all of the functionality that we have in a
quantum agent in the nova-compute agent process, with the same overhead in
terms of data sharing, etc.


> The downside of the above is that the VM management will need to know
> details about the network - for example the network tag - this information
> could be passed as meta data for the vNic. In the current implementation
> this is accessed by the quantum agent when retrieving the details from the
> database.
> The above may save a lot of cycles on the compute node, have less traffic
> on the network and provide a solution for the scalability problems
> discussed.
>

In my view, doing network function X within the nova-compute "agent"
instead of in a separate quantum "agent" doesn't really change anything in
terms of scale or efficiency. nova-compute grabs all of its data from a
centralized database, just the same way the a quantum plugin agent does.
The thing I've been pushing for in terms of agent performance to to have
agents gets notifications via a queue service, so they can avoid DB
polling. That would mean that ovs agents acts just like the nova-compute
agent, which also gets notifications via a queue service, then looks up
data in the DB.

To me, having a separate nova-compute "agent" and quantum "agent" provides
a clean separation of concerns, as nova doesn't need to be change whenever
someone wants to introduce new networking functionality. This is a clear
win for both the nova + quantum teams, so I think there would have to be a
really strong technical cost to having separate agents to justify merging.
Happy to hear arguments on that front though :)

Thanks,

Dan



> Thanks
> Gary
>
> ** **
>
> Thanks,****
>
> ~Sumit.****
>
> ** **
>
> *From:* netstack-bounces+snaiksat=cisco.com@lists.launchpad.net [
> mailto:netstack-bounces+snaiksat=cisco.com@lists.launchpad.net<netstack-bounces+snaiksat=cisco.com@lists.launchpad.net>]
> *On Behalf Of *Gary Kotton
> *Sent:* Wednesday, April 25, 2012 12:55 AM
> *To:* netstack@lists.launchpad.net
> *Subject:* [Netstack] Quantum agents****
>
> ** **
>
> Hi,
> Sorry for not being able to attend the IRC meeting - it was in the middle
> of the night :)
>
> Whilst I was working on the integration of Quantum into oVirt I
> encountered a number of issues and challenges regarding the agents. Some of
> the issues were discussed in yesterdays meeting namely:
> 1. High availability
> 2. Scalability
>
> *High Availability*
> I discovered that when the agent was unable to access the database it
> would terminate on a exception. This has been addressed partially by
> https://review.openstack.org/#/c/6744/ (thanks to Maru Newby for the
> comments - updated, tested manullay for linuxbridge and ovs). I saw that
> Maru opened a bug regarding agent unit tests (kudos). I have tested the ovs
> agent and the linux bridge agent manually.
> I have yet to update the RYU agent (Isaku Yamahata suggested that we speak
> about this at the meeting). I think that we need to address this across the
> board and not only in the agents, but also in the plugins. The database
> access should be done via a common class that takes the connectivity into
> account. I do not feel that this is part of the bug fix above it is more of
> a system wide fix.
>
> *Scalability*
> This is a recurring topic. I understand that from the summit the idea of
> using AMQP came up. This still requires a "PUSH" from the plugin to the
> specific agent. After dealing with the agents above I wonder if we actually
> need the agents? Let me try and elaborate: when a VM is deployed the VIF
> plugin (I think that that is the terminology) creates the tap device, sets
> it to up and in the case of OVS notifies the integration bridge of the tap
> device. In the background the agent is running. When the agent discovers
> that the tap device exists and it matches a attachment from the database it
> "plugs" the device in and updates the database with the port status.
> Why not have the VIF plugin also do the interface plugin? This can and may
> solve a large number of scalability issues mentioned. This will be moving
> part of the logic from the agent to the VIF plugin.
> It would be intersting to know the rationale of the current implementation.
>
> Thanks
> Gary****
>
>
>
>
> --
> Mailing list: https://launchpad.net/~netstack
> Post to : netstack@lists.launchpad.net
> Unsubscribe : https://launchpad.net/~netstack
> More help : https://help.launchpad.net/ListHelp
>
>


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dan Wendlandt
Nicira, Inc: www.nicira.com
twitter: danwendlandt
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Re: Quantum agents [ In reply to ]
Thanks.

On 05/01/2012 03:52 AM, Dan Wendlandt wrote:
> Hi Gary,
>
> On Sat, Apr 28, 2012 at 10:39 PM, Gary Kotton <gkotton@redhat.com
> <mailto:gkotton@redhat.com>> wrote:
>
>>> In general, there is also a thought that the VIF driver should
>>> be really thin, and to the extent possible Quantum
>>> plugin-specific details should be pulled out it.
>>>
>
> I definitely agree with Sumit's comment.
>
> I understand what you have explained above. After giving it
> additional thought I am still not 100% convinced that the
> attachment needs to be done in a separate process, that is, the
> quantum agent. I think that this is still at the VM management
> level. At the moment I am only familiar with the openvswitch and
> linuxbridge implementations - I need to understand the other
> agents. I think that there are a number of use cases here:
> 1. Adding or removing a vNic from the VM
> 2. Moving a vNic from one network to another
> 3. Updating the vNic status (down or disabled)
> I think that the above are still at the VM management side, and in
> most cases may have to be dealt with by the hypervisor to update
> the VM attributes (all except #2). I think that if there was a
> well defined API for the above operations then each plugin could
> provide a class to implement this on the VIF driver side.
>
>
> While the basic operations might be achieved by Nova as part of
> vif-plugging, I think the likely list of things that plugin agents
> will do in the future is much longer than this. It would include
> implementing packet filtering, QoS, potentially creating tunnels to
> other servers, setting monitoring policies, collecting packet
> statistics, etc., all of which could need to be updated on demand
> based on events other than a VM spinning up or adding a NIC. One of
> the main goals of Quantum was to remove this networking complexity
> from the Nova code base. I think you'd quickly converge on having all
> of the functionality that we have in a quantum agent in the
> nova-compute agent process, with the same overhead in terms of data
> sharing, etc.
>
> The downside of the above is that the VM management will need to
> know details about the network - for example the network tag -
> this information could be passed as meta data for the vNic. In the
> current implementation this is accessed by the quantum agent when
> retrieving the details from the database.
> The above may save a lot of cycles on the compute node, have less
> traffic on the network and provide a solution for the scalability
> problems discussed.
>
>
> In my view, doing network function X within the nova-compute "agent"
> instead of in a separate quantum "agent" doesn't really change
> anything in terms of scale or efficiency. nova-compute grabs all of
> its data from a centralized database, just the same way the a quantum
> plugin agent does. The thing I've been pushing for in terms of agent
> performance to to have agents gets notifications via a queue service,
> so they can avoid DB polling. That would mean that ovs agents acts
> just like the nova-compute agent, which also gets notifications via a
> queue service, then looks up data in the DB.
>
> To me, having a separate nova-compute "agent" and quantum "agent"
> provides a clean separation of concerns, as nova doesn't need to be
> change whenever someone wants to introduce new networking
> functionality. This is a clear win for both the nova + quantum teams,
> so I think there would have to be a really strong technical cost to
> having separate agents to justify merging. Happy to hear arguments on
> that front though :)
>
> Thanks,
>
> Dan
>
> Thanks
> Gary
>
>>> Thanks,
>>>
>>> ~Sumit.
>>>
>>> *From:*netstack-bounces+snaiksat=cisco.com@lists.launchpad.net
>>> <mailto:netstack-bounces+snaiksat=cisco.com@lists.launchpad.net>
>>> [mailto:netstack-bounces+snaiksat=cisco.com@lists.launchpad.net]
>>> *On Behalf Of *Gary Kotton
>>> *Sent:* Wednesday, April 25, 2012 12:55 AM
>>> *To:* netstack@lists.launchpad.net
>>> <mailto:netstack@lists.launchpad.net>
>>> *Subject:* [Netstack] Quantum agents
>>>
>>> Hi,
>>> Sorry for not being able to attend the IRC meeting - it was in
>>> the middle of the night :)
>>>
>>> Whilst I was working on the integration of Quantum into oVirt I
>>> encountered a number of issues and challenges regarding the
>>> agents. Some of the issues were discussed in yesterdays meeting
>>> namely:
>>> 1. High availability
>>> 2. Scalability
>>>
>>> *High Availability*
>>> I discovered that when the agent was unable to access the
>>> database it would terminate on a exception. This has been
>>> addressed partially by https://review.openstack.org/#/c/6744/
>>> (thanks to Maru Newby for the comments - updated, tested
>>> manullay for linuxbridge and ovs). I saw that Maru opened a bug
>>> regarding agent unit tests (kudos). I have tested the ovs agent
>>> and the linux bridge agent manually.
>>> I have yet to update the RYU agent (Isaku Yamahata suggested
>>> that we speak about this at the meeting). I think that we need
>>> to address this across the board and not only in the agents, but
>>> also in the plugins. The database access should be done via a
>>> common class that takes the connectivity into account. I do not
>>> feel that this is part of the bug fix above it is more of a
>>> system wide fix.
>>>
>>> *Scalability*
>>> This is a recurring topic. I understand that from the summit the
>>> idea of using AMQP came up. This still requires a "PUSH" from
>>> the plugin to the specific agent. After dealing with the agents
>>> above I wonder if we actually need the agents? Let me try and
>>> elaborate: when a VM is deployed the VIF plugin (I think that
>>> that is the terminology) creates the tap device, sets it to up
>>> and in the case of OVS notifies the integration bridge of the
>>> tap device. In the background the agent is running. When the
>>> agent discovers that the tap device exists and it matches a
>>> attachment from the database it "plugs" the device in and
>>> updates the database with the port status.
>>> Why not have the VIF plugin also do the interface plugin? This
>>> can and may solve a large number of scalability issues
>>> mentioned. This will be moving part of the logic from the agent
>>> to the VIF plugin.
>>> It would be intersting to know the rationale of the current
>>> implementation.
>>>
>>> Thanks
>>> Gary
>>>
>>
>
>
> --
> Mailing list: https://launchpad.net/~netstack
> <https://launchpad.net/%7Enetstack>
> Post to : netstack@lists.launchpad.net
> <mailto:netstack@lists.launchpad.net>
> Unsubscribe : https://launchpad.net/~netstack
> <https://launchpad.net/%7Enetstack>
> More help : https://help.launchpad.net/ListHelp
>
>
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Dan Wendlandt
> Nicira, Inc: www.nicira.com <http://www.nicira.com>
> twitter: danwendlandt
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>