Mailing List Archive

Cluster Communications
Dear colleagues,

my previous posting obviously was not clear enough :-)

What would you think of evaluating ACE and TAO for the *Non-Heartbeat*
cluster communication layer?

See http://www.cs.wustl.edu/~schmidt/ACE-overview.html
and http://www.cs.wustl.edu/~schmidt/TAO-overview.html
and http://www.theaceorb.com/

Any comments are welcome ...
Volker

--
Volker Wiegand Phone: +49 (0) 6196 / 50951-24
SuSE Rhein/Main AG Fax: +49 (0) 6196 / 40 96 07
Mergenthalerallee 45-47 Mobile: +49 (0) 179 / 292 66 76
D-65760 Eschborn E-Mail: Volker.Wiegand@suse.de
++ Only users lose drugs. Or was it the other way round? ++
Cluster Communications [ In reply to ]
On Wed, 3 Nov 1999, Volker Wiegand wrote:

> my previous posting obviously was not clear enough :-)
>
> What would you think of evaluating ACE and TAO for the *Non-Heartbeat*
> cluster communication layer?

Can u expand a little more this communication layer? :-)

--
Tiago Pascoal (l41484@alfa.ist.utl.pt) FAX : +351-1-7273394
Politicamente incorrecto, e membro (nao muito) proeminente da geracao rasca.
Recem empossado (engajado) cidadao da republica das bananas.

Be consistent.
--Larry Wall in the perl man page
Cluster Communications [ In reply to ]
Volker Wiegand wrote:
>
> Dear colleagues,
>
> my previous posting obviously was not clear enough :-)
>
> What would you think of evaluating ACE and TAO for the *Non-Heartbeat*
> cluster communication layer?
>
> See http://www.cs.wustl.edu/~schmidt/ACE-overview.html
> and http://www.cs.wustl.edu/~schmidt/TAO-overview.html
> and http://www.theaceorb.com/

I'm not a CORBA expert by any stretch of the imagination, but I'm going to step
in and comment anyway...

I would suggest that the needs we have vis a vis data communications for the
higher functions (cluster management, etc.) have to include these properties as
a very high priorities:

Design for robustness during communications and node failures
(no hangs, complete control over timeouts, etc)

Compatible data formats on the wire from one version
to the next, including dealing with missing info
from old versions so that rolling upgrades can take
place

Strong authentication (i.e., security) on every access

I would claim that these properties need to be first and foremost in our minds.
The probability of a generic object access paradigm never having unknown and
harmful side-effects in these areas seems small.

What CORBA does is create a standard API to a set of objects. It is my
understanding that it does not specifically address any of these issues. It is
my understanding that "standard" CORBA security is weak (you have to be
authenticated to get an object handle, but you can "guess" them because they are
densely populated(?)).

In this case I would claim that we need to design from the wire up, and then if
for some reason we wanted to put a CORBA API on top of it, then we can do so.
However, this will add complexity (generally decreasing reliability) without
adding function.

Although I love the idea of CORBA for many kinds of applications (like how GNOME
applications are using it), I think some of our most important needs are rather
specialized when viewed through the lens of general-purpose computing.

I would also think that if you DID use a CORBA ORB, you might want to consider
using ORBit (which is already installed on many Linux machines). I haven't
looked closely enough at the QOS features of ACE and TAO to make a judgement on
them, but I did notice that the number one near-term emphasis on TAO was to
"continuing to improve its quality".

Also, it appears that ACE requires C++. In some private email I sent Volker, I
said something which might be construed as saying that I was considering C++ for
future work. I haven't found a reason for that at this time.

One of the criticisms that I've heard about heartbeat is that it is "too
complex". In my estimation, it is probably 2 orders of magnitude less complex
than a CORBA implementation. It seems like a lot to bite off to use as
infrastructure.

The biggest problems I've run into in the implementation have ALL come from
other infrastructure that I rely on (pppd, ifconfig, etc). Significant
dependencies on things you have no control of is a real pain.

-- Alan Robertson
alanr@bell-labs.com
Cluster Communications [ In reply to ]
On Wed, 3 Nov 1999, Alan Robertson wrote:

> I'm not a CORBA expert by any stretch of the imagination, but I'm going to step
> in and comment anyway...
>
I'm no expert either, but it might be worthwhile to become one. I'll play
the ACE+TAO advocate here, as I'm myself only evaluating it. Thanks Alan!

Just to make sure my intentions are clear: I am just looking into some
design issues for Linux-HA that I would like to see discussed. I have not
made up my mind how to continue myself, and what I want to propose. What
I do know, however, is that we need to setup a structure that will allow
us to add complexity and increase speed. ACE+TAO might be one way to save
time and effort for standard software problems that we would not have to
program ourselves. I do not say, "let's use it", but I want to find out
if we would benefit from using it and what impact it would have.

> I would suggest that the needs we have vis a vis data communications for the
> higher functions (cluster management, etc.) have to include these properties as
> a very high priorities:
>
> Design for robustness during communications and node failures
> (no hangs, complete control over timeouts, etc)
>
Correct. This seems to be addressed in the CORBA FT draft, see
http://www.cs.wustl.edu/~schmidt/CORBA-docs/ft-draft.pdf.gz
but it might be reasonable to check if we want to do it ourselves.

> Compatible data formats on the wire from one version
> to the next, including dealing with missing info
> from old versions so that rolling upgrades can take
> place
>
This is one of the main reasons I am looking into a standard like CORBA.
And not define our own standards.

> Strong authentication (i.e., security) on every access
>
SSL support is currently being added to TAO. Again, I vote against
defining, implementing and having to verify our own standards.

> I would claim that these properties need to be first and foremost in our minds.
> The probability of a generic object access paradigm never having unknown and
> harmful side-effects in these areas seems small.
>
Well, ACE is complex, but not generic. It's well structured and proven.

> What CORBA does is create a standard API to a set of objects. It is my
> understanding that it does not specifically address any of these issues. It is
> my understanding that "standard" CORBA security is weak (you have to be
> authenticated to get an object handle, but you can "guess" them because they are
> densely populated(?)).
>
Standard CORBA defines a Security Service. As long as you don't activate
this service, security is indeed low.

Using CORBA, we could concentrate on the problems instead of the tools we
are building and using. Assuming we continue our current approach of
reinforcing the current heartbeat protocol with encryption, how are we
going to verify and certify its security? And what does it mean in terms
of latency?

> In this case I would claim that we need to design from the wire up, and then if
> for some reason we wanted to put a CORBA API on top of it, then we can do so.
> However, this will add complexity (generally decreasing reliability) without
> adding function.
>
Oops, I'm thinking just the other way round. And I'm considering ACE+TAO
to *reduce* the complexity of our work, because we don't have to reinvent
the wheel for every single bit of software we are working on. What is the
sense in programming a point-to-point communication device, if there is
one handy which is already scientifically proven? And freely available.

> Although I love the idea of CORBA for many kinds of applications (like how GNOME
> applications are using it), I think some of our most important needs are rather
> specialized when viewed through the lens of general-purpose computing.
>
My intention of using CORBA is not to generalize, but to use proven design
patterns (one of the central ACE concepts) for the specialized tasks we
have -- in the sense of a framework or tool chest we can pick from.

> I would also think that if you DID use a CORBA ORB, you might want to consider
> using ORBit (which is already installed on many Linux machines). I haven't
> looked closely enough at the QOS features of ACE and TAO to make a judgement on
> them, but I did notice that the number one near-term emphasis on TAO was to
> "continuing to improve its quality".
>
Oops, ORBit (or MICO, for that purpose) provide but a tiny fraction of the
services ACE gives us. Please remember, I do not consider ACE+TAO because
I want to provide CORBA on top of our own work, but the exact and ultimate
opposite is my intention: using ACE+TAO *because* of the existing features
they give us. I would *not* support CORBA just for the fun of it ...

I presume that "continuing to improve its quality" is something you will
find in many Open Source project goals.

> Also, it appears that ACE requires C++. In some private email I sent Volker, I
> said something which might be construed as saying that I was considering C++ for
> future work. I haven't found a reason for that at this time.
>
Yes, I know and my mail was of course just a playing with words :-) But
inded ACE is based upon C++ and makes good use of Templates and other
concepts many of us would consider ... advanced.

Serious question: would switching to C++ prevent anyone in Linux-HA to
participate actively in the future of the project? Knowing the answer to
this question not only from Alan would be extremely important for me.

> One of the criticisms that I've heard about heartbeat is that it is "too
> complex". In my estimation, it is probably 2 orders of magnitude less complex
> than a CORBA implementation. It seems like a lot to bite off to use as
> infrastructure.
>
Hmmm, I'm not going to reprogram ACE, just use it. My main point is that
I think it's easier to worry about the "what" instead of "what AND how".
And I anticipate that the complexity of the long term general purpose HA
solution at least I have in mind (if we want to compete with the DHBrown
listed vendors) will be more than we have now.

Is "too complex" really aimed towards the programming, or could it be in
terms of installation, setup, and documentation. Then it would IMHO be
unrelated to the question I would like to discuss here.

> The biggest problems I've run into in the implementation have ALL come from
> other infrastructure that I rely on (pppd, ifconfig, etc). Significant
> dependencies on things you have no control of is a real pain.
>
That depends on what you mean with "no control". ACE+TAO are proactively
supported and well proven in the field. I admit that it involves a lot
more learning on our side.

> -- Alan Robertson
> alanr@bell-labs.com
>
Volker

--
Volker Wiegand Phone: +49 (0) 6196 / 50951-24
SuSE Rhein/Main AG Fax: +49 (0) 6196 / 40 96 07
Mergenthalerallee 45-47 Mobile: +49 (0) 179 / 292 66 76
D-65760 Eschborn E-Mail: Volker.Wiegand@suse.de
++ Only users lose drugs. Or was it the other way round? ++
Cluster Communications [ In reply to ]
Volker Wiegand wrote:
>
> On Wed, 3 Nov 1999, Alan Robertson wrote:
>
> > I'm not a CORBA expert by any stretch of the imagination, but I'm going to step
> > in and comment anyway...
> >
> I'm no expert either, but it might be worthwhile to become one. I'll play
> the ACE+TAO advocate here, as I'm myself only evaluating it. Thanks Alan!
>
> Just to make sure my intentions are clear: I am just looking into some
> design issues for Linux-HA that I would like to see discussed. I have not
> made up my mind how to continue myself, and what I want to propose. What
> I do know, however, is that we need to setup a structure that will allow
> us to add complexity and increase speed. ACE+TAO might be one way to save
> time and effort for standard software problems that we would not have to
> program ourselves. I do not say, "let's use it", but I want to find out
> if we would benefit from using it and what impact it would have.
>
> > I would suggest that the needs we have vis a vis data communications for the
> > higher functions (cluster management, etc.) have to include these properties as
> > a very high priorities:
> >
> > Design for robustness during communications and node failures
> > (no hangs, complete control over timeouts, etc)
> >
> Correct. This seems to be addressed in the CORBA FT draft, see
> http://www.cs.wustl.edu/~schmidt/CORBA-docs/ft-draft.pdf.gz
> but it might be reasonable to check if we want to do it ourselves.

If this is a draft, it's reasonably likely to change before it becomes
a part of the standard. This will likely impact our design, and have us
spending time changing for the sake of the standard instead of working on our
main goal.

> > Compatible data formats on the wire from one version
> > to the next, including dealing with missing info
> > from old versions so that rolling upgrades can take
> > place
> >
> This is one of the main reasons I am looking into a standard like CORBA.
> And not define our own standards.

Perhaps CORBA does something that few other systems do, or perhaps I didn't make
myself clear. As systems evolve, so do the interfaces. Can an old version of
the client access a new version of the object, and vice versa?

> > Strong authentication (i.e., security) on every access
> >
> SSL support is currently being added to TAO. Again, I vote against
> defining, implementing and having to verify our own standards.

SSL cannot be used in the US in free software. There are patent licenses
to be dealt with. It's overkill when what you need is authentication. Not even
in the US is authentication restricted :-)

> > I would claim that these properties need to be first and foremost in our minds.
> > The probability of a generic object access paradigm never having unknown and
> > harmful side-effects in these areas seems small.
> >
> Well, ACE is complex, but not generic. It's well structured and proven.

CORBA is not specific to our task (creating a cluster manager in a very reliable
way). Therefore as far as we are concerned, it may very well be structured and
well-proven, but generic.

> > What CORBA does is create a standard API to a set of objects. It is my
> > understanding that it does not specifically address any of these issues. It is
> > my understanding that "standard" CORBA security is weak (you have to be
> > authenticated to get an object handle, but you can "guess" them because they are
> > densely populated(?)).
> >
> Standard CORBA defines a Security Service. As long as you don't activate
> this service, security is indeed low.

It is my low quality understanding that even WITH it activated, that security
won't stand up to the internet. There's a large project that's used CORBA here
in Lucent. They are giving a talk tomorrow morning. Guess I'd better go by and
give a listen...

> Using CORBA, we could concentrate on the problems instead of the tools we
> are building and using. Assuming we continue our current approach of
> reinforcing the current heartbeat protocol with encryption, how are we
> going to verify and certify its security? And what does it mean in terms
> of latency?

There is only one thing I think we need *encryption* for, and that's
distributing
keys. Otherwise we only need authentication. I know I live in the US, but I
can't help that :-)

> > In this case I would claim that we need to design from the wire up, and then if
> > for some reason we wanted to put a CORBA API on top of it, then we can do so.
> > However, this will add complexity (generally decreasing reliability) without
> > adding function.
> >
> Oops, I'm thinking just the other way round. And I'm considering ACE+TAO
> to *reduce* the complexity of our work, because we don't have to reinvent
> the wheel for every single bit of software we are working on. What is the
> sense in programming a point-to-point communication device, if there is
> one handy which is already scientifically proven? And freely available.

The same argument was made to me for putting PPP into heartbeat. It sounded
like a
REALLY good idea, it came recommended to me by smart people, so I put the code
in. It was dead wrong. Using PPP significantly increased the complexity, and
decreased the reliability (if you choose to use it). You've seen the code. And
PPP has been around a LONG time, and is very well proven.

> > Although I love the idea of CORBA for many kinds of applications (like how GNOME
> > applications are using it), I think some of our most important needs are rather
> > specialized when viewed through the lens of general-purpose computing.
> >
> My intention of using CORBA is not to generalize, but to use proven design
> patterns (one of the central ACE concepts) for the specialized tasks we
> have -- in the sense of a framework or tool chest we can pick from.
>
> > I would also think that if you DID use a CORBA ORB, you might want to consider
> > using ORBit (which is already installed on many Linux machines). I haven't
> > looked closely enough at the QOS features of ACE and TAO to make a judgement on
> > them, but I did notice that the number one near-term emphasis on TAO was to
> > "continuing to improve its quality".
> >
> Oops, ORBit (or MICO, for that purpose) provide but a tiny fraction of the
> services ACE gives us. Please remember, I do not consider ACE+TAO because
> I want to provide CORBA on top of our own work, but the exact and ultimate
> opposite is my intention: using ACE+TAO *because* of the existing features
> they give us. I would *not* support CORBA just for the fun of it ...
>
> I presume that "continuing to improve its quality" is something you will
> find in many Open Source project goals.

But *not* as it's first goal. For example, I wouldn't switch to a version of
libc which had that as it's first goal. I don't want to spend my time debugging
it. The larger it is, and the more services it provides, the more likely you'll
wind up doing that.

> > Also, it appears that ACE requires C++. In some private email I sent Volker, I
> > said something which might be construed as saying that I was considering C++ for
> > future work. I haven't found a reason for that at this time.
> >
> Yes, I know and my mail was of course just a playing with words :-) But
> inded ACE is based upon C++ and makes good use of Templates and other
> concepts many of us would consider ... advanced.

C++ is OK. Templates are a good concept, added to the language badly. There
were so *many* better ideas for doing the same kind of thing before C++ was
designed. Why he chose such an inferior language design is beyond me. This is
why they are so seldom used. They often bloat code size a great deal. Most
projects that I know of that have used them wind up with huge code size. It's
inherent in the C++ (templates) language design. I don't know of any OOP
language that has copied this language feature, but made their own. Lots of
other things were borrowed, but not templates.

> Serious question: would switching to C++ prevent anyone in Linux-HA to
> participate actively in the future of the project? Knowing the answer to
> this question not only from Alan would be extremely important for me.

I've programmed many tens of thousands of lines of C++. It's OK. It is one of
the easiest languages in the world to use badly. (Of course, *I* would never do
such a thing :-)) Nevertheless, the average C++ program is probably less
reliable than the average C program. Continuously running, high-availabilty
systems cannot afford any memory leaks, or any use-after-free problems. These
problems are legion in C++ programs. I have made very careful and selective use
of dynamic memory in heartbeat. A C++ program which makes careful and selective
use of dynamic memory isn't really a C++ program ;-)

Thinking some more... Maybe Java would be a good language to program the cluster
manager in. Garbage collection is so much better than new/delete. You also
rarely run the code, so you can force garbage collection after each cluster
transition. This might be a good idea if you want an OO paradigm... I assume
this could be easily accomplished if we make the cluster manager a CORBA
object. Of course, this could be considered "pretty advanced" ;-)

> > One of the criticisms that I've heard about heartbeat is that it is "too
> > complex". In my estimation, it is probably 2 orders of magnitude less complex
> > than a CORBA implementation. It seems like a lot to bite off to use as
> > infrastructure.
> >
> Hmmm, I'm not going to reprogram ACE, just use it.

Famous last words... This seems unlikely in practice.

> My main point is that
> I think it's easier to worry about the "what" instead of "what AND how".
> And I anticipate that the complexity of the long term general purpose HA
> solution at least I have in mind (if we want to compete with the DHBrown
> listed vendors) will be more than we have now.

And it will have to be. But it's probably only ~2K lines more code to get to
the point where we have the infrastructure we need to put together the framework
into which the cluster manager can be inserted. At that point, you're
programming at the same level.

Having a the cluster manager / infrastructure be a CORBA object would probably
be a great idea. Having CORBA interfaces for applications to hook into would
probably also be a great idea.

> Is "too complex" really aimed towards the programming, or could it be in
> terms of installation, setup, and documentation. Then it would IMHO be
> unrelated to the question I would like to discuss here.

If your code is an order of magnitude simpler, then by all means use a proven
infrastructure that's two orders of magnitude more complex. If your code is
only somewhat simpler and the infrastructure is working to improve quality, then
you should avoid it like the plague (given our goals). My guess is it's much
closer to the latter than the former.

Installation complexity *is* complexity. Our code should be very simple to
install
and very simple to use. If no one can install it, then no one can use it. A
great project that is hindered in this way isn't really a great project.

I want to be competitive at the high end for clustering. This is an interesting
and inspiring goal that the press loves to talk about. But at least as much, I
want to extend the low end to double the number of HA installations in the
world. Those won't happen unless it is a snap to install and configure (at
least for the simple cases).

> > The biggest problems I've run into in the implementation have ALL come from
> > other infrastructure that I rely on (pppd, ifconfig, etc). Significant
> > dependencies on things you have no control of is a real pain.
> >
> That depends on what you mean with "no control". ACE+TAO are proactively
> supported and well proven in the field. I admit that it involves a lot
> more learning on our side.

What I mean by no control is that unless we package their source with ours, it
can be difficult to ensure that the patches we need have been properly applied
in the field. Also, there is version compatibility testing, etc. These are not
design issues directly, but things like version compatibility often become
design issues over time. In either case, they are often as difficult and
painful to deal with as any design issue. Pain is pain, and difficulty is
difficulty. Users' successes are our successes. Their failures are our
failures. Since commercial profit isn't our main goal, this is a great goal
instead.

In summary, I think there are lots of risks here, and some potential for good
things. Since I haven't studied these packages in detail, I have little hard
data on what the probability of good/bad things is, nor on how great the result
might be. But, as I've no doubt made clear, I'm skeptical.

On the other hand, I have a really clear idea about how to proceed without them,
and a reasonably clear idea of what size development/debugging task we have
ahead of us.

I'd probably lean more towards to a CORBA interface for the cluster manager with
a C-based underlying infrastructure designed specifically for the task.

-- Alan Robertson
alanr@bell-labs.com