Mailing List Archive: Future of /proc/ha

Future of /proc/ha

wiegand at suse

Oct 28, 1999, 1:49 AM

Post #1 of 8 (3107 views)

Dear colleagues,

As I was continuing my work on /proc/ha during the last days, I ran across
a couple of problems and I would like to get some input from others before
I decide how to proceed.

(1) The /proc interface is a moving target. In order to support kernels
down to 2.0.1 (which seems to be reasonable for me), I am already running
three versions. And supporting other open operating systems is completely
out of reach.

(2) Especially when it comes to handling resources, even on an average
sized cluster the amount of data in one "file" can exceed 4 KB, which is
the allocation size in /proc. This means I cannot read atomicly any more.

(3) The work /proc/ha is doing is to provide information for user land
programs which comes from ... user land programs. Debugging can only be
accomplished by printk() outputs. And a post-mortem after a node crashed
is a bit ... difficult to get :-)

(4) I have no chance to make the information persistent even if I wanted
to do so eventually.

All this leads me to the attitude that we should abandon /proc/ha. Alas,
what is then the way to go? I have recently been toying around with the
Berkeley DB code (db-2.7.7.tar.gz) from Sleepycat and it looks very very
promising. They also have a license which would allow to incorporate the
code in our work without charging.

Does anyone want to convince me that I should continue /proc/ha? And does
anyone want to convince me that we should not further investigate B-DB as
the underlying storage module for cluster/node/resource state including
transactional support?

Volker

--
Volker Wiegand Phone: +49 (0) 6196 / 50951-24
SuSE Rhein/Main AG Fax: +49 (0) 6196 / 40 96 07
Mergenthalerallee 45-47 Mobile: +49 (0) 179 / 292 66 76
D-65760 Eschborn E-Mail: Volker.Wiegand@suse.de
++ Only users lose drugs. Or was it the other way round? ++

Future of /proc/ha [ In reply to ]

Oct 28, 1999, 5:05 AM

Post #2 of 8 (3007 views)

Hi,

On Thu, 28 Oct 1999 10:49:05 +0200 (MEST), Volker Wiegand
<wiegand@suse.de> said:

> All this leads me to the attitude that we should abandon /proc/ha. Alas,
> what is then the way to go? I have recently been toying around with the
> Berkeley DB code (db-2.7.7.tar.gz) from Sleepycat and it looks very very
> promising. They also have a license which would allow to incorporate the
> code in our work without charging.

The Berkeley code is also in glibc, as libdb2. The biggest obstacle I
can see is that the advertising clause

* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* This product includes software developed by the University of
* California, Berkeley and its contributors.

is still in the copyright, and that doesn't mix with GPL. (There is a
special exemption on libdb2 for use in glibc:

As a special exception, when Berkeley DB is distributed along with
the GNU C Library, in any program which uses the GNU C Library in
accord with that library's distribution terms, it is also
permitted for Berkeley DB to be loaded dynamically by the GNU C
Library to implement standard ISO/IEC 9945 and Unix interface
functionality.

Don't ask me what this implies for the GPL!

--Stephen

Future of /proc/ha [ In reply to ]

wiegand at suse

Oct 28, 1999, 6:17 AM

Post #3 of 8 (3019 views)

On Thu, 28 Oct 1999, Stephen C. Tweedie wrote:

> The Berkeley code is also in glibc, as libdb2. The biggest obstacle I
> can see is that the advertising clause
>
> * 3. All advertising materials mentioning features or use of this software
> * must display the following acknowledgement:
> * This product includes software developed by the University of
> * California, Berkeley and its contributors.
>
> is still in the copyright, and that doesn't mix with GPL. (There is a
> special exemption on libdb2 for use in glibc:
>
> As a special exception, when Berkeley DB is distributed along with
> the GNU C Library, in any program which uses the GNU C Library in
> accord with that library's distribution terms, it is also
> permitted for Berkeley DB to be loaded dynamically by the GNU C
> Library to implement standard ISO/IEC 9945 and Unix interface
> functionality.
>
> Don't ask me what this implies for the GPL!
>
It is my understanding that we would not be encumbered. The db2 work is
not derived from the work we are doing, and anything _we_ do is actually
GPLed.

Before I made the above suggestion, I contacted Sleepycat to learn about
their attitude. The following is an excerpt from their answer:

> Given the infectious nature of the GPL, I would be happy to put the
> whole work under an artistic or even XFree86 style license (well, my
> immediate superior at SuSE is Dirk Hohndel). What would you suggest?

We really don't feel strongly about what licenses are used.
Generally, we are happy to be distributed under whatever license you
think best for your project, as long as the underlying open source
software provisions are met.

Regards,
Amy Adams

So, from their point it should be okay. I know how restricting the GPL
can be, but does that really mean we are not allowed to use other _free_
software in a project?

Having read the GPL in detail with a legalese attitude, I can still not
see that it would cover the underlying software. I think justification
to use db2 can be derived from the following GPL part:

In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.

So I believe we are clean.

> --Stephen
>
Volker

--
Volker Wiegand Phone: +49 (0) 6196 / 50951-24
SuSE Rhein/Main AG Fax: +49 (0) 6196 / 40 96 07
Mergenthalerallee 45-47 Mobile: +49 (0) 179 / 292 66 76
D-65760 Eschborn E-Mail: Volker.Wiegand@suse.de
++ Only users lose drugs. Or was it the other way round? ++

Future of /proc/ha [ In reply to ]

alanr at bell-labs

Oct 28, 1999, 8:16 PM

Post #4 of 8 (3014 views)

Volker Wiegand wrote:
>
> Dear colleagues,
>
> As I was continuing my work on /proc/ha during the last days, I ran across
> a couple of problems and I would like to get some input from others before
> I decide how to proceed.
>
> (1) The /proc interface is a moving target. In order to support kernels
> down to 2.0.1 (which seems to be reasonable for me), I am already running
> three versions. And supporting other open operating systems is completely
> out of reach.
>
> (2) Especially when it comes to handling resources, even on an average
> sized cluster the amount of data in one "file" can exceed 4 KB, which is
> the allocation size in /proc. This means I cannot read atomicly any more.
>
> (3) The work /proc/ha is doing is to provide information for user land
> programs which comes from ... user land programs. Debugging can only be
> accomplished by printk() outputs. And a post-mortem after a node crashed
> is a bit ... difficult to get :-)
>
> (4) I have no chance to make the information persistent even if I wanted
> to do so eventually.
>
> All this leads me to the attitude that we should abandon /proc/ha. Alas,
> what is then the way to go? I have recently been toying around with the
> Berkeley DB code (db-2.7.7.tar.gz) from Sleepycat and it looks very very
> promising. They also have a license which would allow to incorporate the
> code in our work without charging.
>
> Does anyone want to convince me that I should continue /proc/ha? And does
> anyone want to convince me that we should not further investigate B-DB as
> the underlying storage module for cluster/node/resource state including
> transactional support?

Volker,

My only concern about the Berkley DB code is that most databases are *highly*
prone to corruption during crashes. I would recommend that no permanent state
be kept in a database. When a node comes up after a crash, it's old idea of the
cluster topology is now invalid anyway, so I don't envision this being a
problem.

A good API for accessing the data is essential.

Go for it.

-- Alan Robertson
alanr@bell-labs.com

Future of /proc/ha [ In reply to ]

wiegand at suse

Oct 28, 1999, 9:20 PM

Post #5 of 8 (3002 views)

On Thu, 28 Oct 1999, Alan Robertson wrote:

> My only concern about the Berkley DB code is that most databases are *highly*
> prone to corruption during crashes. I would recommend that no permanent state
> be kept in a database. When a node comes up after a crash, its old idea of the
> cluster topology is now invalid anyway, so I don't envision this being a
> problem.
>
Oh yes, I agree. Using a file based DBMS was _not at all_ for persistency
reasons, and of course a _clean_ cluster startup would initialize a fresh
empty set of tables. With Berkeley DB I'm after two benefits:

(1) You get free local support for transactions. I want this because my
model is a layered one. The lowest layer is a reliable message exchange
within the cluster. All other services build on top of this layer.

(2) You can at least try to get post-mortem analysis on crashed nodes. It
can also be used as an audit trail if you keep the last two or three logs.

> A good API for accessing the data is essential.
>
Great words from a great soul :-)

> Go for it.
>
> -- Alan Robertson
> alanr@bell-labs.com
>
Volker

--
Volker Wiegand Phone: +49 (0) 6196 / 50951-24
SuSE Rhein/Main AG Fax: +49 (0) 6196 / 40 96 07
Mergenthalerallee 45-47 Mobile: +49 (0) 179 / 292 66 76
D-65760 Eschborn E-Mail: Volker.Wiegand@suse.de
++ Only users lose drugs. Or was it the other way round? ++

Future of /proc/ha [ In reply to ]

Oct 29, 1999, 8:48 AM

Post #6 of 8 (3022 views)

Hi,

On Thu, 28 Oct 1999 21:16:00 -0600, Alan Robertson <alanr@bell-labs.com>
said:

> Volker,

> My only concern about the Berkley DB code is that most databases are
> *highly* prone to corruption during crashes. I would recommend that
> no permanent state be kept in a database.

libdb2 supports journaled updates to the database.

--Stephen

Future of /proc/ha [ In reply to ]

alanr at bell-labs

Oct 29, 1999, 9:47 AM

Post #7 of 8 (3035 views)

"Stephen C. Tweedie" wrote:
>
> Hi,
>
> On Thu, 28 Oct 1999 21:16:00 -0600, Alan Robertson <alanr@bell-labs.com>
> said:
>
> > Volker,
>
> > My only concern about the Berkley DB code is that most databases are
> > *highly* prone to corruption during crashes. I would recommend that
> > no permanent state be kept in a database.
>
> libdb2 supports journaled updates to the database.

But, if your filesystem has corrupted your base database file (as sometimes
happens), then you have made a nightmare out of startup. Folks like Oracle
spend way more effort on getting this right than I ever intend to.

This can be due to database bugs, application bugs, filesystem bugs, fsck
bugs, or some combination of the three.

This is not to say that you *can't* make this work but that you are
well-advised to avoid it if you can.

All other things being equal, a design which doesn't depend on this is
superior to one that does.

And, as Volker has made clear, that wasn't his intention anyway.

-- Alan Robertson
alanr@bell-labs.com

Future of /proc/ha [ In reply to ]

Nov 1, 1999, 5:11 AM

Post #8 of 8 (3018 views)

Hi,

On Fri, 29 Oct 1999 10:47:38 -0600, Alan Robertson <alanr@bell-labs.com>
said:

>> libdb2 supports journaled updates to the database.

> But, if your filesystem has corrupted your base database file (as
> sometimes happens), then you have made a nightmare out of startup.
> Folks like Oracle spend way more effort on getting this right than I
> ever intend to.

If that can happen then you can lose your text-mode config files too.
Sorry, that argument doesn't convince me. :)

> This can be due to database bugs, application bugs, filesystem bugs,
> fsck bugs, or some combination of the three.

Sure. You assume that the database software is reliable. That's why
you use the standard software components rather than rolling your own.
If you have filesystem, fsck or application bugs, then whether your data
is stored in a database or as plain text has absolutely no impact on
your ability to recover --- it's not a valid argument against using a
journaled database. (If you don't trust libdb2's recovery, then that
_would_ be a valid argument.)

--Stephen