Mailing List Archive

Heartbeat dynamic module support
Hi,

I modularized communication and authentication parts of heartbeat.

All communication "drivers" and authentication code have been removed from
heartbeat binary and are organized as follow:

/etc/ha.d/modules/comm: ping.so ppp-udp.so serial.so udp.so

/etc/ha.d/modules/auth: crc.so md5.so sha1.so


_All_ modules on those directories are loaded on startup, and after
parsing the configuration files the unused modules are unloaded.

The only missing part of this scheme is to modularize STONITH "drivers",
but I hope to do that soon.

The patch against current CVS version can be found at
http://bazar.conectiva.com.br/~marcelo/ha/patches/hb-module.patch

Comments are welcome.
Heartbeat dynamic module support [ In reply to ]
On Mon, 21 Aug 2000 wiegand@suse.de wrote:

> Marcelo,
>
> that is an excellent idea, and I encourage you to have it included into
> the CVS as "mainstream".

It will be included.

I just sent the patch to the list before to receive comments/suggestions
about the patch.

> But, PLEASE, change the location to something
> suitable for binaries. I'm no expert for FHS, but maybe /sbin/ha.d/...
> would be a better place. In any case, /etc is no good path for executable
> files IMHO.

Right.

I wonder why I chose "/etc/ha.d/modules". :)
Heartbeat dynamic module support [ In reply to ]
wiegand@suse.de wrote:
>
> Marcelo,
>
> that is an excellent idea, and I encourage you to have it included into
> the CVS as "mainstream". But, PLEASE, change the location to something
> suitable for binaries. I'm no expert for FHS, but maybe /sbin/ha.d/...
> would be a better place. In any case, /etc is no good path for executable
> files IMHO.
>
> Kind regards
> Volker

Having subdirectories under /sbin is nonstandard (even though SuSE does it), and
the files aren't complete executables like everything in /sbin. The right
location is something more like /usr/lib/heartbeat or something.

Marcelo submitted the patches to me. He did the development at my request, and
we have discussed it quite a bit. It'll go in to CVS as soon as I see it work
;-)

He has CVS write permissions, but submitted it through me as a courtesy, and a
way of making sure it was reviewed.

-- Alan Robertson
alanr@sues.com
Heartbeat dynamic module support [ In reply to ]
Marcelo Tosatti wrote:
>
> Hi,
>
> I modularized communication and authentication parts of heartbeat.
>
> All communication "drivers" and authentication code have been removed from
> heartbeat binary and are organized as follow:
>
> /etc/ha.d/modules/comm: ping.so ppp-udp.so serial.so udp.so
>
> /etc/ha.d/modules/auth: crc.so md5.so sha1.so
>
> _All_ modules on those directories are loaded on startup, and after
> parsing the configuration files the unused modules are unloaded.

Have they already been locked into memory at that point?. Does the process
actually shrink after they are unloaded?

Thanks for you great work!

-- Alan Robertson
alanr@suse.com
Heartbeat dynamic module support [ In reply to ]
Marcelo,

that is an excellent idea, and I encourage you to have it included into
the CVS as "mainstream". But, PLEASE, change the location to something
suitable for binaries. I'm no expert for FHS, but maybe /sbin/ha.d/...
would be a better place. In any case, /etc is no good path for executable
files IMHO.

Kind regards
Volker



On Sun, 20 Aug 2000, Marcelo Tosatti wrote:

> Hi,
>
> I modularized communication and authentication parts of heartbeat.
>
> All communication "drivers" and authentication code have been removed from
> heartbeat binary and are organized as follow:
>
> /etc/ha.d/modules/comm: ping.so ppp-udp.so serial.so udp.so
>
> /etc/ha.d/modules/auth: crc.so md5.so sha1.so
>
>
> _All_ modules on those directories are loaded on startup, and after
> parsing the configuration files the unused modules are unloaded.
>
> The only missing part of this scheme is to modularize STONITH "drivers",
> but I hope to do that soon.
>
> The patch against current CVS version can be found at
> http://bazar.conectiva.com.br/~marcelo/ha/patches/hb-module.patch
>
> Comments are welcome.
>
>
> _______________________________________________________
> Linux-HA-Dev: Linux-HA-Dev@lists.tummy.com
> http://lists.tummy.com/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>

--
Freundschaftlich / With kind regards
Volker

--
Volker Wiegand Voice: +1-510-628-3380 ext 5029
SuSE Inc. Fax: +1-510-628-3381
580 Second Street, Suite 210 Mobile: +1-510-376-0302
Oakland, CA 94607 USA E-Mail: wiegand@suse.com
Heartbeat dynamic module support [ In reply to ]
On Sun, 20 Aug 2000, Alan Robertson wrote:

> Marcelo Tosatti wrote:
> >
> > Hi,
> >
> > I modularized communication and authentication parts of heartbeat.
> >
> > All communication "drivers" and authentication code have been removed from
> > heartbeat binary and are organized as follow:
> >
> > /etc/ha.d/modules/comm: ping.so ppp-udp.so serial.so udp.so
> >
> > /etc/ha.d/modules/auth: crc.so md5.so sha1.so
> >
> > _All_ modules on those directories are loaded on startup, and after
> > parsing the configuration files the unused modules are unloaded.
>
> Have they already been locked into memory at that point?.

No. mlockall() is called after unloading the unused modules.

> Does the process actually shrink after they are unloaded?

Yes.
Heartbeat dynamic module support [ In reply to ]
Marcelo Tosatti wrote:
>
> Hi,
>
> I modularized communication and authentication parts of heartbeat.
>
> All communication "drivers" and authentication code have been removed from
> heartbeat binary and are organized as follow:
>
> /etc/ha.d/modules/comm: ping.so ppp-udp.so serial.so udp.so
>
> /etc/ha.d/modules/auth: crc.so md5.so sha1.so
>
> _All_ modules on those directories are loaded on startup, and after
> parsing the configuration files the unused modules are unloaded.
>
> The only missing part of this scheme is to modularize STONITH "drivers",
> but I hope to do that soon.

Hi Marcelo,

In addition to the reply that I sent you earlier off the list, I'd like to also
add a note about something important which I strongly suspect doesn't work.

If I load my machine with the crc authentication module, and then discover later
that I want to use the md5 authentication, in order to get security, heartbeat
will currently do that when it is sent a signal *without restarting any
processes*. Given the nature of the authentication, it is very nice to not have
to restart the processes because otherwise you miss heartbeats and miss
messages.

It would be very nice to do that with the dynamic loading code running also...

Of course, it's more complicated now ;-)

-- Alan Robertson
alanr@suse.com
Heartbeat dynamic module support [ In reply to ]
Marcelo Tosatti wrote:
>
> On Sun, 20 Aug 2000, Alan Robertson wrote:
>
> > Marcelo Tosatti wrote:
> > >
> > > Hi,
> > >
> > > I modularized communication and authentication parts of heartbeat.
> > >
> > > All communication "drivers" and authentication code have been removed from
> > > heartbeat binary and are organized as follow:
> > >
> > > /etc/ha.d/modules/comm: ping.so ppp-udp.so serial.so udp.so
> > >
> > > /etc/ha.d/modules/auth: crc.so md5.so sha1.so
> > >
> > > _All_ modules on those directories are loaded on startup, and after
> > > parsing the configuration files the unused modules are unloaded.
> >
> > Have they already been locked into memory at that point?.
>
> No. mlockall() is called after unloading the unused modules.
>
> > Does the process actually shrink after they are unloaded?

That's what I like to hear ;-)

-- Alan Robertson
alanr@suse.com
Heartbeat dynamic module support [ In reply to ]
This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
Send mail to mime@docserver.cac.washington.edu for more info.

--661009-1882596787-966888889=:9156
Content-Type: TEXT/PLAIN; charset=US-ASCII



On Mon, 21 Aug 2000, Alan Robertson wrote:

> Marcelo Tosatti wrote:
> >
> > Hi,
> >
> > I modularized communication and authentication parts of heartbeat.
> >
> > All communication "drivers" and authentication code have been removed from
> > heartbeat binary and are organized as follow:
> >
> > /etc/ha.d/modules/comm: ping.so ppp-udp.so serial.so udp.so
> >
> > /etc/ha.d/modules/auth: crc.so md5.so sha1.so
> >
> > _All_ modules on those directories are loaded on startup, and after
> > parsing the configuration files the unused modules are unloaded.
> >
> > The only missing part of this scheme is to modularize STONITH "drivers",
> > but I hope to do that soon.
>
> Hi Marcelo,
>
> In addition to the reply that I sent you earlier off the list, I'd like to also
> add a note about something important which I strongly suspect doesn't work.
>
> If I load my machine with the crc authentication module, and then discover later
> that I want to use the md5 authentication, in order to get security, heartbeat
> will currently do that when it is sent a signal *without restarting any
> processes*. Given the nature of the authentication, it is very nice to not have
> to restart the processes because otherwise you miss heartbeats and miss
> messages.
>
> It would be very nice to do that with the dynamic loading code running also...
>
> Of course, it's more complicated now ;-)

Its hard to get this working completly right.

The problem is that when we reread the auth file, the auth method which
was being used is removed, and maybe still there are messages in the
status FIFO which have been encoded with it.

The right solution to this problem is to only close the previous auth
method module when we make sure all hosts sent a valid message encoded
with the new auth method, though I think its not worth doing this.

The attached patch will, in case a SIGHUP is sent to the master status
process, load the new authentication module(s). The problem I described
above may happen, but the few messages which may be lost will be
retransmitted.


--661009-1882596787-966888889=:9156
Content-Type: TEXT/PLAIN; charset=US-ASCII; name="hb-sig.patch"
Content-Transfer-Encoding: BASE64
Content-ID: <Pine.LNX.4.21.0008211714490.9156@freak.distro.conectiva>
Content-Description:
Content-Disposition: attachment; filename="hb-sig.patch"

ZGlmZiAtTnVyIGxpbnV4LWhhLm9yaWcvaGVhcnRiZWF0L2hlYXJ0YmVhdC5j
IGxpbnV4LWhhL2hlYXJ0YmVhdC9oZWFydGJlYXQuYw0KLS0tIGxpbnV4LWhh
Lm9yaWcvaGVhcnRiZWF0L2hlYXJ0YmVhdC5jCU1vbiBBdWcgMjEgMDg6NTI6
MDggMjAwMA0KKysrIGxpbnV4LWhhL2hlYXJ0YmVhdC9oZWFydGJlYXQuYwlN
b24gQXVnIDIxIDEzOjQ1OjM2IDIwMDANCkBAIC0xNzMyLDYgKzE3MzIsMjYg
QEANCiBjaGVja19hdXRoX2NoYW5nZShzdHJ1Y3Qgc3lzX2NvbmZpZyAqY29u
ZikNCiB7DQogCWlmIChjb25mLT5yZXJlYWRhdXRoKSB7DQorCQlpbnQgaiwg
cmVtID0gMDsNCisNCisJCWZvciAoaj0wOyBqIDwgbnVtX2F1dGhfdHlwZXM7
ICsraikgew0KKwkJCWlmKFZhbGlkQXV0aHNbal0pIHsNCisJCQkJZGxjbG9z
ZShWYWxpZEF1dGhzW2pdLT5kbGhhbmRsZXIpOw0KKwkJCQloYV9mcmVlKFZh
bGlkQXV0aHNbal0tPmF1dGhuYW1lKTsNCisJCQkJaGFfZnJlZShWYWxpZEF1
dGhzW2pdKTsNCisJCQkJVmFsaWRBdXRoc1tqXSA9IE5VTEw7DQorCQkJfQ0K
KwkJfQ0KKw0KKwkJbnVtX2F1dGhfdHlwZXMgPSAwOw0KKw0KKwkJaWYoYXV0
aF9tb2R1bGVfaW5pdCgpID09IEhBX0ZBSUwpIHsgDQorCQkJaGFfbG9nKExP
R19FUlINCisJCQksCSJBdXRoZW50aWNhdGlvbiBtb2R1bGVzIGxvYWRpbmcg
ZXJyb3IsIGV4aXRpbmcuIik7DQorCQkJc2lnbmFsX2FsbChTSUdURVJNKTsN
CisJCQljbGVhbmV4aXQoMSk7DQorCQl9DQorDQogCQlpZiAocGFyc2VfYXV0
aGZpbGUoKSAhPSBIQV9PSykgew0KIAkJCS8qIE9PUFMuICBTYXlvbmFyYS4g
Ki8NCiAJCQloYV9sb2coTE9HX0VSUg0KQEAgLTE3MzksNyArMTc1OSwyMiBA
QA0KIAkJCXNpZ25hbF9hbGwoU0lHVEVSTSk7DQogCQkJY2xlYW5leGl0KDEp
Ow0KIAkJfQ0KKw0KIAkJY29uZi0+cmVyZWFkYXV0aCA9IDA7DQorDQorCQlm
b3IgKGo9MDsgaiA8IG51bV9hdXRoX3R5cGVzOyArK2opIHsNCisJCQlpZihW
YWxpZEF1dGhzW2pdKSB7DQorCQkJCWlmIChWYWxpZEF1dGhzW2pdLT5yZWYg
PT0gMCkgIHsNCisJCQkJCWRsY2xvc2UoVmFsaWRBdXRoc1tqXS0+ZGxoYW5k
bGVyKTsgDQorCQkJCQloYV9mcmVlKFZhbGlkQXV0aHNbal0tPmF1dGhuYW1l
KTsNCisJCQkJCWhhX2ZyZWUoVmFsaWRBdXRoc1tqXSk7DQorCQkJCQlWYWxp
ZEF1dGhzW2pdID0gTlVMTDsNCisJCQkJCXJlbSsrOw0KKwkJCQl9DQorCQkJ
fQ0KKwkJfQ0KKw0KKwkJbnVtX2F1dGhfdHlwZXMgLT0gcmVtOw0KIAl9DQog
fQ0KIA0KQEAgLTE5OTgsNyArMjAzMyw3IEBADQogdm9pZA0KIHJlcmVhZF9j
b25maWdfc2lnKGludCBzaWcpDQogew0KLQlpbnQJajsNCisJaW50CWosIHJl
bSA9IDA7DQogDQogCXNpZ25hbChzaWcsIHJlcmVhZF9jb25maWdfc2lnKTsN
CiANCkBAIC0yMDI2LDcgKzIwNjEsNTcgQEANCiAJCX1lbHNlew0KIAkJCWhh
X2xvZyhMT0dfSU5GTywgIkNvbmZpZ3VyYXRpb24gdW5jaGFuZ2VkLiIpOw0K
IAkJfQ0KKwl9IGVsc2UgeyANCisNCisJCS8qIFdlIGFyZSBub3QgdGhlIGNv
bnRyb2wgcHJvY2VzcywgYW5kIHdlIHJlY2VpdmVkIGEgU0lHSFVQIHNpZ25h
bC4NCisJCSAqIFRoaXMgbWVhbnMgY29uZmlndXJhdGlvbiBmaWxlIGhhcyBj
aGFuZ2VkLg0KKwkJICovDQorDQorCQlmb3IgKGo9MDsgaiA8IG51bV9hdXRo
X3R5cGVzOyArK2opIHsNCisJCQlpZihWYWxpZEF1dGhzW2pdKSB7DQorCQkJ
CWRsY2xvc2UoVmFsaWRBdXRoc1tqXS0+ZGxoYW5kbGVyKTsNCisJCQkJaGFf
ZnJlZShWYWxpZEF1dGhzW2pdLT5hdXRobmFtZSk7DQorCQkJCWhhX2ZyZWUo
VmFsaWRBdXRoc1tqXSk7DQorCQkJCVZhbGlkQXV0aHNbal0gPSBOVUxMOw0K
KwkJCX0NCisJCX0NCisNCisJCW51bV9hdXRoX3R5cGVzID0gMDsNCisNCisJ
CWlmKGF1dGhfbW9kdWxlX2luaXQoKSA9PSBIQV9GQUlMKSB7IA0KKwkJCWhh
X2xvZyhMT0dfRVJSDQorCQkJLAkiQXV0aGVudGljYXRpb24gbW9kdWxlcyBs
b2FkaW5nIGVycm9yLCBleGl0aW5nLiIpOw0KKwkJCXNpZ25hbF9hbGwoU0lH
VEVSTSk7DQorCQkJY2xlYW5leGl0KDEpOw0KKwkJfQ0KKwkNCisJCWlmIChw
YXJzZV9hdXRoZmlsZSgpICE9IEhBX09LKSB7DQorCQkJLyogT09QUy4gIFNh
eW9uYXJhLiAqLw0KKwkJCWhhX2xvZyhMT0dfRVJSDQorCQkJLAkiQXV0aGVu
dGljYXRpb24gcmVwYXJzaW5nIGVycm9yLCBleGl0aW5nLiIpOw0KKwkJCXNp
Z25hbF9hbGwoU0lHVEVSTSk7DQorCQkJY2xlYW5leGl0KDEpOw0KKwkJfQ0K
KwkNCisJCWNvbmZpZy0+cmVyZWFkYXV0aCA9IDA7DQorCQ0KKwkJLyogVW5s
b2FkIHVucmVmZXJlbmNlZCBtb2R1bGVzICovDQorCQ0KKwkJZm9yIChqPTA7
IGogPCBudW1fYXV0aF90eXBlczsgKytqKSB7DQorCQkJaWYoVmFsaWRBdXRo
c1tqXSkgeyANCisJCQkJaWYgKFZhbGlkQXV0aHNbal0tPnJlZiA9PSAwKSAg
ew0KKwkJCQkJZGxjbG9zZShWYWxpZEF1dGhzW2pdLT5kbGhhbmRsZXIpOyAN
CisJCQkJCWhhX2ZyZWUoVmFsaWRBdXRoc1tqXS0+YXV0aG5hbWUpOw0KKwkJ
CQkJaGFfZnJlZShWYWxpZEF1dGhzW2pdKTsNCisJCQkJCVZhbGlkQXV0aHNb
al0gPSBOVUxMOw0KKwkJCQkJcmVtKys7DQorCQkJCX0NCisJCQl9DQorCQl9
DQorCQ0KKwkJbnVtX2F1dGhfdHlwZXMgLT0gcmVtOw0KIAl9DQorDQogCVBh
cnNlVGVzdE9wdHMoKTsNCiB9DQogDQo=
--661009-1882596787-966888889=:9156--
Heartbeat dynamic module support [ In reply to ]
Marcelo Tosatti wrote:
>
> On Mon, 21 Aug 2000, Alan Robertson wrote:
>
> > Marcelo Tosatti wrote:
> > >
> > > Hi,
> > >
> > > I modularized communication and authentication parts of heartbeat.

<snip>

> > Hi Marcelo,
> >
> > In addition to the reply that I sent you earlier off the list, I'd like to also
> > add a note about something important which I strongly suspect doesn't work.
> >
> > If I load my machine with the crc authentication module, and then discover later
> > that I want to use the md5 authentication, in order to get security, heartbeat
> > will currently do that when it is sent a signal *without restarting any
> > processes*. Given the nature of the authentication, it is very nice to not have
> > to restart the processes because otherwise you miss heartbeats and miss
> > messages.
> >
> > It would be very nice to do that with the dynamic loading code running also...
> >
> > Of course, it's more complicated now ;-)
>
> Its hard to get this working completly right.
>
> The problem is that when we reread the auth file, the auth method which
> was being used is removed, and maybe still there are messages in the
> status FIFO which have been encoded with it.

Worse yet, you could have requested retransmission of some, so they aren't even
in the FIFO. This behavior is what is intended, not a bug (IMHO).

> The right solution to this problem is to only close the previous auth
> method module when we make sure all hosts sent a valid message encoded
> with the new auth method, though I think its not worth doing this.
>
> The attached patch will, in case a SIGHUP is sent to the master status
> process, load the new authentication module(s). The problem I described
> above may happen, but the few messages which may be lost will be
> retransmitted.

But with the same incorrect authentication ;-) The packet will not be re-signed
when it is retransmitted.

Note that if it isn't referenced in the auth file, it doesn't matter if the
module isn't loaded - it wouldn't be called anyway. So, your code shouldn't
change the way it works. This is a known behavior - and you have to solve it
with the correct processes. Read the heartbeat doc - it talks about the
sequence necessary to handle this correctly ;-) It's all perfectly easy to do,
but you have to follow the procedure - correctly.

It goes like this:

(a) Change the auth file to accept both the new and old keys on all machines
but still sign with the old key (version 2)

Send SIGHUP to everyone in the cluster

Once everyone has reread the auth file, then:

(b) Rewrite the auth file to send the new key, but accept both the old and
new keys (version 3)

Send the signal to all processes in the cluster

(c) Once everyone has reread the auth file, then:
Wait long enough for everyone to have flushed out any rexmits
(100 seconds worst case)

(d) Rewrite the auth file to send the new key, and accept only the new key
(version 4)

Send the signal to all processes in the cluster...


That's all folks...

Obviously, this is a great candidate for being automated ;-) I had hesitated
because of how restrictive the rules on exporting encryption (necessary for key
distribution) were here in the US. This is better now, though. It would
actually pretty easy if you prestaged all three of the new files on the machines
before you started.

An automated procedure might look like this:


Ask for the new auth information.

Use this to create the 3 new files as per the description

Copy (scp) the 3 new files to each machine

Start a key update procedure from one of the hosts, sending out checksums
on the three prestaged files, abort if any machine doesn't see
the right checksum on one of the files...

This initiating host then leads all the hosts through the sequenced steps
a-d above, with acks at the end of each stage, and a long pause
at step (c).


-- Alan Robertson
alanr@suse.com
Heartbeat dynamic module support [ In reply to ]
On Mon, 21 Aug 2000, Alan Robertson wrote:

<snip>

> But with the same incorrect authentication ;-) The packet will not be re-signed
> when it is retransmitted.

Oh, this is worse than what I thought.

> Note that if it isn't referenced in the auth file, it doesn't matter if the
> module isn't loaded - it wouldn't be called anyway. So, your code shouldn't
> change the way it works.
> This is a known behavior - and you have to solve it
> with the correct processes. Read the heartbeat doc - it talks about the
> sequence necessary to handle this correctly ;-) It's all perfectly easy to do,
> but you have to follow the procedure - correctly.
>
> It goes like this:
>
> (a) Change the auth file to accept both the new and old keys on all machines
> but still sign with the old key (version 2)
>
> Send SIGHUP to everyone in the cluster
>
> Once everyone has reread the auth file, then:
>
> (b) Rewrite the auth file to send the new key, but accept both the old and
> new keys (version 3)
>
> Send the signal to all processes in the cluster
>
> (c) Once everyone has reread the auth file, then:
> Wait long enough for everyone to have flushed out any rexmits
> (100 seconds worst case)
>
> (d) Rewrite the auth file to send the new key, and accept only the new key
> (version 4)
>
> Send the signal to all processes in the cluster...
>
>
> That's all folks...

At least this works now with dynamic modules.

>
> Obviously, this is a great candidate for being automated ;-)

Changing authentication method is very rare.

IMHO we have more important stuff to do right now.

> I had hesitated
> because of how restrictive the rules on exporting encryption (necessary for key
> distribution) were here in the US. This is better now, though. It would
> actually pretty easy if you prestaged all three of the new files on the machines
> before you started.
>
> An automated procedure might look like this:
>
>
> Ask for the new auth information.
>
> Use this to create the 3 new files as per the description
>
> Copy (scp) the 3 new files to each machine
>
> Start a key update procedure from one of the hosts, sending out checksums
> on the three prestaged files, abort if any machine doesn't see
> the right checksum on one of the files...
>
> This initiating host then leads all the hosts through the sequenced steps
> a-d above, with acks at the end of each stage, and a long pause
> at step (c).
Heartbeat dynamic module support [ In reply to ]
Marcelo Tosatti wrote:
>
> On Mon, 21 Aug 2000, Alan Robertson wrote:
>
> <snip>
>
> > But with the same incorrect authentication ;-) The packet will not be re-signed
> > when it is retransmitted.
>
> Oh, this is worse than what I thought.

But, it works perfectly fine. It is *necessary* for it to work that way, or
something close to that way. You have to shut off recognition of the old auth
method.

> > That's all folks...
>
> At least this works now with dynamic modules.

Yes. I don't think this change broke anything. It worked like it should work
before. It's just that changing keys is complicated.

> >
> > Obviously, this is a great candidate for being automated ;-)
>
> Changing authentication method is very rare.
>
> IMHO we have more important stuff to do right now.

No argument. I'm not so motivated. That's why it's not done ;-)

Patches are being accepted ;-)


-- Alan Robertson
alanr@suse.com