Mailing List Archive: Configuration issue with DRBD + datadisk

Configuration issue with DRBD + datadisk

hostmaster at example

Sep 25, 2000, 11:54 PM

Post #1 of 43 (6676 views)

Hi all !

I have installed drbd-0.5.7 and heartbeat in a RH 6.2 system with 2.2.16-3
kernel . I have added the datadisk script to the heartbeat control file
(haresources) and it works fine when the primary server (pollux) fails (the
secondary server, castor, gets configured as primary and the /dev/nb0 is
mounted correctly).

When the first machine (pollux, the original primary server) is started
again, heartbeat is executed in both servers and the original secondary
node (configured as primary when the original primary server failed) is
configured as secondary again, but the primary machine remains as secondary.

Here is the log file for the primary server :

Sep 26 08:45:27 pollux datadisk: succeeded
Sep 26 08:45:27 pollux datadisk: succeeded
Sep 26 08:45:27 pollux kernel: attempt to access beyond end of device
Sep 26 08:45:27 pollux kernel: 2b:00: rw=0, want=2, limit=0
Sep 26 08:45:27 pollux kernel: dev 2b:00 blksize=1024 blocknr=1 sector=2
size=1024 count=1
Sep 26 08:45:27 pollux kernel: EXT2-fs: unable to read superblock
Sep 26 08:45:27 pollux datadisk: failed
Sep 26 08:45:27 pollux datadisk: succeeded
Sep 26 08:45:27 pollux heartbeat: info: Running /etc/rc.d/init.d/httpd start
Sep 26 08:45:28 pollux httpd: httpd startup succeeded
Sep 26 08:45:44 pollux kernel: drbd : vmallocing 131281 B for bitmap. @c886401c
Sep 26 08:45:44 pollux kernel: drbd0: Connection established.
Sep 26 08:45:44 pollux kernel: drbd0: size=4200997 KB / blksize=4096 B

Here is the output of the /proc/drbd in pollux:

version : 57

0: cs:Connected st:Secondary/Secondary ns:0 nr:0 dw:0 dr:0 of:0

If I execute manually the /etc/rc.d/init.d/datadisk start , the primary
node gets configured ok.

Sep 26 08:47:20 pollux datadisk: succeeded
Sep 26 08:47:20 pollux datadisk: succeeded
Sep 26 08:47:20 pollux kernel: drbd0: blksize=1024 B
Sep 26 08:47:20 pollux kernel: drbd0: blksize=4096 B
Sep 26 08:47:21 pollux datadisk: succeeded

But after this, the /dev/nd0 contents are the existing in the primary
machine before the shutdown. All the modifications in the contents done in
the secondary machine when it is promoted to primary are missing :-/

How can I avoid the first error ? How can I syncronize again the contents
of both servers ?

Best regards,

Antonio Navarro Navarro
BemarNet Management
http://www.bemarnet.es
hostmaster@example.com
Tlf. +34-96-1656644
Fax. +34-96-1656514

Re: Configuration issue with DRBD + datadisk [ In reply to ]

philipp at example

Sep 26, 2000, 1:52 PM

Post #2 of 43 (6635 views)

Hi Antonio,

this looks a lot like if there was someone trying to
mount dev /dev/nbX device before it was connected...
(and the size was not specified by "-d XXXX")

-Philipp

Am Die, 26 Sep 2000 schrieb Antonio Navarro Navarro:
>Hi all !
>
>I have installed drbd-0.5.7 and heartbeat in a RH 6.2 system with 2.2.16-3
>kernel . I have added the datadisk script to the heartbeat control file
>(haresources) and it works fine when the primary server (pollux) fails (the
>secondary server, castor, gets configured as primary and the /dev/nb0 is
>mounted correctly).
>
>When the first machine (pollux, the original primary server) is started
>again, heartbeat is executed in both servers and the original secondary
>node (configured as primary when the original primary server failed) is
>configured as secondary again, but the primary machine remains as secondary.
>
>Here is the log file for the primary server :
>
>Sep 26 08:45:27 pollux datadisk: succeeded
>Sep 26 08:45:27 pollux datadisk: succeeded
>Sep 26 08:45:27 pollux kernel: attempt to access beyond end of device
>Sep 26 08:45:27 pollux kernel: 2b:00: rw=0, want=2, limit=0
>Sep 26 08:45:27 pollux kernel: dev 2b:00 blksize=1024 blocknr=1 sector=2
>size=1024 count=1
>Sep 26 08:45:27 pollux kernel: EXT2-fs: unable to read superblock
>Sep 26 08:45:27 pollux datadisk: failed
>Sep 26 08:45:27 pollux datadisk: succeeded
>Sep 26 08:45:27 pollux heartbeat: info: Running /etc/rc.d/init.d/httpd start
>Sep 26 08:45:28 pollux httpd: httpd startup succeeded
>Sep 26 08:45:44 pollux kernel: drbd : vmallocing 131281 B for bitmap. @c886401c
>Sep 26 08:45:44 pollux kernel: drbd0: Connection established.
>Sep 26 08:45:44 pollux kernel: drbd0: size=4200997 KB / blksize=4096 B
>
>Here is the output of the /proc/drbd in pollux:
>
>version : 57
>
>0: cs:Connected st:Secondary/Secondary ns:0 nr:0 dw:0 dr:0 of:0
>
>If I execute manually the /etc/rc.d/init.d/datadisk start , the primary
>node gets configured ok.
>
>Sep 26 08:47:20 pollux datadisk: succeeded
>Sep 26 08:47:20 pollux datadisk: succeeded
>Sep 26 08:47:20 pollux kernel: drbd0: blksize=1024 B
>Sep 26 08:47:20 pollux kernel: drbd0: blksize=4096 B
>Sep 26 08:47:21 pollux datadisk: succeeded
>
>But after this, the /dev/nd0 contents are the existing in the primary
>machine before the shutdown. All the modifications in the contents done in
>the secondary machine when it is promoted to primary are missing :-/
>
>How can I avoid the first error ? How can I syncronize again the contents
>of both servers ?
>
>Best regards,
>
>Antonio Navarro Navarro
>BemarNet Management
>http://www.bemarnet.es
>hostmaster@example.com
>Tlf. +34-96-1656644
>Fax. +34-96-1656514
>_______________________________________________
>DRBD-devel mailing list
>DRBD-devel@example.com
>http://lists.sourceforge.net/mailman/listinfo/drbd-devel
--
Want to try something new? Are you a Linux hacker?
Volunteer in testing mergemem!
(Get it from http://das.ist.org/mergemem)
-----
Philipp Reisner PGP: http://der.ist.org/~kde/pgp.asc

Re: Configuration issue with DRBD + datadisk [ In reply to ]

hostmaster at example

Sep 27, 2000, 12:51 AM

Post #3 of 43 (6634 views)

At 22.52 26/9/00 +0200, Philipp Reisner wrote:

>this looks a lot like if there was someone trying to
>mount dev /dev/nbX device before it was connected...
>(and the size was not specified by "-d XXXX")

Yes, you are right, was a size problem :-)

>This problem is solved by running the /etc/rc.d/init.d/drbd
>script before running /etc/rc.d/init.d/heartbeat at system startup.
>
>Since the drbd script will use drbdsetup /dev/nbX WAIT to wait
>until resynchronisation is finished.
>
>PLEASE NOTE:
> This is not working flawless in the current release!
> Insert a "sleep 3" before the drbdsetup xxx WAIT command,
> or use more recent source from the CVS.
> ( I am recomending the first of the two solutions :)

Hummm... I have tried this but the system still fails. The sequence is this:

System 1 : Pollux (primary) /dev/nb0 (/home)
System 2 : Castor (secondary) /dev/nb0 (not mounted)

I create a file named 'test' in /home (pollux) containing a line, and,
after a few seconds, i make a halt in the primary machine.

System 1 : Pollux halted
System 2 : Castor (secondary) /dev/nb0 (not mounted)

After a few seconds heartbeat starts castor as primary

System 1 : Pollux halted
System 2 : Castor (primarry) /dev/nb0 (/home)

I add a new line to the /home/test (castor) and after a few seconds power
up pollux

System 1 : Pollux powering up
System 2 : Castor (primarry) /dev/nb0 (/home)

After a few seconds heartbeat starts pollux as primary

System 1 : Pollux (primary) /dev/nb0 (/home)
System 2 : Castor (secondary) /dev/nb0 (not mounted)

But the contents of the /home/test file down't reflect the changes made in
castor. I have added the '/bin/sleep 5' to the drbd script , but when the
machine is restarted the script fails to exec the resynchronization. (a red
[FAILED] message appears in the booting process)

Any idea ?

Antonio Navarro Navarro
BemarNet Management
http://www.bemarnet.es
hostmaster@example.com
Tlf. +34-96-1656644
Fax. +34-96-1656514

Re: Configuration issue with DRBD + datadisk [ In reply to ]

thomasm at example

Sep 27, 2000, 7:10 AM

Post #4 of 43 (6629 views)

> So, If I'm right your proposal is to configure both nodes as slave ? I
> think that a good solution could be the following :

> 1.- If the node starts up and can't find any other node, then is configured
> as master.

> 2.- If the node starts up and can find another node configured as master,
> then is configured as slave.

> 3.- If the node starts up and can find another node configured as slave,
> then is configured as master.

I have a set of personal (firm) script which does that. But as I have no way
to proble drbd on a server I must run a user level server which will try to get

/proc/drbd ..

Phlipp can you see a simple way to query remotely a host if it is running drbd.

"a la ping" or have we to rely on a userland server like I am doing ?

It can be an idea to assign another port to drbd use to remote query the /proc
entry. I can even do that. it may be usefull for HA which may want to remote
check

Thomas

Re: Configuration issue with DRBD + datadisk [ In reply to ]

hostmaster at example

Sep 27, 2000, 7:35 AM

Post #5 of 43 (6632 views)

At 11.00 27/9/00 +0100, "Thomas Mangin" wrote:

>Can the problem be that the failed master reconnect as master directly ?
>As it is not slave, the change done on Castor are not propagated to Pollux,
>So you recover your system as you left it before the power off .

Hummm... maybe, I have been testing the system and now it doesn't works
wuth the same settings... i'm confused

>That is why I want to bring the reconnecting node always as slave ..
>(And as I am paranoiac I force a full sync !)

So, If I'm right your proposal is to configure both nodes as slave ? I
think that a good solution could be the following :

1.- If the node starts up and can't find any other node, then is configured
as master.
2.- If the node starts up and can find another node configured as master,
then is configured as slave.
3.- If the node starts up and can find another node configured as slave,
then is configured as master.

Regards,

Antonio Navarro Navarro
BemarNet Management
http://www.bemarnet.es
hostmaster@example.com
Tlf. +34-96-1656644
Fax. +34-96-1656514

Re: Configuration issue with DRBD + datadisk [ In reply to ]

yocum at example

Sep 27, 2000, 12:41 PM

Post #6 of 43 (6631 views)

Did you mean to take the runForAll routine out of the new datadisk?

Dan

--
Dan Yocum, Sr. Linux Consultant
Linuxcare, Inc.
630.697.8066 tel
yocum@example.com, http://www.linuxcare.com

Linuxcare. Support for the revolution.

Re: Configuration issue with DRBD + datadisk [ In reply to ]

yocum at example

Sep 27, 2000, 12:48 PM

Post #7 of 43 (6633 views)

OK guys, heartbeat (actually ResourceManager in heartbeat) *needs* to
be able to issue these three commands to drbdc:

start
stop
status

And when it does a 'status' it *needs* to see either 'running' or
'Running' in the output if the service is available (i.e., if the volume
is mounted and Primary) not the contents of /proc/drbd.

Don't kill the messenger. ;)

Cheers,
Dan

--
Dan Yocum, Sr. Linux Consultant
Linuxcare, Inc.
630.697.8066 tel
yocum@example.com, http://www.linuxcare.com

Linuxcare. Support for the revolution.

Re: Configuration issue with DRBD + datadisk [ In reply to ]

thomasm at example

Sep 28, 2000, 1:17 AM

Post #8 of 43 (6631 views)

Dan Yocum wrote:

> Did you mean to take the runForAll routine out of the new datadisk?
>
> Dan
>

BIG BUG !!

>
> --
> Dan Yocum, Sr. Linux Consultant
> Linuxcare, Inc.
> 630.697.8066 tel
> yocum@example.com, http://www.linuxcare.com
>
> Linuxcare. Support for the revolution.

Re: Configuration issue with DRBD + datadisk [ In reply to ]

thomasm at example

Sep 28, 2000, 1:21 AM

Post #9 of 43 (6641 views)

Dan Yocum wrote:

> OK guys, heartbeat (actually ResourceManager in heartbeat) *needs* to
> be able to issue these three commands to drbdc:
>
> start
> stop
> status
>
> And when it does a 'status' it *needs* to see either 'running' or
> 'Running' in the output if the service is available (i.e., if the volume
> is mounted and Primary) not the contents of /proc/drbd.
>
> Don't kill the messenger. ;)
>
> Cheers,
> Dan

Start -> master
Stop -> slave
Status -> Running if Master only

Re: Configuration issue with DRBD + datadisk [ In reply to ]

thomasm at example

Sep 28, 2000, 3:22 AM

Post #10 of 43 (6634 views)

Philipp Reisner wrote:

> I think this should go into the datadisk script. Currently we have three
> scripts.
>
> /etc/init.d/drbd {start|stop}
> used by init at system startup/shutdown.
> load module and tells the drbd devices about the IP-addresses,
> ports and block devices.
>

If the node was slave before last reboot it will attempt a drbdsetup WAIT
too.
ie if the last command runned on the server regarding drbd was
- drbdc slave
- datadisk stop (which call drdbc slave)

> /etc/ha.d/resource.d/datadisk [device] {start|stop|status}
> used by heartbeat to switch a/all drbd's devices roles (PRI/SEC).
> I think this one should also handle the status command.
> Probabely it should return "running" if the device is in primary
> mode and "ready" if the device is in secondary mode.

Ok that is now what is done

> If this device is seconday and still syncing it should return
> "not ready".

Ok will do

> It's a bit difficult to define datadisk's behaviour if it's
> called without device and with the "status" command.
> (Calling with out device usually affects all devices, but
> what's the status of all devices ? )

The script will return a list containing for each of them their status
I will add a "sort" call to get a logical order

> /.../drbdc [device] {command}
> This comman is to be used by the system administrator or by
> custom made cluster-management-software. It has every command
> that Thomas can think of :)

This script as well as datadisk are using function now stored in drbd_commun.

So bug fixing only have to be done one time ;*)

I still have to add the "restore as it was before" usefull if not using
heartbeat.
Will be done soon

I was asked a command to return the percentage of a sync for a full resync
I will try to implement that in a near futur, if I can find some time.

Sorry if the scripts are now totally buggy .. (I found at least 5 bug today.)

Thomas

Re: Configuration issue with DRBD + datadisk [ In reply to ]

e9525415 at example

Sep 28, 2000, 3:29 AM

Post #11 of 43 (6633 views)

On Wed, 27 Sep 2000, Dan Yocum wrote:

> OK guys, heartbeat (actually ResourceManager in heartbeat) *needs* to
> be able to issue these three commands to drbdc:
>
> start
> stop
> status
>
> And when it does a 'status' it *needs* to see either 'running' or
> 'Running' in the output if the service is available (i.e., if the volume
> is mounted and Primary) not the contents of /proc/drbd.
>
> Don't kill the messenger. ;)

I think this should go into the datadisk script. Currently we have three
scripts.

/etc/init.d/drbd {start|stop}
used by init at system startup/shutdown.
load module and tells the drbd devices about the IP-addresses,
ports and block devices.

/etc/ha.d/resource.d/datadisk [device] {start|stop|status}
used by heartbeat to switch a/all drbd's devices roles (PRI/SEC).
I think this one should also handle the status command.
Probabely it should return "running" if the device is in primary
mode and "ready" if the device is in secondary mode.
If this device is seconday and still syncing it should return
"not ready".
It's a bit difficult to define datadisk's behaviour if it's
called without device and with the "status" command.
(Calling with out device usually affects all devices, but
what's the status of all devices ? )

/.../drbdc [device] {command}
This comman is to be used by the system administrator or by
custom made cluster-management-software. It has every command
that Thomas can think of :)

-Philipp

Re: Configuration issue with DRBD + datadisk [ In reply to ]

thomasm at example

Sep 28, 2000, 3:31 AM

Post #12 of 43 (6632 views)

Philipp Reisner wrote:

>
> /etc/ha.d/resource.d/datadisk [device] {start|stop|status}
> used by heartbeat to switch a/all drbd's devices roles (PRI/SEC).
> I think this one should also handle the status command.
> Probabely it should return "running" if the device is in primary
> mode and "ready" if the device is in secondary mode.
> If this device is seconday and still syncing it should return
> "not ready".

Can you give me more infos.

You want no ready if the master or the slave are syncing
or only if the slave is syncing ??

Thomas

Re: Configuration issue with DRBD + datadisk [ In reply to ]

yocum at example

Sep 28, 2000, 7:00 AM

Post #13 of 43 (6627 views)

Philipp Reisner wrote:

> I think this should go into the datadisk script. Currently we have three
> scripts.

Yes. I spoke too soon on putting it in drbdc - I saw that Thomas was
splitting datadisk into >1 scripts and thought drbdc was the replacement
for the main script.

> /etc/ha.d/resource.d/datadisk [device] {start|stop|status}
> used by heartbeat to switch a/all drbd's devices roles (PRI/SEC).
> I think this one should also handle the status command.
> Probabely it should return "running" if the device is in primary
> mode and "ready" if the device is in secondary mode.
> If this device is seconday and still syncing it should return
> "not ready".
> It's a bit difficult to define datadisk's behaviour if it's
> called without device and with the "status" command.
> (Calling with out device usually affects all devices, but
> what's the status of all devices ? )

Good point. Let me look at how Alan handles services that have more
than one daemon (*if* he does) to get an idea (i.e., nfs has rpc.mountd,
nfsd, rpc.quotad). My first impression would be to say that 'status'
should report if one or more devices are *not* primary and mounted and
'running' if all is well.

Cheers,
Dan

--
Dan Yocum, Sr. Linux Consultant
Linuxcare, Inc.
630.697.8066 tel
yocum@example.com, http://www.linuxcare.com

Linuxcare. Support for the revolution.

Re: Configuration issue with DRBD + datadisk [ In reply to ]

thomasm at example

Sep 28, 2000, 8:21 AM

Post #14 of 43 (6635 views)

Dan Yocum wrote:

> > /etc/ha.d/resource.d/datadisk [device] {start|stop|status}
> > used by heartbeat to switch a/all drbd's devices roles (PRI/SEC).
> > I think this one should also handle the status command.
> > Probabely it should return "running" if the device is in primary
> > mode and "ready" if the device is in secondary mode.
> > If this device is seconday and still syncing it should return
> > "not ready".
> > It's a bit difficult to define datadisk's behaviour if it's
> > called without device and with the "status" command.
> > (Calling with out device usually affects all devices, but
> > what's the status of all devices ? )

Here are the new return case of status :
Running : Master and Mounted
Ready : Slave
Failed : Not DRBD loaded + Master and not mounted

I hope it is now fine for you ..

Thomas

Re: Configuration issue with DRBD + datadisk [ In reply to ]

thomasm at example

Sep 29, 2000, 3:53 AM

Post #15 of 43 (6634 views)

Dan Yocum wrote:

> Did you mean to take the runForAll routine out of the new datadisk?

The runForAll was only here as I badly cut - paste - delete ...
It was removed for datadisk as its place is in drbdc.

Thomas

Re: Configuration issue with DRBD + datadisk [ In reply to ]

yocum at example

Sep 29, 2000, 8:28 AM

Post #16 of 43 (6637 views)

Thomas Mangin wrote:
>
> Dan Yocum wrote:
>
> > > /etc/ha.d/resource.d/datadisk [device] {start|stop|status}
> > > used by heartbeat to switch a/all drbd's devices roles (PRI/SEC).
> > > I think this one should also handle the status command.
> > > Probabely it should return "running" if the device is in primary
> > > mode and "ready" if the device is in secondary mode.
> > > If this device is seconday and still syncing it should return
> > > "not ready".
> > > It's a bit difficult to define datadisk's behaviour if it's
> > > called without device and with the "status" command.
> > > (Calling with out device usually affects all devices, but
> > > what's the status of all devices ? )
>
> Here are the new return case of status :
> Running : Master and Mounted
> Ready : Slave
> Failed : Not DRBD loaded + Master and not mounted
>
> I hope it is now fine for you ..

Mostly. There are instances when the the device is unconfigured and
will still come back as a "Secondary" device and then report the "Ready"
which is clearly is not. E.g.:

# cat /proc/drbd
version : 58

0: cs:Unconfigured st:Secondary/Unknown ns:0 nr:0 dw:0 dr:0 of:0
1: cs:Connected st:Secondary/Secondary ns:0 nr:0 dw:0 dr:0 of:0

Put another check in there for "Unconfigured" and "Fail" on that as
well.

Cheers,
Dan

--
Dan Yocum, Sr. Linux Consultant
Linuxcare, Inc.
630.697.8066 tel
yocum@example.com, http://www.linuxcare.com

Linuxcare. Support for the revolution.

Re: Configuration issue with DRBD + datadisk [ In reply to ]

thomasm at example

Sep 29, 2000, 9:18 AM

Post #17 of 43 (6636 views)

> > Mostly. There are instances when the the device is unconfigured and
> > will still come back as a "Secondary" device and then report the
> "Ready"
> > which is clearly is not. E.g.:
> >
> > # cat /proc/drbd
> > version : 58
> >
> > 0: cs:Unconfigured st:Secondary/Unknown ns:0 nr:0 dw:0 dr:0 of:0
> > 1: cs:Connected st:Secondary/Secondary ns:0 nr:0 dw:0 dr:0 of:0
> >
> > Put another check in there for "Unconfigured" and "Fail" on that as
> > well.
> >
> > Cheers,
> > Dan
>
> You mean if you so a global datadisk status when more than drbd is setup
> you
> can have a valid report when not.
> Or do you mean that Secondary/Secondary is invalid and must not be
> reported
> as Ready
> Or both ..
>
> I am a little bit lost. I will try to have a look into heartbeat code
> later
> on.
> But if you can explain me what is wrong it would help me
>
> Thank you
>
> Thomas

Re: Configuration issue with DRBD + datadisk [ In reply to ]

Sep 29, 2000, 11:26 AM

Post #18 of 43 (6639 views)

On Thu, Sep 28, 2000 at 09:21:44AM +0100, Thomas Mangin wrote:
> Dan Yocum wrote:
>
> > OK guys, heartbeat (actually ResourceManager in heartbeat) *needs* to
> > be able to issue these three commands to drbdc:
> >
> > start
> > stop
> > status
> >
> > And when it does a 'status' it *needs* to see either 'running' or
> > 'Running' in the output if the service is available (i.e., if the volume
> > is mounted and Primary) not the contents of /proc/drbd.
> >
> > Don't kill the messenger. ;)
> >
> > Cheers,
> > Dan
>
> Start -> master
> Stop -> slave
> Status -> Running if Master only

This is very similar to what FailSafe needs. Right now we have Failsafe
resource scripts that call datadisk. They assume that the module is loaded
and devices configured at boot time and that "start" makes us the master,
and "stop" makes us the slave.

I guess I should point out that heartbeat is not the only clustermanager
out there, so when we change the interface to drbd we risk breaking
something. Perhaps keeping the old version available would help.

-dg

--
David Gould dg@example.com
SuSE, Inc., 580 2cd St. #210, Oakland, CA 94607 510.628.3380
"So many ways to skin a cat, and still everyone uses a great big knife."

Re: Configuration issue with DRBD + datadisk [ In reply to ]

philipp at example

Sep 30, 2000, 12:04 AM

Post #19 of 43 (6640 views)

Am Mit, 27 Sep 2000 schrieb Thomas Mangin:
>> So, If I'm right your proposal is to configure both nodes as slave ? I
>> think that a good solution could be the following :
>
>> 1.- If the node starts up and can't find any other node, then is configured
>> as master.
>
>> 2.- If the node starts up and can find another node configured as master,
>> then is configured as slave.
>
>> 3.- If the node starts up and can find another node configured as slave,
>> then is configured as master.
>
>I have a set of personal (firm) script which does that. But as I have no way
>to proble drbd on a server I must run a user level server which will try to get
>
>/proc/drbd ..
>
>Phlipp can you see a simple way to query remotely a host if it is running drbd.

Thomas,
put the local device into Secondary state, connect and look at the output of
/proc/drbd.

version : 57

0: cs:Connected st:Primary/Secondary ns:741442 nr:0 dw:20657 dr:22095 of:0
1: cs:Connected st:Secondary/Primary ns:262 nr:18955 dw:19222 dr:17348 of:0
2: cs:Connected st:Primary/Secondary ns:2942 nr:0 dw:3063 dr:33674 of:0
^^^ ^^^
local state / remote state

>"a la ping" or have we to rely on a userland server like I am doing ?
>
>It can be an idea to assign another port to drbd use to remote query the /proc
>entry. I can even do that. it may be usefull for HA which may want to remote
>check
>
>Thomas

Re: Configuration issue with DRBD + datadisk [ In reply to ]

philipp at example

Sep 30, 2000, 12:12 AM

Post #20 of 43 (6640 views)

Am Mit, 27 Sep 2000 schrieb Dan Yocum:
>OK guys, heartbeat (actually ResourceManager in heartbeat) *needs* to
>be able to issue these three commands to drbdc:
>
>start
>stop
>status
>
>And when it does a 'status' it *needs* to see either 'running' or
>'Running' in the output if the service is available (i.e., if the volume
>is mounted and Primary) not the contents of /proc/drbd.
>
>Don't kill the messenger. ;)
>
>Cheers,
>Dan

Hi,

It's great that heartbeat is getting a ResourceManager.
The worst problem with combining DRBD and Heartbeat is, that
DRBD might be in the progress of resynchronisationing of disks
after a crash, and Heartbeat is trying to migrate a service
during the resynchronisation process.

My suggestion:
ResourceManager should call "datadisk /dev/nbX status" before
trying to migrate a service. If the output contains something
like "not ready" it will not migrate, and will retry in let's
say 5 minutes.

An other solution would be to call "datadisk /dev/nbX waitTillReady"
before migrating a service. Every other service (httpd,ftpd) would
implement this by returning immediately, but DRBD could wait until the
resynchronisation is finished befor returning.
(A httpd could use this command to wait until it has no open connection,
thus no single client would notice the migration of the service)

Now we are calling /etc/init.d/drbd in the boot process before calling
/etc/init.d/heartbeat. If a resynchronisation is necessary we are
blocking the boot process until it's finished. This is not an optimal
solution.

-Philipp

Re: Configuration issue with DRBD + datadisk [ In reply to ]

philipp at example

Sep 30, 2000, 12:50 AM

Post #21 of 43 (6634 views)

Am Don, 28 Sep 2000 schrieb Thomas Mangin:
>Philipp Reisner wrote:
>
>>
>> /etc/ha.d/resource.d/datadisk [device] {start|stop|status}
>> used by heartbeat to switch a/all drbd's devices roles (PRI/SEC).
>> I think this one should also handle the status command.
>> Probabely it should return "running" if the device is in primary
>> mode and "ready" if the device is in secondary mode.
>> If this device is seconday and still syncing it should return
>> "not ready".
>
>Can you give me more infos.
>
>You want no ready if the master or the slave are syncing
>or only if the slave is syncing ??
>

I am thinking:

Primary, syncing ==> Running (Since the FS is mounted on top of it)
Secondary, connected ==> Stopped (Since it is possible to make this primary)
Secondary, syncing ==> "not ready" (Since it is not possible to make this
primary and to mount the FS on it, because
during the resync the FS on the device
is not in consistent state)

-Philipp

Re: Configuration issue with DRBD + datadisk [ In reply to ]

philipp at example

Sep 30, 2000, 1:12 AM

Post #22 of 43 (6634 views)

Am Fre, 29 Sep 2000 schrieb David Gould:
>On Thu, Sep 28, 2000 at 09:21:44AM +0100, Thomas Mangin wrote:
>> Dan Yocum wrote:
>>
>> > OK guys, heartbeat (actually ResourceManager in heartbeat) *needs* to
>> > be able to issue these three commands to drbdc:
>> >
>> > start
>> > stop
>> > status
>> >
>> > And when it does a 'status' it *needs* to see either 'running' or
>> > 'Running' in the output if the service is available (i.e., if the volume
>> > is mounted and Primary) not the contents of /proc/drbd.
>> >
>> > Don't kill the messenger. ;)
>> >
>> > Cheers,
>> > Dan
>>
>> Start -> master
>> Stop -> slave
>> Status -> Running if Master only
>
>This is very similar to what FailSafe needs. Right now we have Failsafe
>resource scripts that call datadisk. They assume that the module is loaded
>and devices configured at boot time and that "start" makes us the master,
>and "stop" makes us the slave.
>
>I guess I should point out that heartbeat is not the only clustermanager
>out there, so when we change the interface to drbd we risk breaking
>something. Perhaps keeping the old version available would help.
>
>-dg

What's about contributing the drbd-glue-to-failsave scripts to drbd.
As I understand it FailSave is a rather stable product (or at least it's
interfaces I think) whild DRBD is currently a moving target.

-Philipp

Re: Configuration issue with DRBD + datadisk [ In reply to ]

thomas.mangin at example

Sep 30, 2000, 4:17 AM

Post #23 of 43 (6633 views)

> > Start -> master
> > Stop -> slave
> > Status -> Running if Master only
>
> This is very similar to what FailSafe needs. Right now we have Failsafe
> resource scripts that call datadisk. They assume that the module is loaded
> and devices configured at boot time and that "start" makes us the master,
> and "stop" makes us the slave.
>
> I guess I should point out that heartbeat is not the only clustermanager
> out there, so when we change the interface to drbd we risk breaking
> something. Perhaps keeping the old version available would help.

I am planning to add an configuration value to change the behaviouring of the
script
to be able to feet several needs.
Heartbeat and standalone use are now my two short term objective, but I am open
to
any other suggestion.

Thomas

Re: Configuration issue with DRBD + datadisk [ In reply to ]

thomas.mangin at example

Sep 30, 2000, 4:22 AM

Post #24 of 43 (6621 views)

Philipp Reisner wrote:

> Am Mit, 27 Sep 2000 schrieb Thomas Mangin:
> >> So, If I'm right your proposal is to configure both nodes as slave ? I
> >> think that a good solution could be the following :
> >
> >> 1.- If the node starts up and can't find any other node, then is configured
> >> as master.
> >
> >> 2.- If the node starts up and can find another node configured as master,
> >> then is configured as slave.
> >
> >> 3.- If the node starts up and can find another node configured as slave,
> >> then is configured as master.
> >
> >I have a set of personal (firm) script which does that. But as I have no way
> >to proble drbd on a server I must run a user level server which will try to get
> >
> >/proc/drbd ..
> >
> >Phlipp can you see a simple way to query remotely a host if it is running drbd.
>
> Thomas,
> put the local device into Secondary state, connect and look at the output of
> /proc/drbd.
>
> version : 57
>
> 0: cs:Connected st:Primary/Secondary ns:741442 nr:0 dw:20657 dr:22095 of:0
> 1: cs:Connected st:Secondary/Primary ns:262 nr:18955 dw:19222 dr:17348 of:0
> 2: cs:Connected st:Primary/Secondary ns:2942 nr:0 dw:3063 dr:33674 of:0
> ^^^ ^^^
> local state / remote state
>
> >"a la ping" or have we to rely on a userland server like I am doing ?
> >
> >It can be an idea to assign another port to drbd use to remote query the /proc
> >entry. I can even do that. it may be usefull for HA which may want to remote
> >check
> >
> >Thomas

Fine with me, will do.

Re: Configuration issue with DRBD + datadisk [ In reply to ]

thomas.mangin at example

Sep 30, 2000, 4:44 AM

Post #25 of 43 (6631 views)

Philipp Reisner wrote:

> Am Mit, 27 Sep 2000 schrieb Thomas Mangin:
> >> So, If I'm right your proposal is to configure both nodes as slave ? I
> >> think that a good solution could be the following :
> >
> >> 1.- If the node starts up and can't find any other node, then is configured
> >> as master.
> >
> >> 2.- If the node starts up and can find another node configured as master,
> >> then is configured as slave.
> >
> >> 3.- If the node starts up and can find another node configured as slave,
> >> then is configured as master.
> >
> >I have a set of personal (firm) script which does that. But as I have no way
> >to proble drbd on a server I must run a user level server which will try to get
> >
> >/proc/drbd ..
> >
> >Phlipp can you see a simple way to query remotely a host if it is running drbd.
>
> Thomas,
> put the local device into Secondary state, connect and look at the output of
> /proc/drbd.
>
> version : 57
>
> 0: cs:Connected st:Primary/Secondary ns:741442 nr:0 dw:20657 dr:22095 of:0
> 1: cs:Connected st:Secondary/Primary ns:262 nr:18955 dw:19222 dr:17348 of:0
> 2: cs:Connected st:Primary/Secondary ns:2942 nr:0 dw:3063 dr:33674 of:0
> ^^^ ^^^
> local state / remote state
>

Got it I will add the needed function to finally parse this "file" properly ..

Thomas