Mailing List Archive

Meta-data and Heartbeat Interface
I've started playing around with the latest DRBD. The meta-data stuff
is great, it cleanly addresses my major concern with the older DRBD.
Well done!

I do have some questions about how this interfaces with heartbeat.
When two nodes are coming up, DRBD negotiates its states to determine
which one has the latest data. That node is made primary, and it
initiates a quick-sync.

When heartbeat comes up on the node where DRBD is already primary, it
complains:

2001/10/11_11:11:12 ERROR: Resource datadisk::drbd0 is active, and
should not be.
2001/10/11_11:11:12 WARNING: Non-idle resources will affect resource
takeback

Heartbeat doesn't run the "datadisk start" script because datadisk
reported that it was already "running". So even though DRBD is
primary, the file system is never mounted above it.

Things get even worse if heartbeat decides to make the node where DRBD
is secondary the primary node in the cluster:

kernel: drbd0: incompatible states

Now one of the nodes is "WFConnection st:Primary/Unknown" and the
other is "StandAlone st:Primary/Unknown".

It seems to me that heartbeat should be the sole entity to make a DRBD
node primary. I wonder if it would be better to have DRBD do it's
meta-data negotiation, decide who should be primary, do the quick-sync
and then *go back to secondary* before allowing heartbeat to start.
Once the disks are quick-synced, then heartbeat can make whichever node
it wants primary.

Is this reasonable? Am I missing something?

Thanks (my configuration is below).

Tony.


Two RH 6.1 systems.
DRBD: version: 0.6.1-pre3 (api:58/proto:58).
reiserfs over DRBD.
Heartbeat: version 0.4.9.0d.

drbd.conf:

resource drbd0 {

protocol=B
fsckcmd=fsck -p -y

disk {
do-panic
disk-size=2048728
}

net {
sync-rate=5000
skip-sync
tl-size=256
timeout=60
connect-int=10
ping-int=10
}

on cuda1 {
device=/dev/nb0
disk=/dev/hda5
address=192.0.2.1
port=7788
}

on cuda2 {
device=/dev/nb0
disk=/dev/hda5
address=192.0.2.2
port=7788
}
}

--
Tony Willoughby tonyw@example.com

"I don't want knowledge, I want certainty"
-David Bowie
Re: Meta-data and Heartbeat Interface [ In reply to ]
* Tony Willoughby <tonyw@example.com> [011011 17:37]:
>
> I've started playing around with the latest DRBD. The meta-data stuff
> is great, it cleanly addresses my major concern with the older DRBD.
> Well done!
>
> I do have some questions about how this interfaces with heartbeat.
> When two nodes are coming up, DRBD negotiates its states to determine
> which one has the latest data. That node is made primary, and it
> initiates a quick-sync.
>
> When heartbeat comes up on the node where DRBD is already primary, it
> complains:
>
> 2001/10/11_11:11:12 ERROR: Resource datadisk::drbd0 is active, and
> should not be.
> 2001/10/11_11:11:12 WARNING: Non-idle resources will affect resource
> takeback
>
> Heartbeat doesn't run the "datadisk start" script because datadisk
> reported that it was already "running". So even though DRBD is
> primary, the file system is never mounted above it.
>
> Things get even worse if heartbeat decides to make the node where DRBD
> is secondary the primary node in the cluster:
>
> kernel: drbd0: incompatible states
>
> Now one of the nodes is "WFConnection st:Primary/Unknown" and the
> other is "StandAlone st:Primary/Unknown".
>
> It seems to me that heartbeat should be the sole entity to make a DRBD
> node primary. I wonder if it would be better to have DRBD do it's
> meta-data negotiation, decide who should be primary, do the quick-sync
> and then *go back to secondary* before allowing heartbeat to start.
> Once the disks are quick-synced, then heartbeat can make whichever node
> it wants primary.

I think, it should allow heartbeat to start as soon as possible on one
node. (Since we are doing all this to get high availability of the service...)

I do not know the mentioned heartbeat release (0.4.9.0d). The last heartbeat
I used (I think it was 0.4.9) only checked the status of the first
resource of a resource group. (I always put the service IP address as first
resource in the haresources file)

-----

What we could do on drbd's side:

Now datadisk drbdX status reports runnung as soon as the device is in
primary state, we could change that, so that it only reports running when
the filesystem is mounted. -- While I think about it, it seems to be very
reasonable.

-Philipp

> Is this reasonable? Am I missing something?
>
> Thanks (my configuration is below).
>
> Tony.
>
>
> Two RH 6.1 systems.
> DRBD: version: 0.6.1-pre3 (api:58/proto:58).
> reiserfs over DRBD.
> Heartbeat: version 0.4.9.0d.
>
> drbd.conf:
>
> resource drbd0 {
>
> protocol=B
> fsckcmd=fsck -p -y
>
> disk {
> do-panic
> disk-size=2048728
> }
>
> net {
> sync-rate=5000
> skip-sync
> tl-size=256
> timeout=60
> connect-int=10
> ping-int=10
> }
>
> on cuda1 {
> device=/dev/nb0
> disk=/dev/hda5
> address=192.0.2.1
> port=7788
> }
>
> on cuda2 {
> device=/dev/nb0
> disk=/dev/hda5
> address=192.0.2.2
> port=7788
> }
> }
>
> --
> Tony Willoughby tonyw@example.com
>
> "I don't want knowledge, I want certainty"
> -David Bowie
>
> _______________________________________________
> DRBD-devel mailing list
> DRBD-devel@example.com
> https://lists.sourceforge.net/lists/listinfo/drbd-devel
Re: Meta-data and Heartbeat Interface [ In reply to ]
On Friday 12 October 2001 02:55 am, Philipp Reisner wrote:
> * Tony Willoughby <tonyw@example.com> [011011 17:37]:
> > I've started playing around with the latest DRBD. The meta-data
> > stuff is great, it cleanly addresses my major concern with the
> > older DRBD. Well done!
> >
> > I do have some questions about how this interfaces with heartbeat.
> > When two nodes are coming up, DRBD negotiates its states to
> > determine which one has the latest data. That node is made
> > primary, and it initiates a quick-sync.
> >
> > When heartbeat comes up on the node where DRBD is already primary,
> > it complains:
> >
> > 2001/10/11_11:11:12 ERROR: Resource datadisk::drbd0 is active, and
> > should not be.
> > 2001/10/11_11:11:12 WARNING: Non-idle resources will affect
> > resource takeback
> >
> > Heartbeat doesn't run the "datadisk start" script because datadisk
> > reported that it was already "running". So even though DRBD is
> > primary, the file system is never mounted above it.
> >
> > Things get even worse if heartbeat decides to make the node where
> > DRBD is secondary the primary node in the cluster:
> >
> > kernel: drbd0: incompatible states
> >
> > Now one of the nodes is "WFConnection st:Primary/Unknown" and the
> > other is "StandAlone st:Primary/Unknown".
> >
> > It seems to me that heartbeat should be the sole entity to make a
> > DRBD node primary. I wonder if it would be better to have DRBD do
> > it's meta-data negotiation, decide who should be primary, do the
> > quick-sync and then *go back to secondary* before allowing
> > heartbeat to start. Once the disks are quick-synced, then heartbeat
> > can make whichever node it wants primary.
>
> I think, it should allow heartbeat to start as soon as possible on
> one node. (Since we are doing all this to get high availability of
> the service...)
>
> I do not know the mentioned heartbeat release (0.4.9.0d). The last
> heartbeat I used (I think it was 0.4.9) only checked the status of
> the first resource of a resource group. (I always put the service IP
> address as first resource in the haresources file)

For my testing, I had only one resource - DRBD. So, this is certainly
a work around for me.


>
> -----
>
> What we could do on drbd's side:
>
> Now datadisk drbdX status reports runnung as soon as the device is in
> primary state, we could change that, so that it only reports running
> when the filesystem is mounted. -- While I think about it, it seems
> to be very reasonable.

I guess I don't understand what's wrong with making the node
"secondary" after the sync. It can't take that long, can it?

Thanks!

>
> -Philipp
>
> > Is this reasonable? Am I missing something?
> >
> > Thanks (my configuration is below).
> >
> > Tony.
> >
> >
> > Two RH 6.1 systems.
> > DRBD: version: 0.6.1-pre3 (api:58/proto:58).
> > reiserfs over DRBD.
> > Heartbeat: version 0.4.9.0d.
> >
> > drbd.conf:
> >
> > resource drbd0 {
> >
> > protocol=B
> > fsckcmd=fsck -p -y
> >
> > disk {
> > do-panic
> > disk-size=2048728
> > }
> >
> > net {
> > sync-rate=5000
> > skip-sync
> > tl-size=256
> > timeout=60
> > connect-int=10
> > ping-int=10
> > }
> >
> > on cuda1 {
> > device=/dev/nb0
> > disk=/dev/hda5
> > address=192.0.2.1
> > port=7788
> > }
> >
> > on cuda2 {
> > device=/dev/nb0
> > disk=/dev/hda5
> > address=192.0.2.2
> > port=7788
> > }
> > }
> >
> > --
> > Tony Willoughby tonyw@example.com
> >
> > "I don't want knowledge, I want certainty"
> > -David Bowie
> >
> > _______________________________________________
> > DRBD-devel mailing list
> > DRBD-devel@example.com
> > https://lists.sourceforge.net/lists/listinfo/drbd-devel

--
Tony Willoughby tonyw@example.com
...I'd be a Libertarian, if they weren't all a bunch of
tax-dodging professional whiners.
-Berkeley Breathed