Mailing List Archive: Another Heartbeat Interface issue

Another Heartbeat Interface issue

Oct 12, 2001, 3:12 PM

Post #1 of 8 (1768 views)

As far as I understand the code Primary DRBD node will complete
Disk Write even if "drbd_send_data()" fails to forward the request
to the Secondary DRBD node. Let's say this situation happened
not because the Secondary node died, but because it was just
temporarily overloaded with traffic. If shortly after that
Primary node dies and Heartbeat initiates a failover Disk Write
operation will be lost while the client originating this Write
would think that it has successfully completed.

I think when DRBD is used with protocol C and the Heartbeat
it should not complete Disk Write until either both nodes
completed this Write or Heartbeat notified DRBD that the
Secondary node went down. That would require reestablishing
TCP connections and retransmitting requests in case of
send failures.

Thanks,

-Ed

Re: Another Heartbeat Interface issue [ In reply to ]

philipp.reisner at example

Oct 13, 2001, 1:04 PM

Post #2 of 8 (1761 views)

Permalink

I think the solution should be to fix DRBD to not drop the
connection when something is temporarily overloaded.

Actually I have dropped the complete postpone packet related
and replaced it with a new design last week.

Now there is a second TCP connection, called msock (m stands for meta),
it is used for echanging ping packes (and in the future also ack-packets
will go over the msock (as soon as we have GFS support) ).

The asender-thread is resposible for sending/receiving packets via the
msock. -- The asender never sleeps without receiving from the socket,
in contrast to the drbdd-thread.

All the places that caused a timeout and dropped the connection, simply
reqeust the exchange of a ping packet now. If the other side fails to
answer the ping then the connection is dropped.

First tests show that it works!! If you secondary's IO subsystem is
under preassure, and thre drbdd-thread sleeps a lot in getblk/
wait_on_buffer/ll_rw_block DRBD does not dropp the connection due to
a timeout.

I will release -pre4 early next week....

-Philipp

* Guzovsky, Eduard <EGuzovsky@example.com> [011013 00:12]:
> As far as I understand the code Primary DRBD node will complete
> Disk Write even if "drbd_send_data()" fails to forward the request
> to the Secondary DRBD node. Let's say this situation happened
> not because the Secondary node died, but because it was just
> temporarily overloaded with traffic. If shortly after that
> Primary node dies and Heartbeat initiates a failover Disk Write
> operation will be lost while the client originating this Write
> would think that it has successfully completed.
>
> I think when DRBD is used with protocol C and the Heartbeat
> it should not complete Disk Write until either both nodes
> completed this Write or Heartbeat notified DRBD that the
> Secondary node went down. That would require reestablishing
> TCP connections and retransmitting requests in case of
> send failures.
>
> Thanks,
>
> -Ed
>
>

Re: Another Heartbeat Interface issue [ In reply to ]

lmb at example

Oct 14, 2001, 2:13 AM

Post #3 of 8 (1760 views)

Permalink

On 2001-10-13T22:04:49,
Philipp Reisner <philipp.reisner@example.com> said:

> Now there is a second TCP connection, called msock (m stands for meta),
> it is used for echanging ping packes (and in the future also ack-packets
> will go over the msock (as soon as we have GFS support) ).

Hi Philipp, without having looked at the code ;), is using TCP/IP for
"heartbeats" a good choice? Given all the potential stalls etc the TCP/IP
flow-control might incur.

Maybe a better choice would be to modularize this; what you are essentially
doing is a "cluster membership layer", and heartbeat, FailSafe as well as GFS
have this already, and if working on top of one of these solutions, it makes
much sense to use the same cluster membership.

(Of course, I am also hoping that this might make it easier one day to
replicate between more than 2 nodes ;)

Sincerely,
Lars Marowsky-Brée <lmb@example.com>

--
Perfection is our goal, excellence will be tolerated. -- J. Yahl

RE: Another Heartbeat Interface issue [ In reply to ]

EGuzovsky at example

Oct 15, 2001, 10:45 AM

Post #4 of 8 (1773 views)

Permalink

> -----Original Message-----
> From: Philipp Reisner [mailto:philipp.reisner@example.com]
> Sent: Saturday, October 13, 2001 4:05 PM
> To: Guzovsky, Eduard
> Cc: drbd-devel@example.com
> Subject: Re: [DRBD-dev] Another Heartbeat Interface issue
>
>
> I think the solution should be to fix DRBD to not drop the
> connection when something is temporarily overloaded.
>

Theoretically there is still the case when TCP connection
between 2 nodes stalls while heartbeat does not see anything
wrong with the node. DRBD should keep trying to reconnect
and retransmit failed requests. Write request should not
be completed until it is written to both disks or
heartbeat notified DRBD that secondary node is down. I am
not very familiar with the heartbeat implementation. It would
be nice if DRBD could give the heartbeat a hint that something
is wrong, so that the heartbeat can bring the secondary node
down and then notify DRBD.

> Actually I have dropped the complete postpone packet related
> and replaced it with a new design last week.
>
> Now there is a second TCP connection, called msock (m stands
> for meta),
> it is used for echanging ping packes (and in the future also
> ack-packets
> will go over the msock (as soon as we have GFS support) ).
>
> The asender-thread is resposible for sending/receiving
> packets via the
> msock. -- The asender never sleeps without receiving from the socket,
> in contrast to the drbdd-thread.
>
> All the places that caused a timeout and dropped the
> connection, simply
> reqeust the exchange of a ping packet now. If the other side
> fails to
> answer the ping then the connection is dropped.
>

Instead of opening another TCP connection for pings, you might consider
turning on KEEPALIVES on the first TCP connection.

Thanks,

-Ed

> First tests show that it works!! If you secondary's IO subsystem is
> under preassure, and thre drbdd-thread sleeps a lot in getblk/
> wait_on_buffer/ll_rw_block DRBD does not dropp the connection due to
> a timeout.
>
> I will release -pre4 early next week....
>
> -Philipp
>
>
> * Guzovsky, Eduard <EGuzovsky@example.com> [011013 00:12]:
> > As far as I understand the code Primary DRBD node will complete
> > Disk Write even if "drbd_send_data()" fails to forward the request
> > to the Secondary DRBD node. Let's say this situation happened
> > not because the Secondary node died, but because it was just
> > temporarily overloaded with traffic. If shortly after that
> > Primary node dies and Heartbeat initiates a failover Disk Write
> > operation will be lost while the client originating this Write
> > would think that it has successfully completed.
> >
> > I think when DRBD is used with protocol C and the Heartbeat
> > it should not complete Disk Write until either both nodes
> > completed this Write or Heartbeat notified DRBD that the
> > Secondary node went down. That would require reestablishing
> > TCP connections and retransmitting requests in case of
> > send failures.
> >
> > Thanks,
> >
> > -Ed
> >
> >
>
>
> _______________________________________________
> DRBD-devel mailing list
> DRBD-devel@example.com
> https://lists.sourceforge.net/lists/listinfo/drbd-devel
>

Re: Another Heartbeat Interface issue [ In reply to ]

philipp.reisner at example

Oct 15, 2001, 11:25 AM

Post #5 of 8 (1763 views)

Permalink

* Lars Marowsky-Bree <lmb@example.com> [011014 11:13]:
> On 2001-10-13T22:04:49,
> Philipp Reisner <philipp.reisner@example.com> said:
>
> > Now there is a second TCP connection, called msock (m stands for meta),
> > it is used for echanging ping packes (and in the future also ack-packets
> > will go over the msock (as soon as we have GFS support) ).
>
> Hi Philipp, without having looked at the code ;), is using TCP/IP for
> "heartbeats" a good choice? Given all the potential stalls etc the TCP/IP
> flow-control might incur.
>
> Maybe a better choice would be to modularize this; what you are essentially
> doing is a "cluster membership layer", and heartbeat, FailSafe as well as GFS
> have this already, and if working on top of one of these solutions, it makes
> much sense to use the same cluster membership.

Well, if there is such a "cluster memberhsip layer" already I am looking
forward to use it. -- If it fits DRBD's needs. DRBD need to have it's
view of the cluster to get things like the meta-data management right...

> (Of course, I am also hoping that this might make it easier one day to
> replicate between more than 2 nodes ;)

GFS has higher priority than more than 2 nodes... concerning the long
time goals.

-Philipp

Re: Another Heartbeat Interface issue [ In reply to ]

philipp.reisner at example

Oct 15, 2001, 12:20 PM

Post #6 of 8 (1772 views)

Permalink

* Guzovsky, Eduard <EGuzovsky@example.com> [011015 19:45]:
>
>
> > -----Original Message-----
> > From: Philipp Reisner [mailto:philipp.reisner@example.com]
> > Sent: Saturday, October 13, 2001 4:05 PM
> > To: Guzovsky, Eduard
> > Cc: drbd-devel@example.com
> > Subject: Re: [DRBD-dev] Another Heartbeat Interface issue
> >
> >
> > I think the solution should be to fix DRBD to not drop the
> > connection when something is temporarily overloaded.
> >
>
> Theoretically there is still the case when TCP connection
> between 2 nodes stalls while heartbeat does not see anything
> wrong with the node. DRBD should keep trying to reconnect
> and retransmit failed requests. Write request should not
> be completed until it is written to both disks or
> heartbeat notified DRBD that secondary node is down. I am
> not very familiar with the heartbeat implementation. It would
> be nice if DRBD could give the heartbeat a hint that something
> is wrong, so that the heartbeat can bring the secondary node
> down and then notify DRBD.

This suggests something like Lars' "common cluster membership layer",
which I will use if it is possible. -- But I do not feel like stepping
forward to do it. DRBD is enough to keep me bussy.

> > Actually I have dropped the complete postpone packet related
> > and replaced it with a new design last week.
> >
> > Now there is a second TCP connection, called msock (m stands
> > for meta),
> > it is used for echanging ping packes (and in the future also
> > ack-packets
> > will go over the msock (as soon as we have GFS support) ).
> >
> > The asender-thread is resposible for sending/receiving
> > packets via the
> > msock. -- The asender never sleeps without receiving from the socket,
> > in contrast to the drbdd-thread.
> >
> > All the places that caused a timeout and dropped the
> > connection, simply
> > reqeust the exchange of a ping packet now. If the other side
> > fails to
> > answer the ping then the connection is dropped.
> >
>
> Instead of opening another TCP connection for pings, you might consider
> turning on KEEPALIVES on the first TCP connection.
>
> Thanks,
>
> -Ed
>
> > First tests show that it works!! If you secondary's IO subsystem is
> > under preassure, and thre drbdd-thread sleeps a lot in getblk/
> > wait_on_buffer/ll_rw_block DRBD does not dropp the connection due to
> > a timeout.
> >
> > I will release -pre4 early next week....
> >
> > -Philipp
> >
> >
> > * Guzovsky, Eduard <EGuzovsky@example.com> [011013 00:12]:
> > > As far as I understand the code Primary DRBD node will complete
> > > Disk Write even if "drbd_send_data()" fails to forward the request
> > > to the Secondary DRBD node. Let's say this situation happened
> > > not because the Secondary node died, but because it was just
> > > temporarily overloaded with traffic. If shortly after that
> > > Primary node dies and Heartbeat initiates a failover Disk Write
> > > operation will be lost while the client originating this Write
> > > would think that it has successfully completed.
> > >
> > > I think when DRBD is used with protocol C and the Heartbeat
> > > it should not complete Disk Write until either both nodes
> > > completed this Write or Heartbeat notified DRBD that the
> > > Secondary node went down. That would require reestablishing
> > > TCP connections and retransmitting requests in case of
> > > send failures.
> > >
> > > Thanks,
> > >
> > > -Ed
> > >
> > >
> >
> >
> > _______________________________________________
> > DRBD-devel mailing list
> > DRBD-devel@example.com
> > https://lists.sourceforge.net/lists/listinfo/drbd-devel
> >

Re: Another Heartbeat Interface issue [ In reply to ]

lmb at example

Oct 18, 2001, 2:05 AM

Post #7 of 8 (1759 views)

Permalink

On 2001-10-15T20:25:07,
Philipp Reisner <philipp.reisner@example.com> said:

> Well, if there is such a "cluster memberhsip layer" already I am looking
> forward to use it. -- If it fits DRBD's needs. DRBD need to have it's
> view of the cluster to get things like the meta-data management right...

Yes, exactly. And I bet it saves work in the long run to use the available
options instead of reinventing the wheel.

> > (Of course, I am also hoping that this might make it easier one day to
> > replicate between more than 2 nodes ;)
> GFS has higher priority than more than 2 nodes... concerning the long
> time goals.

Well, GFS also has such a membership layer; using GFS and implementing a
_different_ view of the cluster in parallel to GFS is surely asking for
complex and obscure races to happen; there is a reason why FailSafe and GFS as
part of the same cluster probably need some integration.

Sincerely,
Lars Marowsky-Brée <lmb@example.com>

--
Perfection is our goal, excellence will be tolerated. -- J. Yahl

Re: Another Heartbeat Interface issue [ In reply to ]

hch at example

Oct 18, 2001, 2:27 AM

Post #8 of 8 (1759 views)

Permalink

On Thu, Oct 18, 2001 at 11:05:56AM +0200, Lars Marowsky-Bree wrote:
> Well, GFS also has such a membership layer; using GFS and implementing a
> _different_ view of the cluster in parallel to GFS is surely asking for
> complex and obscure races to happen; there is a reason why FailSafe and GFS as
> part of the same cluster probably need some integration.

<OpenGFS commiter hat on>

For OpenGFS we plan to get rid of all non-core functionality in the long
term. If we can use a different membership layer (e.g. FailSafe) we
will probably do that. I don't have looked into FailSafe yet as I have to
fix more than enough bugs for the 0.1 release, but my suggestion for drbd
is to go for FailSafe, not (Open-)GFS services.

Christoph

--
Of course it doesn't work. We've performed a software upgrade.