> -----Original Message-----
> From: Philipp Reisner [mailto:philipp.reisner@example.com]
> Sent: Saturday, October 13, 2001 4:05 PM
> To: Guzovsky, Eduard
> Cc: drbd-devel@example.com
> Subject: Re: [DRBD-dev] Another Heartbeat Interface issue
>
>
> I think the solution should be to fix DRBD to not drop the
> connection when something is temporarily overloaded.
>
Theoretically there is still the case when TCP connection
between 2 nodes stalls while heartbeat does not see anything
wrong with the node. DRBD should keep trying to reconnect
and retransmit failed requests. Write request should not
be completed until it is written to both disks or
heartbeat notified DRBD that secondary node is down. I am
not very familiar with the heartbeat implementation. It would
be nice if DRBD could give the heartbeat a hint that something
is wrong, so that the heartbeat can bring the secondary node
down and then notify DRBD.
> Actually I have dropped the complete postpone packet related
> and replaced it with a new design last week.
>
> Now there is a second TCP connection, called msock (m stands
> for meta),
> it is used for echanging ping packes (and in the future also
> ack-packets
> will go over the msock (as soon as we have GFS support) ).
>
> The asender-thread is resposible for sending/receiving
> packets via the
> msock. -- The asender never sleeps without receiving from the socket,
> in contrast to the drbdd-thread.
>
> All the places that caused a timeout and dropped the
> connection, simply
> reqeust the exchange of a ping packet now. If the other side
> fails to
> answer the ping then the connection is dropped.
>
Instead of opening another TCP connection for pings, you might consider
turning on KEEPALIVES on the first TCP connection.
Thanks,
-Ed
> First tests show that it works!! If you secondary's IO subsystem is
> under preassure, and thre drbdd-thread sleeps a lot in getblk/
> wait_on_buffer/ll_rw_block DRBD does not dropp the connection due to
> a timeout.
>
> I will release -pre4 early next week....
>
> -Philipp
>
>
> * Guzovsky, Eduard <EGuzovsky@example.com> [011013 00:12]:
> > As far as I understand the code Primary DRBD node will complete
> > Disk Write even if "drbd_send_data()" fails to forward the request
> > to the Secondary DRBD node. Let's say this situation happened
> > not because the Secondary node died, but because it was just
> > temporarily overloaded with traffic. If shortly after that
> > Primary node dies and Heartbeat initiates a failover Disk Write
> > operation will be lost while the client originating this Write
> > would think that it has successfully completed.
> >
> > I think when DRBD is used with protocol C and the Heartbeat
> > it should not complete Disk Write until either both nodes
> > completed this Write or Heartbeat notified DRBD that the
> > Secondary node went down. That would require reestablishing
> > TCP connections and retransmitting requests in case of
> > send failures.
> >
> > Thanks,
> >
> > -Ed
> >
> >
>
>
> _______________________________________________
> DRBD-devel mailing list
> DRBD-devel@example.com
> https://lists.sourceforge.net/lists/listinfo/drbd-devel
>