Mailing List Archive

drbdsetup wait_sync
Hello,
I think something does not work as it is mentionned in drbd docs.
Look:

1#node 1 : cat /proc/drbd
version: 0.6.1-pre4 (api:58/proto:59)
0: cs:SyncingAll st:Secondary/Primary ns:0 nr:8204 dw:8204 dr:0 pe:0
ua:2
1: cs:SyncingAll st:Secondary/Primary ns:0 nr:7580 dw:7580 dr:0 pe:0
ua:4

2#node 1 : drbdsetup /dev/nb1 wait_sync -t 10 (timeout 10 s )

3#node 2 : reset

waiting 10s ,20s, ....but command 3 never returns, as it is written in
the docs!
Bye..

--
Jean-Yves BOUET
EADS Defence and Security Networks
jean-yves.bouet@example.com
01 34 60 86 36
Re: drbdsetup wait_sync [ In reply to ]
--snip--
wait_sync -t connect_timeout

Returns as soon as the device leaves any synchronisation state and returns
into connected state.

This command will fail if the device stays for connect_timeout seconds in
unconnected state. The default value is 8 seconds. If it is set to 0, the
command will forever wait for a connection.
--snap--

Since your devices are alredy connected, it will return as soon as
synchronisation is done.

-Philipp

* Jean-Yves Bouet - 78636 <jean-yves.bouet@example.com> [011019 13:45]:
> Hello,
> I think something does not work as it is mentionned in drbd docs.
> Look:
>
> 1#node 1 : cat /proc/drbd
> version: 0.6.1-pre4 (api:58/proto:59)
> 0: cs:SyncingAll st:Secondary/Primary ns:0 nr:8204 dw:8204 dr:0 pe:0
> ua:2
> 1: cs:SyncingAll st:Secondary/Primary ns:0 nr:7580 dw:7580 dr:0 pe:0
> ua:4
>
> 2#node 1 : drbdsetup /dev/nb1 wait_sync -t 10 (timeout 10 s )
>
> 3#node 2 : reset
>
> waiting 10s ,20s, ....but command 3 never returns, as it is written in
> the docs!
> Bye..
>
> --
> Jean-Yves BOUET
> EADS Defence and Security Networks
> jean-yves.bouet@example.com
> 01 34 60 86 36
>
>
>
>
> _______________________________________________
> DRBD-devel mailing list
> DRBD-devel@example.com
> https://lists.sourceforge.net/lists/listinfo/drbd-devel
Re: drbdsetup wait_sync [ In reply to ]
Philipp Reisner wrote:

> --snip--
> wait_sync -t connect_timeout
>
> Returns as soon as the device leaves any synchronisation state and returns
> into connected state.
>
> This command will fail if the device stays for connect_timeout seconds in
> unconnected state. The default value is 8 seconds. If it is set to 0, the
> command will forever wait for a connection.
> --snap--
>
> Since your devices are alredy connected, it will return as soon as
> synchronisation is done.
>

Hello Philipp,
what if primary fails while syncing secondary (in drbd script) ? Secondary
remains blocked forever or is it possible to add a timeout?
Bye.

--
Jean-Yves BOUET
EADS Defence and Security Networks
jean-yves.bouet@example.com
01 34 60 86 36
Re: drbdsetup wait_sync [ In reply to ]
* Jean-Yves Bouet - 78636 <jean-yves.bouet@example.com> [011023 09:44]:
> Philipp Reisner wrote:
>
> > --snip--
> > wait_sync -t connect_timeout
> >
> > Returns as soon as the device leaves any synchronisation state and returns
> > into connected state.
> >
> > This command will fail if the device stays for connect_timeout seconds in
> > unconnected state. The default value is 8 seconds. If it is set to 0, the
> > command will forever wait for a connection.
> > --snap--
> >
> > Since your devices are alredy connected, it will return as soon as
> > synchronisation is done.
> >
>
> Hello Philipp,
> what if primary fails while syncing secondary (in drbd script) ? Secondary
> remains blocked forever or is it possible to add a timeout?
> Bye.

Ok, bug in documentation. It should read: "Returns as soon as the device
leaves any synchronisation state".

Thus, if the device goes into WFConnection state, all wait_sync command
sleeping on that device will return immediately.

-Philipp
Re: drbdsetup wait_sync [ In reply to ]
>
>
> Ok, bug in documentation. It should read: "Returns as soon as the device
> leaves any synchronisation state".
>
> Thus, if the device goes into WFConnection state, all wait_sync command
> sleeping on that device will return immediately.
>
> -Philipp

Hi,
This is not what i can actually observe:
#node1: cat /proc/drbd
version: 0.6.1-pre5 (api:58/proto:60)
0: cs:SyncingAll st:Secondary/Primary ns:0 nr:46104 dw:46104 dr:0 pe:0 ua:0
1: cs:SyncingAll st:Secondary/Primary ns:0 nr:45124 dw:45124 dr:0 pe:0 ua:0

#node2: reset

#node1: cat /proc/drbd
version: 0.6.1-pre5 (api:58/proto:60)

0: cs:WFConnection st:Secondary/Unknown ns:0 nr:46104 dw:46104 dr:0 pe:0 ua:0
-->State WFConnection
1: cs:WFConnection st:Secondary/Unknown ns:0 nr:45124 dw:45124 dr:0 pe:0 ua:0
-->State WFConnection

#node1: ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 1 14:11 ? 00:00:06 init
root 2 1 0 14:11 ? 00:00:00 [keventd]
root 3 1 0 14:11 ? 00:00:00 [kswapd]
root 4 1 0 14:11 ? 00:00:00 [kreclaimd]
root 5 1 0 14:11 ? 00:00:00 [bdflush]
root 6 1 0 14:11 ? 00:00:00 [kupdate]
root 8 1 0 14:12 ? 00:00:00 [kreiserfsd]
bin 50 1 0 14:12 ? 00:00:00 /sbin/portmap
root 81 1 0 14:12 ? 00:00:00 /bin/sh /etc/init.d/rc 3
root 85 1 0 14:12 ? 00:00:00 /sbin/syslogd
root 87 1 0 14:12 ? 00:00:00 /sbin/klogd
root 92 1 0 14:12 ? 00:00:00 /usr/sbin/inetd
root 95 81 0 14:12 ? 00:00:00 /usr/bin/perl -w
/etc/rc.d/rc3.d/S30drbd start
root 100 1 0 14:12 ? 00:00:02 [drbdd_0]
root 103 1 0 14:12 ? 00:00:02 [drbdd_1]
root 105 100 0 14:12 ? 00:00:01 [drbd_asender_0 <defunct>]
root 106 103 0 14:12 ? 00:00:00 [drbd_asender_1 <defunct>]
root 107 95 0 14:12 ? 00:00:00 /usr/bin/perl -w
/etc/rc.d/rc3.d/S30drbd start
root 108 95 0 14:12 ? 00:00:00 /usr/bin/perl -w
/etc/rc.d/rc3.d/S30drbd start
root 110 95 0 14:12 ? 00:00:00 /usr/bin/perl -w
/etc/rc.d/rc3.d/S30drbd start
root 112 110 0 14:12 ? 00:00:00 /usr/sbin/drbdsetup /dev/nb1
wait_sync -->Never returns..
root 113 108 0 14:12 ? 00:00:00 /usr/sbin/drbdsetup /dev/nb0
wait_sync -->Never returns..

5 min after:
#node1: ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 1 14:11 ? 00:00:06 init
root 2 1 0 14:11 ? 00:00:00 [keventd]
root 3 1 0 14:11 ? 00:00:00 [kswapd]
root 4 1 0 14:11 ? 00:00:00 [kreclaimd]
root 5 1 0 14:11 ? 00:00:00 [bdflush]
root 6 1 0 14:11 ? 00:00:00 [kupdate]
root 8 1 0 14:12 ? 00:00:00 [kreiserfsd]
bin 50 1 0 14:12 ? 00:00:00 /sbin/portmap
root 81 1 0 14:12 ? 00:00:00 /bin/sh /etc/init.d/rc 3
root 85 1 0 14:12 ? 00:00:00 /sbin/syslogd
root 87 1 0 14:12 ? 00:00:00 /sbin/klogd
root 92 1 0 14:12 ? 00:00:00 /usr/sbin/inetd
root 95 81 0 14:12 ? 00:00:00 /usr/bin/perl -w
/etc/rc.d/rc3.d/S30drbd start
root 100 1 0 14:12 ? 00:00:02 [drbdd_0]
root 103 1 0 14:12 ? 00:00:02 [drbdd_1]
root 105 100 0 14:12 ? 00:00:01 [drbd_asender_0 <defunct>]
root 106 103 0 14:12 ? 00:00:00 [drbd_asender_1 <defunct>]
root 107 95 0 14:12 ? 00:00:00 /usr/bin/perl -w
/etc/rc.d/rc3.d/S30drbd start
root 108 95 0 14:12 ? 00:00:00 /usr/bin/perl -w
/etc/rc.d/rc3.d/S30drbd start
root 110 95 0 14:12 ? 00:00:00 /usr/bin/perl -w
/etc/rc.d/rc3.d/S30drbd start
root 112 110 0 14:12 ? 00:00:00 /usr/sbin/drbdsetup /dev/nb1
wait_sync -->Always alive..
root 113 108 0 14:12 ? 00:00:00 /usr/sbin/drbdsetup /dev/nb0
wait_sync -->Always alive


Bye...

--
Jean-Yves BOUET
EADS Defence and Security Networks
jean-yves.bouet@example.com
01 34 60 86 36
Re: drbdsetup wait_sync [ In reply to ]
Ok, a little glance at the source unveiles the miracle: drbd_fs.c:421

The timeout limits the time before an initail connection. As soon as
there was a connection the timeout is MAX_SCHEDULE_TIMEOUT :)

-Philipp

* Jean-Yves Bouet - 78636 <jean-yves.bouet@example.com> [011023 14:22]:
> >
> >
> > Ok, bug in documentation. It should read: "Returns as soon as the device
> > leaves any synchronisation state".
> >
> > Thus, if the device goes into WFConnection state, all wait_sync command
> > sleeping on that device will return immediately.
> >
> > -Philipp
>
> Hi,
> This is not what i can actually observe:
> #node1: cat /proc/drbd
> version: 0.6.1-pre5 (api:58/proto:60)
> 0: cs:SyncingAll st:Secondary/Primary ns:0 nr:46104 dw:46104 dr:0 pe:0 ua:0
> 1: cs:SyncingAll st:Secondary/Primary ns:0 nr:45124 dw:45124 dr:0 pe:0 ua:0
>
> #node2: reset
>
> #node1: cat /proc/drbd
> version: 0.6.1-pre5 (api:58/proto:60)
>
> 0: cs:WFConnection st:Secondary/Unknown ns:0 nr:46104 dw:46104 dr:0 pe:0 ua:0
> -->State WFConnection
> 1: cs:WFConnection st:Secondary/Unknown ns:0 nr:45124 dw:45124 dr:0 pe:0 ua:0
> -->State WFConnection
>
> #node1: ps -ef
> UID PID PPID C STIME TTY TIME CMD
> root 1 0 1 14:11 ? 00:00:06 init
> root 2 1 0 14:11 ? 00:00:00 [keventd]
> root 3 1 0 14:11 ? 00:00:00 [kswapd]
> root 4 1 0 14:11 ? 00:00:00 [kreclaimd]
> root 5 1 0 14:11 ? 00:00:00 [bdflush]
> root 6 1 0 14:11 ? 00:00:00 [kupdate]
> root 8 1 0 14:12 ? 00:00:00 [kreiserfsd]
> bin 50 1 0 14:12 ? 00:00:00 /sbin/portmap
> root 81 1 0 14:12 ? 00:00:00 /bin/sh /etc/init.d/rc 3
> root 85 1 0 14:12 ? 00:00:00 /sbin/syslogd
> root 87 1 0 14:12 ? 00:00:00 /sbin/klogd
> root 92 1 0 14:12 ? 00:00:00 /usr/sbin/inetd
> root 95 81 0 14:12 ? 00:00:00 /usr/bin/perl -w
> /etc/rc.d/rc3.d/S30drbd start
> root 100 1 0 14:12 ? 00:00:02 [drbdd_0]
> root 103 1 0 14:12 ? 00:00:02 [drbdd_1]
> root 105 100 0 14:12 ? 00:00:01 [drbd_asender_0 <defunct>]
> root 106 103 0 14:12 ? 00:00:00 [drbd_asender_1 <defunct>]
> root 107 95 0 14:12 ? 00:00:00 /usr/bin/perl -w
> /etc/rc.d/rc3.d/S30drbd start
> root 108 95 0 14:12 ? 00:00:00 /usr/bin/perl -w
> /etc/rc.d/rc3.d/S30drbd start
> root 110 95 0 14:12 ? 00:00:00 /usr/bin/perl -w
> /etc/rc.d/rc3.d/S30drbd start
> root 112 110 0 14:12 ? 00:00:00 /usr/sbin/drbdsetup /dev/nb1
> wait_sync -->Never returns..
> root 113 108 0 14:12 ? 00:00:00 /usr/sbin/drbdsetup /dev/nb0
> wait_sync -->Never returns..
>
> 5 min after:
> #node1: ps -ef
> UID PID PPID C STIME TTY TIME CMD
> root 1 0 1 14:11 ? 00:00:06 init
> root 2 1 0 14:11 ? 00:00:00 [keventd]
> root 3 1 0 14:11 ? 00:00:00 [kswapd]
> root 4 1 0 14:11 ? 00:00:00 [kreclaimd]
> root 5 1 0 14:11 ? 00:00:00 [bdflush]
> root 6 1 0 14:11 ? 00:00:00 [kupdate]
> root 8 1 0 14:12 ? 00:00:00 [kreiserfsd]
> bin 50 1 0 14:12 ? 00:00:00 /sbin/portmap
> root 81 1 0 14:12 ? 00:00:00 /bin/sh /etc/init.d/rc 3
> root 85 1 0 14:12 ? 00:00:00 /sbin/syslogd
> root 87 1 0 14:12 ? 00:00:00 /sbin/klogd
> root 92 1 0 14:12 ? 00:00:00 /usr/sbin/inetd
> root 95 81 0 14:12 ? 00:00:00 /usr/bin/perl -w
> /etc/rc.d/rc3.d/S30drbd start
> root 100 1 0 14:12 ? 00:00:02 [drbdd_0]
> root 103 1 0 14:12 ? 00:00:02 [drbdd_1]
> root 105 100 0 14:12 ? 00:00:01 [drbd_asender_0 <defunct>]
> root 106 103 0 14:12 ? 00:00:00 [drbd_asender_1 <defunct>]
> root 107 95 0 14:12 ? 00:00:00 /usr/bin/perl -w
> /etc/rc.d/rc3.d/S30drbd start
> root 108 95 0 14:12 ? 00:00:00 /usr/bin/perl -w
> /etc/rc.d/rc3.d/S30drbd start
> root 110 95 0 14:12 ? 00:00:00 /usr/bin/perl -w
> /etc/rc.d/rc3.d/S30drbd start
> root 112 110 0 14:12 ? 00:00:00 /usr/sbin/drbdsetup /dev/nb1
> wait_sync -->Always alive..
> root 113 108 0 14:12 ? 00:00:00 /usr/sbin/drbdsetup /dev/nb0
> wait_sync -->Always alive
>
>
> Bye...
>
> --
> Jean-Yves BOUET
> EADS Defence and Security Networks
> jean-yves.bouet@example.com
> 01 34 60 86 36
>
>
>
>
> _______________________________________________
> DRBD-devel mailing list
> DRBD-devel@example.com
> https://lists.sourceforge.net/lists/listinfo/drbd-devel
Re: drbdsetup wait_sync [ In reply to ]
* Jean-Yves Bouet - 78636 <jean-yves.bouet@example.com> [011109 16:43]:
> Hello,
> "drbdsetup wait_sync" on secondary being synced stays blocked when
> primary goes down, even with timeout param.
> I purpose to change the code of drbd_ioctl (drbd_fs.c) in order this
> works:
> instead of :
> ---------------------------------------------------
> while (drbd_conf[minor].cstate >= Unconnected &&
> drbd_conf[minor].cstate != Connected &&
> time > 0 ) {
>
> time = interruptible_sleep_on_timeout(&drbd_conf[minor].cstate_wait,
> time);
>
>
> if (drbd_conf[minor].cstate == SyncingQuick ||
> drbd_conf[minor].cstate == SyncingAll )
> time=MAX_SCHEDULE_TIMEOUT;
>
> if(signal_pending(current)) return -EINTR;
> }
> ---------------------------------------------------
> i purpose:
> ---------------------------------------------------
> if( drbd_conf[minor].cstate >= SyncingAll )
> {
> while(time>0 && drbd_conf[minor].cstate!=Connected)
> {
> //wait a state-change:
> interruptible_sleep_on(&drbd_conf[minor].cstate_wait);
> if(signal_pending(current)) return -EINTR;
> if(drbd_conf[minor].cstate == WFConnection)
> {
>
> time=interruptible_sleep_on_timeout(&drbd_conf[minor].cstate_wait,
> time);
> //time=0 if primary isn't back
> //time>0 if primary is back, and
> cstate=WFReportParams
> if(signal_pending(current)) return -EINTR;
> }
> }
> }
> ---------------------------------------------------
>
> Give me your opinion!

1) The problem with your proposed code it that it will not wait at all
if synchronisation is not allready running when it ioctl() is called.

2) The current code should do what you described. If the connection is
lost cstate gets Timeout/BrokenPipe/Unconnected (for very short time)
and then WFConnection.

Hmm, yes, now I understand. We need to move the
if(...) time=MAX_SCHEDULE_TIMEOUT; up, before the sleep().

Because if the other server is going away before the timeout expired
(and sync was already running before the ioctl() was called),
then time is not set to MAX_SCHEDULE_TIMEOUT.

It should (and will) look like this:

while (drbd_conf[minor].cstate >= Unconnected &&
drbd_conf[minor].cstate != Connected &&
time > 0 ) {

if (drbd_conf[minor].cstate == SyncingQuick ||
drbd_conf[minor].cstate == SyncingAll )
time=MAX_SCHEDULE_TIMEOUT;

time = interruptible_sleep_on_timeout(&drbd_conf[minor].cstate_wait, time);

if(signal_pending(current)) return -EINTR;
}

-philipp
Re: drbdsetup wait_sync [ In reply to ]
Philipp Reisner wrote:

> * Jean-Yves Bouet - 78636 <jean-yves.bouet@example.com> [011109 16:43]:
> > Hello,
> > "drbdsetup wait_sync" on secondary being synced stays blocked when
> > primary goes down, even with timeout param.
> > I purpose to change the code of drbd_ioctl (drbd_fs.c) in order this
> > works:
> > instead of :
> > ---------------------------------------------------
> > while (drbd_conf[minor].cstate >= Unconnected &&
> > drbd_conf[minor].cstate != Connected &&
> > time > 0 ) {
> >
> > time = interruptible_sleep_on_timeout(&drbd_conf[minor].cstate_wait,
> > time);
> >
> >
> > if (drbd_conf[minor].cstate == SyncingQuick ||
> > drbd_conf[minor].cstate == SyncingAll )
> > time=MAX_SCHEDULE_TIMEOUT;
> >
> > if(signal_pending(current)) return -EINTR;
> > }
> > ---------------------------------------------------
> > i purpose:
> > ---------------------------------------------------
> > if( drbd_conf[minor].cstate >= SyncingAll )
> > {
> > while(time>0 && drbd_conf[minor].cstate!=Connected)
> > {
> > //wait a state-change:
> > interruptible_sleep_on(&drbd_conf[minor].cstate_wait);
> > if(signal_pending(current)) return -EINTR;
> > if(drbd_conf[minor].cstate == WFConnection)
> > {
> >
> > time=interruptible_sleep_on_timeout(&drbd_conf[minor].cstate_wait,
> > time);
> > //time=0 if primary isn't back
> > //time>0 if primary is back, and
> > cstate=WFReportParams
> > if(signal_pending(current)) return -EINTR;
> > }
> > }
> > }
> > ---------------------------------------------------
> >
> > Give me your opinion!
>
> 1) The problem with your proposed code it that it will not wait at all
> if synchronisation is not allready running when it ioctl() is called.
>

OK, we can add a timeout before the if:

time1=time;
while( drbd_conf[minor].cstate < SyncingAll && time1 > 0)
{
time1 = interruptible_sleep_on_timeout(&drbd_conf[minor].cstate_wait,
time1);
if(signal_pending(current)) return -EINTR;
}

if( drbd_conf[minor].cstate >= SyncingAll )
{
while(time1>0 && drbd_conf[minor].cstate!=Connected)
{
//wait a state-change:
interruptible_sleep_on(&drbd_conf[minor].cstate_wait);
if(signal_pending(current)) return -EINTR;
if(drbd_conf[minor].cstate == WFConnection)
{

time1=interruptible_sleep_on_timeout(&drbd_conf[minor].cstate_wait,time) ;
//time1=0 if primary isn't back
//time1>0 if primary is back, and cstate=WFReportParams
if(signal_pending(current)) return -EINTR;
}
}
}

>
> 2) The current code should do what you described. If the connection is
> lost cstate gets Timeout/BrokenPipe/Unconnected (for very short time)
> and then WFConnection.
>
> Hmm, yes, now I understand. We need to move the
> if(...) time=MAX_SCHEDULE_TIMEOUT; up, before the sleep().
>
> Because if the other server is going away before the timeout expired
> (and sync was already running before the ioctl() was called),
> then time is not set to MAX_SCHEDULE_TIMEOUT.
>
> It should (and will) look like this:
>
> while (drbd_conf[minor].cstate >= Unconnected &&
> drbd_conf[minor].cstate != Connected &&
> time > 0 ) {
>
> if (drbd_conf[minor].cstate == SyncingQuick ||
> drbd_conf[minor].cstate == SyncingAll )
> time=MAX_SCHEDULE_TIMEOUT;
>
> time = interruptible_sleep_on_timeout(&drbd_conf[minor].cstate_wait, time);
>
> if(signal_pending(current)) return -EINTR;
> }
>
> -philipp

I dont't like this solution because it does not take in account the case of
primary shutdown. (in this case i think we should
start a timeout. If the timeout expires, ioctl should stop).

Bye,


--
Jean-Yves BOUET
EADS Defence and Security Networks
jean-yves.bouet@example.com
01 34 60 86 36
Re: drbdsetup wait_sync [ In reply to ]
* Jean-Yves Bouet - 78636 <jean-yves.bouet@example.com> [011112 09:40]:
> Philipp Reisner wrote:
>
> > * Jean-Yves Bouet - 78636 <jean-yves.bouet@example.com> [011109 16:43]:
> > > Hello,
> > > "drbdsetup wait_sync" on secondary being synced stays blocked when
> > > primary goes down, even with timeout param.
> > > I purpose to change the code of drbd_ioctl (drbd_fs.c) in order this
> > > works:
> > > instead of :
> > > ---------------------------------------------------
> > > while (drbd_conf[minor].cstate >= Unconnected &&
> > > drbd_conf[minor].cstate != Connected &&
> > > time > 0 ) {
> > >
> > > time = interruptible_sleep_on_timeout(&drbd_conf[minor].cstate_wait,
> > > time);
> > >
> > >
> > > if (drbd_conf[minor].cstate == SyncingQuick ||
> > > drbd_conf[minor].cstate == SyncingAll )
> > > time=MAX_SCHEDULE_TIMEOUT;
> > >
> > > if(signal_pending(current)) return -EINTR;
> > > }
> > > ---------------------------------------------------
> > > i purpose:
> > > ---------------------------------------------------
> > > if( drbd_conf[minor].cstate >= SyncingAll )
> > > {
> > > while(time>0 && drbd_conf[minor].cstate!=Connected)
> > > {
> > > //wait a state-change:
> > > interruptible_sleep_on(&drbd_conf[minor].cstate_wait);
> > > if(signal_pending(current)) return -EINTR;
> > > if(drbd_conf[minor].cstate == WFConnection)
> > > {
> > >
> > > time=interruptible_sleep_on_timeout(&drbd_conf[minor].cstate_wait,
> > > time);
> > > //time=0 if primary isn't back
> > > //time>0 if primary is back, and
> > > cstate=WFReportParams
> > > if(signal_pending(current)) return -EINTR;
> > > }
> > > }
> > > }
> > > ---------------------------------------------------
> > >
> > > Give me your opinion!
> >
> > 1) The problem with your proposed code it that it will not wait at all
> > if synchronisation is not allready running when it ioctl() is called.
> >
>
> OK, we can add a timeout before the if:
>
> time1=time;
> while( drbd_conf[minor].cstate < SyncingAll && time1 > 0)
> {
> time1 = interruptible_sleep_on_timeout(&drbd_conf[minor].cstate_wait,
> time1);
> if(signal_pending(current)) return -EINTR;
> }
>
> if( drbd_conf[minor].cstate >= SyncingAll )
> {
> while(time1>0 && drbd_conf[minor].cstate!=Connected)
> {
> //wait a state-change:
> interruptible_sleep_on(&drbd_conf[minor].cstate_wait);
> if(signal_pending(current)) return -EINTR;
> if(drbd_conf[minor].cstate == WFConnection)
> {
>
> time1=interruptible_sleep_on_timeout(&drbd_conf[minor].cstate_wait,time) ;
> //time1=0 if primary isn't back
> //time1>0 if primary is back, and cstate=WFReportParams
> if(signal_pending(current)) return -EINTR;
> }
> }
> }
>
> >
> > 2) The current code should do what you described. If the connection is
> > lost cstate gets Timeout/BrokenPipe/Unconnected (for very short time)
> > and then WFConnection.
> >
> > Hmm, yes, now I understand. We need to move the
> > if(...) time=MAX_SCHEDULE_TIMEOUT; up, before the sleep().
> >
> > Because if the other server is going away before the timeout expired
> > (and sync was already running before the ioctl() was called),
> > then time is not set to MAX_SCHEDULE_TIMEOUT.
> >
> > It should (and will) look like this:
> >
> > while (drbd_conf[minor].cstate >= Unconnected &&
> > drbd_conf[minor].cstate != Connected &&
> > time > 0 ) {
> >
> > if (drbd_conf[minor].cstate == SyncingQuick ||
> > drbd_conf[minor].cstate == SyncingAll )
> > time=MAX_SCHEDULE_TIMEOUT;
> >
> > time = interruptible_sleep_on_timeout(&drbd_conf[minor].cstate_wait, time);
> >
> > if(signal_pending(current)) return -EINTR;
> > }
> >
> > -philipp
>
> I dont't like this solution because it does not take in account the case of
> primary shutdown. (in this case i think we should
> start a timeout. If the timeout expires, ioctl should stop).

If the primary goes down during sync, the data on your secondary
is not consistant. The best thing the secondary can do in this situation
is to do no harm, to do nothing, just sit there and wait.

-Philipp
Re: drbdsetup wait_sync [ In reply to ]
> If the primary goes down during sync, the data on your secondary
> is not consistant. The best thing the secondary can do in this situation
> is to do no harm, to do nothing, just sit there and wait.
>
> -Philipp

I agree with you, but in order that our cluster does not block (our system must be
as automatic as possible), we plan to reformat the inconsistant partition when
synchronisation didn't finish (and when the timeout __ long timeout : many
minutes__ expired). For that, i think it was a good idea to test the return of
"drbdsetup wait_sync timeout" in drbd script.
Bye,


--
Jean-Yves BOUET
EADS Defence and Security Networks
jean-yves.bouet@example.com
01 34 60 86 36
Re: drbdsetup wait_sync [ In reply to ]
* Jean-Yves Bouet - 78636 <jean-yves.bouet@example.com> [011113 08:56]:
> > If the primary goes down during sync, the data on your secondary
> > is not consistant. The best thing the secondary can do in this situation
> > is to do no harm, to do nothing, just sit there and wait.
> >
> > -Philipp
>
> I agree with you, but in order that our cluster does not block (our system must be
> as automatic as possible), we plan to reformat the inconsistant partition when
> synchronisation didn't finish (and when the timeout __ long timeout : many
> minutes__ expired). For that, i think it was a good idea to test the return of
> "drbdsetup wait_sync timeout" in drbd script.
> Bye,

Well , but this is specifically for your clusters and not a generic
DRBD issue. Patch your drbd, or do a workaround by sending a signal
to the sleeping drbdsetup command.

-Philipp