Mailing List Archive

why SyncingAll?
Philipp,

Why do you insist on SyncingAll if the following situation happens:

m1 is primary and has /dev/nb0 volume mounted
m2 is secondary is connected and replicating
m1 goes down and secondary takes over and becomes primary
m2 comes back online and SyncingAll starts

Huh?

The data is consistent on m2... only a sync of the data that was missed by
m1 should be necessary. What's going on here?

Dan


--
Dan Yocum
Sloan Digital Sky Survey, Fermilab 630.840.6509
yocum@example.com, http://www.sdss.org
SDSS. Mapping the Universe.
Re: why SyncingAll? [ In reply to ]
"Bene, Martin" wrote:
>
> Hi Dan,
>
> > Why do you insist on SyncingAll if the following situation happens:
> >
> > m1 is primary and has /dev/nb0 volume mounted
> > m2 is secondary is connected and replicating
>
> I assume the "is replicating" just means m2 is in a normal, connected state,
> right? (not "sync quick" or "sync all"):


Yep. Connected normally, not syncing or anything fancy.

>
> > m1 goes down and secondary takes over and becomes primary
> > m2 comes back online and SyncingAll starts
>
> typo here? shouldn't that be "m1 comes back online"? I'll assume yes.

Doh! Early on a Monday AM - yes, that should say m1.

>
> > The data is consistent on m2... only a sync of the data that
> > was missed by m1 should be necessary. What's going on here?
>
> The question is HOW did m1 go down. if it's a clean shutdown (i.e nb0 got
> switched to secondary before losing the connection), a quick sync is
> sufficient.
>
> if m1 went offline right from primary status, a full sync is necessary: it's
> unknown if all data m1 wrote to its local disk also got sent out on the
> network, so while m2 data is consistent, it might not be completely up to
> date. to get a consistent state on m1, syncall is required.


Even if it wasn't a clean shutdown, only a quick sync should be necessary -
if a file was in the process of being written to disk when m1 went down -
the file is incomplete on m2, too. Any changes that are made on m2 only
need to be synced to m1.... of course, if m1 went down because the disk
died, and was replaced, a full sync would be necessary.... hmmmmm. But, in
that case, there would have to be operator intervention; no way around it.
I'm just talking about the scenario where m1 got rebooted 'cause the power
was interupted or the cleaning lady pulled the plug or I rebooted it.

Cheers,
Dan


--
Dan Yocum
Sloan Digital Sky Survey, Fermilab 630.840.6509
yocum@example.com, http://www.sdss.org
SDSS. Mapping the Universe.
Re: why SyncingAll? [ In reply to ]
* Dan Yocum <yocum@example.com> [011112 16:49]:
> Philipp,
>
> Why do you insist on SyncingAll if the following situation happens:
>
> m1 is primary and has /dev/nb0 volume mounted
> m2 is secondary is connected and replicating
> m1 goes down and secondary takes over and becomes primary
> m2 comes back online and SyncingAll starts
>
> Huh?
>
> The data is consistent on m2... only a sync of the data that was missed by
> m1 should be necessary. What's going on here?
>
> Dan

Sorry about repleating myself.

Read last paragraph page 8 and first example on page 9 of
http://www.complang.tuwien.ac.at/reisner/drbd/publications/drbd_paper_for_NLUUG_2001.pdf

-Philipp
RE: why SyncingAll? [ In reply to ]
> Read last paragraph page 8 and first example on page 9 of
> http://www.complang.tuwien.ac.at/reisner/drbd/publications/drbd_pa
> per_for_NLUUG_2001.pdf

In the example one block is inconsistent causing a full sync. Why do we
syncAll instead of hashing and comparing a bunch of blocks on the nodes in
order to speed things up by avoiding unnecessary write operations?
Re: why SyncingAll? [ In reply to ]
* Patrik Jarnefelt <patrik.jarnefelt@example.com> [011114 09:45]:
> > Read last paragraph page 8 and first example on page 9 of
> > http://www.complang.tuwien.ac.at/reisner/drbd/publications/drbd_pa
> > per_for_NLUUG_2001.pdf
>
> In the example one block is inconsistent causing a full sync. Why do we
> syncAll instead of hashing and comparing a bunch of blocks on the nodes in
> order to speed things up by avoiding unnecessary write operations?

Because nobody has written this up to now. Everybody is welcome to
contribute to DRBD, it is open source (and GPLed).

-Philipp