Mailing List Archive: Down sync

Down sync

Jul 23, 2020, 12:19 AM

Post #1 of 7 (699 views)

Hi,

My configuration is this:

A) Node drbd01: primary all

B) Node drbd02: primary all

C) Node drbd03: secondary all, diskless, for quorum proposal.

Initially all run correctly, but after various hours the sync between drbd
nodes is lost, in spite of the connections (ping) on the networks is ok.

Some times, the witness (node drbd03) appears "connecting" to drbd01,
another times is the node drbd02, etc. My OS is RHEL 7, and firewalld is
stopped and disabled, also SELinux is disabled...

What could be happening?

[root@drbd01 drbd.d]# uname -a
> Linux drbd01 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020
> x86_64 x86_64 x86_64 GNU/Linux
> [root@drbd01 drbd.d]#
> [root@drbd01 drbd.d]# cat global_common.conf
> global {
> usage-count no;
> udev-always-use-vnr;
> }
> common {
> handlers {
> }
> startup {
> }
> options {
> quorum majority;
> # on-no-quorum io-error;
> # quorum-minimum-redundancy 1;
> }
> disk {
> }
> net {
> verify-alg crc32c;
> }
> }
> [root@drbd01 drbd.d]# cat *.res |more
> resource DATA01 {
> volume 1 {
> disk /dev/sdf;
> device /dev/drbd4;
> meta-disk internal;
> }
> on drbd01 {
> address 10.10.10.1:7791;
> node-id 0;
> }
> on drbd02 {
> address 10.10.10.2:7791;
> node-id 1;
> }
> on drbd03 {
> address 10.10.10.3:7791;
> node-id 2;
> volume 1 {
> disk none;
> }
>
>
> }
> connection-mesh {
> hosts drbd01 drbd02 drbd03;
> net {
> protocol C;
> allow-two-primaries yes;
> }
> }
>
> }
>
> resource DATA02 {
> volume 1 {
> disk /dev/sdg;
> device /dev/drbd5;
> meta-disk internal;
> }
> on drbd01 {
> address 10.10.10.1:7792;
> node-id 0;
> }
> on drbd02 {
> address 10.10.10.2:7792;
> node-id 1;
> }
> on drbd03 {
> address 10.10.10.3:7792;
> node-id 2;
> volume 1 {
> disk none;
> }
>
>
> }
> connection-mesh {
> hosts drbd01 drbd02 drbd03;
> net {
> protocol C;
> allow-two-primaries yes;
> }
> }
>
> }
>
> resource DATA03 {
> volume 1 {
> disk /dev/sdh;
> device /dev/drbd6;
> meta-disk internal;
> }
> on drbd01 {
> address 10.10.10.1:7793;
> node-id 0;
> }
> on drbd02 {
> address 10.10.10.2:7793;
> node-id 1;
> }
> on drbd03 {
> address 10.10.10.3:7793;
> node-id 2;
> volume 1 {
> disk none;
> }
>
>
> }
> connection-mesh {
> hosts drbd01 drbd02 drbd03;
> net {
> protocol C;
> allow-two-primaries yes;
> }
> }
>
> }
>
> resource GIMR01 {
> volume 1 {
> disk /dev/sde;
> device /dev/drbd3;
> meta-disk internal;
> }
> on drbd01 {
> address 10.10.10.1:7790;
> node-id 0;
> }
> on drbd02 {
> address 10.10.10.2:7790;
> node-id 1;
> }
> on drbd03 {
> address 10.10.10.3:7790;
> node-id 2;
> volume 1 {
> disk none;
> }
>
>
> }
> connection-mesh {
> hosts drbd01 drbd02 drbd03;
> net {
> protocol C;
> allow-two-primaries yes;
> }
> }
>
> }
> resource MIGRA01 {
> volume 1 {
> disk /dev/sdi;
> device /dev/drbd7;
> meta-disk internal;
> }
> on drbd01 {
> address 10.10.10.1:7794;
> node-id 0;
> }
> on drbd02 {
> address 10.10.10.2:7794;
> node-id 1;
> }
> on drbd03 {
> address 10.10.10.3:7794;
> node-id 2;
> volume 1 {
> disk none;
> }
>
>
> }
> connection-mesh {
> hosts drbd01 drbd02 drbd03;
> net {
> protocol C;
> allow-two-primaries yes;
> }
> }
>
> }
>
> resource MIGRA02 {
> volume 1 {
> disk /dev/sdj;
> device /dev/drbd8;
> meta-disk internal;
> }
> on drbd01 {
> address 10.10.10.1:7795;
> node-id 0;
> }
> on drbd02 {
> address 10.10.10.2:7795;
> node-id 1;
> }
> on drbd03 {
> address 10.10.10.3:7795;
> node-id 2;
> volume 1 {
> disk none;
> }
>
>
> }
> connection-mesh {
> hosts drbd01 drbd02 drbd03;
> net {
> protocol C;
> allow-two-primaries yes;
> }
> }
>
> }
>
> resource MIGRA03 {
> volume 1 {
> disk /dev/sdk;
> device /dev/drbd9;
> meta-disk internal;
> }
> on drbd01 {
> address 10.10.10.1:7796;
> node-id 0;
> }
> on drbd02 {
> address 10.10.10.2:7796;
> node-id 1;
> }
> on drbd03 {
> address 10.10.10.3:7796;
> node-id 2;
> volume 1 {
> disk none;
> }
>
>
> }
> connection-mesh {
> hosts drbd01 drbd02 drbd03;
> net {
> protocol C;
> allow-two-primaries yes;
> }
> }
>
> }
>
> resource MIGRA04 {
> volume 1 {
> disk /dev/sdl;
> device /dev/drbd10;
> meta-disk internal;
> }
> on drbd01 {
> address 10.10.10.1:7797;
> node-id 0;
> }
> on drbd02 {
> address 10.10.10.2:7797;
> node-id 1;
> }
> on drbd03 {
> address 10.10.10.3:7797;
> node-id 2;
> volume 1 {
> disk none;
> }
>
>
> }
> connection-mesh {
> hosts drbd01 drbd02 drbd03;
> net {
> protocol C;
> allow-two-primaries yes;
> }
> }
>
> }
>
> resource OCR01 {
> volume 1 {
> disk /dev/sdb;
> device /dev/drbd0;
> meta-disk internal;
> }
> on drbd01 {
> address 10.10.10.1:7787;
> node-id 0;
> }
> on drbd02 {
> address 10.10.10.2:7787;
> node-id 1;
> }
> on drbd03 {
> address 10.10.10.3:7787;
> node-id 2;
> volume 1 {
> disk none;
> }
>
>
> }
> connection-mesh {
> hosts drbd01 drbd02 drbd03;
> net {
> protocol C;
> allow-two-primaries yes;
> }
> }
>
> }
> resource OCR02 {
> volume 1 {
> disk /dev/sdc;
> device /dev/drbd1;
> meta-disk internal;
> }
> on drbd01 {
> address 10.10.10.1:7788;
> node-id 0;
> }
> on drbd02 {
> address 10.10.10.2:7788;
> node-id 1;
> }
> on drbd03 {
> address 10.10.10.3:7788;
> node-id 2;
> volume 1 {
> disk none;
> }
>
>
> }
> connection-mesh {
> hosts drbd01 drbd02 drbd03;
> net {
> protocol C;
> allow-two-primaries yes;
> }
> }
>
> }
>
> resource OCR03 {
> volume 1 {
> disk /dev/sdd;
> device /dev/drbd2;
> meta-disk internal;
> }
> on drbd01 {
> address 10.10.10.1:7789;
> node-id 0;
> }
> on drbd02 {
> address 10.10.10.2:7789;
> node-id 1;
> }
> on drbd03 {
> address 10.10.10.3:7789;
> node-id 2;
> volume 1 {
> disk none;
> }
>
>
> }
> connection-mesh {
> hosts drbd01 drbd02 drbd03;
> net {
> protocol C;
> allow-two-primaries yes;
> }
> }
>
> }
>

Best regards.
Juan.

Re: Down sync [ In reply to ]

juan.sevilla.11 at gmail

Jul 23, 2020, 1:31 AM

Post #2 of 7 (699 views)

Permalink

The resource drbd02 is just now down between drbd02 and drbd03. Where can i
review the more logs?? Thanks in advance

Jul 23 10:05:57 drbd02 kernel: drbd MIGRA02 drbd03: meta connection shut
> down by peer.
> Jul 23 10:05:57 drbd02 kernel: drbd MIGRA02 drbd03: conn( Connected ->
> NetworkFailure ) peer( Secondary -> Unknown )
> Jul 23 10:05:57 drbd02 kernel: drbd MIGRA02/1 drbd8 drbd03: pdsk( Diskless
> -> DUnknown ) repl( Established -> Off )
> Jul 23 10:05:57 drbd02 kernel: drbd MIGRA02 drbd03: ack_receiver terminated
> Jul 23 10:05:57 drbd02 kernel: drbd MIGRA02 drbd03: Terminating ack_recv
> thread
> Jul 23 10:05:57 drbd02 kernel: drbd MIGRA02 drbd03: sock was shut down by
> peer
> Jul 23 10:05:57 drbd02 kernel: drbd MIGRA02 drbd03: Restarting sender
> thread
> Jul 23 10:05:57 drbd02 kernel: drbd MIGRA02 drbd03: Connection closed
> Jul 23 10:05:57 drbd02 kernel: drbd MIGRA02 drbd03: conn( NetworkFailure
> -> Unconnected )
> Jul 23 10:05:57 drbd02 kernel: drbd MIGRA02 drbd03: Restarting receiver
> thread
> Jul 23 10:05:57 drbd02 kernel: drbd MIGRA02 drbd03: conn( Unconnected ->
> Connecting )
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: Handshake to peer 2
> successful: Agreed network protocol version 117
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: Feature flags enabled
> on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: Starting ack_recv
> thread (from drbd_r_MIGRA02 [2695])
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02: Preparing cluster-wide state
> change 1863242544 (1->2 499/145)
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02: Declined by peer drbd01 (id:
> 0), see the kernel log there
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02: Aborting cluster-wide state
> change 1863242544 (19ms) rv = -10
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: Failure to connect;
> retrying
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: conn( Connecting ->
> NetworkFailure )
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: ack_receiver terminated
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: Terminating ack_recv
> thread
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: Restarting sender
> thread
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: Connection closed
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: conn( NetworkFailure
> -> Unconnected )
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: Restarting receiver
> thread
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: conn( Unconnected ->
> Connecting )
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: Handshake to peer 2
> successful: Agreed network protocol version 117
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: Feature flags enabled
> on protocol level: 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
> Jul 23 10:05:58 drbd02 kernel: drbd MIGRA02 drbd03: Starting ack_recv
> thread (from drbd_r_MIGRA02 [2695])
> Jul 23 10:05:59 drbd02 kernel: drbd MIGRA02: Preparing cluster-wide state
> change 1892110034 (1->2 499/145)
> Jul 23 10:05:59 drbd02 kernel: drbd MIGRA02: Declined by peer drbd01 (id:
> 0), see the kernel log there
> Jul 23 10:05:59 drbd02 kernel: drbd MIGRA02: Aborting cluster-wide state
> change 1892110034 (0ms) rv = -10
> Jul 23 10:05:59 drbd02 kernel: drbd MIGRA02 drbd03: Failure to connect;
> retrying
> Jul 23 10:05:59 drbd02 kernel: drbd MIGRA02 drbd03: conn( Connecting ->
> NetworkFailure )

..........

El jue., 23 jul. 2020 a las 9:19, Juan Sevilla (<juan.sevilla.11@gmail.com>)
escribió:

> Hi,
>
> My configuration is this:
>
> A) Node drbd01: primary all
>
> B) Node drbd02: primary all
>
> C) Node drbd03: secondary all, diskless, for quorum proposal.
>
> Initially all run correctly, but after various hours the sync between drbd
> nodes is lost, in spite of the connections (ping) on the networks is ok.
>
> Some times, the witness (node drbd03) appears "connecting" to drbd01,
> another times is the node drbd02, etc. My OS is RHEL 7, and firewalld is
> stopped and disabled, also SELinux is disabled...
>
> What could be happening?
>
>
> [root@drbd01 drbd.d]# uname -a
>> Linux drbd01 3.10.0-1127.el7.x86_64 #1 SMP Tue Mar 31 23:36:51 UTC 2020
>> x86_64 x86_64 x86_64 GNU/Linux
>> [root@drbd01 drbd.d]#
>> [root@drbd01 drbd.d]# cat global_common.conf
>> global {
>> usage-count no;
>> udev-always-use-vnr;
>> }
>> common {
>> handlers {
>> }
>> startup {
>> }
>> options {
>> quorum majority;
>> # on-no-quorum io-error;
>> # quorum-minimum-redundancy 1;
>> }
>> disk {
>> }
>> net {
>> verify-alg crc32c;
>> }
>> }
>> [root@drbd01 drbd.d]# cat *.res |more
>> resource DATA01 {
>> volume 1 {
>> disk /dev/sdf;
>> device /dev/drbd4;
>> meta-disk internal;
>> }
>> on drbd01 {
>> address 10.10.10.1:7791;
>> node-id 0;
>> }
>> on drbd02 {
>> address 10.10.10.2:7791;
>> node-id 1;
>> }
>> on drbd03 {
>> address 10.10.10.3:7791;
>> node-id 2;
>> volume 1 {
>> disk none;
>> }
>>
>>
>> }
>> connection-mesh {
>> hosts drbd01 drbd02 drbd03;
>> net {
>> protocol C;
>> allow-two-primaries yes;
>> }
>> }
>>
>> }
>>
>> resource DATA02 {
>> volume 1 {
>> disk /dev/sdg;
>> device /dev/drbd5;
>> meta-disk internal;
>> }
>> on drbd01 {
>> address 10.10.10.1:7792;
>> node-id 0;
>> }
>> on drbd02 {
>> address 10.10.10.2:7792;
>> node-id 1;
>> }
>> on drbd03 {
>> address 10.10.10.3:7792;
>> node-id 2;
>> volume 1 {
>> disk none;
>> }
>>
>>
>> }
>> connection-mesh {
>> hosts drbd01 drbd02 drbd03;
>> net {
>> protocol C;
>> allow-two-primaries yes;
>> }
>> }
>>
>> }
>>
>> resource DATA03 {
>> volume 1 {
>> disk /dev/sdh;
>> device /dev/drbd6;
>> meta-disk internal;
>> }
>> on drbd01 {
>> address 10.10.10.1:7793;
>> node-id 0;
>> }
>> on drbd02 {
>> address 10.10.10.2:7793;
>> node-id 1;
>> }
>> on drbd03 {
>> address 10.10.10.3:7793;
>> node-id 2;
>> volume 1 {
>> disk none;
>> }
>>
>>
>> }
>> connection-mesh {
>> hosts drbd01 drbd02 drbd03;
>> net {
>> protocol C;
>> allow-two-primaries yes;
>> }
>> }
>>
>> }
>>
>> resource GIMR01 {
>> volume 1 {
>> disk /dev/sde;
>> device /dev/drbd3;
>> meta-disk internal;
>> }
>> on drbd01 {
>> address 10.10.10.1:7790;
>> node-id 0;
>> }
>> on drbd02 {
>> address 10.10.10.2:7790;
>> node-id 1;
>> }
>> on drbd03 {
>> address 10.10.10.3:7790;
>> node-id 2;
>> volume 1 {
>> disk none;
>> }
>>
>>
>> }
>> connection-mesh {
>> hosts drbd01 drbd02 drbd03;
>> net {
>> protocol C;
>> allow-two-primaries yes;
>> }
>> }
>>
>> }
>> resource MIGRA01 {
>> volume 1 {
>> disk /dev/sdi;
>> device /dev/drbd7;
>> meta-disk internal;
>> }
>> on drbd01 {
>> address 10.10.10.1:7794;
>> node-id 0;
>> }
>> on drbd02 {
>> address 10.10.10.2:7794;
>> node-id 1;
>> }
>> on drbd03 {
>> address 10.10.10.3:7794;
>> node-id 2;
>> volume 1 {
>> disk none;
>> }
>>
>>
>> }
>> connection-mesh {
>> hosts drbd01 drbd02 drbd03;
>> net {
>> protocol C;
>> allow-two-primaries yes;
>> }
>> }
>>
>> }
>>
>> resource MIGRA02 {
>> volume 1 {
>> disk /dev/sdj;
>> device /dev/drbd8;
>> meta-disk internal;
>> }
>> on drbd01 {
>> address 10.10.10.1:7795;
>> node-id 0;
>> }
>> on drbd02 {
>> address 10.10.10.2:7795;
>> node-id 1;
>> }
>> on drbd03 {
>> address 10.10.10.3:7795;
>> node-id 2;
>> volume 1 {
>> disk none;
>> }
>>
>>
>> }
>> connection-mesh {
>> hosts drbd01 drbd02 drbd03;
>> net {
>> protocol C;
>> allow-two-primaries yes;
>> }
>> }
>>
>> }
>>
>> resource MIGRA03 {
>> volume 1 {
>> disk /dev/sdk;
>> device /dev/drbd9;
>> meta-disk internal;
>> }
>> on drbd01 {
>> address 10.10.10.1:7796;
>> node-id 0;
>> }
>> on drbd02 {
>> address 10.10.10.2:7796;
>> node-id 1;
>> }
>> on drbd03 {
>> address 10.10.10.3:7796;
>> node-id 2;
>> volume 1 {
>> disk none;
>> }
>>
>>
>> }
>> connection-mesh {
>> hosts drbd01 drbd02 drbd03;
>> net {
>> protocol C;
>> allow-two-primaries yes;
>> }
>> }
>>
>> }
>>
>> resource MIGRA04 {
>> volume 1 {
>> disk /dev/sdl;
>> device /dev/drbd10;
>> meta-disk internal;
>> }
>> on drbd01 {
>> address 10.10.10.1:7797;
>> node-id 0;
>> }
>> on drbd02 {
>> address 10.10.10.2:7797;
>> node-id 1;
>> }
>> on drbd03 {
>> address 10.10.10.3:7797;
>> node-id 2;
>> volume 1 {
>> disk none;
>> }
>>
>>
>> }
>> connection-mesh {
>> hosts drbd01 drbd02 drbd03;
>> net {
>> protocol C;
>> allow-two-primaries yes;
>> }
>> }
>>
>> }
>>
>> resource OCR01 {
>> volume 1 {
>> disk /dev/sdb;
>> device /dev/drbd0;
>> meta-disk internal;
>> }
>> on drbd01 {
>> address 10.10.10.1:7787;
>> node-id 0;
>> }
>> on drbd02 {
>> address 10.10.10.2:7787;
>> node-id 1;
>> }
>> on drbd03 {
>> address 10.10.10.3:7787;
>> node-id 2;
>> volume 1 {
>> disk none;
>> }
>>
>>
>> }
>> connection-mesh {
>> hosts drbd01 drbd02 drbd03;
>> net {
>> protocol C;
>> allow-two-primaries yes;
>> }
>> }
>>
>> }
>> resource OCR02 {
>> volume 1 {
>> disk /dev/sdc;
>> device /dev/drbd1;
>> meta-disk internal;
>> }
>> on drbd01 {
>> address 10.10.10.1:7788;
>> node-id 0;
>> }
>> on drbd02 {
>> address 10.10.10.2:7788;
>> node-id 1;
>> }
>> on drbd03 {
>> address 10.10.10.3:7788;
>> node-id 2;
>> volume 1 {
>> disk none;
>> }
>>
>>
>> }
>> connection-mesh {
>> hosts drbd01 drbd02 drbd03;
>> net {
>> protocol C;
>> allow-two-primaries yes;
>> }
>> }
>>
>> }
>>
>> resource OCR03 {
>> volume 1 {
>> disk /dev/sdd;
>> device /dev/drbd2;
>> meta-disk internal;
>> }
>> on drbd01 {
>> address 10.10.10.1:7789;
>> node-id 0;
>> }
>> on drbd02 {
>> address 10.10.10.2:7789;
>> node-id 1;
>> }
>> on drbd03 {
>> address 10.10.10.3:7789;
>> node-id 2;
>> volume 1 {
>> disk none;
>> }
>>
>>
>> }
>> connection-mesh {
>> hosts drbd01 drbd02 drbd03;
>> net {
>> protocol C;
>> allow-two-primaries yes;
>> }
>> }
>>
>> }
>>
>
>
> Best regards.
> Juan.
>

Re: Down sync [ In reply to ]

roland.kammerer at linbit

Jul 23, 2020, 7:06 AM

Post #3 of 7 (699 views)

Permalink

On Thu, Jul 23, 2020 at 09:19:14AM +0200, Juan Sevilla wrote:
> Hi,
>
> My configuration is this:
>
> A) Node drbd01: primary all
>
> B) Node drbd02: primary all
>
> C) Node drbd03: secondary all, diskless, for quorum proposal.

I don't know your write pattern, but this sounds like a bad idea.
Multiple primaries are allowed during live migration where you can
guarantee exactly one writer. Otherwise: don't do it.

Regards, rck
_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user

Re: Down sync [ In reply to ]

robert.altnoeder at linbit

Jul 24, 2020, 2:43 AM

Post #4 of 7 (694 views)

Permalink

On 7/23/20 4:06 PM, Roland Kammerer wrote:

> I don't know your write pattern, but this sounds like a bad idea.
> Multiple primaries are allowed during live migration where you can
> guarantee exactly one writer. Otherwise: don't do it.

I'll add some details:

Dual Primary is supported for two kinds of situations:

1. Temporary dual-primary for live-migration of VMs using a multi-node
replicated DRBD resource
2. Permanent dual-primary with specialized readers/writers (cluster file
systems, cluster-synchronized applications, etc.) on a two-node
replicated DRBD resource

You have a three-node replicated DRBD resource. Noone knows whether that
will replicate and resynchronize correctly even if it is connected.
In general, dual primary configurations are a lot less robust than
normal multi-resource active/active configurations and should be
avoided. In most cases, there are other configurations that do not
require dual primary mode for the same use-case, so in most cases, a
dual primary configuration is a misconfiguration in the first place.

Tell us more about what your actual use-case and applications are, and
we may be able to help you with the setup.

br,
Robert

_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user

Re: Down sync [ In reply to ]

juan.sevilla.11 at gmail

Jul 24, 2020, 3:22 AM

Post #5 of 7 (694 views)

Permalink

Hi,

Thanks for your response. I need use primary/primary for using the storage
blocks by clustered filesystem.

In general this configuration is running ok, also when i do a intensive
using of the replicated local disks. My problem is that, occasionally,
appears a disconnect between nodes.

I don't know what you want say when you refer to multiresource
active/active. Is it a alternative to dual primary/primary.

Thanks in advance,
Juan.

Re: Down sync [ In reply to ]

robert.altnoeder at linbit

Jul 24, 2020, 5:07 AM

Post #6 of 7 (694 views)

Permalink

On 7/24/20 12:22 PM, Juan Sevilla wrote:
> Hi,
>
> Thanks for your response. I need use primary/primary for using the
> storage blocks by clustered filesystem.

The question is whether you actually need a clustered filesystem in the
first place. That is why I asked about the use-case and applications
running on those systems.

> In general this configuration is running ok, also when i do a
> intensive using of the replicated local disks.

With a resource that is replicated between more than two nodes while two
nodes are in the Primary role, data could be corrupted, maybe not during
replication, but possibly if a node has an outage, or if data is
resynced later. This is not a supported configuration.

> My problem is that, occasionally, appears a disconnect between nodes.

That should normally lead to an immediate power-off of the node that was
lost. If it doesn't, then it's misconfigured, and the result is at least
a split-brain situation, apart from the potential data corruption due to
what I wrote above.

E.g., let's assume node A is Primary, node B is Primary, node C is
Secondary.
Now A disconnects from B, but both are connected to C, and applications
still read and write data on A and B.
A is missing the data that's being written on B, and read requests on A
read old data after an update of that same data on node B.
The same is true the other way around. So the result is, that you have
two different unrelated data sets on A and B, and the state of any
applications that rely on the cluster filesystem may be corrupted.
But then, node C is even more interesting, because that one gets updates
from both, node A and node B, which have diverged. So the data on node C
could be a completely corrupted mix of unrelated updates from A and B,
which may even corrupt the filesystem's data structures, thereby making
the filesystem unreadable.
Upon reconnect, what is supposed to happen?
Node A and Node B are split-brained and cannot sync. Even if you sync
those two, node C's data cannot be recovered, so you would have to
full-sync it for the data on it to make any sense again.

And that's just the tip of the iceberg with regards to the background
story on why Dual Primary multi-node-clusters are opening Pandora's box
in many interesting ways...

>
> I don't know what you want say when you refer to multiresource
> active/active. Is it a alternative to dual primary/primary.

Multiple resources/volumes. Some Primary on node A, others Primary on
node B. Normally grouped with applications that are independent.
E.g. two different database instances that can run on different nodes.
Instead of keeping those on the same filesystem, each DB instance gets a
separate mountpoint for its data, and each mountpoint is backed by a
separate DRBD resource.
Resource db1 is Primary with database instance 1 running on node A,
Secondary on nodes B and C.
Resource db2 is Primary with database instance 2 running on node B,
Secondary on nodes A and C.
That's a multi-resource active/active setup.

br,
Robert

_______________________________________________
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user

Re: Down sync [ In reply to ]

juan.sevilla.11 at gmail

Jul 26, 2020, 11:28 PM

Post #7 of 7 (684 views)

Permalink

Hi Robert,

> Permanent dual-primary with specialized readers/writers (cluster file
> systems, cluster-synchronized applications, etc.) on a two-node
> replicated DRBD resource

This is exactly my case. I need build a system with Oracle ASM on top. ASM
is a clusterized filesystem, with its own heartbeat, etc. like OCFS2.

I need dual-primary for making a virtual SAN. This system is running
correctly with high load of IO. I've restored and recovered a 0,7TB
database with dual-primary active. I can't use model multiresource
active/active because is only one database beings accessed by two nodes
simultaneously.

I think it's possible to eliminate the witness based on diskless node
(third node) and replace it with a fencing handler.

I appreciate your comments,

Best regards,
Juan.