Mailing List Archive

Implementation Question
I haven't managed to find an answer to this question yet, so I'll ask it
here:

Has anyone come up with a way to use DRBD on more than 2 nodes,
mirroring the same partition(s)?
If not, is there another product which is meant to cater more toward the
3+ node cluster configuration?

TIA

Zackary D. Deems
Unix Systems Administrator
Virginia Dept. of Education
Re: Implementation Question [ In reply to ]
On 2000-11-21T15:45:42,
"Zackary D. Deems" <zdeems@example.com> said:

> Has anyone come up with a way to use DRBD on more than 2 nodes,
> mirroring the same partition(s)?

No.

We are accepting patches though ;-)

You _can_ use stacked drbd's, but this may prove to be not what you want. I
also don't want to think about the complexity this introduces...

> If not, is there another product which is meant to cater more toward the
> 3+ node cluster configuration?

No.

I am not even sure whether EMC^2 RDF does cater to more than two nodes.

You may wish to investigate a shared storage solution - using shared SCSI or
Fibre Channel - and using GFS on top of it, a special filesystem to allow
multiple nodes to access this concurrently. (http://www.globalfilesystem.org/)

What are you trying to achieve? Maybe the linux-ha@example.com list is a good place
to discuss the general design of your project.

Sincerely,
Lars Marowsky-Brée <lmb@example.com>
Development HA

--
Perfection is our goal, excellence will be tolerated. -- J. Yahl
Re: Implementation Question [ In reply to ]
Lars Marowsky-Bree wrote:
>
> On 2000-11-21T15:45:42,
> "Zackary D. Deems" <zdeems@example.com> said:
>
> > Has anyone come up with a way to use DRBD on more than 2 nodes,
> > mirroring the same partition(s)?
>
> No.
>
> We are accepting patches though ;-)
>
> You _can_ use stacked drbd's, but this may prove to be not what you want. I
> also don't want to think about the complexity this introduces...
>
> > If not, is there another product which is meant to cater more toward the
> > 3+ node cluster configuration?
>
> No.

The only other Linux product I'm aware of in this space is the TwinCom
http://www.twincom.com/ Network Disc Mirror. I think it's only 2 nodes
also...

DoubleTake is only for NT and Windows, so I'll ignore it.


-- Alan Robertson
alanr@example.com
Re: Implementation Question [ In reply to ]
Actually, there is a storage system that supports > 2 nodes. The
distributed block device (DBD) module in the CEnsemble toolkit
supports just about any number of nodes. The DBD is like DRDB except
that it allows any number of clients to access any number of logical
disks that are replicated (with 2-way redundancy) on any number of
storage servers. It includes a distributed lock manager, and we are
actively working on integrating the GFS+DBD+DLM to produce a full
clustered file system.

You can get CEnsemble from www.censemble.com. I'm preparing a
release of DBD that includes the (previously missing) kernel driver
and is pretty stable. It will be available in the next couple of
weeks. Please contact me if you would like to try a pre-release of
the software.

--Mark


Lars Marowsky-Bree wrote:
>
> On 2000-11-21T15:45:42,
> "Zackary D. Deems" <zdeems@example.com> said:
>
> > Has anyone come up with a way to use DRBD on more than 2 nodes,
> > mirroring the same partition(s)?
>
> No.
>
> We are accepting patches though ;-)
>
> You _can_ use stacked drbd's, but this may prove to be not what you want. I
> also don't want to think about the complexity this introduces...
>
> > If not, is there another product which is meant to cater more toward the
> > 3+ node cluster configuration?
>
> No.
>
> I am not even sure whether EMC^2 RDF does cater to more than two nodes.
>
> You may wish to investigate a shared storage solution - using shared SCSI or
> Fibre Channel - and using GFS on top of it, a special filesystem to allow
> multiple nodes to access this concurrently. (http://www.globalfilesystem.org/)
>
> What are you trying to achieve? Maybe the linux-ha@example.com list is a good place
> to discuss the general design of your project.
>
> Sincerely,
> Lars Marowsky-Brée <lmb@example.com>
> Development HA
Re: Implementation Question [ In reply to ]
> > Has anyone come up with a way to use DRBD on more than 2 nodes,
> > mirroring the same partition(s)?

You may want to look at intermezzo : http://www.inter-mezzo.org/
They are in the process to package the first production release.
It will probably do what you want it to do.

I am not sure if it can accept three node + , but both node can
be mounted RW ..

Thomas
--
Thomas Mangin (mailto:thomas.mangin@example.com)
System Administrator (mailto:systems@example.com)
Legend Internet Ltd. (http://www.legend.co.uk:/)
--
The urgent is done, the impossible is on the way, for miracles expect a
small delay
Re: Implementation Question [ In reply to ]
On 2000-11-21T14:14:00,
Mark Hayden <mh37@example.com> said:

> Actually, there is a storage system that supports > 2 nodes. The
> distributed block device (DBD) module in the CEnsemble toolkit
> supports just about any number of nodes. The DBD is like DRDB except
> that it allows any number of clients to access any number of logical
> disks that are replicated (with 2-way redundancy) on any number of
> storage servers. It includes a distributed lock manager, and we are
> actively working on integrating the GFS+DBD+DLM to produce a full
> clustered file system.

Cool!

Do you have some sort of performance estimates, how it will compare to drbd?

And while we are at it, how does DBD cope with the "good data/bad data"
problem we are currently discussing? Does it get worse with more than 2 nodes
or easier?

Sincerely,
Lars Marowsky-Brée <lmb@example.com>
Development HA

--
Perfection is our goal, excellence will be tolerated. -- J. Yahl
Re: Implementation Question [ In reply to ]
Lars Marowsky-Bree wrote:
>
> On 2000-11-21T14:14:00,
> Mark Hayden <mh37@example.com> said:
>
> > Actually, there is a storage system that supports > 2 nodes. The
> > distributed block device (DBD) module in the CEnsemble toolkit
> > supports just about any number of nodes. The DBD is like DRDB except
> > that it allows any number of clients to access any number of logical
> > disks that are replicated (with 2-way redundancy) on any number of
> > storage servers. It includes a distributed lock manager, and we are
> > actively working on integrating the GFS+DBD+DLM to produce a full
> > clustered file system.
>
> Cool!
>
> Do you have some sort of performance estimates, how it will compare to drbd?

My focus to this point has been on correctness of the protocols and
data integrity. However, I typically get 5MB/sec over fast ethernet
with one machine running all the storage servers. The performance
goal (which I expect to achieve) is for the software to expose the
performance of the underlying hardware: the system will serve up data
as quickly as the network and disks make it possible. I am
assembling a cluster of servers to do the performance testing needed
to achieve this goal.


> And while we are at it, how does DBD cope with the "good data/bad data"
> problem we are currently discussing? Does it get worse with more than 2 nodes
> or easier?

I haven't been following this dicussion too closely, but I can
describe what DBD does. DBD is designed to always maintain data
integrity in the presence of arbitrary network behavior (aside from
undetected packet corruption). We use a quorum-based protocol on
"manager" servers to determine which "storage" servers are up and
which are down. Note this means you really need at least 3 machines
with managers on them to have fault-tolerant quoroms (you can run
both manager and storage servers on the same machine).

Consider using DBD with a DRBD-like configuration with two storage
servers (call them A and B) and 2-way redundancy. In this case,
writes to all blocks go to both A and B and reads typically go to one
or the other.

If A fails, then B continues serving client requests but marks all
writes as being dirty. When A starts up again, it goes through a
"synchronization" phase in which B's dirty blocks are copied to A
before A is fully up.

If B fails without A completing synchronization, then A will remain
in the synchronization phase (not accepting requests) until B comes
back up and help's A complete. I view this as effectively a
double-failure, which 2-way redundancy does not protect against.

This approach allows DBD to provide a property called "one copy
serializability," which means that a single client accessing an
unshared DBD logical disk cannot tell the difference between the DBD
and a regular disk drive. If multiple clients are share a logical
disk (for GFS for instance), they need to synchronize their requests
using the DLM to get one copy serializability.

If you don't find this description to be clear, I'd be happy to
answer any questions.

regards, Mark
Re: Implementation Question [ In reply to ]
On Wed, 22 Nov 2000, Mark Hayden wrote:

<snip>

> If B fails without A completing synchronization, then A will remain
> in the synchronization phase (not accepting requests) until B comes
> back up and help's A complete. I view this as effectively a
> double-failure, which 2-way redundancy does not protect against.
>
> This approach allows DBD to provide a property called "one copy
> serializability," which means that a single client accessing an
> unshared DBD logical disk cannot tell the difference between the DBD
> and a regular disk drive. If multiple clients are share a logical
> disk (for GFS for instance), they need to synchronize their requests
> using the DLM to get one copy serializability.
>
> If you don't find this description to be clear, I'd be happy to
> answer any questions.

When DBD acknowledges to the kernel upper layers (marking the block clean
and calling the IO completion operation) after writing a dirty block to
the remote nodes?
Re: Implementation Question [ In reply to ]
On Thu, 23 Nov 2000, Mark Hayden wrote:

> Marcelo Tosatti wrote:
> >
> > On Wed, 22 Nov 2000, Mark Hayden wrote:
> >
> > <snip>
> >
> > > If B fails without A completing synchronization, then A will remain
> > > in the synchronization phase (not accepting requests) until B comes
> > > back up and help's A complete. I view this as effectively a
> > > double-failure, which 2-way redundancy does not protect against.
> > >
> > > This approach allows DBD to provide a property called "one copy
> > > serializability," which means that a single client accessing an
> > > unshared DBD logical disk cannot tell the difference between the DBD
> > > and a regular disk drive. If multiple clients are share a logical
> > > disk (for GFS for instance), they need to synchronize their requests
> > > using the DLM to get one copy serializability.
> >
> > When DBD acknowledges to the kernel upper layers (marking the block clean
> > and calling the IO completion operation) after writing a dirty block to
> > the remote nodes?
>
> When the DBD kernel driver gets a request to write a block to a
> logical disk, it determines which storage servers to send the block
> to and sends write requests to them. They write the block to stable
> storage and respond to the request after the IO operation has
> completed. It is only after receiving successful responses from all
> the servers it sent it to that the client driver calls the IO
> completion operation ("end_request()").

Ok. :)

I was going to ask you other questions but I think its best to read the
code myself. :)

Do you have a discussion list of about DBD development?
Re: Implementation Question [ In reply to ]
Marcelo Tosatti wrote:
>
> On Wed, 22 Nov 2000, Mark Hayden wrote:
>
> <snip>
>
> > If B fails without A completing synchronization, then A will remain
> > in the synchronization phase (not accepting requests) until B comes
> > back up and help's A complete. I view this as effectively a
> > double-failure, which 2-way redundancy does not protect against.
> >
> > This approach allows DBD to provide a property called "one copy
> > serializability," which means that a single client accessing an
> > unshared DBD logical disk cannot tell the difference between the DBD
> > and a regular disk drive. If multiple clients are share a logical
> > disk (for GFS for instance), they need to synchronize their requests
> > using the DLM to get one copy serializability.
>
> When DBD acknowledges to the kernel upper layers (marking the block clean
> and calling the IO completion operation) after writing a dirty block to
> the remote nodes?

When the DBD kernel driver gets a request to write a block to a
logical disk, it determines which storage servers to send the block
to and sends write requests to them. They write the block to stable
storage and respond to the request after the IO operation has
completed. It is only after receiving successful responses from all
the servers it sent it to that the client driver calls the IO
completion operation ("end_request()").

--Mark
Re: Implementation Question [ In reply to ]
I take it this implementation of DBD is different than
the one included in the 2.2.x
kernels? That one didn't do its own mirroring, and
would block on IO when the secondary
node died. Didn't help that it required MD-RAID to do
mirroring.. which of course is NOT
network aware..

I'm new to the linux-ha realm, so forgive me if I ask
questions which have been asked time and
again.. but.. is there a particular reason that almost
every 'network block device' type scenario
is built for a 2 node cluster? Is there some reason
why people would NOT want to have more
than two nodes in a cluster? I'm just wondering
whether I'm attempting to cause more problems
for myself than I really need.

Thanks!

Zackary D. Deems
Unix Systems Administrator
Virginia Dept. of Education
---------------------------------------
Actually, there is a storage system that supports > 2
nodes. The
distributed block device (DBD) module in the CEnsemble
toolkit
supports just about any number of nodes. The DBD is
like DRDB except
that it allows any number of clients to access any
number of logical
disks that are replicated (with 2-way redundancy) on
any number of
storage servers. It includes a distributed lock
manager, and we are
actively working on integrating the GFS+DBD+DLM to
produce a full
clustered file system.

You can get CEnsemble from www.censemble.com. I`m
preparing a
release of DBD that includes the (previously missing)
kernel driver
and is pretty stable. It will be available in the
next couple of
weeks. Please contact me if you would like to try a
pre-release of
the software.

--Mark


Lars Marowsky-Bree wrote:
>
> On 2000-11-21T15:45:42,
> "Zackary D. Deems" <zdeems@example.com> said:
>
> > Has anyone come up with a way to use DRBD on more
than 2 nodes,
> > mirroring the same partition(s)?
>
> No.
>
> We are accepting patches though ;-)
>
> You _can_ use stacked drbd`s, but this may prove to
be not what you want. I
> also don`t want to think about the complexity this
introduces...
>
> > If not, is there another product which is meant to
cater more toward the
> > 3+ node cluster configuration?
>
> No.
>
> I am not even sure whether EMC^2 RDF does cater to
more than two nodes.
>
> You may wish to investigate a shared storage
solution - using shared SCSI or
> Fibre Channel - and using GFS on top of it, a
special filesystem to allow
> multiple nodes to access this concurrently.
(http://www.globalfilesystem.org/)
>
> What are you trying to achieve? Maybe the
linux-ha@example.com list is a good place
> to discuss the general design of your project.
>
> Sincerely,
> Lars Marowsky-Bre <lmb@example.com>
> Development HA


__________________________________________________
Do You Yahoo!?
Yahoo! Shopping - Thousands of Stores. Millions of Products.
http://shopping.yahoo.com/