Mailing List Archive

Re: Presentation
Hi David,

Thanks for your interest in the Linux-HA project!

Since you wrote this email to me personally, but mentioned "and company", I am
copying my response to the Linux-HA development team for their consideration as
well.

CEINTEC i+d wrote:
>
> Hi Alan & company:
>
> My name is David Martinez, I have been coordinating
> with Jesus Martinez the work group composed by Iniaki Fernandez,
> Josu Abajo and Javier Ruiz last summer in CEINTEC i+d,
> they have make a good job of investigation about linux-ha
> testing and probing some software solutions like
> Coda, heartbeat ... for our proposes, but they have left
> this project because they have other important ocupations. :-(
>
> I write you this mail to say that I have retake our project,
> and I'll continue testing heartbeat.

That's good news. I really appreciated the report they wrote on heartbeat
towards the end. I'm confident you'll also do a good job.

> In the near future I would like to study the Heartbeat code
> but now i'm only testing it.I'm using RedHat because
> it's that I Know.

Great! It's pretty simple code in most respects. I look forward to your
comments.

> At the moment I'm studing solutions to make an internodes
> real time file mirror system.(We have descarted Coda,
> because it's very complex and unstable.)
>
> I have been testing the Networked block device "nbd" with
> raid 1/5 to make this, but I'm afraid to say that this is
> a very dangerous solution, if a connection fails it
> produces a file system corruption; I think this metod is
> so dangerous than to share an scsi disk array with
> two nodes, because a problem in the master node may
> cause an important data corruption.What do you think?

I believe the reality is much better than this.
I would recommend Raid 1, and a dedicated ethernet connection. Then if you lose
the connection, it's like losing a disk. RAID 1 can handle this. When the
connection comes back, you have to resync the partition, just like you'd
replaced a disk. You need not lose data in either case. However, if this is
going to happen often, then you'll pay the price in disk resynchronization
time. However, at 5-10 mbytes/sec, you can resync pretty good sized partitions
pretty fast.

I still believe that this is worthwhile approach.

> Now I'm thinking and studing the posibility to make
> a kernel patch to implement a mirror system at file level.
> If somebody have any idea about this, tell me please.

There is a project which is well underway in this endeavor, it's called
intermezzo. You can find it here: www.inter-mezzo.org

However, I think the partition mirroring is also a great idea. There is a new
NBD driver (gnbd?) under development, and they're interested in using it with
Linux-HA.

> Excuse me if there is any mistake in this letter because
> my English is poor.

Pero su Engles is mucho mejor que mi Espanol :-) Hablo solo un poquito de
Espanol.

I'll copy you a few emails from others about the gnbd project, and related
things in a separate mail.

I'd recommend that you join the Linux-HA development list. You can join it
here:
http://lists.tummy.com/mailman/listinfo/linux-ha-dev

Thanks again!


-- Alan Robertson
alanr@bell-labs.com
Re: Presentation [ In reply to ]
On Sun, 17 Oct 1999, Alan Robertson wrote:
> CEINTEC i+d wrote:
> > I have been testing the Networked block device "nbd" with
> > raid 1/5 to make this, but I'm afraid to say that this is
> > a very dangerous solution, if a connection fails it
> > produces a file system corruption; I think this metod is
> > so dangerous than to share an scsi disk array with
> > two nodes, because a problem in the master node may
> > cause an important data corruption.What do you think?
>
> I believe the reality is much better than this.
> I would recommend Raid 1, and a dedicated ethernet connection. Then if you lose
> the connection, it's like losing a disk. RAID 1 can handle this.

Also (and this is more of a linux-raid issue) I have seen discussion
recently on the linux-raid list indicating that, while software RAID 1/0
works, software RAID 1/5 may not currently work. At least someone was
complaining of a similar problem with RAID 1/5, and they were using all
local disks.

So their problem may have had nothing to do with using NBD...

-Andy