Mailing List Archive

major corruption with ext3 and drbd
I don't know if it was me or what, but the primary node went down today, the
other took over, and a good portion of the files were corrupt when it
remounted.

I'm not happy.

:-(

--
Dan Yocum
Sloan Digital Sky Survey, Fermilab 630.840.6509
yocum@example.com, http://www.sdss.org
SDSS. Mapping the Universe.
Re: major corruption with ext3 and drbd [ In reply to ]
And it got worse - after the second node remounted about 90% of the files
were OK, until the primary node came back again and a full sync started in
the reverse direction - after about 8 hours of full syncing there were 3
directories of files left, but a 'df' on the system showed that the space
was still being used (~19GB). I don't know what could cause the massive
inode corruption, but the only thing accessing the volumes was drbd.
Umounting and attempting to fsck didn't work and then I couldn't remount at
all 'cause I couldn't find a good superblock.

Nope, not pleased at all.



Dan Yocum wrote:
>
> I don't know if it was me or what, but the primary node went down today, the
> other took over, and a good portion of the files were corrupt when it
> remounted.
>
> I'm not happy.
>
> :-(
>
> --
> Dan Yocum
> Sloan Digital Sky Survey, Fermilab 630.840.6509
> yocum@example.com, http://www.sdss.org
> SDSS. Mapping the Universe.
>
> _______________________________________________
> DRBD-devel mailing list
> DRBD-devel@example.com
> https://lists.sourceforge.net/lists/listinfo/drbd-devel



--
Dan Yocum
Sloan Digital Sky Survey, Fermilab 630.840.6509
yocum@example.com, http://www.sdss.org
SDSS. Mapping the Universe.
Re: major corruption with ext3 and drbd [ In reply to ]
On Fri, 14 Dec 2001, Dan Yocum wrote:

> And it got worse - after the second node remounted about 90% of the files
> were OK, until the primary node came back again and a full sync started in
> the reverse direction - after about 8 hours of full syncing there were 3
> directories of files left, but a 'df' on the system showed that the space
> was still being used (~19GB). I don't know what could cause the massive
> inode corruption, but the only thing accessing the volumes was drbd.
> Umounting and attempting to fsck didn't work and then I couldn't remount at
> all 'cause I couldn't find a good superblock.
>
which kernel version are you running?
2.4.15/14 was it? Had a filecorruption bug.


--
espen







> Nope, not pleased at all.
>
>
>
> Dan Yocum wrote:
> >
> > I don't know if it was me or what, but the primary node went down today, the
> > other took over, and a good portion of the files were corrupt when it
> > remounted.
> >
> > I'm not happy.
> >
> > :-(
> >
> > --
> > Dan Yocum
> > Sloan Digital Sky Survey, Fermilab 630.840.6509
> > yocum@example.com, http://www.sdss.org
> > SDSS. Mapping the Universe.
> >
> > _______________________________________________
> > DRBD-devel mailing list
> > DRBD-devel@example.com
> > https://lists.sourceforge.net/lists/listinfo/drbd-devel
>
>
>
> --
> Dan Yocum
> Sloan Digital Sky Survey, Fermilab 630.840.6509
> yocum@example.com, http://www.sdss.org
> SDSS. Mapping the Universe.
>
> _______________________________________________
> DRBD-devel mailing list
> DRBD-devel@example.com
> https://lists.sourceforge.net/lists/listinfo/drbd-devel
>
Re: major corruption with ext3 and drbd [ In reply to ]
Espen Myrland wrote:

> which kernel version are you running?
> 2.4.15/14 was it? Had a filecorruption bug.


Yeah, I know 'bout that one. I'm using 2.4.9 from Red Hat.



--
Dan Yocum
Sloan Digital Sky Survey, Fermilab 630.840.6509
yocum@example.com, http://www.sdss.org
SDSS. Mapping the Universe.
Re: major corruption with ext3 and drbd [ In reply to ]
Sulamita Garcia wrote:
>
> Hi
>
> I have a corruption with drbd too, but no one told me anything.
> Maybe we can talk about. What is your linux? I use slackware. My boxes
> are netfinity, with raid-0, and scsi disks. Maybe one this thing have
> problems with drbd. In other normal PCs I don't had any errors.


Hmmmm.... interesting. I'm doing linear RAID - maybe this is part of our
problem.

Here's the systems: AMD 1.2GH, 128MB DDRAM, 2 Maxtor 80GB drives - one has
the system areas and a large portion dedicated to data, the second is
completely dedicated to data and those 2 volumes are linear appended
together. Distro is RedHat 7.2, kernel 2.4.9, drbd v0.6.1pre6.



I see lots of these errors thrown by the kernel:

Dec 12 17:37:53 sdssprd2 kernel: drbd0: Connection established.
Dec 12 17:37:53 sdssprd2 kernel: drbd0: size=153430592 KB / blksize=4096 B
Dec 12 17:37:53 sdssprd2 kernel: drbd0: Synchronisation started blks=15
int=1
Dec 12 17:37:59 sdssprd2 kernel: attempt to access beyond end of device
Dec 12 17:37:59 sdssprd2 kernel: 2b:00: rw=0, want=542230180,
limit=153430592
Dec 12 17:37:59 sdssprd2 kernel: EXT3-fs error (device drbd(43,0)):
ext3_get_inode_loc: unable to read inode block - inode=196609,
block=135557544
Dec 12 17:37:59 sdssprd2 kernel: EXT3-fs error (device drbd(43,0)) in
ext3_reserve_inode_write: IO failure
etc....

Thoughts?


Also, Philipp, is this a bad sign:

Nov 12 10:52:03 sdssprd2 kernel: drbd0: blksize=1024 B
Nov 12 10:52:03 sdssprd2 kernel: drbd0: blksize=4096 B

I know I've asked that question before, I just can't remember the answer.


Dan


--
Dan Yocum
Sloan Digital Sky Survey, Fermilab 630.840.6509
yocum@example.com, http://www.sdss.org
SDSS. Mapping the Universe.
Re: major corruption with ext3 and drbd [ In reply to ]
Dan Yocum wrote:
>
> Hmmmm.... interesting. I'm doing linear RAID - maybe this is part of our
> problem.

hmmm... it's a possibility...

> I see lots of these errors thrown by the kernel:
>
> Dec 12 17:37:53 sdssprd2 kernel: drbd0: Connection established.
> Dec 12 17:37:53 sdssprd2 kernel: drbd0: size=153430592 KB / blksize=4096 B
> Dec 12 17:37:53 sdssprd2 kernel: drbd0: Synchronisation started blks=15
> int=1
> Dec 12 17:37:59 sdssprd2 kernel: attempt to access beyond end of device
> Dec 12 17:37:59 sdssprd2 kernel: 2b:00: rw=0, want=542230180,
> limit=153430592
> Dec 12 17:37:59 sdssprd2 kernel: EXT3-fs error (device drbd(43,0)):
> ext3_get_inode_loc: unable to read inode block - inode=196609,
> block=135557544
> Thoughts?

I had this error when I used one partition in secondary node that was
smallest that the partition in primary node. I wish preserve the data
>from primary node, but drbd mirror block, and some data could be in last
blocks, so when you try access some files, returns this error. I'm not
sure, but when I resize the partition on secondary node, this error was
fix.

>
> Also, Philipp, is this a bad sign:
>
> Nov 12 10:52:03 sdssprd2 kernel: drbd0: blksize=1024 B
> Nov 12 10:52:03 sdssprd2 kernel: drbd0: blksize=4096 B

I had this messages too, but I read that drbd work with any size of
block, I don't care... should I?

>
> I know I've asked that question before, I just can't remember the answer.
>
> Dan
>
> --
> Dan Yocum
> Sloan Digital Sky Survey, Fermilab 630.840.6509
> yocum@example.com, http://www.sdss.org
> SDSS. Mapping the Universe.
>
> _______________________________________________
> DRBD-devel mailing list
> DRBD-devel@example.com
> https://lists.sourceforge.net/lists/listinfo/drbd-devel

--
Sulamita Garcia Alta Disponibilidade em Linux
Analista de Suporte http://ha.underlinux.com.br
sulamita.garcia@example.com
ICQ 16465301 Grupo de Usuárias de Linux
http://linuxchix-br.nl.linux.org
Re: major corruption with ext3 and drbd [ In reply to ]
Sulamita Garcia wrote:
>

> I had this error when I used one partition in secondary node that was
> smallest that the partition in primary node.

Smaller... sorry, sorry...
--
Sulamita Garcia Alta Disponibilidade em Linux
Analista de Suporte http://ha.underlinux.com.br
sulamita.garcia@example.com
ICQ 16465301 Grupo de Usuárias de Linux
http://linuxchix-br.nl.linux.org