Mailing List Archive

embedded ext2 and fsck
G'day,

My embedded environment is evolving. The Disk-On-Module currently has
the following partitions:
/dev/hda2 - / - root (ext2)
/dev/hda1 - /boot - syslinux boot partition (FAT16)
/dev/hda3 - /var - ext2, rw

The system has a 486 and is running kernel 2.6.29.6.

Over the past month I've encountered numerous "Stale NFS file handle"
errors. The device isn't networked and there's no apparent reason for
them (as best I can tell).

How important is running fsck in an embedded ext2 environment?

I'm considering
1) "fsck -C -T -a" on every boot
2) letting fsck run according to the tune2fs count
3) using "tune2fs -C 0" to disable checking totally

When do y'all do and recommend?

Thanks.

David
Re: embedded ext2 and fsck [ In reply to ]
On 06/04/2010 15:16, Relson, David wrote:
> Over the past month I've encountered numerous "Stale NFS file handle"
> errors. The device isn't networked and there's no apparent reason for
> them (as best I can tell).
>

I believe (please someone shoot me down) that these types of errors are
indicative of some on disk corruption - not sure why it refers to NFS
though.

So I think you have the big problem here that the fsck adds a good chunk
to the boot time, but disabling it leads to silent corruption and
potentially big problems down the line... One compromise would perhaps
be to split the device to read-only and read-write (which I think you
may have done?) and this then perhaps allows you to split the disk into
high risk and safe data... Now you can play roulette with less important
stuff and use ext3, etc with the more important stuff?

Just an idea

Good luck

Ed W
Re: embedded ext2 and fsck [ In reply to ]
Hi David,

Am Dienstag, den 06.04.2010, 10:16 -0400 schrieb Relson, David:
[...]
> Over the past month I've encountered numerous "Stale NFS file handle"
> errors. The device isn't networked and there's no apparent reason for
> them (as best I can tell).

i had the same problems on a (i think it was ext2) root-filesystem which
was never written to, but was also not mounted ro.

i thought that i would not need the fsck on boot up, because there are
no write accesses to the device ...

however after some months of operation the device failed to boot with
exact the same error message ...

the reason i suspect was that due to power failures the ext2 got
inconsistent somehow ... which resulted in "stale NFS file handle
messages" ... not very intuitive ;)

i put in a check & repair of the filesystem (with -y) on every boot and
now those errors are gone ...

i think the problem encounters when a not cleanly shut down ext2 fs gets
mounted over and over again ... and ... maybe something got written to
it even if i would not know ... it was a mini-itx running gentoo
with /var (mostly) on another (rw) partition - but not very "embedded"
in terms of stripped down ... ;)

hope this helps,
marcus.
Re: embedded ext2 and fsck [ In reply to ]
On 06/04/2010 22:20, Marcus Priesch wrote:
> however after some months of operation the device failed to boot with
> exact the same error message ...
>
> the reason i suspect was that due to power failures the ext2 got
> inconsistent somehow ... which resulted in "stale NFS file handle
> messages" ... not very intuitive ;)
>


It would be interesting to hear if these errors "go away" by switching
to EXT3?

There seem to be several things happening here:

1) The CF card is quietly shuffling data around, so in theory it might
move a good sector onto a patch of flash which is worn out, causing it
to be corrupted on next read. Similarly when you "write" the card does
quite a lot of work in the background and theoretically if power was
lost during the shuffling around of sectors this could also cause data loss?

2) Sudden shutdowns causing the ext2 to be marked dirty and causing
subsequent problems (ie not fully read-only mounted

To be honest, I don't know a lot about how ext2 is mounted read-only,
but option 2) above seems unlikely...?

This suggests that there are real problems with CF cards getting old and
the wear levelling causing data to be shuffled onto worn out sectors.
And/Or it may prove that the wear leveling causes corruption if power is
removed during a write and sectors are only partly shuffled (which kind
of makes sense). Both ideas don't seem to be well talked about and
there is huge disagreement about the probable lifetimes of various flash
devices? Certainly I haven't ever had a bad device so I have never
really seen how they fail? However, I have experienced wierd
corruptions (on windows!) with certain devices if I unplug them suddenly
(ie they loose power suddenly) while they are writing - this could
indicate that certain devices have poor implementations of wear levelling?

Interesting stuff... However, if switching to ext3 fixes things then
this sounds like an OS issue and not a CF card issue?

Cheers

Ed W
Re: embedded ext2 and fsck [ In reply to ]
> Hi David,
>
> Am Dienstag, den 06.04.2010, 10:16 -0400 schrieb Relson, David:
> [...]
>
> > Over the past month I've encountered numerous "Stale NFS file handle"
> > errors. The device isn't networked and there's no apparent reason for
> > them (as best I can tell).
>
> i had the same problems on a (i think it was ext2) root-filesystem which
> was never written to, but was also not mounted ro.
>
> i thought that i would not need the fsck on boot up, because there are
> no write accesses to the device ...
>
> however after some months of operation the device failed to boot with
> exact the same error message ...
>
> the reason i suspect was that due to power failures the ext2 got
> inconsistent somehow ... which resulted in "stale NFS file handle
> messages" ... not very intuitive ;)
>
> i put in a check & repair of the filesystem (with -y) on every boot and
> now those errors are gone ...
>
> i think the problem encounters when a not cleanly shut down ext2 fs gets
> mounted over and over again ... and ... maybe something got written to
> it even if i would not know ... it was a mini-itx running gentoo
> with /var (mostly) on another (rw) partition - but not very "embedded"
> in terms of stripped down ... ;)
>
> hope this helps,
> marcus.

Stale NFS socket you say.... those meant my custom compilation past away, as
the filesystem became unusable. I've chopped the flash and bought new one.

Until now, I've managed to kill several flashes. Apparently they're not so
wearproof as vendors say. This could be coincidence, but all of them were
Kingstons with lifetime warranty.

can I include screenshot here? I mange my Personal Collection of Failures (TM)
and I've got screenshot of this failure too :-))
Re: embedded ext2 and fsck [ In reply to ]
Relson, David wrote:
> My embedded environment is evolving. The Disk-On-Module currently
> has the following partitions:
> /dev/hda2 - / - root (ext2)
> /dev/hda1 - /boot - syslinux boot partition (FAT16)
> /dev/hda3 - /var - ext2, rw
..
> How important is running fsck in an embedded ext2 environment?

For read-only partitions on perfect media it is never needed.


> When do y'all do and recommend?

Since you are having problems related to writes, I would recommending
splitting things up so that you have one physical media which is
exclusively read-only, and another physical media which is
read-write. This is what I use for my customers.


Ed W wrote:
> 1) The CF card is quietly shuffling data around, so in theory it
> might move a good sector onto a patch of flash which is worn out,
> causing it to be corrupted on next read.

This will of course destroy a previously healthy ext2 fs.


> 2) Sudden shutdowns causing the ext2 to be marked dirty and causing
> subsequent problems (ie not fully read-only mounted
>
> To be honest, I don't know a lot about how ext2 is mounted
> read-only, but option 2) above seems unlikely...?

If ext2 is mounted ro then it will never be written to by the kernel
and thus never corrupted by power failure.

Of course, if the media itself gets corrupted for whatever reason,
you lose anyway. Hence; use separate media.


//Peter
RE: embedded ext2 and fsck [ In reply to ]
<history>For those who didn't read previous related threads, the
underlying problem encountered is "Stale NFS file handle" errors
appearing for no obvious reason. As best I recall, they have occurred
even when care has been taken to properly use halt, sync, shutdown, etc.
</history>

We're presently running with 3 partitions:

/dev/hda1 - /boot FAT16,ro - syslinux boot partition
/dev/hda2 - / EXT2,fo - linux system and application program
/dev/hda3 - /var EXT2,rw,sync - data partition

The program is calling sync() after every call to close(). This is
slow, but the number of open,write,close,sync cycles is 4 per minute, so
the slowness is livable. Probably this redundant "belt and suspenders"
approach can be optimized to rw,async and sync(). An alternate idea is
to use FAT16 for the data partition (which would work fine because the
program has been ported from DOS and uses 8.3 filenames).

Regards,

David
-----Original Message-----
From: Peter Stuge [mailto:peter@stuge.se]
Sent: Thursday, April 08, 2010 8:15 PM
To: gentoo-embedded@lists.gentoo.org
Subject: Re: [gentoo-embedded] embedded ext2 and fsck

Relson, David wrote:
> My embedded environment is evolving. The Disk-On-Module currently
> has the following partitions:
> /dev/hda2 - / - root (ext2)
> /dev/hda1 - /boot - syslinux boot partition (FAT16)
> /dev/hda3 - /var - ext2, rw
..
> How important is running fsck in an embedded ext2 environment?

For read-only partitions on perfect media it is never needed.


> When do y'all do and recommend?

Since you are having problems related to writes, I would recommending
splitting things up so that you have one physical media which is
exclusively read-only, and another physical media which is
read-write. This is what I use for my customers.


Ed W wrote:
> 1) The CF card is quietly shuffling data around, so in theory it
> might move a good sector onto a patch of flash which is worn out,
> causing it to be corrupted on next read.

This will of course destroy a previously healthy ext2 fs.


> 2) Sudden shutdowns causing the ext2 to be marked dirty and causing
> subsequent problems (ie not fully read-only mounted
>
> To be honest, I don't know a lot about how ext2 is mounted
> read-only, but option 2) above seems unlikely...?

If ext2 is mounted ro then it will never be written to by the kernel
and thus never corrupted by power failure.

Of course, if the media itself gets corrupted for whatever reason,
you lose anyway. Hence; use separate media.


//Peter
Re: embedded ext2 and fsck [ In reply to ]
> Ed W wrote:
>
>> 1) The CF card is quietly shuffling data around, so in theory it
>> might move a good sector onto a patch of flash which is worn out,
>> causing it to be corrupted on next read.
>>
> This will of course destroy a previously healthy ext2 fs.
>
>
>
>> 2) Sudden shutdowns causing the ext2 to be marked dirty and causing
>> subsequent problems (ie not fully read-only mounted
>>
>> To be honest, I don't know a lot about how ext2 is mounted
>> read-only, but option 2) above seems unlikely...?
>>
> If ext2 is mounted ro then it will never be written to by the kernel
> and thus never corrupted by power failure.
>

Sure - but my theory was that badly implemented wear levelling + power
failure during writes could perhaps cause data to be lost on a read-only
partition when writing to another partition on the same media?

I have no basis for this claim, just pondering how wear levelling is
actually implemented in a random off the shelf device...?

I agree that separate media is an excellent idea, but it's not always
easy to achieve using off the shelf boards?

Anyway, curious to hear of anyone loosing data on a read-only partition
in a manner like the above. It's perhaps only theoretical curiousity,
but...

Cheers

Ed W
Re: embedded ext2 and fsck [ In reply to ]
On 04/09/2010 02:24 PM, Relson, David wrote:
> We're presently running with 3 partitions:
>
> /dev/hda1 - /boot FAT16,ro - syslinux boot partition
> /dev/hda2 - / EXT2,fo - linux system and application program
> /dev/hda3 - /var EXT2,rw,sync - data partition
>
> The program is calling sync() after every call to close(). This is
> slow, but the number of open,write,close,sync cycles is 4 per minute, so
> the slowness is livable. Probably this redundant "belt and suspenders"
> approach can be optimized to rw,async and sync(). An alternate idea is
> to use FAT16 for the data partition (which would work fine because the
> program has been ported from DOS and uses 8.3 filenames).
>
> Regards,
>

I'm not sure what kind of lifetime you expect from your device, but if
you want to maximize it, you should have the RW partition on a separate
physical media. Partitions don't really mean anything to the hardware
wear leveling.

If your data media breaks or wears out you just swap it for a new one.

--
Karl