Mailing List Archive

1 2  View All
Re: Strange disk corruption with Linux >= 2.6.13 [ In reply to ]
Hi Rogerio,

On Mon, Oct 03, 2005 at 01:17:19AM -0300, Rogério Brito wrote:
(...)
> The thing is that any stick alone doesn't seem to generate a problem.
> Only when they are used simultaneously
>
> I will test it more to see what may be wrong with my setup. :-( I still
> have not isolated and understood the problem completely. :-(

This is a common problem caused by flaky motherboards and/or poor
power supplies. You should first take a look at your motherboard's
manual to see if it *really* supports your configuration. Often,
they won't support several dual-side sticks simply because there
are too many chips connected to each signal pin. For instance, my
mobo (A7M266-D) has a lot of trouble if I use more than 2 sticks,
and it is documented that I need registered RAM to do this.

Also, sometimes your mobo will not have been carefully tested by
the maker with every combination of memory sticks. It might be
your case. Sometimes it helps to increase the RAM voltage (you
might have a jumper for this on the mobo or may be able to do
this in the BIOS). In my case, it helped to set the RAM to 2.7V,
but that was not enough to get a stable setup.

Last possible trouble may come from the power supply. If it's
not strong enough to maintain a perfect voltage output during
slightly higher intensity peaks, it can cause what you observe.

Hoping this helps,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: Strange disk corruption with Linux >= 2.6.13 [ In reply to ]
On Oct 02 2005, Bodo Eggert wrote:
> Rogério Brito <rbrito@ime.usp.br> wrote:
> > I removed what was extracted right away and tried again to extract
> > the tree (at this point, suspecting even that something in software
> > had problems). The problem with bzip2 occurred again. Then, I
> > rebooted the system an the problem magically went away.
>
> I have a similar problem:

I am still investigating the problem. I am not planning on resting right
now. I really want to understand what's going on with this system.

Too bad that I am quite naïve and don't understand much about hardware
in general. :-(

> This happens mostly if there are concurrent DMA transfers like playing
> sound or watching TV on bttv cards. I'm affected by the later cause,
> setting no_overlay reduced it.

Humm, I think that I may have seen something like this in the past: I
have two CD readers here (both with DMA turned on) and I was once
extracting audio to be converted to MP3 and I noticed one strange
corruption that I have not been able to reproduce again:

Bits of what was extracted from one file appeared in the other disc and
the result was something like a mix of static and alternation between
the two music sources. Weird, huh?


Thanks for the concern, Rogério Brito.

P.S.: I will reboot my system and force an fsck as soon as I can, just
in case.
--
Rogério Brito : rbrito@ime.usp.br : http://www.ime.usp.br/~rbrito
Homepage of the algorithms package : http://algorithms.berlios.de
Homepage on freshmeat: http://freshmeat.net/projects/algorithms/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: Strange disk corruption with Linux >= 2.6.13 [ In reply to ]
Hi.

On Sun, 2005-10-02 at 07:36, Rogério Brito wrote:
> On Sep 28 2005, Nigel Cunningham wrote:
> > Hi Rogerio.
>
> Hi, Nigel.
>
> > On Tue, 2005-09-27 at 21:10, Rogério Brito wrote:
> > > Hi there. I'm seeing a really strange problem on my system lately and I
> > > am not really sure that it has anything to do with the kernels.
> >
> > I've seen the thread mostly following the hardware line. I'd like to
> > enquire down the kernel path because I've seen occasional, impossible
> > to reproduce problems too.
>
> Nice. I also don't want to rule out anything before I really understand
> what's going on.
>
> > Can I ask first a few questions:
>
> Of course.
>
> > 1) Are you using vanilla kernels, or do you have other patches applied?
>
> Yes, all the kernels that I use are just plain vanilla kernels taken
> straight from kernel.org. No other patches applied.

Ok. That's helpful.

> > 2) Are you using ext3 only?
>
> Yes, I am.
>
> > 3) Is the corruption only ever in memory, or seen on disk too?
>
> I have noticed the problem mostly on disk. One strange situation was
> when I was untarring a kernel tree (compressed with bzip2) and in the
> middle of the extraction, bzip2 complained that the thing was
> corrupted.
>
> I removed what was extracted right away and tried again to extract the
> tree (at this point, suspecting even that something in software had
> problems). The problem with bzip2 occurred again. Then, I rebooted the
> system an the problem magically went away.

If you see it in a form where you can see the amount of corruption, can
you see if it is just four bytes?

I'm asking because I have recently started seeing
impossible-to-reliably-reproduce corruption here, which seems to be only
four bytes at a time, in memory originally but possibly also appearing
on disk (probably because of syncing). I originally wondered if it might
be Suspend2 related (in the first instance, assume I messed up :)), but
I haven't been sure. The corruption I'm seeing only affects the root
filesystem. None of this makes much sense if I assume it's a Suspend2
bug. I could have a bad pointer access somewhere, but the rest is just
confusing.

Regards,

Nigel

> > 4) Is the corruption only in one filesystem or spread across several
> > (if applicable)? (ie in / but not /home or others?)
>
> I only have one filesystem right now, but given the difficulties that
> I'm seeing, I do plan to go back to a multiple filesystem setup (which I
> always used but thought that was overkill---nothing like time to teach
> us something what is safest).
>
> If you want to know anything else, don't hesistate to ask.
>
>
> Regards,
--


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: Strange disk corruption with Linux >= 2.6.13 [ In reply to ]
On Sep 29 2005, Alan Cox wrote:
> Some fixes went in early 2.4 and they got refined later on. See the
> function quirk_vialatency). There is a brief summary at the first URL
> listed still. Essentially the chip has a flaw where it can lose a
> transfer.
>
> If people see this behaviour on a KT133 can you please check the quirk
> is being run and displaying
>
> printk(KERN_INFO "Applying VIA southbridge workaround.\n");

Just as an information, I get the following messages on my system:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
rbrito@dumont:~$ dmesg | grep -i via
Disabling VIA memory write queue (PCI ID 0305, rev 02): [55] 89 & 1f -> 09
PCI: Disabling Via external APIC routing
agpgart: Detected VIA Twister-K/KT133x/KM133 chipset
parport_pc: VIA 686A/8231 detected
parport_pc: VIA parallel port: io=0x378, irq=7
VP_IDE: VIA vt82c686a (rev 22) IDE UDMA66 controller on pci0000:00:04.1
Netfilter messages via NETLINK v0.30.
rbrito@dumont:~$ dmesg | grep -i memor
Memory: 775776k/786352k available (1847k kernel code, 10076k reserved, 733k data, 148k init, 0k highmem)
Disabling VIA memory write queue (PCI ID 0305, rev 02): [55] 89 & 1f -> 09
Non-volatile memory driver v1.2
Freeing unused kernel memory: 148k freed
rbrito@dumont:~$
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Is this what is supposed to appear when one is using a 2.6.1x kernel?


Thanks for any hints, Rogério Brito.

--
Rogério Brito : rbrito@ime.usp.br : http://www.ime.usp.br/~rbrito
Homepage of the algorithms package : http://algorithms.berlios.de
Homepage on freshmeat: http://freshmeat.net/projects/algorithms/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: Strange disk corruption with Linux >= 2.6.13 [ In reply to ]
Hi, Ville.

On Sep 28 2005, Ville Herva wrote:
> You may be running into this problem:
>
> http://www.uwsg.iu.edu/hypermail/linux/kernel/0207.2/0574.html
> http://www.cs.helsinki.fi/linux/linux-kernel/2002-02/1727.html
> http://www.cs.helsinki.fi/linux/linux-kernel/2002-01/1048.html
> http://marc.theaimsgroup.com/?l=linux-kernel&m=99889965423508&w=2
>
> (A google search will turn up more.)

Thank you very much for these links. It seems that I may be not alone
here, unfortunately. :-(

> Placing network card to a different PCI slot helped somewhat as did
> upgrading the bios.

I have not played with the network cards, but I have already upgraded
the BIOS firmware to the latest version that I could find (in the hope
that I could get the Duron 1.3GHz being actually identified as such,
instead of operating at 1.1GHz).

> It seemed to be a KT133 Northbridge DMA issue. My impression is that
> KT133 is utter crap period.

Well, is this a problem particular with KT133 or is this a generic thing
with VIA chipsets?

I'm interested because I don't know the other chipset options that are
Open Source friendly---it seems that Nvidia-based ones have to have
reverse-engineered drivers (e.g., forcedeth), which is quite bad, IMO.

I'm intenging to get another system as soon as the dust settles and
x86_64 and SATA drives become mainstream enough to be readily available
here in Brazil for reasonable prices.

But, then, I'd be concerned in getting a chipset from an company that
plays nice with Linux (and the *BSDs too, for that matter). Opinions are
more than welcome.


Thanks, Rogério Brito.

--
Rogério Brito : rbrito@ime.usp.br : http://www.ime.usp.br/~rbrito
Homepage of the algorithms package : http://algorithms.berlios.de
Homepage on freshmeat: http://freshmeat.net/projects/algorithms/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: Strange disk corruption with Linux >= 2.6.13 [ In reply to ]
On Sat, Oct 01, 2005 at 06:28:06PM -0300, Rog?rio Brito wrote:
> Right now, I'm using just a single 512MB module, but it is single-sided
> (I guess that by double-sided you guys mean that it has chips on both
> sides of the module, right?). The only double-sided module that I have
> here is the 256MB module.
>
> OTOH, with just one 512MB everything *seems* to be working fine, but,
> honestly, I'm not sure.

Well maybe a single sided 512M can still have the same interface as a
double sided. Depends how it is wired I suppose.

> Hummm, nice to see that you have also experienced this. With 256 + 128,
> I had to use PC100 to have it work stably.
>
> I'd obviously prefer to have everything working at PC133 speed, but
> wouldn't mind running at PC100 speed if I could use everything, since I
> sometimes need to use some large programs (for some dynamic programming
> problems).

Actually you probably DON'T want the ram to run PC133 since at PC133 the
latency is a bit higher (in clock counts) than at PC100, so overall the
latency stays about the same. On the other hand running the ram
asynchrounous from the front side bus of the cpu makes getting memory
access aligned more complicated and inserts different delays. So most
likely the system really runs fastest when the ram matches the cpu bus
speed which on an A7V is 100MHz (since it never did actually support any
133FSB cpus, you needed the fixed KT133A chipset for that that the A7V-E
had on it). I also only run a 700MHz cpu so heat isn't a problem. I
know the 1GHz cpu made a lot of heat and really needed good cooling. I
don't remember what cpu speed you have.

Len Sorensen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: Strange disk corruption with Linux >= 2.6.13 [ In reply to ]
Rog?rio Brito wrote (ao):
> On Sep 28 2005, Nigel Cunningham wrote:
> > 3) Is the corruption only ever in memory, or seen on disk too?
>
> I have noticed the problem mostly on disk. One strange situation was
> when I was untarring a kernel tree (compressed with bzip2) and in the
> middle of the extraction, bzip2 complained that the thing was
> corrupted.
>
> I removed what was extracted right away and tried again to extract the
> tree (at this point, suspecting even that something in software had
> problems). The problem with bzip2 occurred again. Then, I rebooted the
> system an the problem magically went away.

That would mean the corruption existed in memory only. The kernel
tarball got sucked into memory and got corrupted. On reboot, the tarball
gets read in again, and this time no corruption. The on disk tarball was
oke it seems.

If you run memtest86+ (latest version) for at least 24 hours it _should_
find something.

Kind regards, Sander

--
Humilis IT Services and Solutions
http://www.humilis.net
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: Strange disk corruption with Linux >= 2.6.13 [ In reply to ]
Hi

On Tue, 2005-10-04 at 20:28, Sander wrote:
> Rog?rio Brito wrote (ao):
> > On Sep 28 2005, Nigel Cunningham wrote:
> > > 3) Is the corruption only ever in memory, or seen on disk too?
> >
> > I have noticed the problem mostly on disk. One strange situation was
> > when I was untarring a kernel tree (compressed with bzip2) and in the
> > middle of the extraction, bzip2 complained that the thing was
> > corrupted.
> >
> > I removed what was extracted right away and tried again to extract the
> > tree (at this point, suspecting even that something in software had
> > problems). The problem with bzip2 occurred again. Then, I rebooted the
> > system an the problem magically went away.
>
> That would mean the corruption existed in memory only. The kernel
> tarball got sucked into memory and got corrupted. On reboot, the tarball
> gets read in again, and this time no corruption. The on disk tarball was
> oke it seems.
>
> If you run memtest86+ (latest version) for at least 24 hours it _should_
> find something.

Assuming that it really is a memory issue. Don't discount the
possibility of a kernel bug too quickly, especially when it apparently
worked fine in the past.

Just my 2c, feel free to discount anyway :)

Regards,

Nigel

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: Strange disk corruption with Linux >= 2.6.13 [ In reply to ]
Hi Rogério,

Sorry, was away for a week.

On Sat, 1 Oct 2005, [iso-8859-1] Rogério Brito wrote:

> > Try removing the 256MB module?...
>
> Right now, I'm only using one 512MB module, but after I have already
> paid for the second one, and it wasn't cheap. :-(

Wasn't it 512 + 512 + 256 MB modules that you had? I just suggested
removing only one 256MB module and testing with 2 x 512MB. Which on the
one hand wouldn't be that bad as only having 512MB, and on the other hand
just for a test...

Good luck
Guennadi
---
Guennadi Liakhovetski
Re: Strange disk corruption with Linux >= 2.6.13 [ In reply to ]
Hi, Guennadi.

On Oct 09 2005, Guennadi Liakhovetski wrote:
> Sorry, was away for a week.

No problems. I've been quite busy also.

> On Sat, 1 Oct 2005, [iso-8859-1] Rog?rio Brito wrote:
>
> > > Try removing the 256MB module?...
> >
> > Right now, I'm only using one 512MB module, but after I have already
> > paid for the second one, and it wasn't cheap. :-(
>
> Wasn't it 512 + 512 + 256 MB modules that you had?

Exactly, but I didn't manage to get the 2x512MB modules useable in my
machine. In fact, sometimes the machine wouldn't even POST with the two
modules, but as soon as I removed any one of them, the machine was back
to normal.

> I just suggested removing only one 256MB module and testing with 2 x
> 512MB. Which on the one hand wouldn't be that bad as only having
> 512MB, and on the other hand just for a test...

Right now, I am using 512 + 256 running at PC100 speeds, with latencies
all set to 3-3-3. Now, it seems to run stably, but is slower than what I
would like it to run, of course.

I will still keep trying some combinations, but some of them seem
definitely ruled out (like having both 512 MB modules at the same time).


Thank you very much for your comments, Rogério Brito.

--
Rogério Brito : rbrito@ime.usp.br : http://www.ime.usp.br/~rbrito
Homepage of the algorithms package : http://algorithms.berlios.de
Homepage on freshmeat: http://freshmeat.net/projects/algorithms/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

1 2  View All