Mailing List Archive

U60-SMP + irq patch + kernel-2.6.11-r7 'crashme' crash
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

To follow up on my IRC note: Pass 2 of your crashme script killed
the U60, configuration logs from which I just sent you. The
script was copying from /dev/sdb4 --> /dev/sda4.

Nothing running but (1) console, (2) a remote ssh from which I was
running the script, (3) possibly an attempt at a 'scp'.

Nothing whatsoever using -X- was active.

Regards,
Ferris
- --
Ferris McCormick (P44646, MI) <fmccor@gentoo.org>
Developer, Gentoo Linux (Sparc)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFCe2EHQa6M3+I///cRAmMYAKDd+PVT6gNWgcyLLrpS0mddG8YUDwCg5GQK
QLtKFcF5QkeLgLsO4s/XAXY=
=/juE
-----END PGP SIGNATURE-----
--
gentoo-sparc@gentoo.org mailing list
Re: U60-SMP + irq patch + kernel-2.6.11-r7 'crashme' crash [ In reply to ]
> To follow up on my IRC note: Pass 2 of your crashme script killed
> the U60, configuration logs from which I just sent you. The
> script was copying from /dev/sdb4 --> /dev/sda4.

What scsi host are those drives on? Can you try doing this without smp
enabled? Weeve said his UP blade was having issues under high scsi
activity, so it may be related to that rather than SMP.

--Jeremy
Re: U60-SMP + irq patch + kernel-2.6.11-r7 'crashme' crash [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Jeremy,

On both cases I've documented to far (my netra 1405 and Ferris' ultra60),
the scsi conrtorller is symbios 53c875 rev 14. I have tried a UP kernel, and that
made no difference. I have also verified that the issue exists when the script
runs on a single partition rather than across two controllers or two drives.

It would be nice to try on something that uses a different scsi driver,l alahouth
I can say that my E450 doesn't crash, and it uses a 53c875 rev 03.

Josh

Jeremy Huddleston wrote:
>>To follow up on my IRC note: Pass 2 of your crashme script killed
>>the U60, configuration logs from which I just sent you. The
>>script was copying from /dev/sdb4 --> /dev/sda4.
>
>
> What scsi host are those drives on? Can you try doing this without smp
> enabled? Weeve said his UP blade was having issues under high scsi
> activity, so it may be related to that rather than SMP.
>
> --Jeremy
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCf00PFAhB33r2ACYRAsFbAJkBLpaZgJznyhbXy9sy+kayiNCURACeLgFM
oPbRwVyK5tKRrOYlM9CJ+To=
=TiGN
-----END PGP SIGNATURE-----
--
gentoo-sparc@gentoo.org mailing list
Re: U60-SMP + irq patch + kernel-2.6.11-r7 'crashme' crash [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 9 May 2005, Jeremy Huddleston wrote:

> What scsi host are those drives on? Can you try doing this without smp
> enabled? Weeve said his UP blade was having issues under high scsi
> activity, so it may be related to that rather than SMP.

Yes, I put a Sun compatible Symbios Logic 53c875 (rev 26) card in my Blade
100 system, along with an 18GB Hitachi DK32DJ-18MW drive. As it is one of
my primary build systems, and Portage's temp and package directories are
on the drive, it tends to see at least a couple hours of compilation time
and high disk I/O a day.

Previously, a vanilla 2.6.6 kernel was the only 2.6 kernel I could use for
any length of time without lockups. Right now it's running
gentoo-sources-2.6.11-r4 and has been up for a little over 10 days. I'll
keep folks posted if this kernel appears to tank.

Cheers,
- --
Jason Wever
Gentoo/Sparc Co-Team Lead
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD4DBQFCf6l+dKvgdVioq28RApJcAJjkeSbaheY4un5FZSOl91CiqzZBAKCj5fK6
iY7lApd/tjXmKgx/fwrnBQ==
=w0o8
-----END PGP SIGNATURE-----
--
gentoo-sparc@gentoo.org mailing list
Re: U60-SMP + irq patch + kernel-2.6.11-r7 'crashme' crash [ In reply to ]
On 5/9/05 11:18 AM, Jason Wever wrote:

>> What scsi host are those drives on? Can you try doing this without smp
>> enabled? Weeve said his UP blade was having issues under high scsi
>> activity, so it may be related to that rather than SMP.

I've had trouble on a UP Blade 100 with any of the 2.6.x kernels I've
built. All have been vanilla from kernel.org, and all have choked (hard)
within a few days of boot. My box is stock from Sun -- no SCSI, no
add-ons, no nothin'.

The 2.4.x kernels will run for months, but I've sworn off 2.6 for now.

--
Paul Heinlein <> heinlein@madboa.com <> www.madboa.com
--
gentoo-sparc@gentoo.org mailing list
Re: U60-SMP + irq patch + kernel-2.6.11-r7 'crashme' crash [ In reply to ]
My two strongest theories are that either the PCI streaming
buffer sequence is hanging, or there is some bum pointer
reference that hangs the box (the kernel TLB miss handler
is optimized to the point where certain bad kernel pointer
accesses wedge the chip instead of invoking the fault handler
to print a message, sorry).

Anyways, the patch below implements a timeout for the streaming
buffer flushes. People seeing the sym2 system hangs should
give this a try.

[SPARC64]: Add timeout for streaming buffer flushes.

If there is some hardware error, we will spin forever waiting
for the streaming buffer flush flag to get updated. This is
asking for trouble, and will result in difficult to diagnose
wedged systems if the condition should ever occur.

Signed-off-by: David S. Miller <davem@davemloft.net>

--- 1/arch/sparc64/kernel/sbus.c.~1~ 2005-05-05 17:15:57.000000000 -0700
+++ 2/arch/sparc64/kernel/sbus.c 2005-05-09 15:11:02.000000000 -0700
@@ -117,19 +117,34 @@ static void iommu_flush(struct sbus_iomm

#define STRBUF_TAG_VALID 0x02UL

-static void strbuf_flush(struct sbus_iommu *iommu, u32 base, unsigned long npages)
+static void sbus_strbuf_flush(struct sbus_iommu *iommu, u32 base, unsigned long npages)
{
+ unsigned long n;
+ int limit;
+
iommu->strbuf_flushflag = 0UL;
- while (npages--)
- upa_writeq(base + (npages << IO_PAGE_SHIFT),
+ n = npages;
+ while (n--)
+ upa_writeq(base + (n << IO_PAGE_SHIFT),
iommu->strbuf_regs + STRBUF_PFLUSH);

/* Whoopee cushion! */
upa_writeq(__pa(&iommu->strbuf_flushflag),
iommu->strbuf_regs + STRBUF_FSYNC);
upa_readq(iommu->sbus_control_reg);
- while (iommu->strbuf_flushflag == 0UL)
+
+ limit = 10000;
+ while (iommu->strbuf_flushflag == 0UL) {
+ limit--;
+ if (!limit)
+ break;
+ udelay(10);
membar("#LoadLoad");
+ }
+ if (!limit)
+ printk(KERN_WARNING "sbus_strbuf_flush: flushflag timeout "
+ "vaddr[%08x] npages[%ld]\n",
+ base, npages);
}

static iopte_t *alloc_streaming_cluster(struct sbus_iommu *iommu, unsigned long npages)
@@ -406,7 +421,7 @@ void sbus_unmap_single(struct sbus_dev *

spin_lock_irqsave(&iommu->lock, flags);
free_streaming_cluster(iommu, dma_base, size >> IO_PAGE_SHIFT);
- strbuf_flush(iommu, dma_base, size >> IO_PAGE_SHIFT);
+ sbus_strbuf_flush(iommu, dma_base, size >> IO_PAGE_SHIFT);
spin_unlock_irqrestore(&iommu->lock, flags);
}

@@ -569,7 +584,7 @@ void sbus_unmap_sg(struct sbus_dev *sdev
iommu = sdev->bus->iommu;
spin_lock_irqsave(&iommu->lock, flags);
free_streaming_cluster(iommu, dvma_base, size >> IO_PAGE_SHIFT);
- strbuf_flush(iommu, dvma_base, size >> IO_PAGE_SHIFT);
+ sbus_strbuf_flush(iommu, dvma_base, size >> IO_PAGE_SHIFT);
spin_unlock_irqrestore(&iommu->lock, flags);
}

@@ -581,7 +596,7 @@ void sbus_dma_sync_single_for_cpu(struct
size = (IO_PAGE_ALIGN(base + size) - (base & IO_PAGE_MASK));

spin_lock_irqsave(&iommu->lock, flags);
- strbuf_flush(iommu, base & IO_PAGE_MASK, size >> IO_PAGE_SHIFT);
+ sbus_strbuf_flush(iommu, base & IO_PAGE_MASK, size >> IO_PAGE_SHIFT);
spin_unlock_irqrestore(&iommu->lock, flags);
}

@@ -605,7 +620,7 @@ void sbus_dma_sync_sg_for_cpu(struct sbu
size = IO_PAGE_ALIGN(sg[i].dma_address + sg[i].dma_length) - base;

spin_lock_irqsave(&iommu->lock, flags);
- strbuf_flush(iommu, base, size >> IO_PAGE_SHIFT);
+ sbus_strbuf_flush(iommu, base, size >> IO_PAGE_SHIFT);
spin_unlock_irqrestore(&iommu->lock, flags);
}

--- 1/arch/sparc64/kernel/pci_iommu.c.~1~ 2005-05-05 17:15:57.000000000 -0700
+++ 2/arch/sparc64/kernel/pci_iommu.c 2005-05-09 15:09:12.000000000 -0700
@@ -8,6 +8,7 @@
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/mm.h>
+#include <linux/delay.h>

#include <asm/pbm.h>

@@ -379,6 +380,54 @@ bad:
return PCI_DMA_ERROR_CODE;
}

+static void pci_strbuf_flush(struct pci_strbuf *strbuf, struct pci_iommu *iommu, u32 vaddr, unsigned long ctx, unsigned long npages)
+{
+ int limit;
+
+ PCI_STC_FLUSHFLAG_INIT(strbuf);
+ if (strbuf->strbuf_ctxflush &&
+ iommu->iommu_ctxflush) {
+ unsigned long matchreg, flushreg;
+
+ flushreg = strbuf->strbuf_ctxflush;
+ matchreg = PCI_STC_CTXMATCH_ADDR(strbuf, ctx);
+
+ limit = 10000;
+ do {
+ pci_iommu_write(flushreg, ctx);
+ udelay(10);
+ limit--;
+ if (!limit)
+ break;
+ } while(((long)pci_iommu_read(matchreg)) < 0L);
+ if (!limit)
+ printk(KERN_WARNING "pci_strbuf_flush: ctx flush "
+ "timeout vaddr[%08x] ctx[%lx]\n",
+ vaddr, ctx);
+ } else {
+ unsigned long i;
+
+ for (i = 0; i < npages; i++, vaddr += IO_PAGE_SIZE)
+ pci_iommu_write(strbuf->strbuf_pflush, vaddr);
+ }
+
+ pci_iommu_write(strbuf->strbuf_fsync, strbuf->strbuf_flushflag_pa);
+ (void) pci_iommu_read(iommu->write_complete_reg);
+
+ limit = 10000;
+ while (!PCI_STC_FLUSHFLAG_SET(strbuf)) {
+ limit--;
+ if (!limit)
+ break;
+ udelay(10);
+ membar("#LoadLoad");
+ }
+ if (!limit)
+ printk(KERN_WARNING "pci_strbuf_flush: flushflag timeout "
+ "vaddr[%08x] ctx[%lx] npages[%ld]\n",
+ vaddr, ctx, npages);
+}
+
/* Unmap a single streaming mode DMA translation. */
void pci_unmap_single(struct pci_dev *pdev, dma_addr_t bus_addr, size_t sz, int direction)
{
@@ -386,7 +435,7 @@ void pci_unmap_single(struct pci_dev *pd
struct pci_iommu *iommu;
struct pci_strbuf *strbuf;
iopte_t *base;
- unsigned long flags, npages, i, ctx;
+ unsigned long flags, npages, ctx;

if (direction == PCI_DMA_NONE)
BUG();
@@ -414,29 +463,8 @@ void pci_unmap_single(struct pci_dev *pd
ctx = (iopte_val(*base) & IOPTE_CONTEXT) >> 47UL;

/* Step 1: Kick data out of streaming buffers if necessary. */
- if (strbuf->strbuf_enabled) {
- u32 vaddr = bus_addr;
-
- PCI_STC_FLUSHFLAG_INIT(strbuf);
- if (strbuf->strbuf_ctxflush &&
- iommu->iommu_ctxflush) {
- unsigned long matchreg, flushreg;
-
- flushreg = strbuf->strbuf_ctxflush;
- matchreg = PCI_STC_CTXMATCH_ADDR(strbuf, ctx);
- do {
- pci_iommu_write(flushreg, ctx);
- } while(((long)pci_iommu_read(matchreg)) < 0L);
- } else {
- for (i = 0; i < npages; i++, vaddr += IO_PAGE_SIZE)
- pci_iommu_write(strbuf->strbuf_pflush, vaddr);
- }
-
- pci_iommu_write(strbuf->strbuf_fsync, strbuf->strbuf_flushflag_pa);
- (void) pci_iommu_read(iommu->write_complete_reg);
- while (!PCI_STC_FLUSHFLAG_SET(strbuf))
- membar("#LoadLoad");
- }
+ if (strbuf->strbuf_enabled)
+ pci_strbuf_flush(strbuf, iommu, bus_addr, ctx, npages);

/* Step 2: Clear out first TSB entry. */
iopte_make_dummy(iommu, base);
@@ -647,29 +675,8 @@ void pci_unmap_sg(struct pci_dev *pdev,
ctx = (iopte_val(*base) & IOPTE_CONTEXT) >> 47UL;

/* Step 1: Kick data out of streaming buffers if necessary. */
- if (strbuf->strbuf_enabled) {
- u32 vaddr = (u32) bus_addr;
-
- PCI_STC_FLUSHFLAG_INIT(strbuf);
- if (strbuf->strbuf_ctxflush &&
- iommu->iommu_ctxflush) {
- unsigned long matchreg, flushreg;
-
- flushreg = strbuf->strbuf_ctxflush;
- matchreg = PCI_STC_CTXMATCH_ADDR(strbuf, ctx);
- do {
- pci_iommu_write(flushreg, ctx);
- } while(((long)pci_iommu_read(matchreg)) < 0L);
- } else {
- for (i = 0; i < npages; i++, vaddr += IO_PAGE_SIZE)
- pci_iommu_write(strbuf->strbuf_pflush, vaddr);
- }
-
- pci_iommu_write(strbuf->strbuf_fsync, strbuf->strbuf_flushflag_pa);
- (void) pci_iommu_read(iommu->write_complete_reg);
- while (!PCI_STC_FLUSHFLAG_SET(strbuf))
- membar("#LoadLoad");
- }
+ if (strbuf->strbuf_enabled)
+ pci_strbuf_flush(strbuf, iommu, bus_addr, ctx, npages);

/* Step 2: Clear out first TSB entry. */
iopte_make_dummy(iommu, base);
@@ -715,28 +722,7 @@ void pci_dma_sync_single_for_cpu(struct
}

/* Step 2: Kick data out of streaming buffers. */
- PCI_STC_FLUSHFLAG_INIT(strbuf);
- if (iommu->iommu_ctxflush &&
- strbuf->strbuf_ctxflush) {
- unsigned long matchreg, flushreg;
-
- flushreg = strbuf->strbuf_ctxflush;
- matchreg = PCI_STC_CTXMATCH_ADDR(strbuf, ctx);
- do {
- pci_iommu_write(flushreg, ctx);
- } while(((long)pci_iommu_read(matchreg)) < 0L);
- } else {
- unsigned long i;
-
- for (i = 0; i < npages; i++, bus_addr += IO_PAGE_SIZE)
- pci_iommu_write(strbuf->strbuf_pflush, bus_addr);
- }
-
- /* Step 3: Perform flush synchronization sequence. */
- pci_iommu_write(strbuf->strbuf_fsync, strbuf->strbuf_flushflag_pa);
- (void) pci_iommu_read(iommu->write_complete_reg);
- while (!PCI_STC_FLUSHFLAG_SET(strbuf))
- membar("#LoadLoad");
+ pci_strbuf_flush(strbuf, iommu, bus_addr, ctx, npages);

spin_unlock_irqrestore(&iommu->lock, flags);
}
@@ -749,7 +735,8 @@ void pci_dma_sync_sg_for_cpu(struct pci_
struct pcidev_cookie *pcp;
struct pci_iommu *iommu;
struct pci_strbuf *strbuf;
- unsigned long flags, ctx;
+ unsigned long flags, ctx, npages, i;
+ u32 bus_addr;

pcp = pdev->sysdata;
iommu = pcp->pbm->iommu;
@@ -772,36 +759,14 @@ void pci_dma_sync_sg_for_cpu(struct pci_
}

/* Step 2: Kick data out of streaming buffers. */
- PCI_STC_FLUSHFLAG_INIT(strbuf);
- if (iommu->iommu_ctxflush &&
- strbuf->strbuf_ctxflush) {
- unsigned long matchreg, flushreg;
-
- flushreg = strbuf->strbuf_ctxflush;
- matchreg = PCI_STC_CTXMATCH_ADDR(strbuf, ctx);
- do {
- pci_iommu_write(flushreg, ctx);
- } while (((long)pci_iommu_read(matchreg)) < 0L);
- } else {
- unsigned long i, npages;
- u32 bus_addr;
-
- bus_addr = sglist[0].dma_address & IO_PAGE_MASK;
-
- for(i = 1; i < nelems; i++)
- if (!sglist[i].dma_length)
- break;
- i--;
- npages = (IO_PAGE_ALIGN(sglist[i].dma_address + sglist[i].dma_length) - bus_addr) >> IO_PAGE_SHIFT;
- for (i = 0; i < npages; i++, bus_addr += IO_PAGE_SIZE)
- pci_iommu_write(strbuf->strbuf_pflush, bus_addr);
- }
-
- /* Step 3: Perform flush synchronization sequence. */
- pci_iommu_write(strbuf->strbuf_fsync, strbuf->strbuf_flushflag_pa);
- (void) pci_iommu_read(iommu->write_complete_reg);
- while (!PCI_STC_FLUSHFLAG_SET(strbuf))
- membar("#LoadLoad");
+ bus_addr = sglist[0].dma_address & IO_PAGE_MASK;
+ for(i = 1; i < nelems; i++)
+ if (!sglist[i].dma_length)
+ break;
+ i--;
+ npages = (IO_PAGE_ALIGN(sglist[i].dma_address + sglist[i].dma_length)
+ - bus_addr) >> IO_PAGE_SHIFT;
+ pci_strbuf_flush(strbuf, iommu, bus_addr, ctx, npages);

spin_unlock_irqrestore(&iommu->lock, flags);
}
--
gentoo-sparc@gentoo.org mailing list