Mailing List Archive

PME_Turn_Off in Linux
Hello,
We've been seeing some nasty data corruption issues on some platforms.
We've been capturing PCI-E traces looking for something nasty but we
haven't found anything yet. One of the hardware guys if asking if there
is a call in Linux to issue a PME_Turn_Off broadcast message.

PME_Turn_Off Broadcast Message
Before main component power and reference clocks are turned off, the
Root Complex or Switch Downstream Port must issue a broadcast Message
that instructs all agents downstream of that point within the hierarchy
to cease initiation of any subsequent PM_PME Messages, effective
immediately upon receipt of the PME_Turn_Off Message.

This must be initiated from the root complex. Is there such a call in
linux?

Thanks,
mikem

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: PME_Turn_Off in Linux [ In reply to ]
On Wed, Jan 17, 2007 at 10:43:14AM -0600, Miller, Mike (OS Dev) wrote:
> Hello,
> We've been seeing some nasty data corruption issues on some platforms.
> We've been capturing PCI-E traces looking for something nasty but we
> haven't found anything yet. One of the hardware guys if asking if there
> is a call in Linux to issue a PME_Turn_Off broadcast message.
>
> PME_Turn_Off Broadcast Message
> Before main component power and reference clocks are turned off, the
> Root Complex or Switch Downstream Port must issue a broadcast Message
> that instructs all agents downstream of that point within the hierarchy
> to cease initiation of any subsequent PM_PME Messages, effective
> immediately upon receipt of the PME_Turn_Off Message.
>
> This must be initiated from the root complex. Is there such a call in
> linux?

This firmware that implements the PCI-E connection should do this, I
don't think there is anything that the Operating system can do to
control this, as PCI-E should be transparant to the OS.

Unless this is on a PCI-E Hotplug system? What is the sequence of
events that cause the data corruption?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
RE: PME_Turn_Off in Linux [ In reply to ]
greg k-h wrote:

> On Wed, Jan 17, 2007 at 10:43:14AM -0600, Miller, Mike (OS Dev) wrote:
> > Hello,
> > We've been seeing some nasty data corruption issues on some
> platforms.
> > We've been capturing PCI-E traces looking for something
> nasty but we
> > haven't found anything yet. One of the hardware guys if asking if
> > there is a call in Linux to issue a PME_Turn_Off broadcast message.
> >
> > PME_Turn_Off Broadcast Message
> > Before main component power and reference clocks are turned
> off, the
> > Root Complex or Switch Downstream Port must issue a
> broadcast Message
> > that instructs all agents downstream of that point within the
> > hierarchy to cease initiation of any subsequent PM_PME Messages,
> > effective immediately upon receipt of the PME_Turn_Off Message.
> >
> > This must be initiated from the root complex. Is there such
> a call in
> > linux?
>
> This firmware that implements the PCI-E connection should do
> this, I don't think there is anything that the Operating
> system can do to control this, as PCI-E should be transparant
> to the OS.

Hmmm, the hw folks tell me that "other" os'es implement that. But I
would tend to agree that system firmware should probably be doing this.

>
> Unless this is on a PCI-E Hotplug system? What is the

No hotplug.

> sequence of events that cause the data corruption?

Install rhel4 u4 on ia64, at the reboot prompt let the system sit idle
for several hours or overnight. Then after rebooting the filesystems are
totally trashed. I usually get a message that the kernel is not a valid
compressed file format. If I try to rescue the system I cannot mount any
filesystems. I don't have the message handy but it complains about an
invalid Verneed record, whatever that is.

I've also tried the same procedure using a dumb SAS hba. It complained
that it couldn't read the initrd image but on a second attempt it acted
like it read the initrd but the system goes out in the weeds while
booting. Not the same symptoms but I suspect there's some relationship.

I have not tried any other distros yet.

Thanks,
mikem
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: PME_Turn_Off in Linux [ In reply to ]
On Wed, Jan 17, 2007 at 04:35:02PM -0600, Miller, Mike (OS Dev) wrote:
> > On Wed, Jan 17, 2007 at 10:43:14AM -0600, Miller, Mike (OS Dev) wrote:
> > > Hello,
> > > We've been seeing some nasty data corruption issues on some
> > platforms.
> > > We've been capturing PCI-E traces looking for something
> > nasty but we
> > > haven't found anything yet. One of the hardware guys if asking if
> > > there is a call in Linux to issue a PME_Turn_Off broadcast message.
> > >
> > > PME_Turn_Off Broadcast Message
> > > Before main component power and reference clocks are turned
> > off, the
> > > Root Complex or Switch Downstream Port must issue a
> > broadcast Message
> > > that instructs all agents downstream of that point within the
> > > hierarchy to cease initiation of any subsequent PM_PME Messages,
> > > effective immediately upon receipt of the PME_Turn_Off Message.
> > >
> > > This must be initiated from the root complex. Is there such
> > a call in
> > > linux?
> >
> > This firmware that implements the PCI-E connection should do
> > this, I don't think there is anything that the Operating
> > system can do to control this, as PCI-E should be transparant
> > to the OS.
>
> Hmmm, the hw folks tell me that "other" os'es implement that. But I
> would tend to agree that system firmware should probably be doing this.

Where would the "other" oses implement this, as they don't even know the
pci device is a pci-e port? How can the os send a PCI-E message without
talking directly to the chipset-specific controller chip?

> >
> > Unless this is on a PCI-E Hotplug system? What is the
>
> No hotplug.

That's good :)

> > sequence of events that cause the data corruption?
>
> Install rhel4 u4 on ia64, at the reboot prompt let the system sit idle
> for several hours or overnight. Then after rebooting the filesystems are
> totally trashed. I usually get a message that the kernel is not a valid
> compressed file format. If I try to rescue the system I cannot mount any
> filesystems. I don't have the message handy but it complains about an
> invalid Verneed record, whatever that is.

The RHEL4 kernel is pretty old as far as PCI-E goes. Can you try this
on a kernel.org release? 2.6.19.2 would be great at the least. If not,
you're going to have to get your support from Red Hat on this issue :(

Any kernel log messages while the machine is idle before rebooting?

What tasks are running overnight that would cause writes to the disk?

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/