Mailing List Archive

intermittent pxe failure
Hi,

I have an intermittent pxelinux boot problem. It happens rarely, for example
it happened one day and then did not happen again until 6 days later. However
when it does happen it is rather serious as it affects all clients on the
network. Here is some basic info:

- IBM netvista PCs, built-in pxe, banner says it is "PXE 2.x".
- happens rarely, but when it does it affects all clients. May last for an
hour or so when it happens.
- DHCP - microsoft
- TFTP #1: hpa-tftp, started with -s /tftpboot -B 1468 -r blksize -v -v -v -v
- TFTP #2: winagents tftp server (tsize on, blksize negotiation off)
- network: WAN, but fiber-based and as fast as a LAN
- retrieving vmlinuz *always* works at all times (0 failures)
- retrieving initrd *almost always* works...except when this problem comes
up, and then retrieving initrd *always* fails until some time passes and the
problem goes away.

The error message is:

Loading vmlinuz.................
Could not find ramdisk image: initrd
boot:

We tried switching between TFTP #1 and TFTP #2, but no help there. In the
tftp log, what we see is vmlinuz being transferred completely and normally,
and then no further requests coming in.

I am using thinstation 2.2. I have also asked on that list but have not found
a solution. That thread can be found here:
http://www.nabble.com/intermittent-pxe-failure-t3655353.html

Question: what code is responsible for downloading vmlinuz, and which code is
responsible for downloading initrd? Is it the pxe firmware, or pxelinux
itself? What happens between the vmlinuz download and the initrd download?
Any network activity that could potentially lead to a failure? Is this
activity logged anywhere?

Any suggestions welcome!

Larry

_______________________________________________
SYSLINUX mailing list
Submissions to SYSLINUX@zytor.com
Unsubscribe or set options at:
http://www.zytor.com/mailman/listinfo/syslinux
Please do not send private replies to mailing list traffic.
Re: intermittent pxe failure [ In reply to ]
Op 05-05-2007 om 00:49 schreef Larry Howe:
<snip/>
> - happens rarely, but when it does it affects all clients. May last for an
> hour or so when it happens.

Gut feeling:
a TFTP daemon that is started by inetd and stays live that hour
(why the TFTPd handles vmlinuz, but not the initrd is indeed strange)

> Question: what code is responsible for downloading vmlinuz, and which code is
> responsible for downloading initrd? Is it the pxe firmware, or pxelinux
> itself?

Briefly:
bootROM downloads and starts 'pxelinux.0'
pxelinux.0 downloads and parses "pxelinux.cfg/default"
pxelinux.0 downloads _both_ vmlinuz and initrd.
all three use the same 'get_a_network_packet' software routine in the PXE ROM
pxelinux.0 starts vmlinuz
vmlinuz searches initrd (in download memory) and reads from it.

> What happens between the vmlinuz download and the initrd download?

Sorry, I don't know (for sure)


> Any network activity that could potentially lead to a failure?

Only malicious network activity ( which is poorly documented ;-)


> Is this activity logged anywhere?

IIRC get the TFTP requests in the syslog, you might need -v -v -v
parameters.


> Any suggestions welcome!


tcpdump the TFTP server on the TFTP port.
Watching only on port 69 will get you only the TFTP Requests.
That has two advantages:
* low disk usage,which makes monitoring for weeks possible
* you should if the client really requests the initrd.



> Larry

Cheers
Geert Stappers


P.S.
From http://www.nabble.com/intermittent-pxe-failure-t3655353.html
| Am I right in assuming that PXE loads vmlinuz, but then vmlinuz loads initrd?

No. pxelinux loads both.

_______________________________________________
SYSLINUX mailing list
Submissions to SYSLINUX@zytor.com
Unsubscribe or set options at:
http://www.zytor.com/mailman/listinfo/syslinux
Please do not send private replies to mailing list traffic.
Re: intermittent pxe failure [ In reply to ]
Geert Stappers wrote:
> Op 05-05-2007 om 00:49 schreef Larry Howe:
> <snip/>
>> - happens rarely, but when it does it affects all clients. May last for an
>> hour or so when it happens.
>
> Gut feeling:
> a TFTP daemon that is started by inetd and stays live that hour
> (why the TFTPd handles vmlinuz, but not the initrd is indeed strange)

tftp-hpa sticks around for 15 minutes after last use, by default.

-hpa

_______________________________________________
SYSLINUX mailing list
Submissions to SYSLINUX@zytor.com
Unsubscribe or set options at:
http://www.zytor.com/mailman/listinfo/syslinux
Please do not send private replies to mailing list traffic.
Re: intermittent pxe failure [ In reply to ]
> tcpdump the TFTP server on the TFTP port.
> Watching only on port 69 will get you only the TFTP Requests.
> That has two advantages:
> * low disk usage,which makes monitoring for weeks possible
> * you should if the client really requests the initrd.
>
> Cheers
> Geert Stappers

Thanks Geert and Peter for the detailed answers. At least I know where to
start looking. I will post back if I find anything. For now, we are booting
with CD (ISOLINUX) which will work fine until we get this worked out.

Larry

_______________________________________________
SYSLINUX mailing list
Submissions to SYSLINUX@zytor.com
Unsubscribe or set options at:
http://www.zytor.com/mailman/listinfo/syslinux
Please do not send private replies to mailing list traffic.
Re: intermittent pxe failure [ In reply to ]
Op 07-05-2007 om 23:58 schreef Larry Howe:
> > tcpdump the TFTP server on the TFTP port.
> > Watching only on port 69 will get you only the TFTP Requests.
> > That has two advantages:
> > * low disk usage,which makes monitoring for weeks possible
> > * you should see if the client really requests the initrd.
> >
> > Cheers
> > Geert Stappers
>
> Thanks Geert and Peter for the detailed answers. At least I know where to
> start looking. I will post back if I find anything. For now, we are booting
> with CD (ISOLINUX) which will work fine until we get this worked out.

What do you what to get worked out?

_______________________________________________
SYSLINUX mailing list
Submissions to SYSLINUX@zytor.com
Unsubscribe or set options at:
http://www.zytor.com/mailman/listinfo/syslinux
Please do not send private replies to mailing list traffic.
Re: intermittent pxe failure [ In reply to ]
On Tuesday 08 May 2007 16:15, Geert Stappers wrote:
> Op 07-05-2007 om 23:58 schreef Larry Howe:
> > > tcpdump the TFTP server on the TFTP port.
> > > Watching only on port 69 will get you only the TFTP Requests.
> > > That has two advantages:
> > > * low disk usage,which makes monitoring for weeks possible
> > > * you should see if the client really requests the initrd.
> > >
> > > Cheers
> > > Geert Stappers
> >
> > Thanks Geert and Peter for the detailed answers. At least I know where to
> > start looking. I will post back if I find anything. For now, we are
> > booting with CD (ISOLINUX) which will work fine until we get this worked
> > out.
>
> What do you what to get worked out?

For now, we will just boot from CD. That will give us time to look more
closely at the PXE / TFTP problem.

Larry

_______________________________________________
SYSLINUX mailing list
Submissions to SYSLINUX@zytor.com
Unsubscribe or set options at:
http://www.zytor.com/mailman/listinfo/syslinux
Please do not send private replies to mailing list traffic.
Re: intermittent pxe failure [ In reply to ]
Op 08-05-2007 om 22:46 schreef Larry Howe:
> On Tuesday 08 May 2007 16:15, Geert Stappers wrote:
> >
> > What do you what to get worked out?
>
> For now, we will just boot from CD. That will give us time to look more
> closely at the PXE / TFTP problem.


Each CD boot is a missed chance to reproduce the _intermitted_ PXE failure.

My advice:
Activate various loggers/monitors/watchers/datacaptuters and do PXE booting.



Geert Stappers
mostly in an attempt to prevent a self reply for Larry Howe
--
There is nothing wrong self replies that have added value

_______________________________________________
SYSLINUX mailing list
Submissions to SYSLINUX@zytor.com
Unsubscribe or set options at:
http://www.zytor.com/mailman/listinfo/syslinux
Please do not send private replies to mailing list traffic.
Re: intermittent pxe failure [ In reply to ]
Larry Howe wrote:
>
> For now, we will just boot from CD. That will give us time to look more
> closely at the PXE / TFTP problem.
>

Hi Larry,

Did you ever get a chance to look at that? I'm planning to get 3.50 out
Really Soon Now.

-hpa

P.S. Thanks for the donation. I didn't see the paypal message until now :)

_______________________________________________
SYSLINUX mailing list
Submissions to SYSLINUX@zytor.com
Unsubscribe or set options at:
http://www.zytor.com/mailman/listinfo/syslinux
Please do not send private replies to mailing list traffic.
Re: intermittent pxe failure [ In reply to ]
On Tuesday 05 June 2007 22:42, H. Peter Anvin wrote:
> Larry Howe wrote:
> > For now, we will just boot from CD. That will give us time to look more
> > closely at the PXE / TFTP problem.
>
> Hi Larry,
>
> Did you ever get a chance to look at that? I'm planning to get 3.50 out
> Really Soon Now.
>
> -hpa

Peter,

We're pretty sure it was the tftp server. Our data center preferred to host
the tftp server on Windows, so we had something called WinAgents tftp as the
primary server. We also had a linux hpa-tftp as a secondary. It appeared to
me (and everyone else) that the problem was happening on both the primary and
secondary. However as time went by it only happened ever again on the
primary. I speculate that something was cached somewhere and made us believe
it was happening on both, when really it was just the primary. Either that,
or we just goofed up.

Now both primary and secondary are hpa-tftp. Over this past weekend I set all
the clients to automatically reboot themselves over and over. Got over 15,000
reboots without missing one. So, I'm saying we have a working system now.

Thanks for your Great Work on PXE and on tftp.

Larry

_______________________________________________
SYSLINUX mailing list
Submissions to SYSLINUX@zytor.com
Unsubscribe or set options at:
http://www.zytor.com/mailman/listinfo/syslinux
Please do not send private replies to mailing list traffic.