PXE diskless boot fails

Hi,

I am trying to setup a server for a diskless boot of a computer class (with ubuntu)
I followed the guidelines in DisklessUbuntuHowto - Community Help Wiki

I have two computers: one is the server I installed and the other is my macbook on which I run a virtual machine (using VMWare fusion) that does the net boot. I have additional VM on it with ubuntu 10.4 installed on it for other tests. The server runs ubuntu 10.4 server and the client image for network boot is ubuntu 10.4 desktop that I previously prepared as explained in the link above on a VM on my laptop.

The problem is that after loading the kernel and initrd (and outputting many things that I can't read because it happens too fast) I get the following error:

IP-Config: eth0 hardware address 00:0c:29:79:86:c1 mtu 1500 DHCP
[    2.775718] eth0: link up
IP-Config: no response after 60 secs - giving up
/init: .: line 3: can't open /tmp/net-eth0.conf
[   69.073555] Kernel panic- not syncing" Attempted to kill init!
[   69.073859] Pid: 1, comm: init Not tainted 2.6.32-22-generic #35-Ubuntu
< call trace - if it can help I can grab a screenshot >

At first I thought it means that the NFS is not accessible but I was able to mount it and access its files from the other VM so it is not the problem.

I have no idea what this error practically means, I saw in some google hits that it means that the network driver is missing. However, I created the initrd file on the same VM that now tries to boot and as explained in the tutorial. I tried both ways (mkinitramfs and update-initramfs) with no success.

Anyone can help?

Thanks!
Yotam

Your diskless machine is not getting an IP address:

IP-Config: no response after 60 secs - giving up

So yes, network problems for one reason or another.

The call trace is probably irrelevant. What happens is that 'init' is unable to grab an IP address, gives up and quits, which causes a kernel panic since init should never ever quit.

1 Like

But it should already have an IP address - it gets it during boot from DHCP....
(I see that happening at the beginning of boot - it gets the IP I assigned to its MAC address)

But the kernel doesn't know that. Check to see if it's set to autoconfig again or use passed parameters.

For this reason I did what is written in the manual - here is the client's /etc/network/interfaces:

auto lo
iface lo inet loopback

iface eth0 inet manual

The configuration in the nfsroot only matters once you already have network. If it's never connecting to NFS, this is happening before it. I wonder if the kernel has kernel-level autoconfiguration done -- meaning it will attempt to get DHCP by itself. Or if IP-Config is a proprietary ubuntu program, you will need to ask them.

thanks, I posted a specific question on that in their forum
however, I have the same main question posted there from hours ago and nobody answers... I'll have to wait patiently...

---------- Post updated 05-06-10 at 12:48 AM ---------- Previous update was 04-06-10 at 10:55 PM ----------

Maybe it is related to the NFS?
Before the error message I see info on that but I don't really know how to read it:

Loading, please wait...
Begin: Loading essential drivers... ...
Done.
[    2.415110] udev: starting version 151
Begin: Running /scripts/init-premount ...
Done.
Begin: Mounting root file system... ...
Begin: Running /scripts/nfs-top ...
Done.
[    2.659652] RPC: Registered udp transport module.
[    2.659716] RPC: Registered tcp transport module.
[    2.659781] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    2.721320] pcnet32.c:v1.35 21.Apr.2008 tsbogend@alpha.franken.de
[    2.721454] pcnet32 0000:01:01.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[    2.721730] pcnet32: PCnet/PCI II 79C970A at 0x2000, 00:0c:29:79:86:c1 assigned IRQ 19.
[    2.722151] eth0: registered as PCnet/PCI II 79C970A
[    2.722284] pcnet32: 1 cards_found.
IP-Config: eth0 hardware address 00:0c:29:79:86:c1 mtu 1500 DHCP
[    2.775718] eth0: link up

Then after a minute:

IP-Config: no response after 60 secs - giving up
/init: .: line 3: can't open /tmp/net-eth0.conf
[   69.073555] Kernel panic- not syncing" Attempted to kill init!
[   69.073859] Pid: 1, comm: init Not tainted 2.6.32-22-generic #35-Ubuntu

Hope it helps...
Thanks!

Why would it not receiving DHCP have anything to do with NFS? It shouldn't be trying to receive DHCP at all. That it fails suggests a general network problem -- there's no good reason I can think of it shouldn't be able to receive DHCP again after booting, even if that's pointless after booting off DHCP.

I agree with you but have no clue. DHCP server definitely works as it receives configuration at first (and also - other computers that do not boot from network receive IP addresses).
I tried to change the APPEND line in /tftpboot/pxelinux.cfg/default from:

APPEND root=/dev/nfs initrd=initrd.img-2.6.32-22-generic nfsroot=192.168.1.115:/nfsroot ip=dhcp rw

to:

APPEND root=/dev/nfs initrd=initrd.img-2.6.32-22-generic  nfsroot=192.168.1.115:/nfsroot ip=none rw

and then it just did not show the IP-Config lines at all and immediately showed these lines:

/init: .: line 3: can't open /tmp/net-eth0.conf
Kernel panic- not syncing" Attempted to kill init!

I also tried to remove the IP address and colon in this APPEND line (saw an old bug on that) but it did not help either.

Thanks again

---------- Post updated at 02:38 AM ---------- Previous update was at 02:02 AM ----------

I think that the problem is that the DHCP server is not answering the later DHCP requests - I ran tcpdump for broadcasts and can clearly see that for the initial request that is sent during boot there is a reply and the later sequence of requests remain unanswered.

This is my /etc/dhcp3/dhcpd.conf file:

allow booting;
allow bootp;

subnet 192.168.1.0 netmask 255.255.255.0 {
  range 192.168.1.2 192.168.1.99;
  option broadcast-address 192.168.1.255;
  option routers 192.168.1.1;
  option domain-name-servers 192.168.1.1;

  filename "/pxelinux.0";
  next-server 192.168.1.115;
}

host pxe_client {
  hardware ethernet 00:0C:29:79:86:C1;
  fixed-address 192.168.2.3;
}

I also tried to remove the last fixed-ip part but it does not change anything.
Maybe it's something with the "allow" directives? (I took this file from the manual)

---------- Post updated at 02:47 AM ---------- Previous update was at 02:38 AM ----------

Addition:
The difference between DHCP requests is their lengths: while the initial ones (that are answered) are of length 548, the latter ones are only 271. Though, all other details (mainly MAC) are the same

---------- Post updated at 03:39 AM ---------- Previous update was at 02:47 AM ----------

Now I found how to get some more info:

First (answered) request and its response:

03:32:00.255217 IP (tos 0x0, ttl 20, id 1, offset 0, flags [none], proto UDP (17), length 576)
    0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:0c:29:79:86:c1 (oui Unknown), length 548, xid 0x2a7986c1, secs 4, Flags [Broadcast] (0x8000)
      Client-Ethernet-Address 00:0c:29:79:86:c1 (oui Unknown) [|bootp]
03:32:00.317616 IP (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
    yotam-server.local.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 300, xid 0x2a7986c1, secs 4, Flags [Broadcast] (0x8000)
      Your-IP 192.168.1.3
      Server-IP yotam-server.local
      Client-Ethernet-Address 00:0c:29:79:86:c1 (oui Unknown) [|bootp]

Latter (unanswered) request:

03:32:19.957269 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 299)
    0.0.0.0.bootpc > 255.255.255.255.bootps: [no cksum] BOOTP/DHCP, Request from 00:0c:29:79:86:c1 (oui Unknown), length 271, xid 0xb4b70b15, secs 1, Flags [none] (0x0000)
      Client-Ethernet-Address 00:0c:29:79:86:c1 (oui Unknown) [|bootp]

---------- Post updated at 01:39 PM ---------- Previous update was at 03:39 AM ----------

I managed to partially overcome the problem: I made a dedicated PXE configuration file for the machine (name after its MAC) and in it I changed the APPEND line to:

APPEND root=/dev/nfs initrd=initrd.img-2.6.32-22-generic nfsroot=192.168.1.115:/nfsroot ip=192.168.1.3:192.168.1.115:192.168.1.1:255.255.255.0:::none rw

Now I am able to boot the machine and load X. The problem is that it is annoying to do that and should work without this workaround...