Anybody have any experience with network booting Linux boxes?
In the cluster I'm designing, network booting appears to have a number of advantages:
-Cost: Hard Drive would cost about $75 to $100 per motherboard or about 7% to 10%
-Maintenance: Upgrade it on the server and your done. Plus, hard drives are probably the least reliable part of a system. The fewer the better!
-Other: Would make it easier to use the system for both classified & unclassified work. Just have different servers, instead of replacing all the node drives.
Of course there are some down sides...
-Initial setup complexity and narrower (more expensive?) hardware selection. Anybody know if the 760MP can boot off of those integrated NICs? That would be sweet!
-Need big, fast drive system on the server. But you'd probably have something pretty good on there anyway.
-Need more RAM on the diskless nodes to avoid swapping. But given the prices right now, we'll probably go with 256MB/CPU anyway. Should be enough? Would 256 per <i>motherboard</i> be enough? Probably single CPU boards on the current application, but maybe not much room to grow particualarly on SMP boards.
-Take a network performance hit. Will this be a factor after boot-up? Planning on switched fast ethernet with 2 NICs/MB.
-Cluster scalability. Right now I'm hoping for 24 to 32 CPUs (12 to 16 motherboards if we go with duals). Is that reasonable? What about 64?
I have a little bit of experience with such things. I've never actually set up such a thing myself, but I'm familiar with how it's done.
One of the theoretical disadvantages is that you have a single point of failure. "The fewer the better" may not apply so well when the hard drives (in complete systems) are responsible for picking up for each other's failure...
Hard disk bandwidth may a bit of a problem; you're going to have several systems requesting the server's hard disk data. You might end up needing a very large RAID array, or a solid-state disk, as well as one or more Gigabit adapters, to meet the bandwidth requirements.
Disk I/O could suck up each node's network bandwidth in heavy disk usage--by simple math, a 10/100 Ethernet port gets a little over 12MB/sec, not counting the overhead of your network protocol. If your application goes light on disk I/O, though, this shouldn't be a problem.
Centralized maintenance is pretty sweet though--though you don't have to go completely diskless to get that.
As for the 760MP's onboard NICs, I would suspect they are not bootable. Bootable NICs have become rather unpopular these days. 3com makes them (3C905x-TXMBA), but they end up being rather more expensive than the non-EPROM enabled models. Most (all?) Intel PRO/100 NICs come with a bootable EPROM and cost about the same as a baseline 3C905x-TX card with no EPROM. Intel NICs also happen to perform just as well as 3com's, and a couple of the PRO/100 series (the PRO/100+ Management Adapter and the PRO/100 w/ hardware 3DES) are extremely low-profile PCI cards (about 1.5" tall, not including the metal slot panel).
Another option (one which isn't quite so unclean as it sounds) is to boot your nodes off a floppy disk. Not only does this not require an EPROM on your NIC, but it's much easier to set up than actually having a netboot server. Just create a bootdisk that boots, loads network drivers, configures eth0 via DHCP, mounts the remote root, then pivot_root's to the new root directory. At that point, you don't need a TFTP server either. Trouble is, you still need a floppy drive in each node...
Yet another option--if the onboard 3com's have EPROM sockets (the add-on boards usually do), then maybe you can find someone with an EPROM burner and flash your boot image onto some EPROM chips.
"/join #hackerz. See the Web. DoS interesting people."
June 8, 2001 2:56:23 PM
>One of the theoretical disadvantages is that you have a
>single point of failure. "The fewer the better" may not
>apply so well when the hard drives (in complete systems)
>are responsible for picking up for each other's failure...
Yea, there certainly is a flipside to my argument. If a HD dies in a node, only that node goes down. If the server drive goes down, everything is down. It will definitely be running RAID (and not just RAID 0), which alleviates the problem somewhat.
>Hard disk bandwidth may a bit of a problem; you're going
>to have several systems requesting the server's hard disk
>data. You might end up needing a very large RAID array, or
>a solid-state disk, as well as one or more Gigabit
>adapters, to meet the bandwidth requirements.
How big of a problem would you expect this to be after boot-up? I'm guessing that booting the cluster could be extraordinarily slow, but if you have enough ram it shouldn't be much of an issue beyond that point. And I don't expect to reboot it often :smile:
>If your application goes light on disk I/O, though, this
>shouldn't be a problem.
Yea, the application's slave processes do some very light logging over NFS, and the message passing traffic is relatively light also. Starting the slave processes taxes the net a bit more because they have to read a MB or so of input files over NFS. It's not to bad now because there are only 16 slaves at most, and the master process is busy anyway.
>Another option (one which isn't quite so unclean as it
>sounds) is to boot your nodes off a floppy disk.
This sounds like a great idea! Probably much less hassle then getting bootable NICs, and getting all that stuff configured reliably.
And at about $12 each in quantity, the nodes were going to have a floppy drive in each anyway. Just makes maintenance/debugging that much easier.
>maybe you can find someone with an EPROM burner and flash
>your boot image onto some EPROM chips.
I'm not to excited by that one. Pulling, flashing & reinstalling chips on up to 32 boxes just to upgrade a kernel doesn't sound like much fun! Or maybe I misunderstand.