A lot of it comes down to the software your running. If the software is happy to run on multiple machines and use networking to share data or control over machines then you can build up a cluster using separate machines. If your software is relying on everything being on one physical machine then it would be a problem. This I think is the first thing you need to look into. I play with virtual machines and so networking across several servers both virtual and real worked fine for what I wanted.
As for servers - I bought a cheap DL585 and upgraded the processors, you will have to look at what you need out of them. Do you want resilient servers or just sheer performance. Some servers have dual NIC's, PSU's etc. and RAID can also be a choice you have to make.
If networking speed is important then use either 10GB ethernet, or I've a cheap Infiniband connection between my desktop and my server - cost less than £50 to connect them and you can get 10GB speeds. The only thing is that either of these solutions becomes way more expensive when you need to add more machines as you might need a router - which ain't cheap! High speed networking could also mean you can centralise your storage and get a decent high throughput RAID setup ( or PCI-e SSD ) which is shared across all servers.
It's not 100% necessary to have my software run on 1 machine albeit, it would be much easier if it did.
I've been looking at Beowulf clusters today, I think this is basically what I've after.
We've already threaded the application, so if I could connect a handful of desktops, I assume the application would work across all the available cores.
The idea of a cluster is that you can have several machines running on a network, as long as your application can communicate between threads using something like sockets then it should be OK. Or if there is no dependence on other threads then they can be run on different machines anyway.
The main thing is that they will still be distinct machines with their own CPU's ,RAM and drives - but they can easily communicate with each other via networking - this can be achieved without any clustering software!
I play with clustering of things like MySQL and Web stacks using multiple virtual machines, they are usually all based on the same machine, but sometimes over 2 machines. The main things is each virtual machine has a discrete role, but using the networking configuration, they all communicate with each other to form the cluster. The hardest part sometimes is ensuring the network routing works more than anything else.
The other issue is what you plan to do with it, if it's for gaming, your better off with a single processor/multicore system, since most games can't utilize beyond four threads/cores anyways. If it's for research purposes, you might want to look at CUDA cards, or other graphics-based solutions like the TESLA platform for mathematical computations.