Traciatim :
It really depends on the data.
You can buy an E5-2687W for about 2 grand just for the CPU and then you would need a server based motherboard and memory . . . or you could buy 4 i7-4770K's with a cheap-ish overclocking motherboard and memory each for the same cost.
Does your data set work well with hyperthreading? Maybe the i5-4670k (or even AMD 8350) is a better choice and get one additional machine.
Could your data be ported to OpenCL or CUDA and processed on a video card? Maybe you'd be better served with an i3 (or pentium) and a 7970 or 780 in each machine . . . or just a machine with 4 of them.
Your answer depends on so many factors there is no good response.
Ive had a look into GPU processing but i cant get around the issue of warp divergence due to the nature of the analysis. The only reason i can see (im possibly missing others) is that servers support ECC memory.
I havent tried hyperthreading but im assuming not since it appears to be CPU bound and the context switching would probably reduce performance.
Yeah i appreciate its a very open ended question, but i was just wanting to get some general pointers/make sure i hadnt missed anything.