CPU benchmarks measure how well the CPU performs running the benchmark. That's it, really.
And some benchmarks are "shenanigans", in that the benchmark may be measuring some aspect of the CPU in which a certain sort of CPU performs exceptionally well, but other CPUs perform mediocre-at-best (where for other benchmarks the mediocre CPUs would perhaps excel).
If you wonder if the CPU benchmark has any correlation with the performance running your applications, you need to benchmark your applications. Really. There is no substitute.
Often, benchmarks have many factors, such as (for example) the standard Quake performance (framerate) benchmark. That benchmark also is impacted by the optimizing compiler used (and which particular CPU tuning was enabled), graphics card, memory, bus speed, graphics card drivers, motherboard, chipset, overclocking, yada yada.
So if you are using some CPU benchmark to gauge your next CPU purchase, take the CPU benchmarks with a grain of salt. Also factor in the (theoretical?) performance of the CPUs you are considering against the price of the CPUs. That is: look at the value.
How do you gauge value when you can't really tell if two CPUs perform comparably? Yes, that's the rub.
That's why we love Tom's Hardware, which will run a variety of CPUs against a battery of performance tests and try to keep other factors constant where possible.
Unless you actually benchmark YOUR application with some given sets of hardware, other benchmarks may only have marginal bearing.
Actually theres more to this than meets the eye.
Testing CPUs against each other and keeping everything else constant sounds logical, but thats not the whole picture. I imagine that many people will use a CPU benchmark to help answer the question "which CPU should I choose for my next purchase?"
Equally important is the question, if I buy a current model laptop with a core i7 mobile, how will this system compare to my Dual Pentium 4 Compaq workstation, my iMac core duo, my Macbook Pro Core 2 Duo, ...
For this question I want to consider the fact that the Core 2 duo is running on a different chipset with a different memory bandwidth than the dual pentium 4, because this stuff makes a difference. For this, the Passmark methodology is useful.
Now Im not just making a rhetorical point. My empirical experience has been that system performance is well correlated with these scores, as is broad-brush benchmarks like SPECjvm, which, like Toms Hardware, runs a set of real-world applications.