Well, quad-pumping the signal does make it near-effectively 533. I.e. it will transfer as much data per second (not accounting for latency or access commands) as a 533MHz FSB. The difference being you don't actually have to clock it that high (which is very hard to do with a 64-bit trace path).
Also, as I've mentioned in other posts, the P4 does not neccessarily need more bandwidth than the Athlon on a per-clock basis. Rather your average P4 is usually much higher clocked than your average Athlon and the clock disparity between the processor and memory bus make it neccessary for the prefetch logic to be very aggressive (which takes up memory bandwidth) when fetching cachelines. There's also the reason of the P4 having a bigger cacheline than the Athlon, so every memory access transmits more data from memory even if only 1 instruction out of that entire chunk of data that was transfered is used (albeit a rare case, it does happen in code that doesn't have very good spacial locality).
"We are Microsoft, resistance is futile." - Bill Gates, 2015.