As has been said, the BD arch is crippled when it comes to FP performance. BD was designed for servers, first and foremost, where Integer math dominates. AMD was hoping OpenCL/OpenMP would take off, and all FP work would be offloaded to the GPU. This has not happened, hence the poor FP performance.
Its accepted that at a given clockspeed, a PII X4 would be about 10% faster then BD.