How to measure: you take a piece of code, calculate the number of machine-level instructions required to complete it, then use the high-performance timers to calculate the number of clock cycles required to complete it on the actual hardware. Divide the number of instructions by the number of CPU ticks, you get the IPC for that specific workload on your specific CPU/system.
Since how well the scheduler can fill execution ports is heavily dependent on instruction mix and data dependencies, actual IPC can vary drastically between different workloads.