As others have said, the fewer devices on the bus the better for overclocking and reliability in general.
For performance, having twice as many chips means twice as much potential for open rows so there *may* be a slight performance advantage there assuming the exact same clocks and timings are used between 2x4GB and 4x2GB configurations. And by slight, I mean less than 1% so using 2x4GB to leave two slots open for future upgrade as suggested above is a much better option.