The reason it works that way is because current multi-GPU rendering methods generally use Alternate Frame Rendering (or some variant thereof). They do not work in tandem to render the same frame, they work independently to render alternate frames. Since they work independently this means that they both need to have all the data necessary to render the frame that they are working on without relying on the other GPU.
If you want to have a solid 60 FPS with a single GPU then your GPU needs to render one frame within a maximum of 16.67 milliseconds. If you have two GPUs, the frames are alternated between the GPUs and as such each GPU has 33.33 milliseconds to do its work. However, each frame is not significantly different than the previous frame. The environment doesn't change, the textures don't change, the shaders don't change, the lighting doesn't change, etc... None of the stuff required to render the scene changes. What does change is the time at which that frame is rendered and as such, the motion and position of geometric objects will be slightly different.
All the static rendering data (textures, models, shaders, lighting, untransformed geometry,etc...) has to be constant between the two GPUs. There's simply not enough bandwidth to have one GPU and another pool their memory