This post finally gave me the answer I needed (I thank everyone who ever goes back to post fixes they discover - they are truly kings among men) so lemme toss in my two bits.
The two connection fix is, based on what GPU-Z is telling me, the result of CUDA being disabled if the NVIDIA card is not the primary renderer. My setup has a Powercolor PCs+ Radeon 5770 alongside an MSI Cyclone OC GTS 450, and GPU-Z's little checkboxes at the bottom show PhysX enabled for both cards regardless of which one is the renderer (thanks to the Hybrid PhysX hack) but the CUDA checkbox is only enabled when I am running my display from the 450.
Another niggle is that the 450's adaptive power mode only works if the card is the primary renderer as well, but that might be a result of the drivers (who releases beta drivers as the only drivers available for a new card at launch? I ask you...).
Anyways, based on what I'm seeing sofar, while the Mercury Playback Engine benefits significantly from CUDA's GPU acceleration, it's not taxing the GPU AT ALL.
Foxconn A79A-S 790FX mobo
4GB OCZ Platinum 1066 DDR2
AMD X3 440 with the 4th core unlocked
the abovementioned video cards
Windows 7 x64
My test transcode was a 1080p 59.94fps Lagarith-encoded file, and I was transcoding it to a Blu-Ray compliant 720p 59.94fps H.264 file, maximum render depth and maximum render quality both enabled.
In Software mode, the encoder took about 35 mins/minute of footage.
With GPU acceration enabled, the transcode time for the exact same file was reduced to 18 minutes/minute of footage. (!)
The funny thing was, the GPU load (according to GPU-Z) peaked at only 10%, which means either a) the GPU load measurement GPU-Z uses shows the overall GPU load, of which the load on the CUDA (shader) cores is only a fraction of the whole, b) the Mercury engine has a lot of headroom left to leverage, or c) something is hooped in my config. For what it's worth, my Fluidmark PhysX scores are off the charts brutal, (but that might be just an effect of the Hybrid PhysX hack) so c) might very well be correct.
In contrast, using the AVIVO encoder on the same file resulted in a 7 minute/minute of footage transcode to the same format, though only 29.97 fps, and since you can't control the settings for the transcode past a Quality slider, this solution is pretty much unusable. Though, it may back up the stats companies like CyberLink trot out where they show similar improvements to transcode times regardless of whether CUDA or ATI Stream are used.
Anyways, the last note I wanted to add was this: for me, the way I work the cabling to enable CUDA is to have the DVI cable connected to the 450 at boot and any time I'm working in Premiere, then if I'm gaming I disconnect the cable and connect it to the 5770 (my monitor only has one digital input). The system rejigs to use the Radeon card automatically and all is right with the world. Hope this helps somebody else like the OP helped me.