GPGPU computations are primarily for highly parallel applications. Unlike a CPU where you may be able to run a few threads at a time, CUDA/OpenCL (I'll use these ones as the example since you have a 295) are setup to take advantage of many thread executions on the same programming kernel. Any set of threads created for a GPU must be running the exact same kernel on every bit of data passed. Once all those threads are done executing, you can start a new kernel and accompanying threads.
Data transfers to and from the card are typically the main bottleneck so ideally you want the all the needed data (both input and output) to stay on the card as long as possible. I could be wrong but I believe you can switch kernels without changing the memory so that you can start a new execution using data from the previous step without a new memory transfer as long as you don't need different data.
For highly parallel applications, the increase in computation power is at least a ten-fold (likely a lot more on the beast you have) over a CPU. Real world is usually significantly less depending on how well you can adapt your methods to the GPU model. nVidia's sample programs will give you an idea of the real world increases for some sample programs. They range anywhere from 2x to 1500x faster than the originals.
In summary, unless your applications in C++ are highly paralleliazable, it's unlikely you'll see much of a boost switching to CUDA/OpenCL. As WR2 pointed out, CUDAZone is a great resource for learning the general idea and program structures. It takes some getting used to but it is a nice step up from old GPGPU computing.