MPEG-1 algorithms force each frame to be the same size by dynamically changing the amount of compression performed on each frame but it also performs additional compression based on the differences between frames. This next step up in video algorithms is called interframe compression. The basic idea is that you can eliminate any redundant information that is common across multiple video frames. Rather than compress each complete frame individually we should only need to keep track of the differences between frames.
You start with a frame, compress it using JPEG. This will be our reference image (called an I frame or Intracoded frame). We then take the next frame, compress it too, and then compare the two. We throw away all the redundant information and only keep track of the differences what we're left with iscalled a P or Predictive coded frame. Repeat for all following frames. This will introduce some errors over time and since these errors are cumulative you have to reset everything every so often by inserting another I frame. You can also improve the algorithm by inserting special frames that compare differences both backward and forward in time. These are called B or Bidirectionally interpolated frames. In terms of size I frames are the largest, P frames are next, and B frames are the smallest. A collection of I, P, and B frames is called a GOP or Group of Pictures..
MPEG-1 was originally designed as a non-isotropic compression algorithm. In other words the amount of processing power required to encode MPEG is much higher than the amount of processing power required to decode that same MPEG stream. Good for playback, bad for capture and compression.
MPEG-1 for video CDs (sometimes called White Book) specified that an MPEG stream had to run at a constant 1.15 Mbps no matter what (for 1X CD-ROMs). In order to maintain this constant bit-rate during playback the amount of compression applied to each frame is constantly changing (not to mention that compressing a video stream from 124 Mbps - down to MPEG-1 rates is a ratio of over 100:1!).
And there are other problems to deal with. As I said earlier, video is very noisy. Even if you point a camera at a blank wall and shoot a handful of frames you would find that virtually every single pixel that makes up each image will change slightly from frame to frame. A pixel that shifts from an RGB value 10,10,10 to 10,10,9 and back again from one frame to the next might not be enough to notice with the naked eye but to a computer those pixel color values are completely different and would count as changes to keep track of.
Now you can pre-process the images or make the algorithms more sophisticated but the tradeoffs involve processing time, image quality, compression ratio, or additional hardware and even then, the results can vary quite a bit depending on the source video you're trying to compress. You might be able to filter out those noise artifacts in a static scene but what happens when the video really is changing significantly from frame to frame? When the camera zooms or pans or moves at all then virtually every single pixel really is changing. Shots of a wheat field blowing in the wind or water flowing or a crackling fire are all difficult to compress. Scene cuts, fades, and dissolves also involve changing every single pixel so any intra-frame compression gains are lost in these situations.