Depends if it's an inclusive or exclusive cache hierarchy.
- AMD usually go exclusive, Intel go inclusive.
Exclusive means that data is not duplicated between each cache level, this is good in theory but bad in practice for most software (including games).
Cache is not memory in the x86 architecture, it cannot be addressed (via any normal means), it is however RAM (SRAM, ZRAM, etc) in the electronic sense.
The reason for it is economics. Otherwise systems would need 8GB of SRAM to have similar performance (and they would cost substantially more).
Each level of cache may have a 64% to 80% plus hit rate, so having two levels of cache is usually sufficient to keep the overall miss rate down to under 13% and often much lower (below 4% miss rate).
The size of the cache doesn't affect it's hit/miss rate as much as it's design and the 'algorithms' implemented in the surrounding electronics. It's just marketing saying a 256KB L2 + 3MB L3 cache is better than a 1MB L2 cache processor. (In reality the 1MB L2 cache processor may be vastly better designed and perform better).
Having a 3rd level of cache is usually only desirable for database servers and machines doing a lot of virtual machine workloads.
Having large, slow, caches has downsides and a balance needs to be found.
e.g. Why have 96MB of L3/L4 cache, when with the same transistor count (or less) you could incorporate a quad channel DDR4 integrated memory controller and additional system buses?
Basically, unless you're really processing ~153GB/sec per CPU socket, then Dual-Channel DDR3-1333 will suffice with most any decent cache set-up, even if it's only a large L1 cache with a good hit ratio... depending upon the software.