Not... Exactly. Cache management for L1 (and L2 caches since the Pentium Pro) is usually done by the CPU itself; "accessing data" inside a cache can't be done directly for that reason, and "switching" from a cache level to another is often useless - when it's even possible or relevant (Pentium 4's L2 cache duplicated L1 data). Some data management on caches can be done on some very modern OSes (I think Vista does provide some tools to do so, and Linux too; I don't know how public, stable and/or efficient these APIs are, but they are certainly not something that user-space programs should access).
Distributing data on caches requires assembly programming or OS support, and is very architecture-dependent (a Core2 won't work the same way a P4 did, and an Athlon64 and a Phenom will certainly not work like, say, an Athlon K7).
That's what I could gather, if somebody more experienced knows better...
1) FPU and Integer opeartions on Lx(L1, L2,L3) and Main memory data for ensuring performance of CPU.
2) If you have any idea regarding about data pattern for access/ switching between L1, L2, L3, main memory then please let me know.
We are trying to forcibly put data into the cache(L1, L2, L3) and then perform different arithmatic operations by accessing data present in the cache.
During this we want to measure the performance of CPU.
I have identified the __mm_prefetch CPU Cache intrinsic, which fetches the line of data from memory that contains the byte specified with the source operand to a location in the cache hierarchy.