It should help your scenario. Please read this taken from a description of maNTLE.
Solving DirectX's Small Batch Problem
One of the issues plaguing DirectX development for years has been the fact that the API itself consumes a great deal of CPU overhead in certain scenarios. This is exacerbated if the developer launches a great many small batches of triangles for rendering. Every batch of draw calls consumes additional CPU power, so the goal is to group draw calls as efficiently as possible.
Solving DirectX's Small Batch Problem
One of the issues plaguing DirectX development for years has been the fact that the API itself consumes a great deal of CPU overhead in certain scenarios. This is exacerbated if the developer launches a great many small batches of triangles for rendering. Every batch of draw calls consumes additional CPU power, so the goal is to group draw calls as efficiently as possible.
What the image above shows is a series of application threads (left side) being queued for execution across the CPU and GPU. Workloads are being shifted to specific targets depending on where they'd be optimally executed. Mantle is designed to explicitly allow asynchronous compute scheduling so that the GPU can simultaneously run graphics and non-graphics workloads, or share data across the CPU-GPU link thanks to HSA.
One of the other ideas behind Mantle is that of expanding parallelism. Under DirectX and OpenGL, CPU0 might be handling game compute, CPU1 sets up rendering, and CPU2 handles the driver setup and data passing. Using Mantle, CPU0 handles the CPU-centric computation, but CPU1-CPUx (maximum multi-threading) are all dedicated to the render path with no need to tie up cores with driver interfaces.
According to Marc, CPUs are actually powerful enough to even occasionally serve as offload engines for GPU rendering on both consoles and PCs.