"8 BITS PER CORE" threads dont work that way.
2 threads running on one core goes more like this:
50 cycles for thread one, one waits for data, core switches to thread 2, core runs thread 2, data for thread 1 arrived, core finishes thread 2 and waits for data, core switches to thread 1, core runs thread 1... and so on and so on.
Sometimes it goes diferently, say: thread 1 uses integer calculations, thread 2 needs to do floating point calculations on a Vector, core runs both at the same time, thread 1 getting the integer pipeline, thread 2 goes to the floating point unit or the SIMD pipeline (sometimes both of them are the same)
P.S. i cant edit my previous post so insert this in the right spot