AMD needs a technology that does the reverse of Hyper-threading

tatsu99

Commendable
Dec 7, 2016
64
0
1,660
if AMD really wants to take the market share away from Intel then it needs to work on it's CPU's IPC improvements. This has been said in many articles but AMD rather prefers to have a higher number of cores/threads. same story for RYZEN, 8 cores/16 threads. we haven't seen any benchmarks yet so we really don't know how the new CPUs will perform. if they still lack single thread performance that is more important in games as games really rely on faster single thread performance and mostly use up 2-3 cores, AMD will be having a bad future then. AMD should come up with something that does the opposite of Hyper-threading. Hyper-threading lets each core of the CPU to have two threads and can be useful in CPU demanding apps. AMD needs to come up with something like thread combination that will first combine two threads of a core so that there is only one thread per core (that one thread can be a super thread). so, with thread combination there will be 8 cores/ 8 threads, now lets perform another combination, combine two cores so that each combined core will have the performance of two different cores and in the end there should be 4 cores/ 4 threads, i would call this technology "Super-threading". with this kind of emulation, gamers can use Super-threading to achieve better single core performance in their games whereas server users can just ignore the tech altogether as they have different needs. they can really avoid this hassle if AMD has really improved the single core performance to match at least the Haswell CPUs (more like Haswell-E CPUs).
 

dangus

Admirable
Oct 8, 2015
1,715
0
6,160
first off, it doesn't work like that.

plus, it's not just games that rely on greater IPC. every piece of software ever relies on IPC even if it is multi-threaded.

thirdly, haswell and haswell-E have the same architecture, so the same single core performance.

and D.....weird rant.
 

UnspokenWhale

Reputable
Aug 18, 2014
96
1
4,660
If AMD could solve this problem they wouldn't be having performance problems in the first place.

Anyways, I don't think this would work how'd you think it would. From Wikipedia:

For each processor core that is physically present, the operating system addresses two virtual (logical) cores and shares the workload between them when possible. The main function of hyper-threading is to increase the number of independent instructions in the pipeline; it takes advantage of superscalar architecture, in which multiple instructions operate on separate data in parallel. With HTT, one physical core appears as two processors to the operating system, allowing concurrent scheduling of two processes per core. In addition, two or more processes can use the same resources: if resources for one process are not available, then another process can continue if its resources are available.

What you're saying is that AMD should try to use two cores to make a single thread faster, but this wouldn't work very well. Sure, there might be some gains by having double everything, so that the processor has more room to perform multiple instructions. But Hyper-threading gains performance on the basis that there isn't enough independent instructions to parallelize (and therefor utilize the entire core) to begin with, so it throws two threads at the core. But in order to gain any performance doing what you're suggesting, you need to have too many independent instructions. Basically any application which benefits from Hyper-threading would only lose performance with such a processor.
 

tatsu99

Commendable
Dec 7, 2016
64
0
1,660


this is some heavy MIT level stuff....how will apps that dont use hyperthreading perform?
 

TJ Hooker

Titan
Ambassador

Basically this ^

Edit: To expand on this a bit

A hyper threaded core has the same execution resources as a non HT core. HT allows the core to work on two threads simultaneously; it basically improves the scheduling and resource allocation of the core by executing operations from either thread whenever it can in order to keep all execution resources busy as much as possible, increasing overall throughput. If you were to "combine two threads of a core so that there is only one thread per core", you're basically just removing HT and going back to a regular core.
 

DSzymborski

Curmudgeon Pursuivant
Moderator


It's actually a quite basic concept. Me and my two friends may be able to wash a car faster working together, but we can't drive the car to the beach any faster with more of us because the task can't be split up into parallel tasks. We won't get to the beach any faster taking three separate cars nor will we if they're in my car helping me drive one car.
 

UnspokenWhale

Reputable
Aug 18, 2014
96
1
4,660
It's not that complicated. Each core basically has a bunch of different actions it can perform, and programs consist of a series of these instructions represented as data. The processor fetches an instruction (reads it from memory), decodes it (finds out which instructions they represent), and then executes it (performs the action).

Superscalar processors simply break up independent instructions (if I give a core three instructions, A, B, and C, if B does not rely on the result of A and C does not rely of the result of A or B, it does not matter if the instructions are executed out-of-order or simultaneously) into separate execution units to keep the entire core busy.

However, if I give a core a stream of instructions that are all reliant on one-another, I've completely ruined it's ability to utilize the entire core because it has to completely execute each instruction before moving onto the next. However, with Hyper-threading, it gives the core two instruction streams from two different threads. This allows the core to utilize the rest of the execution units because it has another instruction stream to draw from when it has an execution unit that isn't being utilized.

What you're suggesting basically results in there being twice as many execution units per thread. This does not solve the problem Hyper-threading is meant to solve. You'd only see performance gains when the core can manage to parallelize the stream of instructions to such an extent that it can utilize more execution units than it normally has. However, if you give it a stream of instructions that brings much of the core to a halt until it's completed, you'd end up under-utilizing one core and stalling the other while simultaneously managing zero performance gains because the one of the two cores that is doing the work can't complete it any faster than it would independently.

An application that could "perfectly" utilize the feature would probably not perform as well as one that just used all the threads on the processor to begin with. And even that can't beat Intel's offerings. AMD needs a new architecture to compete.
 

bit_user

Polypheme
Ambassador
The original premise isn't completely off the mark. It's reminiscent of Soft Machines' VISC technology, which uses sophisticated runtime analysis to find coarse-grain parallelism that can be exploited by harnessing multiple cores.

Now, if the app in question were written to perfectly utilize all available cores & hyperthreads, then VISC couldn't add anything. But the reality is that much software leaves some performance on the table, not least because they don't want to optimize it too narrowly for a particular CPU or SoC.

I think VISC & similar technologies definitely have potential & a place in the future computing landscape. I can't say whether anything of the sort will catch on for desktop applications, however.
 
There are a lot of idle CPU cycles while CPU is waiting for data. That's how single core processors can multi task. Faster and larger cache took care of most of them and HT cuts even that in half. How it's used depends on OS and programs/games.
 

bit_user

Polypheme
Ambassador
If you're talking about waiting for data from disk or network, then yes. As for RAM, no the OS won't context-switch a thread, simply because it got a cache miss. Some GPUs might do something equivalent, however.

This is actually one of the things that Hyperthreading is so good at. While one thread is blocked on a memory transaction (this can run into the thousands of cycles), the other has full use of the ALU. As CPUs become increasingly data-starved, the number of hardware threads per core is typically increased. Knights Landing/Xeon Phi has 4 threads per core, while I think recent generations of Intel's iGPUs have 7.

You can think of SMT as a sort of light-weight context switching, but it's not the same as having the OS switch between threads or processes.