Atom: In-Order and HyperThreading
The Atom uses a new architecture, but with older technologies. It’s the first in-order x86 from Intel since the Pentium, back in 1993. All other Intel processors (since the P6) use an out-of-order architecture.
In-Order: Say what?
To simplify, think of the processor as receiving the instructions one by one and putting them in its pipeline before executing them. In an in-order architecture, the instructions are executed in the order in which they arrive, whereas an out-of-order architecture is capable of changing the order in the pipeline. The advantage is that losses can be limited. If, for example, you have a simple calculation instruction, a memory access, then another simple calculation, an in-order architecture will execute the three operations one after the other, whereas in OoO the processor can execute the two calculations at the same time and then the memory access, with an obvious time saving. Quite surprisingly, whereas in-order architectures generally use a short pipeline, the Atom has a 16-stage pipeline, which can be a disadvantage in certain cases.
HyperThreading is a technology that appeared with the Pentium 4. It can process two threads simultaneously using the unused parts of the pipeline. While not as efficient as two true cores, the technology can make the OS think that the CPU can process two threads simultaneously and increase the computer’s overall performance. On the Atom with its long pipeline coupled to an in-order architecture, HyperThreading is very effective, and the technology can significantly increase performance without impacting the TDP. Intel claims an increase in consumption of only 10%.
The processing core
For the rest, the Atom is equipped with two ALUs (units capable of performing integer calculations) and two FPUs (units dedicated to floating-point calculation and very important for gaming, for example). The first ALU manages shift operations, and the second jumps. All multiplication and addition operations, even in integers, are automatically sent to the FPUs. The first FPU is simple and limited to addition, while the second manages SIMD and multiply/divide operations. Note that the first branch is used in conjunction with the second for 128-bit calculation (the two branches are in 64 bits).
Intel Has Optimized the Basic Instructions
If you look at the number of cycles necessary to execute instructions, you realize something: Some instructions are fast and others are (very) slow. A mov or an add, for example, is executed in one cycle, as on a Core 2 Duo, whereas a multiplication (imul) will take five cycles, compared to only three on the Core architecture. Worse, a floating-point division in 32 bits, for example, takes 31 cycles compared to only 17 (or almost half as many) on a Core 2 Duo. In practice – and Intel willingly admits this – the Atom is optimized to execute the basic instructions quickly, meaning that this processor short-changes performance with complex instructions. This can be checked simply using Everest (for example), which includes a tool for measuring the latencies of instructions.