[pclab] AMD Steamroller - successor Piledriver (Vishera and Trinity). What can we expect from the...

skitz9417

Distinguished
nd heres the source http://pclab.pl/art52841.html

A few days ago AMD announced its plans to investors for the immediate future, informing them, among other things, that the architecture of the Steamroller for the first time will be used again in 2013. It's nothing new, but subsequent Web sites began to chase in copying information from half a year and serving them as new. Therefore, we have prepared for you a small but comprehensive set of information about the upcoming AMD processors. Note: do not pogubcie you like to bits, bytes, megahertz and gigatransferach!
Microarchitecture Steamroller
Components of a modern processor and ... new stand?
Kaveri with the new memory controller-GDDR5M and more
Last year during the Conference 2012 new AFDS (then) head of AMD Rory Read said that since then, AMD products will be produced regularly as clockwork. There we used to believe unquestioningly in ensure ' green ', but this new AMD Rory Read I guess actually working differently – "timetables" (technology roadmaps provide a means) do not change what month. Steamroller for a long time was announced for the second half of 2013, but some sites apparently cannot refrain from repeating endlessly.
Steamroller (architecture. 'walec ') already we wrote, and since then has not revealed any new information, but let us recall briefly, what is known about it. It's another step in the development of modular x 86 cluster of multithreading. For the first time applied in the Bulldozer, has been improved and now, as the Piledriver modules, is available in stores in processors Vishera (FX-x3xx) Trinity, and soon also in Richland (APU A10 series and lower). Steamroller is a step in the direction of increasing the performance of a single thread and energy efficiency.
An additional set-top box orders, improved prediction of jumping and increased the cache for instructions to help you more quickly, "to nourish" cores. Expanded the internal read/write buffers to the cache that stores data. In conjunction with the niesprecyzowanymi schedulera this command enhancements allow its inner core Steamroller performance up to 30% more micro-operations at the same time, than can the core of Piledriver (talking about maximum growth and about mikrooperacjach – not to be confused with performance in specific programs!). Interested in details welcome to the earlier article: "Hot News from Hot Chips 2012".
The first processors with Steamroller will be named Kaveri working systems, designed for laptop and desktop computers with the average segment (where today meets processors Trinity and Richland). Then the modules used to build desktop processors will Steamroller and no server layout, it is not known, however, how long have you been coming to not wait.
Kaveri APU to include two or three modules Steamroller, four or six cores. Steamroller module is significantly smaller than the Piledriver module by automatically designing some circuits. This combined with the transition to the new technological size (28 nm) in the factories should allow the construction of GlobalFoundries trzymodułowego Kaveri area less than the surface dwumodułowego Trinity.
Modules Steamroller is just one of many parts of the APU. The second most important component of the layout will be based on the GCN architecture, perhaps with some improvements. It is not known how many execution units will be or how fast the GPU is clocked. A comparison of architectures VLIW4 and GCN, we know that the latter is much more efficient with the same number of execution units. The transition from 32-nanometer production process (Trinity / Richland) on 28-nanometer (Kaveri) should also allow engineers to place a larger number of shaders on a similar surface. The increase in GPU performance the Trinity and Richland should be big, but we can not guess anything else, until we know the details of production.
The new stand?The coreboot source code (open source program that replaces the BIOS and UEFI) a few months ago there was the first sign of Kaveri APU support. In the file with a list of constants defined two named KV_SOCKET_FM3 and KV_SOCKET_FS2. Abbreviation KV probably comes from the name of the working Kaveri, and FM3 and FS2 is probably the name of the new carriers to the APU desktop and laptopowych. Soon available for sale Richland APU will use coasters FS1 (laptops), and their desktop counterparts are compatible with base FM2. Break compatibility with existing bases, not everyone will like it, but you have to remember that changes in the overall organization of the processor often entail the need for a new stand. As mentioned in the description of Haswell processors, often these modifications can not be avoided - integrated power supply and external links to the area where the most work to do when it comes to adapting to new types of construction equipment. We prefer to get a more modern APU, even at the cost of upgrade possibilities - especially if you abandon the FM2 will enable significant improvements in energy efficiency and support for new types of memory ... but more on that in a moment
AMD has long reiterates that the APU is more than the sum of the CPU and GPU. Indeed, the integration of these two systems is a critical part of the puzzle, perhaps the most difficult to implement. The Kaveri for the first time is to be integrated into the address space CPU and GPU (at least in the world of personal computers - PlayStation 4 is to appear in stores around the same time).
Trinity and Richland allow the CPU and GPU to use each other's storage pools without copying the data, but to provide data under the control of another system, you need to convert addresses. The Kaveri GPU and CPU will use the same address, and to be able to operate on the same segment of memory to change without 'transfer of control' or copying.To allow such operations must include ensuring high-speed link between the GPU and memory controller. The Trinity GPU has two memory buses: one connects to the queue of requests for access to the address space of the CPU and allows for these "transfer of control" over the memory without copying. The second is used to communicate directly with the controller and a separate storage pool only for the GPU. In Kaveri is the first to be twice as wide, 32-byte, even if it is clocked at a similar rate as in the Trinity, it still will have a capacity of about one-third more than the theoretical amount of memory system features dual-channel DDR3-1866 - and is likely to run faster. The second bus will probably lose its raison d'être since the GPU will be able to easily access the address space of the processor.
But this system of DDR3-1866 is obsolete - it is good for the Trinity, not for the next generation APU :) unofficial, but very reliable information saying that the memory controller in the Kaveri will be able to act not only in DDR3, but also in cooperation with the bones of GDDR5 :
Such a controller would be complicated, but AMD's engineers have a lot of experience in building a universal storage controllers: for example, in Radeonach Cape Verde (HD 7750) have used the one that supports and DDR3 and GDDR5. Similarly Phenomach II controller uses DDR2 and DDR3 support. But what's the use of GDDR5 changes in practice?In short we can say that at the same bus width and the number of connections on the laminate GDDR5 provides higher bandwidth than DDR3, but at the expense of increased delays. Dual-channel DDR3 Kaveri will be able to work with the RAM clocked at up to DDR3-2500 (or 1250 MHz PC3-10000), which gives a maximum theoretical bandwidth of 40 GB / s DDR3. The same controller will be able to operate in either four 32-bit channels of bone GDDR5 clocked to 850 MHz (3500 MT / s, compared with 6000 MT / s fastest desktop video cards), giving a theoretical throughput of 54.4 GB / s Increased memory bandwidth can continue to scale performance built-in graphics. In the case of x86 cores longer the delay in access to the RAM can be "masked" by using more advanced prefetch mechanism, but it is not yet clear how this will be resolved in the Kaveri.
At first glance, the most serious obstacle to the use of GDDR5 memory as the system is its small capacity. Bones GDDR5 (if you do not know what we mean when we refer to the glossary) are available in versions with a capacity of 1 Gbps or 2 Gbps (256 MB). Although the production are the bones of four-gigabit (512MB), even in the most expensive graphics cards used cheaper and readily available 2-Gigabit. Assignment of the 128-bit controller Kaveri maximum of 2 Gbps bones would amount to 2 GB of RAM. It's quite funny capacity at a time when most cost-effective purchase are 8 - or 16-gigabyte DDR3 kits. The use of 4-gigabit bones would build a system memory of 4 GB, which already meets the needs of lightweight laptops, but for larger laptops and desktops still not enough.
But all indications are that the mode of GDDR5 memory will be soldered to the board, such as graphics cards. SK Hynix and proposed organization JEDEC memory usage GDDR5M (postscript M is a different case and additional energy-saving features) in the form SODIMMs. GDDR5M SODIMMs are to have the same form as DDR4 DIMM and are mechanically compatible.
DDR4 DIMM modules
SK Hynix has started selling the bones (no modules!) GDDR5M a capacity of 4 GB and clock speeds up to 1000 MHz (4000 MT / s) in the third quarter of this year, will therefore be time to spread the new standard. DDR4 Link allows you to connect more than one module per channel - maybe the same method can be installed more GDDR5M system with quad 32-bit controller. Unfortunately, the documentation logical and electrical GDDR5M available.
GDDR5M for APU to be a middle way between aging and the standard DDR3 memory integrated on the processor die. Future generations of processors probably DDR4 will serve as the relatively rapid storage of large data capacity and the additional memory in the processor die (on a separate silicon core) will be ultra-fast, small-capacity buffer.
Speaking zapuściliśmy so far into the realm of speculation: DDR4 perform four transfers per clock cycle, as well as GDDR5. The controller supports both standards, and on top of DDR3, it would not be a big surprise. We hope that AMD quickly satisfy our curiosity ...
Note: Site is in Polish, use google translate! but i have translated it for you guy and girls
 

Latest posts