Q&A with AMD VP Randy Allen: How Torrenza will open up the CPU

Sunnyvale (CA) - Yesterday, during its Analyst Day briefing, AMD introduced an innovation to its AMD64 architecture that could literally change the way we think about system architecture: By means of a platform currently called Torrenza, the company will effectively open up its existing HyperTransport link to the CPU, and license the means for other companies to produce co-processors for AMD64-based Opteron servers. These co-processors will be called accelerators, borrowing yet another term from the early days of Intel's 387. An accelerator can provide hardware processing functionality right next door to the CPU - literally within a spare socket that could be used for a supplemental CPU, bypassing the system bus and relocating hardware functionality from the expansion bus to the core level.

Randy Allen, AMD Corporate Vice President for Server Products.

It may be a little premature to say this changes everything, but the potential is certainly there. With HyperTransport expansion slots already a part of AMD's second phase of Torrenza implementation - the first phase of which will launch in the latter part of this year - we could see physics processors, storage network connections, and conceivably even graphics processors moving off of PCI Express and onto HyperTransport (HTX), in a much smaller form factor with considerably less power consumption.

TG Daily spoke late yesterday afternoon to AMD's corporate vice president for its Server Products division, Randy Allen, who was one of the presenters at yesterday's event. One of our concerns was how far this Torrenza platform will extend. For instance, could it make use of Trinity - AMD's forthcoming virtualization platform, also announced yesterday - to enable an underlying control program (what engineers call a "hypervisor") to bypass device drivers and, in turn, the operating system. Here's what Allen had to say yesterday:

TG Daily: It appears to me that Torrenza would open up an entirely new class of original equipment platform, that could conceivably communicate with the CPU directly, bypassing the system bus.

Randy Allen: That is correct. More precisely, we really don't have a system bus with Opteron because of our DirectConnect architecture, but more precisely, the HyperTransport now connects to the southbridge, and then all the I/O devices are hanging off of that. You are absolutely correct, the key value proposition with this is: [Today] all the I/O has to go through the southbridge or a tunnel chip, say, out to PCI Express, [and] all of that requires some additional latency. What we're doing is enabling those applications and workloads that are very latency-sensitive to directly connect to the processor via HyperTransport, and that will allow higher levels of performance than could have been achieved earlier.

TG Daily: When you talk about "latency-sensitive applications," give me a few examples.

RA: Floating-point and media processing are a couple that jump out right away. We talked about XML processing. The idea is, instead of having to go all the way through an external device out through an industry standard bus like PCI Express, [the CPU] could have a very low latency connection out to a very special-purpose processor.

TG Daily: In the discussion that Marty Seyer presented, with regard to the second phase of Torrenza, which could make possible an expansion slot on top of HTX, there could be any number of these co-processor applications that could, I suppose, "piggy-back" on top of the CPU slot, correct?

RA: That is correct. The big news, as you have deduced, is that we are basically tapping into the entire capability of the HyperTransport ecosystem, if you will. This provides a very low-investment path for them to take advantage of our existing infrastructure, our HyperTransport socket, in order to deliver value to the OEMs and, ultimately, to the end users.

TG Daily: Applications that are going to be using this channel are going to be communicating directly with the CPU; they're not going to need any kind of device driver architecture to do this, will they?

Swipe to scroll horizontally
"Some of these accelerators are really good at doing certain things, on the order of 10x to 20x faster, and performing these tasks in a much more energy-efficient manner."Randy Allen, Corporate Vice President, AMD

RA: That's where some of the complexity does come in, in terms of how you schedule the execution of these tasks. I think there's a lot of work left to do, in terms of making sure that there's an efficient way of offloading the appropriate workloads such that you get the full benefit of those capabilities, of those accelerators, at the same time, without unneeded complexity.

TG Daily: [With regard to] whatever control program is actually in use inside future AMD64 architectures, for HyperTransport applications to communicate with the CPU, this would be an opportunity for a certain class of virtualization technology to enter the picture...I'm thinking about a type of platform that would enable a virtual system to ride on top of a physical layer system underneath. Wouldn't Trinity provide some of that?

RA: Trinity could be used in conjunction with Torrenza in that manner, that is correct.

TG Daily: In so doing, I would think this would position AMD64 as a technology with capabilities over and above what Intel is doing with VT [Vanderpool].

RA: If you go back to the traditional virtualized environment that we expect to have, we tend to think of it in terms of having the underlying hardware with the guest OS and the hypervisors. In some sense, the HT links to those other processors, we would think, would be external to that. I don't think it's as clear-cut as you're suggesting, in terms of how the virtualization connects with Torrenza.

TG Daily: So we're not talking about a hypervisor-type situation?

RA: No, we're not.

TG Daily: Instead of that, if it's not a hypervisor and it's not a BIOS, what does the control, here?

RA: The control will have to be addressed by the software, but the idea will be that there will be operations being performed by the accelerator, and there would be instructions issued across HyperTransport, and results delivered back from HyperTransport. But some of these accelerators are really good at doing certain things, on the order of 10x to 20x faster, and also performing these tasks in a much more energy-efficient manner. Both performance and performance-per-watt benefits for doing certain workloads by very specialized hardware - that's really the play.

Rather than trying to run every workload on a general-purpose x86 processor, recognize that there's going to be a certain set of workloads that, at least for a period of time, are not so ubiquitous that we would put the hardware in our general-purpose x86 processor, but that there can be special-purpose hardware that can deliver a significant performance benefit, and that the best way to do that is to just hang that processing engine off of HyperTransport, therefore providing a very low latency, high bandwidth connection between the general-purpose processor and the special-purpose processor.

TG Daily: Some of the early partners that were listed in the presentation include companies that are into scalable processing technologies, that are already producing external processors for Opteron systems, that are producing storage area networking capabilities.

RA: For example, Cray, in the slide that they presented today, talking about how they're actually utilizing this capability for their supercomputer, is an outstanding example of that.