Sign in with
Sign up | Sign in
Your question

What makes HyperTransport so much better than the FSB?

Last response: in CPUs
Share
November 26, 2006 8:34:12 PM

What makes HyperTransport so much better than the FSB?

It seems like, now a days, that the Netburst FSB has scaled up to provide similar bandwidth as HyperTransport. So, what advantages does HT hold over the FSB...
a. for 1P
b. for 2P
c. for 4P+
November 26, 2006 8:47:40 PM

Quote:
What makes HyperTransport so much better than the FSB?

It seems like, now a days, that the Netburst FSB has scaled up to provide similar bandwidth as HyperTransport. So, what advantages does HT hold over the FSB...
a. for 1P
b. for 2P
c. for 4P+

Up to now it hasn't scored any spectacular point but that seems the way to go for the future. And don't forget onlt HTT could make possible for AMD to put togeather the 4x4 because it allows two CPUs to communucate; flexibility is it's main advantage.
November 26, 2006 9:24:01 PM

I like your picture :D 

sorry for the offtopic
Related resources
November 26, 2006 9:27:01 PM

Quote:
. So, what advantages does HT hold over the FSB...
a. for 1P
b. for 2P
c. for 4P+

a. No advantages
b. Some advantages
c. More advantages
November 26, 2006 9:45:11 PM

They are completely different approaches: HT is a point-to-point link, while FSB is a bus.
HT has lower latency and is better for scalability: if the manufacturer needs to increase system bandwidth or he wants to add more companion chips, he simply adds another HT channel (2- and 8- Opterons have multiple HT channels for interlink).
FSB is a bus shared between all devices, so it has higher latencies (the CPU can't access a device if another is currently transferring data) and it's limited by the bus clock frequency.

Core 2 arch solved temporarily those issues raising clock speeds and doubling the CPU caches, but they'll soon touch another limit.

Anyway no solution is absolutely better: HT scales better, but FSB is much more simple to implement and the space left in the CPU can be used for more cache or more ALUs.
November 26, 2006 9:50:25 PM

Quote:
What makes HyperTransport so much better than the FSB?

It seems like, now a days, that the Netburst FSB has scaled up to provide similar bandwidth as HyperTransport. So, what advantages does HT hold over the FSB...
a. for 1P
b. for 2P
c. for 4P+



The benefits can be seen at all levels. The bandwidth shown by X2 is because of HT.

The fact that 4P Opteron is the Transaction King is partly because of HT.
1. It's bidirectional
2. It connects to any other bus (Infiniband, PCIe) with no latency penalties.
3. It provides a direct connection between CPUs/coProcs.
4. It scales very well with additional processors(doubling bandwidth)
5. It allows ganging (HT3 at least; which is like nVidias network "teaming")


HT is a great interconnect and it's funny that Intel owns the patent from Alpha but hasn't used it. Maybe they are stuck in the "in-house" mode of design where they only buy technology to keep others from it or to make money off of licensing.
November 26, 2006 9:54:29 PM

Hypertransport really isn't a replacement for the fsb. What AMD did to create the K8 is place the northbridge chip on the die with the CPU. This allows them to run the fsb at the same speed as the processor since it only has to cross a few nm, removing it as a bottleneck. Assuming the K8's fsb uses the same ev6 bus as the K7, the lowly X2 3800 has 64GB/s of bandwidth (32GB/s per core) while the FX-62 has 89.6GB/s (44.8GB/s per core).

Hypertransport is just the system side link from the integrated northbridge. It's biggest advantages are low latency, low pin count (72 per link AFAIK), long trace legnth, high bandwidth (41.6GB/s per link for the 3.0 spec), and scaleabillity. Like gOJDO said, for the average user hypertransport does almost nothing. But for massively parallel systems like something cray would build, it's the secret sauce that make's it work.
November 27, 2006 12:46:11 AM

"The benefits can be seen at all levels. The bandwidth shown by X2 is because of HT.

The fact that 4P Opteron is the Transaction King is partly because of HT.
1. It's bidirectional
2. It connects to any other bus (Infiniband, PCIe) with no latency penalties.
3. It provides a direct connection between CPUs/coProcs.
4. It scales very well with additional processors(doubling bandwidth)
5. It allows ganging (HT3 at least; which is like nVidias network "teaming") "



I know all about gangs and gangstaness if you need help with that.
November 27, 2006 1:25:40 AM

Quote:
"The benefits can be seen at all levels. The bandwidth shown by X2 is because of HT.

The fact that 4P Opteron is the Transaction King is partly because of HT.
1. It's bidirectional
2. It connects to any other bus (Infiniband, PCIe) with no latency penalties.
3. It provides a direct connection between CPUs/coProcs.
4. It scales very well with additional processors(doubling bandwidth)
5. It allows ganging (HT3 at least; which is like nVidias network "teaming") "



I know all about gangs and gangstaness if you need help with that.


I don't blame you, I blame Jack.
:twisted:
November 27, 2006 1:31:52 AM

Quote:
"The benefits can be seen at all levels. The bandwidth shown by X2 is because of HT.

The fact that 4P Opteron is the Transaction King is partly because of HT.
1. It's bidirectional
2. It connects to any other bus (Infiniband, PCIe) with no latency penalties.
3. It provides a direct connection between CPUs/coProcs.
4. It scales very well with additional processors(doubling bandwidth)
5. It allows ganging (HT3 at least; which is like nVidias network "teaming") "



I know all about gangs and gangstaness if you need help with that.


I don't blame you, I blame Jack.
:twisted:
I blame MTV
November 27, 2006 2:15:56 AM

Quote:
"The benefits can be seen at all levels. The bandwidth shown by X2 is because of HT.

The fact that 4P Opteron is the Transaction King is partly because of HT.
1. It's bidirectional
2. It connects to any other bus (Infiniband, PCIe) with no latency penalties.
3. It provides a direct connection between CPUs/coProcs.
4. It scales very well with additional processors(doubling bandwidth)
5. It allows ganging (HT3 at least; which is like nVidias network "teaming") "



I know all about gangs and gangstaness if you need help with that.


I don't blame you, I blame Jack.
:twisted:
I blame MTV

Jack tries to sound like this holier-than-thou all-knowing VunderKind, which in some ways parallels the programming of the........


Wait a minute this is a CPU forum.

Bad Master Ninja. Bad Master Ninja.

As some of us tend to say, even Jack admits that the FSB arch is D-E-A-D dead.

I guess AMD figured that if they do have a breakthrough they will need a point to point protocol that will allow not only high bandwidth and low latency but also a single connection for all levels of processing power.

As someone also "fondly" recounted, Intel owns the patent that spawned HTX. I guess they haven't figured it out yet. :lol: 
November 27, 2006 2:20:29 AM

To put it mildly.... meh. FSB will be changed. Soon enough. So this is all quite moot.
November 27, 2006 2:24:42 AM

Quote:
Alas, the uber idiot speakth --- while what you list is interesting, and part of the HT Consortium marketing bullets, it does not explain why HT is better than FSB.... you will need to do better than that, though I seriously doubt you can...


Maybe you should see my earlier post. I am not trying to get a forum "PhD." Simple question. Simple answer. I only read C# books. That's what pays the bills.

Oh maybe I should have put in a www.hypertransport.org link along with....


whatever. I'll be adjourning court now jester.
November 27, 2006 3:38:32 AM

Latency. HTT latency is 33-40% of the FSB. that is why the new DARPA benchmarks focus on that. Red Storm using 130nm Clawhammer 146's bested Blue Gene L in two benchmarks because of latency.
"The two first-place benchmarks measure the efficiency of keeping track of data (called random access memory), and of communicating data between processors. This is the equivalent of how well a good basketball team works its offense, rapidly passing the ball to score against an opponent."

"More technically, Red Storm posted 1.8 TB/sec (1.8 trillion bytes per second) on one HPCC test: an interconnect bandwidth challenge called PTRANS, for parallel matrix transpose. This test, requiring repeated "reads," "stores," and communications among processors, is a measure of the total communication capacity of the internal interconnect. Sandia's achievement in this category represents 40 times more communications power per teraflop (trillion floating point operations per second) than the PTRANS result posted by IBM's Blue Gene system that has more than 10 times as many processors.

"Red Storm is the first computer to surpass the 1 terabyte-per-second (1 TB/sec) performance mark measuring communications among processors -- a measure that indicates the capacity of the network to communicate when dealing with the most complex situations.

"The "random access" benchmark checks performance in moving individual data rather than large arrays of data. Moving individual data quickly and well means that the computer can handle chaotic situations efficiently.

"Red Storm also did very well in categories it did not win, finishing second in the world behind Blue Gene in fft ("Fast Fourier Transform," a method of transforming data into frequencies or logarithmic forms easier to work with); and third behind Purple and Blue Gene in the "streams" category (total memory bandwidth measurement). Higher memory bandwidth helps prevent processors from being starved for data." http://www.physorg.com/news62939660.html

When are supercomputers really super?
Interconnect speed is key to high-performance computing
http://www.gcn.com/print/25_5/40021-1.html A some what long article but lays most of the issues out in understandable terms.
a b à CPUs
November 27, 2006 4:38:26 AM

Heheh, easy answer: HyperTransport is a really cool technology because it allows you to chain Northbridges together.

Oh sure, it's got lots of other advantages. You even add other devices to the chain, but get this:

Say you wanted a chipset to NATIVELY support PCI-Express and AGP. Well, that would require two different Northbridges. That's what happened with the ULi M1695/M1567 chipset combo, the Southbridge is actually a second Northbridge!

Or say you wanted 40 PCI-Express lanes, but your chipset only supported 20. Well, you could design a new chipset. Or you could do what nVidia has done, and chain two together. OH! You thought those new Southbridges were actually new?

Business chipsets, wow, nVidia has different chipsets that are actually the same thing with different parts disabled. They chain three together to support a bunch of cards and processors.

So, you know those 8-pin Lego building blocks? That's your HT x8 link. Add a second Lego next to it for x16. Stack them any way you like, build a house of computers!
November 27, 2006 5:25:07 AM

Quote:
Alas, the uber idiot speakth --- while what you list is interesting, and part of the HT Consortium marketing bullets, it does not explain why HT is better than FSB.... you will need to do better than that, though I seriously doubt you can...


Maybe you should see my earlier post. I am not trying to get a forum "PhD." Simple question. Simple answer. I only read C# books. That's what pays the bills.

Oh maybe I should have put in a www.hypertransport.org link along with....


whatever. I'll be adjourning court now jester.

Fail.
November 27, 2006 7:13:44 AM

The benefits can be seen at all levels. The bandwidth shown by X2 is because of HT.

The bandwidth of X2 is because of the on-die memory controller and its direct connection to RAM. This memory bus is proprietary and not necessarily based on HT. It could very well be parallel just like the FSB, as modern DRAM sockets are still based on parallel signalling. Rambus is serial interface memory, but their designs have not been popular for various reasons.

The fact that 4P Opteron is the Transaction King is partly because of HT.
1. It's bidirectional


The serial design of HT means that it is more efficient to have dedicated up- and downlinks running at half aggregate bandwidth, while the parallel design of the FSB makes it better off with the entire channel dedicated to one direction at a time. Neither method is necessarily better as there is a tradeoff between bandwidth flexibility and latency.

2. It connects to any other bus (Infiniband, PCIe) with no latency penalties.

All interconnection methods introduce latency. How much is dependent on the design, and HT has low latency only because of good supporting logic, not the fact that it's serial. The "latency" in the FSB design is not at the parallel traces, either, but at the memory hub.

3. It provides a direct connection between CPUs/coProcs.

Correct. It's faster to send data directly between processors than to go through the memory hub. And a serial interface is much less likely to overcrowd a motherboard with many interconnected CPU sockets. Intel's design is not directly comparable because it involves one central memory controller handling all RAM access. Were CPUs not to come with any cache, the hub design could be more efficient.

4. It scales very well with additional processors(doubling bandwidth)

Again, the memory scaling of Opterons comes from the IMC found in each chip. The HT links are necessary to keep the SMP system functioning, as one CPU can't directly see all available RAM.

5. It allows ganging (HT3 at least; which is like nVidias network "teaming")

A good point true of serial interfaces. You can put several interfaces in parallel to increase bandwidth, whereas an alreadly parallelized interface can only be widened as board space permits.

HT is a great interconnect and it's funny that Intel owns the patent from Alpha but hasn't used it. Maybe they are stuck in the "in-house" mode of design where they only buy technology to keep others from it or to make money off of licensing.

Intel hasn't used HT because there's nothing to connect with HT. Their CPUs don't have memory controllers, so a separate Northbridge has to handle RAM, and I don't think it's practical to connect CPUs both by Northbridge and by inter-CPU HT. There exist dual x16 SLI boards which indirectly use HT to connect the two PCI-E slots, but the 975X chipset has the Northbridge handling dual GPUs at 8x speed.
November 27, 2006 8:06:01 AM

May I add that since AMD uses an integrated memory controller, there is no memory control overhead going through their simili-HT CPU/RAM connection? Meaning that at equal bandwidth, Intel's FSB can't transport as much actual data back and forth as AMD's does.
This does increase an X2's actual throughput by as much as 25%, as far as I know.
November 27, 2006 10:34:50 AM

Quote:
May I add that since AMD uses an integrated memory controller, there is no memory control overhead going through their simili-HT CPU/RAM connection? Meaning that at equal bandwidth, Intel's FSB can't transport as much actual data back and forth as AMD's does.
This does increase an X2's actual throughput by as much as 25%, as far as I know.


I'm not sure that's relevant. AMD can add as many data lines as necessary to make sure RAM is being fully utilized by the IMC.

Memory control signals go through the address bus lines on DIMM modules, whereas actual data goes through the data lines. In addition, as you know, memory operates with several-clock latencies. That's why dual-channel DDR2-800 (64 bit data) doesn't fully saturate a 400MHz quad-pumped FSB (also 64-bit). The MCH apparently holds a small buffer for RAM and juggles bandwidth with other system communication. It's in Intel's interest to minimize any overhead in the data going over the FSB.
November 27, 2006 1:46:47 PM

errr... a single data line is 64 bit wide (has been since the i586, which required paired 32-bit FPM/EDO modules)), so bus width is 128bit for dual channel...
Now memory control signals may well go through address bus lines, but I'm not sure those are not counted in the FSB width and throughput. Do you have any source that would clarify the situation? I'm sorry, I feel a bit lazy today.
!