Sign in with
Sign up | Sign in
Your question

Athlon64 - AMD Learns from Intel

Last response: in CPUs
Share
April 20, 2003 3:32:37 PM

The XBit test of the 1.6 GHz. "2800" looks sort of disappointing to me. The new chip won in a few areas but lost dismally in many others due to the low core clock. But hey, this is exactly what Intel does... introduce a dog of a new chip, sell a boat-load to the "just gotta have it first" guys, then improve it a few months later and sell another boat load! This strategy has worked for Intel in the past and it's gonna make a lot of money for AMD in the future!

I have a hard time thinking they can't introduce this chip at 2+ GHz. where it would be competetive... But then, I'm sure the complexities are enormous...

Scout
700 Mflops in SETI!
April 20, 2003 3:36:10 PM

AMD will probably not concentrate on A64 at first since opteron is the one that will give them the most money
April 20, 2003 3:40:12 PM

I now the A64 isn't due out till year end... But I got the impression AMD fully intends to release a 1.6 GHz. model and call it a 2800+... is that right?

Scout
700 Mflops in SETI!
Related resources
April 20, 2003 5:09:07 PM

AMD will change the PR rating and clockspeed 100 times before the final release. The review is using pre production chipset and cpu. I would take the whole review with a pinch of salt if I were you. AMD have hinted that A64 will be launched at approx 2Ghz. They still have a couple of months for tweaking the cpu and drivers which could affect performance at final release.

<font color=purple>Ladies and Gentlemen, its...Hammer Time !</font color=purple>
April 20, 2003 5:12:11 PM

Considering that AMD has nowhere near the name-recognition and market penetration of Intel, they would be extremely, extremely stupid if they were trying to act like Intel. They need to win either on performance or price, and usually both. A crappy product release doesn't really help them at all.

I get the feeling that AMD is teetering on the edge of viability right now, and the next year will probably decide their fate. A-64 better be real good or real cheap. Anything in-between doesn't really help them, IMHO.



Those who live in glass houses shouldn't take showers. :tongue:
April 20, 2003 9:39:26 PM

Amds lifespan doesnt depend on a64, it depends on opteron.The server market is about 1000x more lucrative than the desktop market. If opteron is "revolutionary" and a64 isnt they still dont lose a big share of thier funds
April 21, 2003 12:40:22 AM

AMd need at lease 33% lower price or 33% more performance to be ablr to compete Vs intel wich they dont have any vs intel they got the same price on high-end and lower Performance.

[-peep-] french
April 21, 2003 1:33:22 AM

If you look carefully, you will see that Athlon 64 is struggling most times when it comes to SSE2 optimized application, because they don't have enough memory bandwidth. If AMD uses Dual DDR333 controller with final Athlon 64, it will look much better

----------------
<b><A HREF="http://geocities.com/spitfire_x86" target="_new"> My Website</A></b>

<b><A HREF="http://geocities.com/spitfire_x86/myrig.html" target="_new"> My Rig</A></b>
April 21, 2003 2:59:38 AM

If you also look carefully you will notice the date on the cpu, week 1 of 2003. From the time the cpu was fabbed to the time of the preview im sure amd has had ample time to make changes already to this class of processor, and it still has 5 months left. And amd isnt lacking mem bandwidth, if p4 can move 6000mb/s at a 200ns latency, and the amd can do 3000 at 97ns, which one is moving more data?
April 21, 2003 4:47:39 AM

According to the memory benchmarks, the P4's FSB is moving more data. However, as explained before, the P4's SSE2 unit was really focused on in the point of design. Unlike it's x87 legacy unit (which the majority of software still uses). Using SSE2, there's no reason why the K8 would achieve any better of a clock-normalized performance than the P4.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
April 21, 2003 5:21:04 AM

Ironically, as discussed in the other thread, the A64 does actually run faster tick for tick in SiSoft Sandra Multi-Media floating point SSE2 benches as long as hyper-threading is disabled.

P4 12949 / 2.53ghz = 5,118 per ghz,
P4 20635 / 2.8ghz = 7,369 per ghz. (800mhz FSB)
A64 9161 / 1.6ghz = 5,725 per ghz.
P4 15618 / 3.06ghz = 5,104 per ghz
P4 21916 / 3.06ghz = 7,162 per ghz

A64 is ~600 marks per ghz faster against non hyper-threading P4s, ~1450 marks slower against hyper-threading P4s.

Not much faster though and the P4 is much faster when hyper-threading is enabled.

Dichromatic for your viewing plesure...
April 21, 2003 7:04:55 AM

Quote:
if p4 can move 6000mb/s at a 200ns latency, and the amd can do 3000 at 97ns, which one is moving more data?

I think you answer your question yourself: The P4 moves 6000 MB/s, the A64 3000 MB/s ... So yes, the P4 has a bigger memory bandwidth. Latency has got nothing to do with data throughput, though it is important for performance.

Greetz,
Bikeman

<i>Then again, that's just my opinion</i>
April 21, 2003 9:19:33 AM

Apparently latency has a lot to do with throughput, just look at the <A HREF="http://www.xbitlabs.com/articles/cpu/display/athlon64_7..." target="_new">bandwidth numbers.</A> The A64 comes dramatically close to the maximum throughput of 3.2GBs for single channel DDR400. Running at close to 96% maximum. Where as the P4 2.8 manages only 80% of its 6.4GBs possible on a dual channel DDR400 bus (quad channel 200 bus), as the P4 2.53 manages 80% of its 4.2GBs bus. While the XP2800 with its single channel DDR333 bus comes in at 87% efficiency of its 2.7GBs. The under clocked XP 1.6ghz DDR400 manages an 81-90% efficiency of its 3.2GBs bus.

Dichromatic for your viewing plesure...
April 21, 2003 4:06:17 PM

This i a wierd result i want to see A64 memory controleur full benchmark.a 97 NS lantency it more 97 CPU cycle.

[-peep-] french
April 21, 2003 4:15:41 PM

Ok, yes, indeed, latency does matter, but not in the way the other guy was thinking ... It is like the access time of harddisks more or less. It also depends very much on the type of memory access that is required ... If a constant stream of data is required, latency will be much less determining, but if random places are adressed, it will play a bigger role. I think you get the point.

Greetz,
Bikeman

<i>EDIT:</i> Look what I found: <A HREF="http://arstechnica.com/paedia/b/bandwidth-latency/bandw..." target="_new">Understanding Bandwidth and Latency</A> on ArsTechnica ... For the interested people.

<i>Then again, that's just my opinion</i><P ID="edit"><FONT SIZE=-1><EM>Edited by bikeman on 04/21/03 08:59 PM.</EM></FONT></P>
April 21, 2003 5:37:43 PM

What? Could you be a bit more specific? Why 97ns and 97 CPU cycles?

Dichromatic for your viewing plesure...
April 21, 2003 6:16:46 PM

Juin said it is more than 97 CPU cycles. Which is logical since 1.6GHZ means 1.6 cycles per nano.

--
This post is brought to you by Eden, on a Via Eden, in the garden of Eden. :smile:
April 21, 2003 7:09:00 PM

Memory latency has nothing to do with CPU cycles. It is based on the memory clock and how well it can balance reads vs. random seeks. The 800mhz P4 has a 200mhz memory clock as does the A64 and the under clocked XP 1.6ghz. The 533mhz P4 runs on a 133mhz while the XP2800 runs on a 166mhz bus.

Dichromatic for your viewing plesure...
April 21, 2003 7:47:04 PM

Quote:
Ironically, as discussed in the other thread, the A64 does actually run faster tick for tick in SiSoft Sandra Multi-Media floating point SSE2 benches as long as hyper-threading is disabled.

P4 12949 / 2.53ghz = 5,118 per ghz,
P4 20635 / 2.8ghz = 7,369 per ghz. (800mhz FSB)
A64 9161 / 1.6ghz = 5,725 per ghz.
P4 15618 / 3.06ghz = 5,104 per ghz
P4 21916 / 3.06ghz = 7,162 per ghz

A64 is ~600 marks per ghz faster against non hyper-threading P4s, ~1450 marks slower against hyper-threading P4s.

Not much faster though and the P4 is much faster when hyper-threading is enabled.

I don't think you could actually compare it like that. Processor performance doesn't scale linearly and it doesn't scale the same way with two dramatically different MPU architectures. We'd really need a P4 Northwood at 1.6 GHz to tell the throughput difference per clock. That or a 2.53 GHz Athlon64. Although I don't think we'll see that anytime soon.

Quote:
Memory latency has nothing to do with CPU cycles. It is based on the memory clock and how well it can balance reads vs. random seeks. The 800mhz P4 has a 200mhz memory clock as does the A64 and the under clocked XP 1.6ghz. The 533mhz P4 runs on a 133mhz while the XP2800 runs on a 166mhz bus.


The P4 actually sends access data twice per clock, so it's "memory clock" is effectively 400MT/s for accesses and 800MT/s for data. Because of the nature that memory is accessed, actual throughput is very latency-dependent.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
April 21, 2003 9:41:18 PM

The P4 scales pretty darn linearly over half a gigahertz. I mean the 3.06 differs from the 2.53 by only 0.27%. The P4 does seem to get a significant boost from an 800mhz FSB, almost 200 marks per gigahertz. I don't see why it wouldn't be similar for an A64.

BTW the P4 sends data 4 times per clock, thusly the famous <A HREF="http://www.tomshardware.com/cpu/20001120/p4-04.html" target="_new">quad pumped bus</A>. However, the latency is always calculated off the base FSB and is fully dependant on the efficiency of memory controller.

edit 5.53 2.53 quite a difference when talking gigahertz

Dichromatic for your viewing plesure...<P ID="edit"><FONT SIZE=-1><EM>Edited by Schmide on 04/21/03 02:54 PM.</EM></FONT></P>
April 21, 2003 10:38:43 PM

Quote:
The P4 scales pretty darn linearly over half a gigahertz. I mean the 3.06 differs from the 2.53 by only 0.27%. The P4 does seem to get a significant boost from an 800mhz FSB, almost 200 marks per gigahertz. I don't see why it wouldn't be similar for an A64.


Whether the P4 scales linearly is not really the issue. The issue is, does the Athlon64 scale in the same way the P4 did from 1.6 GHz to 2.53 GHz. I.e. if the P4 scaled 50% in performance for a 58.13% increase in clockspeed from 1.6 GHz to 2.53 GHz, would the Athlon64 scale exactly the same amount (a 50% performance increase going from 1.6 GHz to 2.53 GHz)? If not, and if there are variations as great as 10% in scalability (not very unfeasible), then your analysis would be questionable as that was the clock-normalized performance difference between the 2.53 P4 and 1.6 Athlon64 was 11.8%.

Quote:
BTW the P4 sends data 4 times per clock, thusly the famous quad pumped bus. However, the latency is always calculated off the base FSB and is fully dependant on the efficiency of memory controller.


While latency is calculated off the base-FSB, accesses are sent twice per clock, meaning instead of just sending one load command, two can be sent (or a pre-fetch command) in each clock iteration. This means that you won't have to wait till the next clock to send the second command which saves time significantly.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
April 21, 2003 11:28:01 PM

I was under the impression that prefetch was only for access between the L1 and L2 and loads from main memory use exclusive access to main memory. I attempted to find memory controller details to backup what you are saying but was unsuccessful. If you could point me in the right direction I would like to read about this multi-access external access.

As for the synthetic benchmarks, the XP scales linearly under the same bus conditions as does the P4 and P3. Just look at <A HREF="http://www.tomshardware.com/cpu/20030217/cpu_charts-28...." target="_new">here</A>. I wonder why you would think it would not be so. If you are just trying to cast the P4 in a better light, the fact that it scales linearly and is greatly influenced by hyper-threading is a benefit to the P4.

Dichromatic for your viewing plesure...
April 22, 2003 1:53:20 AM

Quote:
I was under the impression that prefetch was only for access between the L1 and L2 and loads from main memory use exclusive access to main memory. I attempted to find memory controller details to backup what you are saying but was unsuccessful. If you could point me in the right direction I would like to read about this multi-access external access.


The memory is acted on exclusively from the L1/L2 caches. I mentioned that prefetch commands were sent to memory to load data into cache. As for address commands sent twice per clock, that was from <A HREF="http://I was under the impression that prefetch was only for access between the L1 and L2 and loads from main memory use exclusive access to main memory. I attempted to find memory controller details to backup what you are saying but was unsuccessful. If you could point me in the right direction I would like to read about this multi-access external access." target="_new">Anandtech's 3.0C P4 review</A>, I will quote:
"In the case of the Pentium 4's FSB, the actual operating frequency of the bus is 100/133MHz (for the 400 and 533MHz FSBs respectively). Addresses are sent twice per clock, which makes the FSB transfer addresses as fast as a 200/266MHz FSB would; and finally we have the quad-pumped data transfer rates, which means that data can be sent 4x per clock, effectively making the FSB transfer data as fast as a 400/533MHz FSB would."

Quote:
As for the synthetic benchmarks, the XP scales linearly under the same bus conditions as does the P4 and P3. Just look at here. I wonder why you would think it would not be so. If you are just trying to cast the P4 in a better light, the fact that it scales linearly and is greatly influenced by hyper-threading is a benefit to the P4.


Looking at the numbers using the same memory subsystem (only the 2.66 and 2.8 had PC1066 memory), a 5% clock increase between the 2.66 and 2.8 GHz P4's yielded a 2.6% increase in SSE2 FP performance.
Looking at the AthlonXP's scaling from 1.67 to 2.13 Ghz (2000+ to 2600+) using an FSB266 and DDR266, we see a roughly linear scaling (27% increase yielded 27% increase in SSE performance.)
Obviously the two MPU's do not scale the same. Examining the two with different memory subsystems would likely yield different results (some with the P4 scaling more linearly and others with the AthlonXP scaling less linearly), however, the point of the matter is, we do not know how well the Athlon64 would scale and we don't know that it'll scale exactly the same as the P4 does. Without knowing such things, you cannot examine the difference of clock-normalized performance using different frequency levels. That is, an Athlon64 at 1.6 GHz may have a certain clock-normalized performance compared a 2.53 GHz P4, but that doesn't mean a 2.53 GHz Athlon64 would have the same clock-normalized performance difference compared to the 2.53 GHz P4. Scaling isn't always exactly linear on both MPU's and the scaling varies with the different clock frequencies (i.e. a 1.67 GHz Athlon may scale linearly to 2.17 GHz, but a 2.17 GHz Athlon may not scale linearly to 2.53 GHz). The only real way to determine the difference in clock-normalized performance is to take both processors at a specific frequency and compare them. And even then, the clock-normalized performance difference would only be valid at that frequency.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
April 22, 2003 2:37:28 AM

After reading that, as I have not seen it elsewhere, it reads as if only the new P4 C 800mhz fsb processor has this 2-transfer ability. Any thoughts?

I totally agree with the varying scaling of processors, especially those of different flavors. I would just restate that it follows a very deterministic model and processors of varying architectures follow similar lines. As stated by xbitlabs, the A64 is just a tweaked XP.

Dichromatic for your viewing plesure...
April 22, 2003 3:30:48 AM

Normaly they count the number of CPU cycle.after the request is send the CPU continue ticking after the Data have arrive thy count the number of tick and that it simple.

Why i say so a EV8 score about 70 NS or 75 CPU clock Lantency with plain PC 800 RDRAM.DDR 400 twice as much internal clock will score higher no.Canterwood to have 4X time the lantency imposible it only because P4 run twice faster sso relate to NS and you got a real figure.

Also only read burst score are give what about Write perf READ/Write perf none are giving i so want to see I850 PC 800 kick there butt on write perf.

[-peep-] french
April 22, 2003 3:36:31 AM

Quote:
After reading that, as I have not seen it elsewhere, it reads as if only the new P4 C 800mhz fsb processor has this 2-transfer ability. Any thoughts?


They specifically mentioned the FSB400 and FSB533 P4's. If it was something specific to the FSB800 P4's, I think they would've mentioned that.

Quote:
I totally agree with the varying scaling of processors, especially those of different flavors. I would just restate that it follows a very deterministic model and processors of varying architectures follow similar lines. As stated by xbitlabs, the A64 is just a tweaked XP.


Well, my whole point was that even comparing the AthlonXP to the P4, scalability varies (they're both close to linear in certain situations, but not completely) and hence, unless there's a huge difference in clock-normalized performance (like 30%), it would be difficult to say there's a definite difference. Not unless you had two clock-normalized MPU's to test. Just dividing the performance number by frequency isn't really going to tell you anything except that the average IPC of one processor is different than the average IPC of another processor at a completely different clockspeed. Hell, even a P4 1.6 GHz would have a different clock-normalized performance than a P4 2.53.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
April 22, 2003 3:40:38 AM

Adress command bus run at 2X or DDR and the Data bus at 4X like DDR single command adresse bus and DDR data bus RDRAM use 2X/2X setting.

ALL DDR ALL P4 simple.

Scaling is hard to know like some HT kind mix all result.Normaly over 2GHZ K7 core have trouble increase is perf.can be due to a to old subsystemes or lack of cache.

[-peep-] french
!