Sign in with
Sign up | Sign in
Your question

Pretty good explanation of x86-64 by HP

Last response: in CPUs
Share
Anonymous
a b à CPUs
a b α HP
December 5, 2004 4:02:11 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

I found this whitepaper from HP to be pretty good, it is surprisingly
candid, considering HP was the coinventor of the Itanium. It does a
pretty good job of explaining and summarizing the similarities and
differences between AMD64 and EM64T, and their comparison to the
Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
compatible", but IA64 is a different animal altogether.

Yousuf Khan

http://h200001.www2.hp.com/bc/docs/support/SupportManua...
Anonymous
a b à CPUs
a b α HP
December 5, 2004 9:21:33 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 01:02:11 -0500, Yousuf Khan <bbbl67@ezrs.com> wrote:

>I found this whitepaper from HP to be pretty good, it is surprisingly
>candid, considering HP was the coinventor of the Itanium. It does a
>pretty good job of explaining and summarizing the similarities and
>differences between AMD64 and EM64T, and their comparison to the
>Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
>compatible", but IA64 is a different animal altogether.
>
> Yousuf Khan
>
>http://h200001.www2.hp.com/bc/docs/support/SupportManua...

Hmm and the following quote: "However, the latency difference between local
and remote accesses is actually very small because the memory controller is
integrated into and operates at the core speed of the processor, and
because of the fast interconnect between processors." is relevant to
another discussion here. I wish we could get a firm answer on this one.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
Anonymous
a b à CPUs
a b α HP
December 5, 2004 9:44:31 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

Yousuf Khan wrote:
> I found this whitepaper from HP to be pretty good, it is surprisingly
> candid, considering HP was the coinventor of the Itanium. It does a
> pretty good job of explaining and summarizing the similarities and
> differences between AMD64 and EM64T, and their comparison to the
> Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
> compatible", but IA64 is a different animal altogether.
>
> Yousuf Khan
>
> http://h200001.www2.hp.com/bc/docs/support/SupportManua...

When did the non-Xeon Prescott P4s start offering EMT64 as listed in
the paper? News to me. Does HP know something the rest of the world
doesn't?

Bill
Related resources
Anonymous
a b à CPUs
a b α HP
December 5, 2004 5:42:59 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 06:44:31 GMT, Bill Bradley
<senator2@NOSPAMearthlink.net> wrote:

>Yousuf Khan wrote:
>> I found this whitepaper from HP to be pretty good, it is surprisingly
>> candid, considering HP was the coinventor of the Itanium. It does a
>> pretty good job of explaining and summarizing the similarities and
>> differences between AMD64 and EM64T, and their comparison to the
>> Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
>> compatible", but IA64 is a different animal altogether.
>>
>> Yousuf Khan
>>
>> http://h200001.www2.hp.com/bc/docs/support/SupportManua...
>
> When did the non-Xeon Prescott P4s start offering EMT64 as listed in
>the paper? News to me. Does HP know something the rest of the world
>doesn't?

Not that they know something the rest of the world doesn't, just that
they have access to processors that most of us do not. IBM sells them
as well, but for the time being Intel will ONLY sell them for use in
servers. Why? I really don't know. Maybe it's just a bit too much
crow for them to eat after saying (only a bit over a year ago) that
64-bit wouldn't be useful for the desktop until the end of the year?

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca
Anonymous
a b à CPUs
a b α HP
December 5, 2004 7:25:39 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

Bill Bradley wrote:
> When did the non-Xeon Prescott P4s start offering EMT64 as listed in
> the paper? News to me. Does HP know something the rest of the world
> doesn't?

It must have been at least two or three months now, I posted a message
about it in one of these newsgroups.

Google Search: g:thl403337196d
http://groups.google.ca/groups?q=g:thl403337196d&dq=&hl...

or,

http://tinyurl.com/6tnjy

Yousuf Khan
Anonymous
a b à CPUs
a b α HP
December 5, 2004 7:30:15 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

George Macdonald wrote:
> Hmm and the following quote: "However, the latency difference between local
> and remote accesses is actually very small because the memory controller is
> integrated into and operates at the core speed of the processor, and
> because of the fast interconnect between processors." is relevant to
> another discussion here. I wish we could get a firm answer on this one.

Yeah, but that's why I think AMD insists on calling their multiprocessor
connection scheme as SUMO (Sufficiently Uniform Memory Organization),
rather than NUMA. It's not worth headaching over such small differences
in latency, is basically what they're saying.

Yousuf Khan
Anonymous
a b à CPUs
a b α HP
December 5, 2004 8:37:16 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

George Macdonald wrote:
> On Sun, 05 Dec 2004 01:02:11 -0500, Yousuf Khan <bbbl67@ezrs.com> wrote:
>
>
>>I found this whitepaper from HP to be pretty good, it is surprisingly
>>candid, considering HP was the coinventor of the Itanium. It does a
>>pretty good job of explaining and summarizing the similarities and
>>differences between AMD64 and EM64T, and their comparison to the
>>Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
>>compatible", but IA64 is a different animal altogether.
>>
>> Yousuf Khan
>>
>>http://h200001.www2.hp.com/bc/docs/support/SupportManua...
>
>
> Hmm and the following quote: "However, the latency difference between local
> and remote accesses is actually very small because the memory controller is
> integrated into and operates at the core speed of the processor, and
> because of the fast interconnect between processors." is relevant to
> another discussion here. I wish we could get a firm answer on this one.
>

Not sure if this is exactly what you are looking for in the
way of a "firm answer", but the latencies in a Opteron system are:

0 hops 80 ns uniprocessor (Local access)
100 ns multiprocessor (Local access, with cache snooping on other processors)
1 hop 115 ns
2 hops 150 ns
3 hops 190 ns

I couldn't find my original source for those numbers, and
the two and three hop numbers above are a little higher
than I remembered them as being. This time around I got
them from this thread:
http://www.aceshardware.com/forum?read=80030960

That thread refers to this article:
http://www.digit-life.com/articles2/amd-hammer-family/
which gives slightly different numbers for a 2 GHz Opteron
with DDR333:
Uni-processor system: 45 ns
Dual-processor system: 0-hop - 69 ns, 1-hop - 117 ns.
Four-processor system: 0-hop - 100 ns, 1-hop - 118 ns, 2-hop - 136 ns.


I don't know if any of the numbers above are for cache misses
or if they are averages that include both hits and misses.
Anonymous
a b à CPUs
a b α HP
December 5, 2004 9:16:39 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

"Bill Bradley" <senator2@NOSPAMearthlink.net> wrote in message
news:j7ysd.1769$yr1.125@newsread3.news.pas.earthlink.net...
> Yousuf Khan wrote:
>> I found this whitepaper from HP to be pretty good, it is surprisingly
>> candid, considering HP was the coinventor of the Itanium. It does a
>> pretty good job of explaining and summarizing the similarities and
>> differences between AMD64 and EM64T, and their comparison to the
>> Itanium's IA64 instruction set. AMD64 and EM64T are "broadly compatible",
>> but IA64 is a different animal altogether.
>>
>> Yousuf Khan
>>
>> http://h200001.www2.hp.com/bc/docs/support/SupportManua...
>
> When did the non-Xeon Prescott P4s start offering EMT64 as listed in the
> paper? News to me. Does HP know something the rest of the world
> doesn't?
>
> Bill

www.overclockers.co.uk had some a few weeks back, and htey sold very quikly.
I think theres a few more in now.

hamman
Anonymous
a b à CPUs
a b α HP
December 5, 2004 9:38:46 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

"George Macdonald" <fammacd=!SPAM^nothanks@tellurian.com> wrote in message
news:hmr5r05drs3hird56j69qs2nbu5mth1b95@4ax.com...

> Hmm and the following quote: "However, the latency difference between
> local
> and remote accesses is actually very small because the memory controller
> is
> integrated into and operates at the core speed of the processor, and
> because of the fast interconnect between processors." is relevant to
> another discussion here. I wish we could get a firm answer on this one.

In typical Opteron setups (2-8 CPUs, using the Opteron's build in SMP
hardware), the latency difference between local and remote memory accesses
is so small that the benefits of treating it as NUMA are typically
outweighed by the costs. Generally, you just distribute the memory evenly
and interleaved on the nodes (if you can) to avoid overloading one memory
controller channel.

DS
Anonymous
a b à CPUs
a b α HP
December 5, 2004 10:47:30 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

Tony Hill <hilla_nospam_20@yahoo.ca> writes:

>Not that they know something the rest of the world doesn't, just that
>they have access to processors that most of us do not. IBM sells them
>as well, but for the time being Intel will ONLY sell them for use in
>servers. Why? I really don't know. Maybe it's just a bit too much
>crow for them to eat after saying (only a bit over a year ago) that
>64-bit wouldn't be useful for the desktop until the end of the year?

How much does Intel stockpile? Could it be that they have warehouses
full of already produced non-64-bit processors, and those want to be
sold at the projected prices, not thrown away?

best regards
Patrick
Anonymous
a b à CPUs
a b α HP
December 5, 2004 10:47:31 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

> Patrick Schaaf <mailer-daemon@bof.de> wrote:

> How much does Intel stockpile?

Well, according to the Reg,
<http://www.theregister.co.uk/2004/12/03/intel_eol_p2/&g...;
they just finally announced EOL for the Pentium-II.

"The Register reveals that you'll be able to continue
ordering the part for a year, with the last trays
leaving the chip giant's Pentium II warehouse on
1 June 2006."

> Could it be that they have warehouses full of already
> produced non-64-bit processors, and those want to be
> sold at the projected prices, not thrown away?

Whether there is any connection between your hypothesis
and the Reg news, is left as an exercise for the reader :-)

--
Regards, Bob Niland mailto:name@ispname.tld
http://www.access-one.com/rjn email4rjn AT yahoo DOT com
NOT speaking for any employer, client or Internet Service Provider.
December 6, 2004 12:44:37 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 16:30:15 -0500, Yousuf Khan wrote:

> George Macdonald wrote:
>> Hmm and the following quote: "However, the latency difference between local
>> and remote accesses is actually very small because the memory controller is
>> integrated into and operates at the core speed of the processor, and
>> because of the fast interconnect between processors." is relevant to
>> another discussion here. I wish we could get a firm answer on this one.
>
> Yeah, but that's why I think AMD insists on calling their multiprocessor
> connection scheme as SUMO (Sufficiently Uniform Memory Organization),
> rather than NUMA. It's not worth headaching over such small differences
> in latency, is basically what they're saying.

I'd say that because in small systems (less than 8 CPUs), Opterons are
coherent in hardware thus sufficiently tightly coupled to be called UMA,
as far as the user is concerned.

--
Keith
December 6, 2004 12:46:42 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 19:47:30 +0000, Patrick Schaaf wrote:

> Tony Hill <hilla_nospam_20@yahoo.ca> writes:
>
>>Not that they know something the rest of the world doesn't, just that
>>they have access to processors that most of us do not. IBM sells them
>>as well, but for the time being Intel will ONLY sell them for use in
>>servers. Why? I really don't know. Maybe it's just a bit too much
>>crow for them to eat after saying (only a bit over a year ago) that
>>64-bit wouldn't be useful for the desktop until the end of the year?
>
> How much does Intel stockpile? Could it be that they have warehouses
> full of already produced non-64-bit processors, and those want to be
> sold at the projected prices, not thrown away?

Unsold inventory is a very bad thing indeed. The tax man isn't happy.
Stockholders aren't happy. Executives shiver.

--
Keith
December 6, 2004 1:24:56 AM

Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips (More info?)

On Sun, 05 Dec 2004 19:12:44 -0800, Greg Lindahl wrote:

> In article <pan.2004.12.06.02.44.34.997332@att.bizzzz>,
> keith <krw@att.bizzzz> wrote:
>
>>I'd say that because in small systems (less than 8 CPUs), Opterons are
>>coherent in hardware thus sufficiently tightly coupled to be called UMA,
>>as far as the user is concerned.
>
> However, it's not hard to show with benchmarks that paying attention
> to the NUMA nature of the Opteron is a significant win. So you can
> call it what you want, but...

Point well taken. So we have a desert topping and a floor wax. ;-)

> Newsgroups trimmed.

..chips added back in.

--
Keith
Anonymous
a b à CPUs
a b α HP
December 6, 2004 3:30:48 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

In comp.arch Tony Hill <hilla_nospam_20@yahoo.ca> wrote:

> Not that they know something the rest of the world doesn't, just that
> they have access to processors that most of us do not. IBM sells them
> as well, but for the time being Intel will ONLY sell them for use in
> servers. Why? I really don't know.

FWIW, Dell are shipping EM64T-equipped non-Xeon P4 workstations (the
Precision 370).

-a
Anonymous
a b à CPUs
a b α HP
December 6, 2004 5:04:27 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 17:37:16 GMT, Rob Stow <rob.stow.nospam@shaw.ca> wrote:

>George Macdonald wrote:
>> On Sun, 05 Dec 2004 01:02:11 -0500, Yousuf Khan <bbbl67@ezrs.com> wrote:
>>
>>
>>>I found this whitepaper from HP to be pretty good, it is surprisingly
>>>candid, considering HP was the coinventor of the Itanium. It does a
>>>pretty good job of explaining and summarizing the similarities and
>>>differences between AMD64 and EM64T, and their comparison to the
>>>Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
>>>compatible", but IA64 is a different animal altogether.
>>>
>>> Yousuf Khan
>>>
>>>http://h200001.www2.hp.com/bc/docs/support/SupportManua...
>>
>>
>> Hmm and the following quote: "However, the latency difference between local
>> and remote accesses is actually very small because the memory controller is
>> integrated into and operates at the core speed of the processor, and
>> because of the fast interconnect between processors." is relevant to
>> another discussion here. I wish we could get a firm answer on this one.
>>
>
>Not sure if this is exactly what you are looking for in the
>way of a "firm answer", but the latencies in a Opteron system are:
>
>0 hops 80 ns uniprocessor (Local access)
> 100 ns multiprocessor (Local access, with cache snooping on other processors)
>1 hop 115 ns
>2 hops 150 ns
>3 hops 190 ns
>
>I couldn't find my original source for those numbers, and
>the two and three hop numbers above are a little higher
>than I remembered them as being. This time around I got
>them from this thread:
>http://www.aceshardware.com/forum?read=80030960
>
>That thread refers to this article:
> http://www.digit-life.com/articles2/amd-hammer-family/
>which gives slightly different numbers for a 2 GHz Opteron
>with DDR333:
> Uni-processor system: 45 ns
> Dual-processor system: 0-hop - 69 ns, 1-hop - 117 ns.
> Four-processor system: 0-hop - 100 ns, 1-hop - 118 ns, 2-hop - 136 ns.
>
>
>I don't know if any of the numbers above are for cache misses
>or if they are averages that include both hits and misses.

Thanks for the data but no I guess I should have highlighted better what I
was getting at: "the memory controller is integrated into and operates at
the core speed of the processor", which is what was being
discussed/disputed in another thread.

I haven't been able to find any hard data from AMD on where the clock
domain boundaries are in the Opteron/Athlon64 but if the memory controller
is not operating at "core speed" it's now at the stage of Internet
Folklore.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
Anonymous
a b à CPUs
a b α HP
December 6, 2004 5:04:27 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 16:30:15 -0500, Yousuf Khan <bbbl67@ezrs.com> wrote:

>George Macdonald wrote:
>> Hmm and the following quote: "However, the latency difference between local
>> and remote accesses is actually very small because the memory controller is
>> integrated into and operates at the core speed of the processor, and
>> because of the fast interconnect between processors." is relevant to
>> another discussion here. I wish we could get a firm answer on this one.
>
>Yeah, but that's why I think AMD insists on calling their multiprocessor
> connection scheme as SUMO (Sufficiently Uniform Memory Organization),
>rather than NUMA. It's not worth headaching over such small differences
>in latency, is basically what they're saying.

See my reply to Rob Stow.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
Anonymous
a b à CPUs
a b α HP
December 6, 2004 5:04:28 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips (More info?)

In article <8c97r0hqh2sqf8sh89ut3153lpdmddfs76@4ax.com>,
George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote:

>I haven't been able to find any hard data from AMD on where the clock
>domain boundaries are in the Opteron/Athlon64 but if the memory controller
>is not operating at "core speed" it's now at the stage of Internet
>Folklore.

Note that the STREAM bandwidth and lmbench latency changes with every
cpuspeedbump. So clearly part of the memory controller is at the cpu
core frequency, or a related frequency, and not at the HT frequency,
or the SDRAM external bus frequency.

Please reduce the cross-post. Followups set to a group I read.

-- greg
Anonymous
a b à CPUs
a b α HP
December 6, 2004 7:48:18 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips (More info?)

In article <pqe8r0hsb4vfiqv4uvpk6h2h7cn8gq5q37@4ax.com>,
Tony Hill <hilla_nospam_20@yahoo.ca> wrote:

>It does, but the difference is small, usually less than 10% and often
>much closer to 0%.

No, it's not. The Opteron builds the best 4-cpu SMP system out there
according to the SPECrate2000 cpu benchmark, but in order to get that
best result, you need to pin the individual processes to cpus and
memory using a utility. Without it, the performance is no longer the
best. So people really care about that last bit of performance.

Now I don't have the directly comparison for that, but here's a
comparison on some benchmarks for a recent competitive bid. "Slow" is
a system without the processor binding and with "node interleave"
turned on. "Fast" is with processor binding and node interleave off,
which lets the processor binding have the best benefit. Note that it's
only a trivial amount of work to get this improvement for a serial
code, so this is a common situation, although these benchmarks are, of
course, particular to this scientific-computing customer. In these
results, the comparison is scaling for 4 processes on a 4 cpu machine.
4.0 would be a perfect score.

fast slow difference
benchmark 1 3.71 3.03 + 22 %
benchmark 2 3.76 3.29 + 14 %
benchmark 3 3.78 3.26 + 16 %
benchmark 4 3.79 3.45 + 10 %
benchmark 5 3.92 3.89 + 1 %
benchmark 6 3.88 3.71 + 5 %

These benchmarks were run with the best Opteron compiler, so this
scaling improvement was very good to see. And it's bigger than
"usually less than 10%".

> When well over 90% of your memory access is coming
> from cache anyway and (assuming a totally random distribution in a
> strictly UMA setup) 50% of your memory access is going to be local,
> most of the performance difference is lost in the noise.

Handwaving is a bad way to evaluate effects like this.

>I've said it before and I'll say it again: Hardware is cheap,
>software is expensive. It would be a true disservice to your
>customers to tell them to spend thousands upon thousands of dollars
>changing all their software for the small improvement in performance
>equal to a few hundred dollars of hardware costs.

Customers know what 10% or 20% more performance means, as do vendors
who are doing competitive bidding. The fact that I care a lot about
this should give you a clue. And in some cases, such as serial codes,
the benefits are easy to achieve. It took only a moderate amount of
work in our OpenMP compiler and runtime to get these benefits for some
parallel programs, too. Well worth it to our customers.

-- greg
speaking for myself, not PathScale
Anonymous
a b à CPUs
a b α HP
December 6, 2004 9:53:09 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On 05 Dec 2004 19:47:30 GMT, mailer-daemon@bof.de (Patrick Schaaf)
wrote:

>Tony Hill <hilla_nospam_20@yahoo.ca> writes:
>
>>Not that they know something the rest of the world doesn't, just that
>>they have access to processors that most of us do not. IBM sells them
>>as well, but for the time being Intel will ONLY sell them for use in
>>servers. Why? I really don't know. Maybe it's just a bit too much
>>crow for them to eat after saying (only a bit over a year ago) that
>>64-bit wouldn't be useful for the desktop until the end of the year?
>
>How much does Intel stockpile? Could it be that they have warehouses
>full of already produced non-64-bit processors, and those want to be
>sold at the projected prices, not thrown away?

ALL of the "Prescott" and "Nocona" cores are 64-bit capable excluding
those that would pass a validation as 32-bit chips but fail as 64-bit
chips, but such chips would be rather few and far between. It could
be that Intel still has a reasonable amount of inventory of their old
"Northwood" P4 chips and they want to clear those out first, but that
certainly doesn't seem to be the case looking at Intel's pricing
structure and what is being sold by the major OEMs (Intel seems to be
pushing Prescott VERY hard here).

Long story short, I'm not quite sure what the actual answer is, but
excessive inventory of 32-bit chips doesn't seem to make sense from
what I've seen.

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca
Anonymous
a b à CPUs
a b α HP
December 6, 2004 9:53:09 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On 6 Dec 2004 00:30:48 GMT, ammonton@cc.full.stop.helsinki.fi wrote:

>In comp.arch Tony Hill <hilla_nospam_20@yahoo.ca> wrote:
>
>> Not that they know something the rest of the world doesn't, just that
>> they have access to processors that most of us do not. IBM sells them
>> as well, but for the time being Intel will ONLY sell them for use in
>> servers. Why? I really don't know.
>
>FWIW, Dell are shipping EM64T-equipped non-Xeon P4 workstations (the
>Precision 370).

Ahh, thanks. When I first wrote the above I had actually included
Dell's name as well, but then removed it when I couldn't find any
EM64T P4 processors in any of their servers (didn't think to check
workstations first). I figured that if anyone was selling 64-bit P4s
it would be Dell!

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca
Anonymous
a b à CPUs
a b α HP
December 6, 2004 12:51:33 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips (More info?)

"Greg Lindahl" <lindahl@pbm.com> wrote in message
news:41b45512$1@news.meer.net...
> ...
> These benchmarks were run with the best Opteron compiler
> ...

Visual C?

:-)

Thanks,
Eugene
Anonymous
a b à CPUs
a b α HP
December 6, 2004 5:05:24 PM

Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips (More info?)

Greg Lindahl wrote:

> These benchmarks were run with the best Opteron compiler [...]

Which compiler would that be? PathScale?

:-)
Anonymous
a b à CPUs
a b α HP
December 6, 2004 9:36:00 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

keith wrote:
> I'd say that because in small systems (less than 8 CPUs), Opterons are
> coherent in hardware thus sufficiently tightly coupled to be called UMA,
> as far as the user is concerned.

Yes, exactly my point, it's more or less UMA in the upto 8 processor
range. After that, then you can start thinking of it as NUMA. But having
upto 8 processors being treated as UMA is quite a lot.

Yousuf Khan
Anonymous
a b à CPUs
a b α HP
December 6, 2004 9:39:46 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

George Macdonald wrote:
> On Sun, 05 Dec 2004 17:37:16 GMT, Rob Stow <rob.stow.nospam@shaw.ca> wrote:
>
>
>>George Macdonald wrote:
>>
>>>On Sun, 05 Dec 2004 01:02:11 -0500, Yousuf Khan <bbbl67@ezrs.com> wrote:
>>>
>>>
>>>
>>>>I found this whitepaper from HP to be pretty good, it is surprisingly
>>>>candid, considering HP was the coinventor of the Itanium. It does a
>>>>pretty good job of explaining and summarizing the similarities and
>>>>differences between AMD64 and EM64T, and their comparison to the
>>>>Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
>>>>compatible", but IA64 is a different animal altogether.
>>>>
>>>> Yousuf Khan
>>>>
>>>>http://h200001.www2.hp.com/bc/docs/support/SupportManua...
>>>
>>>
>>>Hmm and the following quote: "However, the latency difference between local
>>>and remote accesses is actually very small because the memory controller is
>>>integrated into and operates at the core speed of the processor, and
>>>because of the fast interconnect between processors." is relevant to
>>>another discussion here. I wish we could get a firm answer on this one.
>>>
>>
>>Not sure if this is exactly what you are looking for in the
>>way of a "firm answer", but the latencies in a Opteron system are:
>>
>>0 hops 80 ns uniprocessor (Local access)
>> 100 ns multiprocessor (Local access, with cache snooping on other processors)
>>1 hop 115 ns
>>2 hops 150 ns
>>3 hops 190 ns
>>
>>I couldn't find my original source for those numbers, and
>>the two and three hop numbers above are a little higher
>>than I remembered them as being. This time around I got
>>them from this thread:
>>http://www.aceshardware.com/forum?read=80030960
>>
>>That thread refers to this article:
>> http://www.digit-life.com/articles2/amd-hammer-family/
>>which gives slightly different numbers for a 2 GHz Opteron
>>with DDR333:
>> Uni-processor system: 45 ns
>> Dual-processor system: 0-hop - 69 ns, 1-hop - 117 ns.
>> Four-processor system: 0-hop - 100 ns, 1-hop - 118 ns, 2-hop - 136 ns.
>>
>>
>>I don't know if any of the numbers above are for cache misses
>>or if they are averages that include both hits and misses.
>
>
> Thanks for the data but no I guess I should have highlighted better what I
> was getting at: "the memory controller is integrated into and operates at
> the core speed of the processor", which is what was being
> discussed/disputed in another thread.
>
> I haven't been able to find any hard data from AMD on where the clock
> domain boundaries are in the Opteron/Athlon64 but if the memory controller
> is not operating at "core speed" it's now at the stage of Internet
> Folklore.

Ah, that one is much easier to answer. ;-)

Straight from the horse's mouth:
http://www.amd.com/us-en/Processors/ProductInformation/...

"By running at the processor’s core frequency, an integrated
memory controller greatly increases bandwidth directly available
to the processor at significantly reduced latencies."
Anonymous
a b à CPUs
a b α HP
December 6, 2004 10:19:44 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

In article <gbe8r0pp5ip0dl4d58fvktklbuu35442it@4ax.com>, Tony Hill wrote:
> It could
> be that Intel still has a reasonable amount of inventory of their old
> "Northwood" P4 chips and they want to clear those out first, but that
> certainly doesn't seem to be the case looking at Intel's pricing
> structure and what is being sold by the major OEMs (Intel seems to be
> pushing Prescott VERY hard here).

A friend recently (1 month ago IIRC) wanted a Northwood for his DIY
computer, but he found that none of the usual suspects around here had
them in stock. Eventually he called the importer, who said that
they're out of stock and they're not getting anymore either, buy a
Prescott instead.

> Long story short, I'm not quite sure what the actual answer is, but
> excessive inventory of 32-bit chips doesn't seem to make sense from
> what I've seen.

Considering the rate chips depreciate I guess manufacturers think
pretty hard about what they can do to minimize inventory.


--
Janne Blomqvist
Anonymous
a b à CPUs
a b α HP
December 6, 2004 10:30:08 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Mon, 06 Dec 2004 18:39:46 GMT, Rob Stow <rob.stow.nospam@shaw.ca> wrote:

>George Macdonald wrote:
>> On Sun, 05 Dec 2004 17:37:16 GMT, Rob Stow <rob.stow.nospam@shaw.ca> wrote:
<<snip>>

>> Thanks for the data but no I guess I should have highlighted better what I
>> was getting at: "the memory controller is integrated into and operates at
>> the core speed of the processor", which is what was being
>> discussed/disputed in another thread.
>>
>> I haven't been able to find any hard data from AMD on where the clock
>> domain boundaries are in the Opteron/Athlon64 but if the memory controller
>> is not operating at "core speed" it's now at the stage of Internet
>> Folklore.
>
>Ah, that one is much easier to answer. ;-)
>
>Straight from the horse's mouth:
>http://www.amd.com/us-en/Processors/ProductInformation/...
>
> "By running at the processor’s core frequency, an integrated
> memory controller greatly increases bandwidth directly available
> to the processor at significantly reduced latencies."

Ah so there we have it... assuming this has been approved by the technical
folks.:-) BTW I notice that AMD seems to cutting back on the depth of info
in their technical docs - the Product Data Sheets now consist of one
page... a far cry from the excruciating detail on cache operation etc. we
used to get.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
December 6, 2004 11:37:07 PM

Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips (More info?)

On Sun, 05 Dec 2004 23:29:08 -0800, Greg Lindahl wrote:

> In article <8c97r0hqh2sqf8sh89ut3153lpdmddfs76@4ax.com>,
> George Macdonald <fammacd=!SPAM^nothanks@tellurian.com> wrote:
>
>>I haven't been able to find any hard data from AMD on where the clock
>>domain boundaries are in the Opteron/Athlon64 but if the memory controller
>>is not operating at "core speed" it's now at the stage of Internet
>>Folklore.
>
> Note that the STREAM bandwidth and lmbench latency changes with every
> cpuspeedbump. So clearly part of the memory controller is at the cpu
> core frequency, or a related frequency, and not at the HT frequency,
> or the SDRAM external bus frequency.

That does *not* mean that the memory corntoller runs at the core speed.
>It would be nuts to assume such. Would you assume the cashes of the
>PII run at the the I/O bus speed?


> Please reduce the cross-post. Followups set to a group I read.

Isn't his a rather egotistical statement? "I don't read other
groups, so no one else matters!" Hint: Others are reading this thread
from other groups! It's posted to *three* related groups (hardly a breech
of USENET protocol).

--
Keith
Anonymous
a b à CPUs
a b α HP
December 7, 2004 5:04:47 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Sun, 05 Dec 2004 01:02:11 -0500, Yousuf Khan <bbbl67@ezrs.com> wrote,
in part:

>I found this whitepaper from HP to be pretty good, it is surprisingly
>candid, considering HP was the coinventor of the Itanium. It does a
>pretty good job of explaining and summarizing the similarities and
>differences between AMD64 and EM64T, and their comparison to the
>Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
>compatible", but IA64 is a different animal altogether.

>http://h200001.www2.hp.com/bc/docs/support/SupportManua...

I would have preferred if you had given the URL of a page with a *link*
on it to this manual. That would make it easier to back-navigate for
other items of related interest, and it would have meant that the manual
could be downloaded with a right-click without waiting for the browser
plug-in to display the whole manual.

On page 13, under the heading "Power Considerations", I noticed a real
whopper. Or, at least, what _seemed_ to me to be a real whopper
initially.

It is true that for a given implementation, a higher clock speed means
more power consumption. It takes more power to make gates switch faster.

However, if a higher clock speed is obtained by splitting the pipeline
into more itty-bitty pieces, for the same level of instruction latency,
then one still has the same number of gates, each consuming the same
amount of power. (Except for the overhead of the pipelining process...
and one more thing to be noted later.)

What is the point of splitting up a pipeline into smaller pieces? Is it
to put more megahertz in the ad copy? No, it is so that more
instructions can be executing, in different stages, at once. (Which
means that a Pentium IV ought to have explicit vector instructions. Yes,
it has a separate instruction cache and data cache, but there's still
only one bus to *main memory*, and caches do have to get filled from
somewhere.)

Since CMOS gates only consume power when they are changing state, unused
elements of a non-pipelined ALU are not consuming power, so it may well
be that a 14-stage pipelined ALU can consume twice as much power as a
7-stage pipelined ALU.

But that will be because twice as much of it is in use, not because it
is going "twice as fast".

Since they are still sort of right, even if for the wrong reason,
perhaps all I am criticizing is an oversimplification here. But I think
that this can lead to a profound misconception of how microprocessors
work.

John Savard
http://home.ecn.ab.ca/~jsavard/index.html
Anonymous
a b à CPUs
a b α HP
December 7, 2004 5:04:48 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

"John Savard" <jsavard@excxn.aNOSPAMb.cdn.invalid> wrote in message
news:41b50db1.4580547@news.ecn.ab.ca...
> On Sun, 05 Dec 2004 01:02:11 -0500, Yousuf Khan <bbbl67@ezrs.com>
wrote,
> in part:

OK, I wont trim the wonderful newsgroup list, all of whose readers are
breathlessly awaiting my imortal prose....
>
> >I found this whitepaper from HP to be pretty good, it is surprisingly
> >candid, considering HP was the coinventor of the Itanium. It does a
> >pretty good job of explaining and summarizing the similarities and
> >differences between AMD64 and EM64T, and their comparison to the
> >Itanium's IA64 instruction set. AMD64 and EM64T are "broadly
> >compatible", but IA64 is a different animal altogether.
>
>
>http://h200001.www2.hp.com/bc/docs/support/SupportManua...
38028.pdf
>
> I would have preferred if you had given the URL of a page with a
*link*
> on it to this manual. That would make it easier to back-navigate for
> other items of related interest, and it would have meant that the
manual
> could be downloaded with a right-click without waiting for the browser
> plug-in to display the whole manual.

What braindamaged newsreader are you using that won't let you right
click the link in the newsreader? Even OE does that. So quit whining
and switch to a decent newsreader.
>
> On page 13, under the heading "Power Considerations", I noticed a real
> whopper. Or, at least, what _seemed_ to me to be a real whopper
> initially.
>
> It is true that for a given implementation, a higher clock speed means
> more power consumption. It takes more power to make gates switch
faster.
>
Probably referring to that esoteric equation P= (sf)*.5*C*V**2 which you
may have encountered. Or perhaps I=Cdv/dt.

> However, if a higher clock speed is obtained by splitting the pipeline
> into more itty-bitty pieces, for the same level of instruction
latency,
> then one still has the same number of gates, each consuming the same
> amount of power. (Except for the overhead of the pipelining process...
> and one more thing to be noted later.)
If one adds pipe stages one has more gates and more latches and more
clock drivers. And the power per gate goes up because of the higher
frequency.
>
> What is the point of splitting up a pipeline into smaller pieces? Is
it
> to put more megahertz in the ad copy? No, it is so that more
> instructions can be executing, in different stages, at once. (Which
> means that a Pentium IV ought to have explicit vector instructions.
Yes,
> it has a separate instruction cache and data cache, but there's still
> only one bus to *main memory*, and caches do have to get filled from
> somewhere.)

Actually one reason for intel to "superpipeline" was to jack up the freq
for the ad copy.
You lost me with the "Pentium IV ought to have explicit vector
instructions" leap.
>
> Since CMOS gates only consume power when they are changing state,
unused
> elements of a non-pipelined ALU are not consuming power, so it may
well
> be that a 14-stage pipelined ALU can consume twice as much power as a
> 7-stage pipelined ALU.
Or maybe 4 times, if the freq is double.
>
> But that will be because twice as much of it is in use, not because it
> is going "twice as fast".

Clearly they are using "twice as fast" to mean "double the frequency".
Why do you find that so hard to understand?
>
> Since they are still sort of right, even if for the wrong reason,
> perhaps all I am criticizing is an oversimplification here. But I
think
> that this can lead to a profound misconception of how microprocessors
> work.

What ARE you talking about?
>
> John Savard
> http://home.ecn.ab.ca/~jsavard/index.html

Del Cecchi.
Anonymous
a b à CPUs
a b α HP
December 7, 2004 5:15:47 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

George Macdonald wrote:
> On Mon, 06 Dec 2004 18:39:46 GMT, Rob Stow <rob.stow.nospam@shaw.ca> wrote:
>
>
>>George Macdonald wrote:
>>
>>>On Sun, 05 Dec 2004 17:37:16 GMT, Rob Stow <rob.stow.nospam@shaw.ca> wrote:
>
> <<snip>>
>
>>>Thanks for the data but no I guess I should have highlighted better what I
>>>was getting at: "the memory controller is integrated into and operates at
>>>the core speed of the processor", which is what was being
>>>discussed/disputed in another thread.
>>>
>>>I haven't been able to find any hard data from AMD on where the clock
>>>domain boundaries are in the Opteron/Athlon64 but if the memory controller
>>>is not operating at "core speed" it's now at the stage of Internet
>>>Folklore.
>>
>>Ah, that one is much easier to answer. ;-)
>>
>>Straight from the horse's mouth:
>>http://www.amd.com/us-en/Processors/ProductInformation/...
>>
>> "By running at the processor’s core frequency, an integrated
>> memory controller greatly increases bandwidth directly available
>> to the processor at significantly reduced latencies."
>
>
> Ah so there we have it... assuming this has been approved by the technical
> folks.:-) BTW I notice that AMD seems to cutting back on the depth of info
> in their technical docs - the Product Data Sheets now consist of one
> page... a far cry from the excruciating detail on cache operation etc. we
> used to get.

The "Product Data Sheets" are indeed so brief as to be
virtually useless, but there is still a wealth of PDFs
that provide details about just about everything.

The useless Product Data Sheet heads the list of
"AMD Opteron™ Processor Tech Docs" at
http://www.amd.com/us-en/Processors/TechnicalResources/...
but the other PDFs there have mind numbing details about
every little thing that does not give away trade secrets.
For example, read the "BIOS and Kernel Developer's Guide
for AMD Athlon™ 64 and AMD Opteron™ Processors".
Anonymous
a b à CPUs
a b α HP
December 7, 2004 12:56:44 PM

Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips (More info?)

In article <pan.2004.12.07.01.37.06.417847@att.bizzzz>,
keith <krw@att.bizzzz> wrote:

>> Note that the STREAM bandwidth and lmbench latency changes with every
>> cpuspeedbump. So clearly part of the memory controller is at the cpu
>> core frequency, or a related frequency, and not at the HT frequency,
>> or the SDRAM external bus frequency.
>
>That does *not* mean that the memory corntoller runs at the core speed.
>>It would be nuts to assume such. Would you assume the cashes of the
>>PII run at the the I/O bus speed?

"or a related frequency", i.e. based on the cpu frequency with a
constant divider.

>> Please reduce the cross-post. Followups set to a group I read.
>
>Isn't his a rather egotistical statement?

No, it follows Usenet tradition: post only to groups that you read.

But thanks for giving me the benefit of the doubt.

-- greg
Anonymous
a b à CPUs
a b α HP
December 7, 2004 1:34:02 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips (More info?)

Eugene Nalimov wrote:

> Greg Lindahl wrote:
>
>> These benchmarks were run with the best Opteron compiler
>
> Visual C?
>
> :-)

Maybe he meant GCC!

;-)
Anonymous
a b à CPUs
a b α HP
December 7, 2004 6:09:53 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

Del Cecchi wrote:

> What braindamaged newsreader are you using that won't let you right
> click the link in the newsreader? Even OE does that. So quit whining
> and switch to a decent newsreader.

Speaking of brain-damaged newsreaders, take a look at the mess yours
did when you quoted John's message. I rest my case.
Anonymous
a b à CPUs
a b α HP
December 7, 2004 6:09:54 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

"Grumble" <devnull@kma.eu.org> wrote in message
news:cp4djh$vdh$1@news-rocq.inria.fr...
> Del Cecchi wrote:
>
> > What braindamaged newsreader are you using that won't let you right
> > click the link in the newsreader? Even OE does that. So quit whining
> > and switch to a decent newsreader.
>
> Speaking of brain-damaged newsreaders, take a look at the mess yours
> did when you quoted John's message. I rest my case.

A few lines got wrapped. That what you are talking about?

del
Anonymous
a b à CPUs
a b α HP
December 7, 2004 7:19:26 PM

Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips (More info?)

Del Cecchi wrote:

> Grumble wrote:
>
>> Del Cecchi wrote:
>>
>>> What braindamaged newsreader are you using that won't let you
>>> right click the link in the newsreader? Even OE does that.
>>> So quit whining and switch to a decent newsreader.
>>
>> Speaking of brain-damaged newsreaders, take a look at the mess
>> yours did when you quoted John's message. I rest my case.
>
> A few lines got wrapped. That what you are talking about?

Yessir!

Perhaps OE-QuoteFix might help if you must use OE?
Anonymous
a b à CPUs
a b α HP
December 7, 2004 11:11:52 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

In article <41B578C0.1000400@sgi.com>, Michael Woodacre wrote:
> Another example would be making sure that people understand that when
> Opteron goes dual core, unless you double the memory bandwidth
> available, you effectively cut the bandwidth per core in half. This will
> impact some workloads quite dramatically. Has AMD made public statements
> about supporting higher local bandwidth for the dual core chip?

No public statements that I know of, but there are rumors that the
90nm Opterons, due Real Soon Now, will support DDR2 in addition to
plain old DDR. See e.g.

http://www.xbitlabs.com/news/cpu/display/20040212022200...

By the time dual core Opterons arrive, I suspect that DDR2-800 will
also be available, thus providing twice the memory BW compared to the
current single core offerings using DDR-400.


--
Janne Blomqvist
Anonymous
a b à CPUs
a b α HP
December 8, 2004 3:01:59 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Mon, 6 Dec 2004 20:16:21 -0600, "del cecchi" <dcecchi.nojunk@att.net>
wrote, in part:

>What braindamaged newsreader are you using that won't let you right
>click the link in the newsreader?

Clicking on the link in the newsreader, supposing I could do that, would
simply cause the link to open in a browser window. Which is exactly what
I achieved by cutting and pasting.

Maybe some newsreaders do allow right-clicking links. Such newsreaders
would probably also do dangerous and reckless things like rendering HTML
posts instead of displaying them in all their <angle bracket> glory.

This could result in having a brain-damaged computer, were I to view the
wrong post by accident.

As the posting in question was a text posting, this means that the
newsreader would have to guess at what constituted an URL, as well, with
no doubt occasional hilarious results.

John Savard
http://home.ecn.ab.ca/~jsavard/index.html
Anonymous
a b à CPUs
a b α HP
December 8, 2004 3:19:59 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On 06 Dec 2004 14:12:20 +0100, Per Ekman <pek@pdc.kth.se> wrote:

>Tony Hill <hilla_nospam_20@yahoo.ca> writes:
>
>> It does, but the difference is small, usually less than 10% and often
>> much closer to 0%.
>
>And sometimes 50%...

Sure, there will be extreme cases in everything.

>> Most users don't use their computer to run STREAM though. Even in the
>> HPC community where memory bandwidth is king, STREAM is still a rather
>> extreme case.
>
>I admit I'm from the HPC-sector and memory bandwidth is very important
>to many applications here.

One thing that you need to keep in mind is that you represent a VERY
small minority here in terms of PC server sales. Just because it
matters to your application probably doesn't have much reference to
the bulk of the buying public, and it almost certainly isn't going to
have implications for what the marketing people write in the trade
rags.

>> Besides, they do recognize that it is NUMA, just that they are saying
>> you don't NEED to worry about that if you don't want to because for
>> the vast majority of times the performance difference is lost in the
>> noise.
>
>It's a pretty strange argument in my eyes, "If you ignore the
>applications that run poorly because of property X, then it makes
>sense to downplay property X." True, but not helpful if you have such
>an application.

Ahh, but it's VERY helpful if you're in the marketing department! :>

In the end, the people that are going to take a performance due to
lack of NUMA optimizations probably already know as much and have
factored it into their buying decisions. The people who are talking
to Dell or HPaq's server sales and are thinking about an Opteron
system but are worried that this here NoooMah thingy might cause their
application to run slow most likely don't have to worry about much.
Hence SUMO.

It's all a matter of perspective.

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca
Anonymous
a b à CPUs
a b α HP
December 8, 2004 11:09:03 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Wed, 08 Dec 2004 00:19:59 -0500, Tony Hill <hilla_nospam_20@yahoo.ca>
wrote:

>On 06 Dec 2004 14:12:20 +0100, Per Ekman <pek@pdc.kth.se> wrote:
>
>>Tony Hill <hilla_nospam_20@yahoo.ca> writes:
>>
>>> It does, but the difference is small, usually less than 10% and often
>>> much closer to 0%.
>>
>>And sometimes 50%...
>
>Sure, there will be extreme cases in everything.
>
>>> Most users don't use their computer to run STREAM though. Even in the
>>> HPC community where memory bandwidth is king, STREAM is still a rather
>>> extreme case.
>>
>>I admit I'm from the HPC-sector and memory bandwidth is very important
>>to many applications here.
>
>One thing that you need to keep in mind is that you represent a VERY
>small minority here in terms of PC server sales. Just because it
>matters to your application probably doesn't have much reference to
>the bulk of the buying public, and it almost certainly isn't going to
>have implications for what the marketing people write in the trade
>rags.

I think you're underestimating the size of the "workstation" market, which
will include people finding they can migrate down to PC-grade CPUs to
replace old "higher power" systems as well as people on the lower-end
fringe who may have grown their problem complexity beyond a uni-PC, or who
*could* get by with a fastish PC but like the comfort of the move up to
dual for future growth. Add them to the current established base of CAD,
engineering and modeling etc. applications and there is a decent sized
market.

There are a lot of mathematical/engineering problems out there which are
just part of everyday business computing - many *used* to be considered HPC
and are now quite routine on desktop sized boxes. In many cases,
proprietary (purchased) software is used and the algorithmic methods are
only understood fairly superficially by the user; what that user wants is
response, whether it's measured in minutes, hours or a day or more. The
software vendor thus feels responsible for supplying the best combination
of software and recommended hardware selection.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
December 9, 2004 2:43:53 AM

Archived from groups: comp.arch,comp.sys.ibm.pc.hardware.chips (More info?)

On Tue, 07 Dec 2004 09:56:44 -0800, Greg Lindahl wrote:

> In article <pan.2004.12.07.01.37.06.417847@att.bizzzz>,
> keith <krw@att.bizzzz> wrote:
>
>>> Note that the STREAM bandwidth and lmbench latency changes with every
>>> cpuspeedbump. So clearly part of the memory controller is at the cpu
>>> core frequency, or a related frequency, and not at the HT frequency,
>>> or the SDRAM external bus frequency.
>>
>>That does *not* mean that the memory corntoller runs at the core speed.
>>>It would be nuts to assume such. Would you assume the cashes of the
>>>PII run at the the I/O bus speed?
>
> "or a related frequency", i.e. based on the cpu frequency with a
> constant divider.

Ok, how many "unrelated frequencies" are there in a CPU? Let's get real
here.

>>> Please reduce the cross-post. Followups set to a group I read.
>>
>>Isn't his a rather egotistical statement?
>
> No, it follows Usenet tradition: post only to groups that you read.

No, that is *not* Usenet tradition. The tradition is to limit
cross-postings to on-topic newsgroups. Cross-posting is not expensive
(unless you have a dran-bamaged newsreader).

> But thanks for giving me the benefit of the doubt.

Cutting off your audience, particularly those who *you* have responded to
is rude. Sorry if I've ruffled your feathers!

--
Keith
Anonymous
a b à CPUs
a b α HP
December 9, 2004 2:15:30 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

David Schwartz wrote:
> The scaling advantage comes largely from the architecture of a single
> processor. The memory controller is on the chip. The main reason this
> matters is that it means that local memory accesses don't have to content
> with any other inter-CPU or I/O traffic.

That's only partly true. The Opterons still talk to each other even on local
accesses (coherency tokens only, no real data transfer). This takes both
time and adds to the traffic, since such a token needs to get everywhere.

What's missing here is a "exclusive" bit in the page table, for non-coherent
pages. The OS pretty well knows (or can know) which core is accessing a
page, and for a page that's not shared, the coherency token is not
necessary.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/
Anonymous
a b à CPUs
a b α HP
December 13, 2004 9:58:07 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

In comp.arch David Schwartz <davids@webmaster.com> wrote:
> In typical Opteron setups (2-8 CPUs, using the Opteron's build
> in SMP hardware), the latency difference between local and remote
> memory accesses is so small that the benefits of treating it as NUMA
> are typically outweighed by the costs.

SPECweb99_SSL is probably atypical then (Yes, one of my favorite
benchmarks :)  - the evolution of the tunes for Opteron systems on that
benchmark show the size of the Zeus tuanble "cache_small_file"
increasing to 90000 bytes. That brings many more of the URLs into the
"malloc" cache of Zeus where they are replicated per Zeus instance and
in this case then per-CPU (things being bound to CPUs) "Normal"
practice is to have cache_small_file be "NBPG"/numCPU to optimize the
memory comsumption.

It all depends of course:)  Maybe that wasn't done for latency but to
cut-down the bandwidth consumed. Who knows - although I am interested
in trying to find-out :) 

> Generally, you just distribute the memory evenly and interleaved on
> the nodes (if you can) to avoid overloading one memory controller
> channel.

FWIW, I've noticed that Node interleave is (or seems to be, it was set
that way on the first one I saw and had no indication from the source
that it had been altered) disabled by default on the Sun V20z's.
Anyone have data on how Node interleave defaults on other
Opteron-based systems?

rick jones
--
a wide gulf separates "what if" from "if only"
these opinions are mine, all mine; HP might not want them anyway... :) 
feel free to post, OR email to raj in cup.hp.com but NOT BOTH...
Anonymous
a b à CPUs
a b α HP
December 14, 2004 9:04:29 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

Rick Jones <foo@bar.baz.invalid> writes:

>
>FWIW, I've noticed that Node interleave is (or seems to be, it was set
>that way on the first one I saw and had no indication from the source
>that it had been altered) disabled by default on the Sun V20z's.
>Anyone have data on how Node interleave defaults on other
>Opteron-based systems?

It defaults to "off" on Penguin systems, too.

scott
Anonymous
a b à CPUs
a b α HP
December 14, 2004 10:46:45 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

Rick Jones <foo@bar.baz.invalid> writes:

> FWIW, I've noticed that Node interleave is (or seems to be, it was set
> that way on the first one I saw and had no indication from the source
> that it had been altered) disabled by default on the Sun V20z's.
> Anyone have data on how Node interleave defaults on other
> Opteron-based systems?

As far as I know it's disabled by default on most shipping Opteron
servers. Only a few build-it-yourself dual motherboards have it
enabled by default.

For Linux use i would recommend to always disable it. The modern
kernel can do page interleaving on demand (with numactl or libnuma),
which is nearly as good, and most programs seem to just prefer
good memory latency.

-Andi
Anonymous
a b à CPUs
a b α HP
December 15, 2004 6:27:12 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips (More info?)

lindahl@pbm.com (Greg Lindahl) writes:

> benchmark 1 3.71 3.03 + 22 %
> benchmark 2 3.76 3.29 + 14 %
> benchmark 3 3.78 3.26 + 16 %
> benchmark 4 3.79 3.45 + 10 %
> benchmark 5 3.92 3.89 + 1 %
> benchmark 6 3.88 3.71 + 5 %
>
> These benchmarks were run with the best Opteron compiler, so this
> scaling improvement was very good to see. And it's bigger than
> "usually less than 10%".

Averages out to 11 % .

Sounds like "usually less than 10%" may be right when talking about non scientific workloads.
Anonymous
a b à CPUs
a b α HP
December 15, 2004 6:35:43 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

jsavard@excxn.aNOSPAMb.cdn.invalid (John Savard) writes:

> As the posting in question was a text posting, this means that the
> newsreader would have to guess at what constituted an URL, as well, with
> no doubt occasional hilarious results.

Sorry, you dont make sense.
You really should get a decent newsreader.
December 15, 2004 6:35:44 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Wed, 15 Dec 2004 03:35:43 +0000, Israel T wrote:

> jsavard@excxn.aNOSPAMb.cdn.invalid (John Savard) writes:
>
>> As the posting in question was a text posting, this means that the
>> newsreader would have to guess at what constituted an URL, as well, with
>> no doubt occasional hilarious results.
>
> Sorry, you dont make sense.
> You really should get a decent newsreader.

Hmmm, I alwasy though Agent was fairly good. Perhaps yours can't show
headers? ...oh, another emacs bigot.

--
Keith
Anonymous
a b à CPUs
a b α HP
December 15, 2004 8:04:44 AM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

keith <krw@att.bizzzz> writes:

> Hmmm, I alwasy though Agent was fairly good. Perhaps yours can't show
> headers?

I used Agent for some years untill it's limitations became irritating.

>...oh, another emacs bigot.

It is a matter of using the right tool for the job.
Emac's mail/news sub-system, Gnus is superb.
Anonymous
a b à CPUs
a b α HP
December 15, 2004 7:22:22 PM

Archived from groups: comp.arch,comp.sys.intel,comp.sys.ibm.pc.hardware.chips,alt.comp.hardware.amd.x86-64 (More info?)

On Tue, 14 Dec 2004 23:41:04 -0500, keith <krw@att.bizzzz> wrote:

>On Wed, 15 Dec 2004 03:35:43 +0000, Israel T wrote:
>
>> jsavard@excxn.aNOSPAMb.cdn.invalid (John Savard) writes:
>>
>>> As the posting in question was a text posting, this means that the
>>> newsreader would have to guess at what constituted an URL, as well, with
>>> no doubt occasional hilarious results.
>>
>> Sorry, you dont make sense.
>> You really should get a decent newsreader.
>
>Hmmm, I alwasy though Agent was fairly good. Perhaps yours can't show
>headers? ...oh, another emacs bigot.

Well jsavard is using an *old* version of Free Agent but even the 1.93 I'm
using doesn't have a right click and "Save Link Target As.." I dunno what
the big deal is on either side here - copy/paste of a URL is always coming
up as a nuisance for file downloads, especially with the Adobe reader 6.0
being so damned slow to get started - the plugin has to load its err,
plugins to get started and then you also have to have it configured to turn
off "fast web view" to get the whole document without paging through the
bugger... all a royal PITA.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
!