Sign in with
Sign up | Sign in
Your question

RAM Latency, Bandwidth, and Benchmarking. Again!!

Tags:
Last response: in Memory
Share
May 22, 2002 2:36:07 PM

Here I am again bringing up the discussion of benchmarking memory. Is there any way to start a real discussion of how to benchmark memory architectures? Are there any methods that a acceptable to the computer technology community for benchmarking bandwidth, latency, and other aspects for main L1, L2, L3 and main memory?

Based on the IDC (<A HREF="http://www.idc.com" target="_new">International Data Corporation</A>), there is a new way to test supercomputers.

Here is the old way...

<A HREF="http://theregus.com/content/53/25003.html" target="_new">In the original rankings announced last year, IDC ranked the performance of single processors used in each HPC server or cluster using Linpack and SPEC benchmarks. Memory bandwidth for single nodes and the complete cluster were based on the STREAM benchmark. A scalability factor was then calculated based on the total processor count and the memory bandwidth of the full HPC cluster.</A>

The new Balanced Rating method sheds new light on how computers have been and now can be ranked.

Here is my question...

If STREAM, Linpak, and SPEC are used by the rest of the world today why can't we use these and others that are well used by the rest of the world?

There is an interest on testing memories for the basis of performance of the architectures themselves. Why can't we here at the THGC do some of that testing and figure it out as a whole.

Is it the fact that we here just like being lemmings, listening to and following whatever Anand, Dr. Tom and all the others say to us? For whatever reason there has been an obstruction to this open discussion, could all of those veterans of this community come together? Or would we here rather rant about how idiotic writers is and then post our view of their article?

Why can't we as a whole discuss what benchmarks and or procedures to use and then post our figures and discuss the outcome of those results? Is it really that hard to do?

Back to you...

Please feel free to post your thoughts and comments regarding this issue. Please make intelligent responses.

<b>"Sometimes you can't hear me because I'm talking in parenthesis" - Steven Wright</b> :lol: 
May 22, 2002 4:36:55 PM

Last time I tried this, apparently there was nobody in the forum with a compiler that could produce an executable of STREAM...

<font color=blue>Hi mom!</font color=blue>
May 23, 2002 1:02:46 PM

Maybe Ray could do it? Didn't he compile something for you before?

I think it is sad that you seem to be the only one other than me that is willing to put any effort into this. (For the others that do care, sorry. I said <i>seems</i>.)

<b>"Sometimes you can't hear me because I'm talking in parenthesis" - Steven Wright</b> :lol: 
Related resources
May 23, 2002 5:08:17 PM

Ray originally compiled Latency2. There should be several others, maybe I'll ask some friends to do it.

<font color=blue>Hi mom!</font color=blue>
May 23, 2002 6:47:34 PM

What are you going to have it compile for? I can probably get someone to do it but what makes sense to run it in?

<b>"Sometimes you can't hear me because I'm talking in parenthesis" - Steven Wright</b> :lol: 
May 23, 2002 9:23:12 PM

Windows or DOS. True DOS would have much less overhead than Windows and would produce more accurate results.

<font color=blue>Hi mom!</font color=blue>
May 24, 2002 3:55:35 PM

I havn't tried it yet in a "True DOS" scenario yet. I'll try it on my laptop while I am here at work and post the results of each way.

<b>"Sometimes you can't hear me because I'm talking in parenthesis" - Steven Wright</b> :lol: 
May 24, 2002 6:08:17 PM

I ran STREAM on my laptop and here are the results.

PIIIM @
647.2MHz

PC100 CL2 2-2-2
FSB 100MHz
Chipset is i440BX
MCH is 82443BX


Under Window98SE

C:\>STREAMD
DOS/4GW Protected Mode Run-time Version 1.97
Copyright (c) Rational Systems, Inc. 1990-1994

STREAM for DOS v2 by Dennis Lee
===============================
1 MB = 1000000 Bytes in the following measurements.

For accurate results, this benchmark should be executed
in a true DOS session, and not a DOS shell under another OS.

Time Operation Mem Speed Error
---- --------- --------- -----
1.92 sec COPY32 333.33 MB/s 3.2%
2.04 sec COPY64 313.73 MB/s 3.0%
2.03 sec SCALE 315.27 MB/s 3.0%
2.64 sec ADD 363.64 MB/s 2.3%
2.91 sec TRIAD 329.90 MB/s 2.1%

These results are comparable with those on the STREAM website.
See <http://www.cs.virginia.edu/stream&gt; for info on STREAM.

Type 'streamd ?' for help

C:\>


Under DOS...

C:\>STREAMD
DOS/4GW Protected Mode Run-time Version 1.97
Copyright (c) Rational Systems, Inc. 1990-1994

STREAM for DOS v2 by Dennis Lee
===============================
1 MB = 1000000 Bytes in the following measurements.

For accurate results, this benchmark should be executed
in a true DOS session, and not a DOS shell under another OS.

Time Operation Mem Speed Error
---- --------- --------- -----
1.98 sec COPY32 323.23 MB/s 3.1%
2.03 sec COPY64 315.27 MB/s 3.0%
2.09 sec SCALE 306.22 MB/s 3.0%
2.47 sec ADD 388.66 MB/s 2.5%
2.80 sec TRIAD 342.86 MB/s 2.2%

These results are comparable with those on the STREAM website.
See <http://www.cs.virginia.edu/stream&gt; for info on STREAM.

Type 'streamd ?' for help

C:\>


The DOS is not under windows. I wrote down my results and filled in the blanks.


So back to you FB.

<b>"Sometimes you can't hear me because I'm talking in parenthesis" - Steven Wright</b> :lol: 
May 25, 2002 4:32:50 PM

i440BX @ 112FSB / 2x PIII-800E @ 896Mhz / 512MB SDRAM @ 112MHz 2-2-2

Under Win2000 Dos Shell (Booting into DOS requires me to dig up my old Win98SE CD):

Time Operation Mem Speed Error
---- --------- --------- -----
1.70 sec COPY32 376.47 MB/s 3.7%
1.70 sec COPY64 376.47 MB/s 3.7%
1.76 sec SCALE 363.64 MB/s 3.5%
1.98 sec ADD 484.85 MB/s 3.1%
2.31 sec TRIAD 415.58 MB/s 2.7%

Using the "stream.exe" that someone compiled last month in the other thread:

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 10000000, Offset = 0
Total memory required = 228.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 999 microseconds.
Each test below will take on the order of 425999 microseconds.
(= 426 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 374.7073 0.4299 0.4270 0.4350
Scale: 375.5869 0.4299 0.4260 0.4360
Add: 458.8910 0.5263 0.5230 0.5350
Triad: 460.6526 0.5273 0.5210 0.5510

- JW
!