Archived from groups: alt.comp.periphs.mainboard.asus (
More info?)
Paul wrote:
> In article <313mofF367g34U1@uni-berlin.de>, "Johnny"
> <repro007@hotmail.com> wrote:
>
>> Paul wrote:
>>> I don't normally top post, but don't want to try to trim the
>>> rest of this down.
>>>
>>> Some random observations:
>>>
>>> 1) Could this be a Hyperthreading problem ? Is Hyperthreading
>>> disabled in the BIOS ? I don't know my Hyperthreading policy
>>> versus OS, but perhaps if you were quitting Passmark between
>>> runs, maybe the program is running on a different virtual
>>> processor each time, and one virtual processor has more load
>>> than the other. If you disable Hyperthreading in the BIOS,
>>> the perf difference might stop.
>>>
>>> In any case, Hyperthreading is not all it is cracked up to
>>> be. In some cases, it is a clear win, but in other cases it
>>> can trash the performance of the memory subsystem, and actually
>>> run slower than without it.
>>>
>> WOW!!! Before altering any voltages or settings, just running the
>> standard [auto] jumperless detection settings and simply setting CPU
>> hyperthreading [disabled] option, the results are now, well,
>> somewhat different!!
>> How thorough or accurate passmark is I know not but for purposes of
>> comparison it's useful. It's difficult to present the results in
>> here but the scores for example of the CPU suite of tests are as
>> follows in my attempt at a table (hope it comes out ok).
>>
>> cpu test hyperthreading [enabled] hyperthreading[disabled]
>>
>> integer math 170/246 varies 257 solid
>> floating p math 230 291
>> mmx 181 278
>> sse 131 164
>> compression 1319 1868
>> encryption 6.8 10.9
>> image rotation 113 195.9
>> string sorting 665 810
>>
>> CPU passmark 322 467
>> integer math
>>
>> I havent managed to get anything other than very close to the numbers
>> above with hyperthreading [disabled], it is solid. [disabled]
>> hyperthreading has also affected the memory test benchmark speeds,
>> presumably due to the increased CPU performance.
>>
>> all this before altering any voltages or any other settings, blimey!
>
> Does the memtest86 memory bandwidth indicator change as a function
> of the BIOS Hyperthreading setting ? It shouldn't. In any case, one
> thing that strikes me, is how negative an effect hyperthreading is
> having on your results.
>
Yeah me too - it's got me flumoxed this thing. The bandwidth indicators in
memtest are exactly the same with hyperthreading enabled and disabled. The
CPU temperature increases by 10C to 48-50C when hyperthreading is disabled
then drops back to 38-40C when I enable it.
If I select Turbo Mode in the BIOS settings the system still bombs out
despite the memory tweaks.
>>>
>>> 2) Increase Vdimm to the Corsair. DDR400 memory needs 2.6V to
>>> start with, and you may find bumping the memory voltage up
>>> a couple notches stops the errors. If the memory passes memtest86
>>> in an overnight test without errors, use Prime95 torture test
>>> in mixed mode, and see if it runs error free as well. I've had
>>> memory pass memtest86 and fail Prime95.
>>>
>>> 3) Look up your Corsair memory here:
>>>
>>>
http://corsairmicro.com/corsair/xms.html
>>>
>>> Click the link and download the datasheet. For example, 3200XL
>>> is rated for 2.75V and you could try that. The datasheet for
>>> 3200XL claims the SPD is loaded with 2-2-2-5, so it shouldn't
>>> start at 2.5-2-2 on its own. If this is some other memory,
>>> you may need to post in this forum, and get some help with
>>> your product - or search for someone having the same system
>>> as you've got:
>>>
>> The product is CMX512-3200XLPT listed on their site under
>> CMX512-3200XL and it clearly states 2.75V. Changing the voltage to
>> 2.75V has stopped the blackouts.
>>
>> For interest here are the passmark memory results before (but with
>> hyperthreading disabled) and after voltage change. The - configure
>> DRAM timing by speed option is [enabled] in bios
>>
>> test [auto] 2.75V[auto] 2.75V / 2.0-2-2-5
>>
>> allocate small block 1162.8 1163 1164.8
>> read cached 1390 1389.7 1389.9
>> read uncached 1326.6 1328.3 1328.8
>> write 809.4 809.7 809.4
>
> As the auto and manual setting seem to be doing the same thing, I
> think you can conclude that the SPD on the 3200XL is 2-2-2. You can
> play
> with the 5 number manually, as by calculation, the 5 number is
> supposed to be the sum of two of the other parameters plus 2 (four
> beats of
> DDR data taking 2 cycles). On an AMD system, raising that number to
> 10 is best, while on the P4, a lower value is better, but play with it
> a bit, and see what happens.
>
> In terms of memory bandwidth, your CTIAW and memtest86 bandwidth
> indicators are in the same ballpark as mine, so I don't think you
> are far off from optimal. Certainly, overclocking the memory will
> be the single biggest determinant of memory bandwidth, and the
> nice thing about the 3200XL, is you can play with it a bit. I think
> it can be pushed up to DDR500, at the expense of relaxing the timing
> numbers a bit. My Ballistix doesn't like that quite as much.
>
> These two documents describe some of the things you can do to
> optimize memory bandwidth. But with the Asus hack to enable PAT,
> the rules might be more like an 875 than an 865. The chips, after
> all, are the same die, but with different signals pinned out.
>
>
ftp://download.intel.com/design/chipsets/applnots/25273001.pdf (875P)
>
ftp://download.intel.com/design/chipsets/applnots/25303601.pdf (865PE)
>>
>> altering the dram burst timing between 4 and 8 clocks appeared to
>> make no difference in these tests. having memory acceleration
>> enabled gave the following 1165.4,1389.3, 1340.2, 810 so only read
>> uncached improved slightly but consistently.
>
> When the cache is enabled for a certain area of memory, the memory
> controller likes to fetch cache-line-sized chunks. That might be why
> normally, the 4 versus 8 setting doesn't make a difference. Perhaps
> the memory used by PCI cards for I/O is uncached ? I've left mine
> set at 4. (I think the cache line size is 64 bytes, and with dual
> channel memory, 16 bytes are transferred per beat, so the 4 setting
> would be right for it. If you were in single channel mode, perhaps
> 8 would be the right setting, times 8 bytes per beat.)
>
>>
>> **** INTEL/AMD/VIA memory config info, c't/Andreas Stiller V2.7
>> June 03
>> ****
>> Kernel Driver: WinNT DIRECTNT.SYS V01.09
>> Pentium 4,(0F34-00)ca 3274 MHz (sleep) 2999 MHz (load)
>> Bus Speed: max=200MHz, ratio=15 => 200 MHz
>> Hostdevice: (2570) Springdale i865 MCH, Vendor: (8086) Intel,
>> Rev:0002h
>> ----------------------------------------------------------------
>> Intel Springdale i865 MCH Rev:02: Bus:0, Device-Nr:0, Function:0
>> System Frequency : FSB533/133 MHz
>> Memory Frequency : DDR266/133 MHz (1:1)
>> IOQ Depth : 12 deep
>> Top of usable Memory : 1024.0 MByte
>> Extended SMRAM (Tseg) : disabled
>> Overflowdevice : disabled and unlocked, ID= 2576h,
>> Rev: 2 Memory Delays Base Address : FECF0000 not prefetchable
>> CPU Parking : disabled
>> Memory : row0: 512 MByte/16 KB Pages
>> : row1: 512 MByte/16 KB Pages
>> DRAM-Channels : Dual Channel Linear, DDR
>> ECC & Refresh : Non-ECC, Refresh=7.8 µs
>> PAT-mode : (1) fully enabled
>> Active to Precharge Delay : 5 clocks .. 70 µs
>> Tcl - Trcd -Trp : 2-2-2 T (DRAM Clocks)
>>
>> Memory Read Bandwidth : ca. 5780.5 MBytes/s, Cacheline size=
>> 64 >> go on with CR
>>
>>
>>>
>>>
http://www.houseofhelp.com/forums/forumdisplay.php?forumid=128
>>>
>>> 4) CTIAW and memtest86 disagree on your PAT setting. I don't know
>>> what to make of that.
>>>
>>> 5) There is a possible reason for CTIAW mis-reporting the bus
>>> speed. An 865PE Northbridge is not supposed to have PAT, but
>>> Asus and others use a trick to enable it. The processor has
>>> two signals called BSEL, and they indicate the bus speed rating
>>> of the processor (400, 533, 800 etc). The BSEL signals are
>>> normally routed from the processor to the Northbridge and to
>>> the clockgen. What Asus did, is they disconnected that link.
>>> Asus sends a fake value of BSEL to the Northbridge - I think
>>> if the FSB is set to 533, PAT is enabled, so by sending the
>>> 533 bit pattern to the Northbridge, but setting the clockgen
>>> to 800, PAT is enabled, and the memory can run at DDR400, just
>>> like on an 875P Northbridge. I think what CTIAW could be doing,
>>> is reading the Northbridge register, instead of checking the
>>> clockgen. This trick is great for fooling the hardware, but
>>> software authors have to be aware of the trick too, to get
>>> the info right.
>>>
>>> 6) I dug up some benchmarks you can try. Maybe these will be
>>> reproducible from run to run.
>>>
>>> http://www.super-computing.org/
>>>
ftp://pi.super-computing.org/windows/super_pi.zip
>>>
>>> Super_pi computes PI, and you select the number of digits from
>>> the menu. You double click the .exe, to run a Windows dialog.
>>> Select the number of digits to calculate and then run it.
>>> I just ran 1 million digits, and it takes 48 seconds
>>> on my 2.8C with 2x512MB 2-2-2-6 memory. I did two test runs and
>>> they had exactly the same test time. A file is created in the
>>> install directory with the results of the calculation.
>>> The test time and the amount of memory used increase
>>> with the digits setting. Some people use the 32M setting
>>> as a stability test for new motherboards.
>>>
>> 44 seconds with hyper threading [disabled]
>> 53 seconds with hyper threading [enabled]
>>
>> as you say this test is consistent
>
> I just don't understand why your results are being hammered
> so bad by Hyperthreading. The OS cannot be taking up that much
> memory bandwidth in the background. And, since your processor
> has a 1MB cache, it shouldn't be measurably thrashing the cache
> either. I wonder if Windows is actually using the whole
> cache ? I remember reading a while back, about a situation where
> Windows needed to be manually adjusted to use the whole cache
> (back in the P3 era). Something still isn't right here.
>
>>>
>>> Here is a second test:
>>>
>>> This is some kind of finite element analysis. It was
>>> posted by the author a while back. It uses a good chunk
>>> of memory, and judging by the CPU heating, is not memory
>>> bound, but does a fair amount of computing. To use it,
>>> unzip the file, fire up a MSDOS window, cd to the unzipped
>>> directory, then type "now" into the MSDOS window, to execute
>>> now.bat . After it reaches "step 992", it will finish, and
>>> print the number of "MUPs", which are millions of operations
>>> per second. My computer takes 202 or 203 seconds to run the
>>> benchmark, and achieves a rating of 12.27 MUPs (the number
>>> is printed in scientific notation, so shift the decimal
>>> point as appropriate).
>>>
>> with hyperthreading [enabled]
>>
>> 242 - 244seconds 10.16 - 10.24 MUPs +/- 0.04% (i assume)
>>
>> with hyperthreading [disabled]
>>
>> 203seconds 12.21 MUPs +/- 0.06% consistently.
>
> The Hyperthreading penalty seems to be the same here, as
> Super_PI. It seems strange that they would be the same, as
> these programs won't have the same memory access pattern.
>
>>
>>>
http://users.viawest.net/~hwstock/bench/3d0/3d0.zip
>>>
>>> Instructions and some background info are here:
>>>
http://www.abxzone.com/forums/showthread.php?t=70142
>>>
>>> Those two tests are reproducible for me. Give them a
>>> try, with and without Hyperthreading turned on in the
>>> BIOS.
>>>
>>> Note: The 3d0 program is a bit unhygenic, and leaves
>>> a bunch of files in its directory. You may want to
>>> dump all but the original files, when the directory
>>> fills up.
>>>
>> Be interested to hear what you make of that lot. Obviously
>> hyperthreading is doing the bulk of the damage but the memory scores
>> seem a little low also. I'll run the memtest and mess with some
>> other BIOS settings later but I have to go make some money now.
>>
>> many thanks,
>> J
>
> <<snip>>
>
> All I can say, is Hyperthreading is doing way more damage than
> it should be. Try memtest86 again, with Hyperthreading enabled
> and then with it disabled. There should be no change in the
> bandwidth readout. If there is, there is some other serious
> problem there.
>
> In my registry, I see an entry called SecondLevelDataCache, but
> it is set to zero. Implying it is detected automatically, as if
> L2 were disabled, you would see the performance plummet.
>
> HKEY_LOCAL_MACHINE\SYSTEM\CURRENTCONTROLSET\CONTROL\SESSION
> MANAGER\MEMORY MANAGEMENT
>
> According to this, changing it shouldn't help:
> http://www.winguides.com/registry/display.php/116/
>
> You might try downloading Sandra Lite 2005 and run the
> "Cache and Memory" benchmark. The 2002 version I've got
> has that benchmark, and the "bumps" in the curve tell
> you where the cache breakpoints are. A Prescott, with
> its 1MB cache, should have a breakpoint at the 1MB mark
> if the cache is working.
>
>
http://www.sisoftware.co.uk/index.html?dir=dload&location=sware_dl_all&langx=en&a=
>
> I think if I try to install it, it will remove the older software,
> so I cannot do this right now. I hope the Lite version still has
> that benchmark...
>
> HTH,
> Paul