Skull Trail Board, Hardware Malfunction
Ok i have a skull trail board, daul Extreme 3.2 Ghz LGA 711 quad core processors, 8 gb of DDR2 ECC ram and Nvidia Quadra 4000 series video card, and 1200 watt tough power power supply, 160 gb Wetern Digital raptor drive and 2 500gb Seagate Barracudas for storage and it runs xp 64 pro. The machine runs fine for 24 hours then i get a hardware malfunction error on the screen, the bios is the latest, the drivers are the latest, the tempratue never gets above 97 F in the case the processors never reach more then 95 F, yet the system still seems to crash there are no dump logs created, I ran every test i can think of on the hardware removed every peice of ram and tried each one by itself for 24 hours, have swapped out the video card, for diffrent brands, disconnected the extra drives, i'm at a lost so lets hear it HIT ME WITH SOME NEW IDEAS please i got just about 6k invested in this beast and it's killing me to find it blue screen once a day
thanks for comment but it's had and OCZ dual fan ram chip cooler on it for a while now, the ram was hot in the begining but's it' staying failry cool now, I am trying all sorts of diffrent configurations now, have even tried diffrent OS's just for the heck of it, of ocarse it didnt make any difference but you never know
I have the same situation.
2 - QX9775 CPUs
Tried both Kingston 5300 FB-Dimms & Crucial 6400 FB-Dimms
Tried different videocards ( 7300 GS & 8600 GT)
Currently Using Vista Ultimate but was considering Ubuntu or old Win XP 64.
Bios settings are default.
My system is fine under "normal" or "idling" conditions but as soon as I launch a render inside Maya or a batch render ... I get the BSOD you've described:
I'm on my third replacement motherboard.
yeah it has been a sort of night mare, but i am using XP 64 bit and when i run a heavy application i go in and set the affinty in the task manager to use only 7 processors, it seems to help as i havent crashed laltey, i know that is not the answer for ever but i'm using the machine 24 hours a day so there is little down time for it for me to test fixes or patches, i'm hoping to come up with a true fix one of these days
I've done a lot of tests. I seems to me that the system is totally stable when I just have one CPU installed (tested both sockets individually - no BSODs).
Buccanner 7, can you describe what kind of cooling solution you are using? I'm using two Cooler Master Hyper 212s and ram coolers with heatpipes and fins. My gut tells me that the problem is the northbridge. See, my CPU coolers are so big that I was unable to install a north bidge cooler.
Some times this error is mixed with a "memory parity error" message...and though many would jump to a RAM issue - it is infact the Northbridge that controls the communication between the memory and the CPUs (as well as other things):
This is the name for the chip in most desktop PC motherboards that handles the data transactions between the CPU, AGP device, Southbridge chip and the main system memory. The performance of the Northbridge chip will have a substantial impact on the power of the entire system.
Sorry if this is obvious...but the point is that the northbridge is critical and the crappy passive aluminum heatsink that comes with the motherboard is simply not up to the task.
I'm in pretty deep already...so I going to get a solid water cooling setup and I'll report when I have something to say.
currently CPU cooling is handle by 2 ULTRA chilltec coolers, they are thermoelectric coolers I got them from xoxide they are doing a great job keeping the temp down, the ram is cooled by a OCZ ram cooler, there are 6 other fans in the box pushing air around, including one on the north bridge cooler, I keep the room at 72f with about 50% humidity, the box really throws some heat when i'm really pushing it i have to open the side panel, I’m not sure I’m going to try anything new soon at this point as I have reached a sort medium where everything stays stable, but I am continuing my research into why the crashes , I also have experienced the same error with the memory parity problem when I first built it but not any more after i cooled the ram more. I look forward to see how you make out with your liquid cooling...good luck
Thanks for the time and willingness to share your trials and solutions. I want to buy a skull trail or D5400XS, but am waiting and studying the issue and problems. Why ... because I've been jumped on to many band wagons only to have be the lab rat troubleshooting the new widget. Doesn’t matter if it is Microsoft, Intel, SolidWorks or COSMOS Works, they all release before its ready to keep the money flowing. The first release is always at the buyer’s risk. Don't feel bad, it always works out in the end. You may have guessed, I am a 3D SolidWorks designer since 1997 and have owned every version of Microsoft and SolidWorks up to 2004 and a bought a bunch of systems, all Intel. Guess you are asking SO WHAT! OK, from the system side. Intel does not ship the CPU with any heat sink. Check out Intel’s link http://www.intel.com/support/motherboards/desktop/sb/CS-029095.htm , it is a warning on thermal issues. Be careful of the TEC coolers, they are very cold, and can cause as much of a problem as being to hot. Why? Thermal expansion for one thing (Hot and cold cycles can cause micro cracks on PCB BGA solder pads, thru holes on multiplayer boards, but condensation (water collecting on cold surfaces and running under chips can cause corrosion and electrolysis [electric current flowing through a liquid that dissolves metal and re-deposits it in a different place] causing micro shorts between pins). You are doing a good job debug and troubleshooting. My experience tells me you are on the right track. 1) Changing the # Processors in the Task Manager, changes the processor management and loading. This tells me it is an OS problem with XP 64 management of memory threads. Send Microsoft your notes and findings. It may also be in the Intel kernel chip drivers too! Remember, the Intel motherboard is using a new memory controller to control system memory and L1/L2 cache memory, 12MB (2 X 6MB Cache). Check out Intel links http://www.intel.com/Assets/PDF/prodspec/D5400XS-tps.pdf, http://download.intel.com/support/motherboards/desktop/d5400xs/sb/e30088001us.pdf, http://processorfinder.intel.com/details.aspx?sSpec=SLANY. Yes, cooling the Northbridge is critical, too! Video, Memory. Speed KILLS, This motherboard is the leading edge in speed and power. Each CPU needs or expends 150W of thermal engery. Pushing electrons around that fast generates a lot of heat. This motherboard takes Over Clocking to a new level. Good Luck and hope this helps a little. I’m working on a cooling system to help address some of these problems. Just pumping liquid Nitrogen on the CPU is only part of the thermal design problems. Managing the thermal load is going to take a dynamic active system that continually adjusts the demand on each heat source. Getting Microsoft and Intel to resolve the complex software and hardware issues is going to take a joint effort in providing information to these giants. Take care.
I'm running Vista 32, Xp 32, Ubuntu 32 using Hyper-X FB-DIMM. Issue occurs when warm cycling to XP only. On a number of occassions the USB Controller has been marked as low speed (rare) and for a while some strange ATA driver was installing. I re-install the Intel drivers regularly now.
Happens in Safe Mode ?
I am using a Silverstone Zeus 1200 (1300?)W PSU, and have to physically unplug this to fully power down as there is a quiescent supply even when switched off. Disks are Velociraptors (no RAID due to Linux).
2x EVGA GTX280's, which preclude the molex power lead on bottom right hand side, and also requires non tall jumper on the bios reset pins. I may have to lose one of these. I am not running the Aux fan from here and the cards (bricks ?) are independently powered.
Fans everywhere, but these don't idle down on a warm start. FSB is 400, dropping helps. (385 say). It should not need to. This system is not stable enough to distribute.
I have issues with the BIOS as supplied. I have 800Mhz RAM, BIOS indicates 667. Sandra tells me it is 800, as it should be, so maybe the board does not like 800. The BIOS the board was shipped with says 800 ?
It would be good if the BIOS maxed out all fans on on warm boot, and for a minute or so on shutdown.
Predominantly a reboot problem for me, but when running full loads, say AVG8 (well multi-threaded) "Fast", it could happen on clicking START. No thermal issue with CPU's or MCH (68).
I think this was a side project by Intel that has been left behind a bit. No sign of a DCC yet, and there may never be one. They and microsoft must know about this issue though. It is not good.
I have also tried with 8500GT's - to eliminate the cards