Sign in with
Sign up | Sign in
Your question

In desperate need of help, New system is misbehaving

Last response: in Systems
Share
August 6, 2007 3:28:49 PM

A few weeks ago I completed my first build. All went really well and the system has been working wonderfully and without problem for a few weeks now. However, about a week ago the trouble began. I'm stumped, I'm not sure if its a BIOS setting that I have set incorrectly due to inexperience, or if a piece of hardware has gone bad already. If its the later, I'm not sure how to determine which is bad or what my next course of action should be. Overall, the build itself had gone very well for a first-timer and the system had been running without issue. However, now that this strange behavior has popped up....I'm at a loss of exactly how to troubleshoot this..or what I may be doing wrong.

I'll do my best to post all the details of the behavior and give all the information I can and I really hope folks here will pitch in if they can, and give me a hand. I'm really wet behind the ears at system builds compared to most here...so any help experienced builders can offer would be terrific.

I'll start with my system specs:

CPU: Intel Q6600 2.4 Ghz
Mobo: EVGA nVidia NForce 680i SLI (A1 Series)
Memory: 4 x 1024 Corsair XMS2 DDR2-800
PSU: Thermaltake Toughpower 1200W
Video: EVGA GeForce 8800 Ultra
CPU Heatsink: ThermalRight Ultra-120 Extreme
Hard Drives: 2 x Seagate 7200.10 320 GB (RAID 0)
Optical Drives: 1 Samsung IDE DVD Burner and 1 Samsung SATA DVD Burner
Floppy Drive: Sony FDD
Soundcard: Soundblaster X-Fi XtremeGamer
Case: CoolerMaster Stacker 830 Evo
OS: Windows XP Professional 32-bit

None of the above parts has been overclocked or modified from stock configuration.

Ok...so now on to attempt to explain what is happening and walk you through this. Story time. I apologize before-hand if this is a long post...I'm just doing my best to go ahead and take the time to type this up in detail so that whomever might attempt to help me has as much info as possible.

The system was assembled, up, and running a few weeks ago. No problems at all. Its been booting reliably, running applications, and playing games just great. No jitters or blurbs...and certainly no smoke or fire pouring out of the back of the machine. For these past weeks, I've had it running around 8 or 10 hours each day...and have closely monitored core temperatures using CoreTemp. The temps have always seemed real good...in a room with an ambient temperature of around 80 Degrees F....when idle in Windows XP the cores have all been around 42-44 Degrees C, and 51-56 Degrees C while gaming. The cores have always been within 2 or 3 degrees of each other.

The problems started last Wednesday. I got home from work, sat down in my office, and hit the power button. The fans spun up, the system posted with a happy beep as it had for weeks...and then it just plain died. It sounded as if someone stepped behind the machine and yanked the plug out of the wall. Right in the middle of booting up it just powered down, the fans spun down...and it sat there quiet. I looked inside the case...and the little blue and amber LED's on the Mobo were still lit....it was still receiving juice. Strange. So I hit the power button again....Same thing. System made it past POST and then died. After trying this several times...I discovered that the system didn't seem to "die" in the same place each attempt. Sometimes it would die just after POST. Other times it would boot all the way into Windows XP and run just fine for 5 minutes before croaking. Once it ran for 30 minutes in XP before croaking. Then all of a sudden it wouldn't even make it to POST. You'd hit the power button, the fans would spin up, and it would die less than a second later. I tried just turning the power off at the back of the machine and cutting power to the mobo...and leaving it sit for a half hour or so. Tried it again....and same nonsense. I was making it to POST again, for a while...sometimes into Windows XP...and then after 10 or so attempts I could hardly get it to POST anymore.

Next I began troubleshooting the best I could. I removed 2 of the sticks of RAM and unplugged the hard drives and unplugged the SATA DVD drive. It still wasn't any better. I wasn't sure if optical drives, FDD's, video cards, or sound cards could cause this sort of thing so I just left those plugged in for now. When it didn't get any better after unplugging the hard drives, I plugged the HD's and SATA DVD Drive back in and left 2 of the sticks of RAM out and tried again. It might be total random chance, or my imagination, but it almost seemed that the system was doing 'worse' with just the 2 sticks. It was only making it half a second or so after pressing the power button before dying. So I put the 2 sticks of RAM back in for a total of 4 sticks once again. After doing this, it seemed to make it a little further into its boot...a few seconds further maybe...sometimes past POST...sometimes not quite to POST.

I tried resetting the BIOS back to defaults using the jumper on the mobo. It didn't seem to make any difference. So....I gave up for a while and went downstairs to make dinner for my family. After supper, I walked upstairs and gave it another try. It worked fine. Booted just fine. I got into the BIOS and, of course, all the settings were back at defaults. I went through my list of settings I had written down and set everything again. The system kept running...it didn't just 'die' like it had been. It seemed myseriously stable again. I looked through all my BIOS settings trying to find a 'culprit' that perhaps I had set incorrectly due to inexperience and maybe I had caused this myself.

There is an "SLI Ready Memory" setting in the BIOS that I can enable or disable. My Corsair XMS2 memory is "SLI Ready" memory. If this setting is enable, it apparently reads manufacturer provided settings off the memory stick and sets "performance" timings and voltages for the memory as dictated by the manufacturer. A few weeks ago when I fired up the system for the first time...the BIOS defaulted the "SLI Ready Memory" setting to "Disabled", it defaulted my memory timings to 5-5-5-18, and the memory voltage to 1.850 volts. I set that to "Enabled" (CPU OC 0%) and when I did, it set those memory timings to 4-4-4-12 and the memory voltage to 2.10 volts.

Now that my system was booting and the BIOS was reset back to defaults...the memory timings had been reset to 5-5-5-18 timings. The memory voltage was reset to "Auto"...and the "Auto Detect" was setting the memory voltage to 1.850 volts.

Whatever had happened, it appeared to be over. The system ran fine for the rest of the night and the rest of the week. Not another blurb, not another problem...not even a hiccup. I wondered if this problem had anything to do with me enabling the SLI-Ready memory. I read several posts on the Corsair forums and read that you shouldn't set "SLI Ready Memory" to enabled if you have 4 sticks of RAM. I don't know "Why"...but thats what I read. So, I left teh "SLI Ready Memory" setting disabled and manually set the memory timings to the manufacturer listed 4-4-4-12 and manually set the memory voltage back to 2.10. I ran MemTest86 for around 10 hours...and it didn't find any memory errors. The memory "seemed" fine.

The system ran fine all week, and all weekend. I worked on it, played games, watched movies...you name it...and it didn't even hiccup. I began to think that maybe, that day, maybe the computer gods just didn't like me.

It ran fine....until yesterday (Sunday). The system ran fine all day Sunday. At supper time, I shut it down and killed the power. I came back up to the office an hour or so later, turned the power back on, hit the power-button...the system posted its happy "Beep"...and then died. All I could do is look at my wife and say "Oh No...not again". I tried again, and again, and again. Same exact symptoms. It would boot to seemingly random points....but SEEM to almost steadily get worse...just like before. Initially I could get to windows and it would post every time before dying (Always just one "beep" at post)...then the more times I 'tried'....it seemed to get worse and worse until it wouldn't make it to post and it would die just a second or two after hitting the power button.

So...I thought maybe that this was due to me setting the memory timings manually to 4-4-4-12 and the memory voltage manually to 2.10. Maybe, with the 4 sticks of RAM, these timings were just too high to be stable. Again, I tried resetting the BIOS with the motherboard jumper. That didn't help. Sometimes it would boot far enough for me to hit "Del" and get into the BIOS....but usually I'd only have a few seconds to look around in the BIOS before it died. Again I tried yanking out 2 of the sticks of RAM...and it actually seemed worse...not better...so I put the 2 sticks back in.

In my poking around the BIOS...I did notice what seemed "Odd" to me...although to an experienced builder...it might mean absolutely nothing. Above, in my story, I had mentioned that when the BIOS is at default settings....that the "Auto" detected memory voltage was 1.850 at 5-5-5-18 timings. However, when the system gave me enough time to see the settings and as I looked at the auto-detected memory voltage this time...it was at 1.900 volts. Maybe thats normal? I just thought it odd that it would auto-detect to 1.850 before, and now it was auto-detecting to 1.900. I tried manually setting this up to 2.10 volts...thinking maybe it wasn't enough juice. I had enough time to save my BIOS changes and reboot....but that didn't help...system still died. I rebooted and tried again and again until the system stayed alive long enough for me to get back into the BIOS and I changed the memory voltage back to "Auto" and rebooted. When I finally got back into the BIOS again...I noticed the memory voltage was back to 1.850 again.

This time around...I did NOT set the timings. I left them at the default detected 5-5-5-18 and 1.850 volts. Maybe setting it to 4-4-4-12 and 2.10 volts had made my system unstable. Or maybe that had nothing to do with it.

Now, the above memory stuff may have had NOTHING to do with this and maybe I'm way off base and barking up the wrong tree....but...suddenly....the system was stable again. I was able to get into windows and the system ran FINE the rest of the night. It 'misbehaved' over a period of about an hour and a half...and then suddenly it was "fine" again. I dont' know if the memory settings had anything to do with it....or if it was pure random chance.

So, here I am at work writing this post. I don't know WHAT to do next and I'm stumped, frustrated, and really need some help and advice. It seems to me that the most likely culprits of something like this would be power supply, motherboard, or memory. However, it also seems to me that if it was one of those things...that it would be more 'consistently' bad. That once it 'died'...it would continue to exhibit the bad behavior....it just doesn't seem like it would suddenly get 'better' and work fine for days before doing it again.

I'm not sure what my next course of action should be. I've read alot about buying and using multimeters and such to test power supplies.....but it seems that would be a dead end as if the "culprit" just decided to behave FINE when I'm testing it...I might not even see anything wrong. I have no idea how to tell if a mobo is 'bad'....it just seems like it would be 'bad' all the time or not at all. I really dread the thought of having to take the mobo back apart, send it back, wait for a new one, and then reassemble everything yet again. I'm not sure what to do.

Help?

Thanks alot in advance to anyone who read all that stuff above...again...I apologize for the long post. I was trying to include all the details I could think of.

Thanks!
August 7, 2007 12:14:54 AM

that truly is a mighty post, but I can't help you except to hope for your sake the problem has gone
a c 84 B Homebuilt system
August 7, 2007 12:39:52 AM

It sounds like it is memory related. Run memtest86+ for several hours.
Related resources
August 7, 2007 1:36:01 AM

Open the case... hit the power button... pull power connector from MB as soon as possible...
Sounds like a power switch (front of case) to me.
a b B Homebuilt system
August 7, 2007 1:50:54 AM

Why? I mean, can you say if the power switch is OK or not after that? How? Just curious...
August 7, 2007 1:56:20 AM

If the power switch is flakey then if you unplug it from the mb it won't switch the pc off... did that make sense?

Anyway... I only mentioned it because the power switch on my CoolerMaster broke in the first few weeks of having got the case and that's how I figured out what it was...
a b B Homebuilt system
August 7, 2007 2:56:06 AM

Yup, thanks.
August 7, 2007 7:27:40 AM

I was thumbing through a Cyberguys catalogue yesterday and noticed some inexpensive Post Code Testers. If I was having the problems you are having I would consider one of these.
When I build a computer I start with the motherboard and use the manual to order components. You might check your motherboard manual and check if your components match the motherboard requirements. Maybe you have already done this. Judging by other posts on building it looks like most of you just ask others for recommendations on parts to purchase.
August 7, 2007 6:04:48 PM

Thanks to everyone for chiming in...this problem is certainly a pesky one ;) 

First, an update: Last time the system experienced its...uhm...strange behavior....was Sunday night. The behavior lasted about an hour and a half before it started running stable again. Since Sunday night (Today is Tuesday)...just like before...the system has been running stable. I did several hours of gaming on it last night..and periodically checked the core temps in Coretemp...the highest they got over several hours of gaming was 51-51-55-50. For whatever reason (I have no idea why as I haven't adjusted affinities or anything) but the game I was playing seems to really like to work Core 3 and leave the others be. Anyhow, core temps were fine. I rebooted here and there to pop into the BIOS and check the MCP temps...highest that got was around 57. Overall...system ran rock solid. I'm not getting my hopes up though...as the LAST time it did this and came back to 'life'...it ran rock solid for 3 or 4 days and then decided to go t*ts-up again.

Quote:

that truly is a mighty post, but I can't help you except to hope for your sake the problem has gone


lol...yep...took me a while to type it...but this problem has been driving me nuts and I figured that the more info I add in my post, the better the odds that someone else will have had a similar situation or have an idea what the heck is going on.

Quote:

It sounds like it is memory related. Run memtest86+ for several hours.


Yep....tried that. Ran 11 passes of memtest86...0 errors

Quote:

If the power switch is flakey then if you unplug it from the mb it won't switch the pc off... did that make sense?

Anyway... I only mentioned it because the power switch on my CoolerMaster broke in the first few weeks of having got the case and that's how I figured out what it was...


Now there's something I'd never thought of! My case is a CoolerMaster Stacker 830 Evo. I never thought that perhaps the switch or the electronics behind the switch might be 'sticking' and shutting the system down over and over. I'll definitely add that to my list of things to try when this happens again.

Quote:

I was thumbing through a Cyberguys catalogue yesterday and noticed some inexpensive Post Code Testers. If I was having the problems you are having I would consider one of these.
When I build a computer I start with the motherboard and use the manual to order components. You might check your motherboard manual and check if your components match the motherboard requirements. Maybe you have already done this. Judging by other posts on building it looks like most of you just ask others for recommendations on parts to purchase.


What is a Post Code Tester? How does it work? Is it difficult to use? You don't happen to have a link to one of these thingamabobs do ya?

As for building my system, as this was my first build I spent several months researching parts for it...based on both opinions and recommendations on various forums, user reviews, "approved" equipment lists from the online mobo documentation/manufacturer web sites, slizone, articles, reviews, etc. I did the best I could...and I discovered I missed the ball in a few places...stuff I wish I'd known earlier...but I suppose you can only research for SO long before you have to go ahead and take the first steps.

I wish I'd understood more about the effects of RAM on the system when I started...and had a better graspsed on what "SLI Approved" RAM really meant. I wish I'd known that using 4 x 1024 sticks of RAM instead of TWO sticks of RAM tended to make systems less stable....and often decreased performance or forced you to back off the memory performance. I understood the limitations of 32-bit OS's with 4 Gigs of memory....but i missed the ball on the rest of it....and didn't realize that 4 gigs versus 2 would have any 'negative' effects.

I also wish I'd known to remove the stock Northbridge and Southbridge heatsinks from the evga board and cleaned the putty-stuff off of them, and replaced it with AS5. I didn't read about that procedure until I was researching my recent problems on the EVGA forums. Seems that alot of folks swear that the first thing to do with a new EVGA board is to clean that junk off, and doing so would dramatically improve the boards stability. Unfortunately those posts were 'buried' in the EVGA forums due to a recent forum migration. Its a bit disappointing as I would have expected that if it was that important to do...that EVGA would have had some notes or documentation on it in their FAQ. My Mobo has been in for a few weeks now...and unless I have to remove it due to problems with this recent issue...I'd rather not take the whole thing apart just to clean off that putty-stuff.

Quote:

it could be your wife wanting to spend time with you?
you figure your first build would be a little less ballsy


maybe you should be plugging the PSU into your OVEN or DRYER socket :sarcastic: 


your RAM voltage should be 2.1V
your timing 4-4-4-12
did you go to http://www.evga.com/ for finding out what the problem is?


http://www.evga.com/support/faq/af [...] aqid=57902


http://www.evga.com/support/faq/af [...] ?topicid=7


hehe...my wife wants it to be diagnosed and cured as much as I do. She's a gamer as well and my system problems are eating into her gaming time ;) 

My build was ballsy? hehe...I actually was thinking it was a bit cowardly ;)  I spent so long reading reviews, articles, and user reviews....and didn't overclock (yet...I'd like to try that soon if I can get this whole system problem straightened out)...and I'd thought about trying watercooling but decided against it, based on advice from others and just not wanting to add any more complication to the build. There were places I could have saved more money in the build...but I was afraid to try those routes for fear of a 'cheaper solution' causing me headaches in the end. I'd rather just do it right the first time ;) 

I can go ahead and set my RAM voltage back to 2.1V and the timing back to 4-4-4-12...should I go ahead and do that? Right now its at 5-5-5-18 and 1.850 volts (The defaults autodetected and set by the BIOS). I was afraid to go monkeying with those defaults after I got the system running again for fear that my problem had something to do with me setting those timings and voltages. Especially after the RAM GUY over on the Corsair forums suggested that with 4 sticks of RAM...I might have to take steps to make it more stable (He suggested underclocking it from 800 to 667).

As for going over to EVGA.com and checking there...I sure did. I read all the motherboard FAQs, checked the forums, and posted for help there as well.

I didn't install nTune. I'd read that installing nTune or the nVidia Network Management tools was...'bad'. So I did not install either of those. Thats too bad as nTune sounded like it had some handy tools and would have worked nicely to monitor temperatures and such on my video card...but after what I'd read about problems with that software...I was afraid to.

In case anyone who has chipped in here is following along to add to their own mental repertoire of tips and tricks for troubleshooting things....after several days of googling and reading....I have a couple more 'leads' as to the cause of this problem. Two of them more 'expected' and one is a bit strange.

Some folks seem to think that this could be memory related. I read over on the EVGA forums that alot of EVGA boards go bad with what they refer to as the "Dreaded C1 error" and the memory controllers go bad. Alot of folks there seem to attribute that problem to the 'goop' that they put under the Northbridge and Southbridge heatsinks...and that this 'goop' is way caked on and actually stopping them from cooling properly. They say that removing, cleaning, and reapplying AS5 can prevent the problem. However, I haven't received any "C1 errors".

Of course, googling my 'issue' I found many instances of folks with machines that restart randomly or shut down randomly, but after examining these further...most of these instances were not all that similar to my problem. Most of them seemed to be due to overheating or improper cooling. However, I did find a couple instances of problems nearly identical to mine...and in both cases....replacing the PSU solved the problem. I'll be obtaining another PSU so that I've got it around should my system 'croak' again and I can try out the PSU to see if replacing it solves the problem. So far, I'm thinking that of all the information I've read...a bad PSU seems the most likely cause.

The 'strange' one I read was buried in a bunch of engineering forums regarding the design and characteristics of Uninterruptible Power Supplies and PSU's. It seems that many of the newer PSU's require a "True Sine Wave" and usually only the very expensive UPS's produce true sine waves. For example, after talking to APC...only their Smart-UPS line produces a true sine wave. Their cheaper 'home user' Back-UPS series produces "Stepped approximations of sine waves". I'd read alot of reports of folks with Enermax Galaxy PSU's (Which the fine print on those reads that they require a true sine wave) who experienced problems very similar to what I am experiencing. Once they had taken the UPS out of the picture, there problems ceased. I currently use an APC UPS unit that does not produce a true sine wave....and I suppose its possible that the Thermaltake Toughpower 1200 could require a true sine wave like the Enermax Galaxy does...and maybe they just don't document that like Enermax does. So, just to eliminate that from the equation, for the time being...I've taken my UPS out of the equation and am using a run-of-the-mill surge protector. I've got an e-mail into support at Thermaltake about this, to see if this could be the cause of my problems.

Thanks again to everyone who is trying to help out...hopefully this thing will get figured out and the system will be all shiny-happy again ;) 
a b B Homebuilt system
August 7, 2007 7:16:58 PM

Hmmm, I wonder if thermal flexing of your mobo has exposed the presence of cracks somewhere. I wouldn't expect a bad PSU switch to be this inconsistent; it's just a momentary-contact pushbutton, and would either be shorted, or not.
Another possibility is something loose, that the vibration of fans or drives is jiggling around. Any possibility of FOD (foreign object / debris) under or on top of the mobo?
August 8, 2007 9:31:42 AM

Okay, I finally managed to read through all the posts and I find a bit strange that only 1 person mentioned it may be the PSU/power related.

First thing is that I've never been a fan of killing the power to my PC, when I shut it down it still pulls juice but that is mainly because I use the wake on lan feature. But that's the first thing I recommend you try.

Also if possible try to borrow a different PSU from a friend, I'm guessing that your PSU may be dying or might have a manufacture error.

In my experience it usually is the PSU/power related when a machine just outrights dies instead of hanging or restarting.

If you do intend to purchase another PSU I would recommend PC power and cooling as they have a single rail with a large amp.
August 8, 2007 8:30:56 PM

I also think the issue is power related, the power supply. If you go to www.cyberguys.com and type in the search space the number
204 0502 and another product number is 204 0503. These testers plug into a standard ISA or PCI slot. I haven't purchased one because I haven't run into any problems I can't figure out on my own. My systems are usually set up with ASUS motherboards and AMD processors and I don't overclock. You appear to know a lot more about computers than I do.
I did run into a few problems with setting up my Windows Media Center PC, but that is all ironed out now.
August 8, 2007 8:53:56 PM

after jtt283's FOD check {checking for a short}, try running with 2 sticks of ram {before you start underclocking}
are all the cables arranged neatly and out of the way, with all the plugs seated nicely?
August 10, 2007 5:03:03 PM

Update:

The System behavior discussed above has been continuing. Wednesday night was the last time it occurred....same thing. I fired it up after work and it ran fine. It ran and I played games and worked on it all night....then shut it down. Just to test it out...about 2 hours after I shut it down I tried to start it back up...and again...it wouldn't even post. Also again, when the system starts doing this....I hit the power button....it proceeds through the boot routine and dies at various points. Sometimes it dies right after I press the power button. Sometimes it dies just after post. Sometimes it dies after it gets to windows. Somtimes it runs for 5 minutes before it dies. I've been watching the LED codes on the motherboard while it does this and it doesn't seem to be any one 'code' or point in the bootup cycle that it stops. Its random.

So...continuing...

Quote:

Hmmm, I wonder if thermal flexing of your mobo has exposed the presence of cracks somewhere. I wouldn't expect a bad PSU switch to be this inconsistent; it's just a momentary-contact pushbutton, and would either be shorted, or not.
Another possibility is something loose, that the vibration of fans or drives is jiggling around. Any possibility of FOD (foreign object / debris) under or on top of the mobo?


Thanks jtt283. I opened it up and took a good look for any FOD. Visual inspection, tilted the case slightly and listened for anything 'sliding' or 'rolling'. Even used some compressed air to see if anything I missed would move. No FOD there.

Quote:

Okay, I finally managed to read through all the posts and I find a bit strange that only 1 person mentioned it may be the PSU/power related.

First thing is that I've never been a fan of killing the power to my PC, when I shut it down it still pulls juice but that is mainly because I use the wake on lan feature. But that's the first thing I recommend you try.

Also if possible try to borrow a different PSU from a friend, I'm guessing that your PSU may be dying or might have a manufacture error.

In my experience it usually is the PSU/power related when a machine just outrights dies instead of hanging or restarting.

If you do intend to purchase another PSU I would recommend PC power and cooling as they have a single rail with a large amp.


Thanks ffchocobo.

Hoping you're right and its a bad PSU, I ordered a new PSU (Identical model to the one I was using as I wanted modular cabling). I received it yesterday and swapped in the new PSU. Started it up and the system started just fine. Booted into windows and shut it down. Restarted it. Ran fine. Did this several times....it was fine each time. Started it up and ran it for a few minutes then shut it down. Went down and watched TV for a couple hours then went back up before bed and tried to start it up again just because I was paranoid by this time...and sure enough...

It died. Just like before.

So....I guess its not the PSU. Unless I got two bad ones in a row :( 

Any other ideas?

I'm probably stretching here as I'm about out of ideas.....but I'll ask some more questions just for my own piece of mind.

With regards to the configuration of the motherboard:
On the EVGA 680 SLI motherboard there are 5 fan headers. I've got one 120mm exhaust fan plugged into the VREG fan header just in front of the rear exhaust fan port. I've got one 120mm exhaust Fan in the Exhaust port at the TOP of the case plugged into one of the Chassis fan headers at the front of the board. I've got two 120mm intake fans in the front of the machine. One of the two fans is plugged into the Chassis2 Fan header on the motherboard....and the other fan is plugged into the string of Molex connectors from the PSU (The same cable that is powering the peripherals).

I notice that when I power on the machine...the rear exhaust fan plugged into the VREG fan header takes maybe 2 or 3 seconds to 'start' spinning compared to the rest of the case fans....

Is it possible for too many fans to be causing this problem? I mean...I guess that probably sounds lame but I'm out of ideas. Its a 1200 W PSU so I wouldn't think I'd be low on juice. I'm only using 4 ports on the back of the PSU. Two of the ports I'm using are PCIe cables plug directly into the 8800 Ultra card. One of the peripheral PSU ports I'm using for peripherals to power the SATA devices (2 hard drives and a DVD Drive) and another perhipheral PSU port I'm using to power the IDE DVD Drive, the Floppy Drive, and one 120mm fan connected via Molex.

Is there some other 'connection' or 'configuration' item on the board or the BIOS that I could have done out of ignorance to cause this?

Its strange because once this problem occurs....if I give up and just let the computer sit for a few hours or overnight.....it'll start back up. However, once the problem starts to happen...no matter how many times I stab that power button and try to start it up....it will continue to happen.

That sounds sort of like a heat problem to me....but my 4 cores are 40 degrees in the BIOS and 40 41 38 40 as reported by coreTemp. My MCP temp in the BIOS is 47. The hottest those temps have ever gotten was 49 52 54 50 and MCP 55. That doesn't seem like enough heat to cause problems. Seems pretty cool to me. The case is extremely well ventilated with plenty of nice cool airflow. So heat doesn't really make sense. Not to mention, this problem has occurred when the PC was 'cold' after sitting overnight....and there are times when I've played demanding games all night long, and the PC would shut down and restart as it is supposed to...without any problems. Again, it seems totally random.

Or...do you think that its probably nothing that I did...and its probably just a bad motherboard...and that I should just cut bait and RMA the board?

Quote:

I also think the issue is power related, the power supply. If you go to www.cyberguys.com and type in the search space the number
204 0502 and another product number is 204 0503. These testers plug into a standard ISA or PCI slot. I haven't purchased one because I haven't run into any problems I can't figure out on my own. My systems are usually set up with ASUS motherboards and AMD processors and I don't overclock. You appear to know a lot more about computers than I do.
I did run into a few problems with setting up my Windows Media Center PC, but that is all ironed out now.


Thanks fferree. That looks terrific! Maybe that would help. At this point though....I'm thinking I'll have to RMA the board and the board is just bad. I just wish I had more experience at these things so I'd have a better idea if its a bad board...or what....or some way to test the board to make sure its bad or good. The device you linked me to looks like an LED-code display thingie. The board I'm using has something similar on it...displays LED codes to indicate its current procedures during bootup...unfortunately...I've been watching all the codes it displays and have been unable to find any errors or patterns.

Quote:

after jtt283's FOD check {checking for a short}, try running with 2 sticks of ram {before you start underclocking}
are all the cables arranged neatly and out of the way, with all the plugs seated nicely?


Thanks coldmast. I tried that too. I ran memtest against the memory as well, overnight over many passes, and no errors. I've checked and rechecked all the plugs...everything seems good there.


I have no idea what else to try or where else to look...I'm guessing that at this point...its probably best to just RMA the board and try a new one.

So...today I requested an RMA for the board and I'll ship it out on Monday....and hope to get my new one from NewEgg soon :) 

Thanks to everyone for your ideas....I'm stumped and really appreciate the help! Thanks guys!
August 10, 2007 7:15:47 PM

Is the case you are using new?
August 10, 2007 8:04:30 PM

Something to try.

I had an ASUS mb that would do the exact thing you are discribing when I had all 4 memory DIMMS used. I bumped up the voltage a tad over what it was rated at. ( from 2.1 to 2.2) and it work flawlessly after. Before you try bumping up the voltage though maybe try just running the system on 2 sticks of memory at the correct settings. (I will admit I skimmed your posts, if you tried this already I apologize.)
August 11, 2007 8:22:55 PM

Quote:

Is the case you are using new?


Case is brand new, its a CoolerMaster Stacker 830 Evo. There's a ton of space inside so I have plenty of room, and even with the big ThermalRight Ultra-120 Extreme heatsink....there's still plenty of room and nothing is even close to making contact with the side-panel.

Quote:

Something to try.

I had an ASUS mb that would do the exact thing you are discribing when I had all 4 memory DIMMS used. I bumped up the voltage a tad over what it was rated at. ( from 2.1 to 2.2) and it work flawlessly after. Before you try bumping up the voltage though maybe try just running the system on 2 sticks of memory at the correct settings. (I will admit I skimmed your posts, if you tried this already I apologize.)


Thanks for chiming in tdank :)  At this point...I'm willing to try about anything, including but not limited to, doing strange indian tribal dances around the PC and burning incense while praying to the almight system gods.

As far as memory goes, I've tried running on both 2 and 4 of the sticks...and at both the Corsair recommended 2.1 volts and the BIOS auto-detected default 1.850 volts. I've tried the manufacturer recommended 4-4-4-12 timings and the relaxed BIOS default 5-5-5-18 timings. However, perhaps my testing procedure was 'flawed'?

The last time my system was starting up, and then died and I decided to try 2 instead of 4 sticks...I simply went in, pulled 2 of the sticks (Pulled them from slots 1 and 3, left the sticks in slots 0 and 2). The system still was doing the 'random shutdowns' at startup thing. However, when I did that...should I have yanked the CMOS battery and let the system sit for an hour, after pulling 2 of the sticks and before trying again? Was my test inaccurate because I didn't pull the battery and wait?

Don't worry about just skimming my posts...I REALLY appreciate you even taking the time to skim them...as this is a frustrating problem and my posts on it are really really long. I have to expect that anyone willing to help may not have the time or inclination to read all that garbage I posted....but on the other hand...I wanted to post as complete information as possible so it would be available to those who did read it all...and needed the info to help me :) 

So anyhow...I have some NEW info to post...here goes:

Thanks again guys...I can't begin to tell you all how much I appreciate your help. It just goes to figure that on my FIRST system build...I'd get some oddball problem that is difficult to troubleshoot.

Ok...here goes. After reading the posts on other forums where I'd also asked for help to try some ideas there...after work yesterday I picked up a few new CMOS batteries. I came home and unplugged my UPS and got that out of the picture, and used only a quality surge protector that I knew would work. I also pull the CMOS battery and inserted the new battery.

I hit the power button, system powered up, posted, I got it into the BIOS so I could load defaults...and then it died. As usual, it was as if someone yanked the power...however the blue LED on the mobo was still lit so there was power...something just told my system not to run.

This leads me to my first questions, I am still learning and don't understand alot about the way CMOS works.

When I yanked the old battery and inserted the new one, should I have waited an hour before I inserted the new one? When I'm troubleshooting something and removing parts from the system...do I need to yank the CMOS battery out and wait an hour each time? If I don't do that...is the problem that my 'testing' isn't really accurate because the system is attempting to load on 'old' BIOS settings (for example, BIOS settings that include hardware that I had removed). I'm just trying to understand what clearing the CMOS does so that I know when I do and do not need to do it...and how long I need to wait.

Next, I started pulling hardware out until the Mobo was naked except for just one memory stick. I tried powering up the system and the same "shut off" problem was still occuring. So it didn't appear to be any of the hardware. I didn't have any fans or anything plugged in.

So....I again pulled the CMOS battery and reconnected everything. A couple hours later...I put the CMOS battery back in and powered up. System started up just fine and I was able to get into the BIOS and configure my settings. After that, I booted into windows and it was still running fine.

Here is where it gets really weird:

Once I was in windows...I decided to replace the sidepanel on the case (CoolerMaster Stacker 830 Evo). I slid the side-panel 90% on (Up to the point where you have to apply pressure to get it to 'snap' into place). As it 'snapped' into place...bam....system died. Now...I'm thinking to myself..."Was this a coincidence"?

Later....once, after many many attempts at pushing the power button and attempting a startup...I got it running again. Once again, I got into windows and the system was running fine. Once again, I slid the side-panel back onto the case and when it 'snapped' into position...that slight 'jar' to the machine SEEMED to cause it to shutdown. Again, this COULD be a coincidence...but granted that it happened twice is suspicious.

So I opened up the machine and started looking for any shorts or contact points that could be related to the side-panel. Nothing. There is nowhere that I could see any sort of 'problem' with the sidepanel. There is PLENTY of space in that huge case....and the side panel is well away from everything...even the large Thermaltake Ultra-120 Extreme heatsink.

That heatsink gave me another thought. Everyone says those Thermalright Ultra-120 Extreme's are the best thing since sliced bread and get great temps on them. I started thinking though....this heatsink is heavy.....could my problem be related to this thing flexing the motherboard? Could a microscopic 'crack' have opened in some circuit and the stress from the heatsink, exaggerated when I close that side-panel, cause that crack to 'open' and shut down the machine? Could the weight of that heatsink be causing a problem with my processor? I didn't THINk it would as so many folks have such great success with the heatsink and don't report any 'processor weight' problems. But...I'm 'new'...so what do I know?

Aside from that, once I get the system up and running 'stable'....once I'm in windows it tends to continue to run stable just fine. However, once I 'shut down' from windows after a nights gaming and turn off the power to the machine....the next day...24 hours later...when I power back up is when many times the system tends to have "the problem" again. Sometimes its almost as if my system doesn't like to be 'cold'...maybe the thing has a mind of its own and prefers to be really warm and toasty ;) 

The other possibly 'odd' thing I noticed...and I have NO idea if this is 'normal' or not...but when the system is running...if I look in the BIOS under 'system voltages'. My CPU voltage defaults to 1.25 Volts. However, as I watch the voltage readouts....every few seconds it fluctuates between 1.26 Volts and 1.25 Volts. Is that normal and nothing to worry about? Or could this be an indicator of the problem I'm having? Perhaps a damaged Mobo?

So here is my dilemma....do I RMA the mobo? What are the odds that this is a mobo problem? I'd hate to RMA a perfectly good stable mobo if the Mobo isn't the problem, go to all the work (especially for someone new at this...replacing a mobo may be easy for some of you....but its still alot of work for a new guy like me) of replacing the mobo...only to have this problem continue? Or....is the general consensus here that the problem is something 'else' and I should hang onto the mobo and keep trying other things?
August 11, 2007 9:47:54 PM

Major Update!!!

I've posted for 'help' in several forums...and over at the EVGA forums...a fella pitched in his thoughts:

Quote:

The fact that twice when you tried putting the side panel on the system shut down is not a coincidence. Without putting the side panel on, reach in the case and slghtly jiggle the Heat Sink as well as other main components on the board. It's possible that movement is causing a momentary short to ground.


and here is what happened when I attempted his idea and what I posted there afterwards:

Quote:

trs32505, I think I owe you a six-pack!!!!

You win the prize. You were right on the money and you're troubleshooting tip uncovered the problem.

I did what you said, and when the system was up and stable and running...I gently pressed the Thermalright Ultra-120 Extreme up. Nothing 'bad' happened. Ok...now I gently applied a little pressure 'down'...bingo...system died. Thats what was causing it!

That damned stupid Thermalright Ultra-120 Extreme cooler is causing/aggravating one of 3 things:

A) Its too much stress and/or weight on the processor and, I have NO idea how this could work....or why...but that stress and/or weight is making the processor mad at me and my system shuts down.

B) Its too much stress and/or weight on the motherboard...maybe the board is even flexing...and that is shutting it down

C) There were already microscopic cracks in the circuits around the board...and the weight of that heatsink is 'opening' the cracks when jiggled, causing a short, and shutting the system down.

So...to 'test' the theory. When my system shut down....I tried to fire it back up right away. Nothing. Starts up, and randomly shuts down.

Next, I built a 'suspension system'....its certainly nothing fancy and I HOPE it will do the job. I took a LONG strip of velcro and ran it through one of the available 'slots' in the case just beneath the PSU, then down beneath the Heatsink towards the 'end' of it to support its weight, and then up to loop around another 'slot' towards the front of the case making a big 'suspender'. I gently pulled this tight, and watched as the weight from gravity came off the top end of that heatsink...and it 'lifted' that end just a fraction of an inch. Enough to see. With my velcro pulled tight with no slack and fastened off.....and the case still open....I fired up the machine.

Bingo...it started FINE. Posted. Booted into Windows...no errors...no problems...nothing. Its been running stable now for a while and undergone several restarts.

Next....to test my 'suspender'...I snapped the side-panel back on....and what do you know...system stayed running :) 

I rolled the case out of its spot and a foot or so on the carpet...system stayed on and stable.

My 'velcro' suspender SEEMS to be doing its job. I just hope it doesn't melt or catch fire or something awful!

So there you have it. It looks like Thermalright didn't look very much at the sheer gravity-induced topheavy weight of this thing very much (Unless of course, they did their testing just FINE and my motherboard or processor already HAD a fault before I installed the cooler).

So...finally that brings me to my question and I really hope you guys can help me out. I need to borrow on your collective experience and best guesses:

What should I do now?

Do you think the motherboard is 'bad' and I should still RMA it? In that case, it was a mobo that was ALREADY bad and the 'weight' of the cooler was aggravating an otherwise invisible condition.

Do you think my motherboard was 'fine' and it was the sheer weight and imbalance of the cooler itself causing all the problems...and now...when suspended....my mobo and my processor will be just fine?

Do you think there could be something damaged or wrong with my processor from this? If so, what should I do about it and how would I tell?

etc. etc.

What should my next steps be? What would you do if you were me?

At the bare minimum...after ALL this typing and ALL these headaches...and ALL the help folks on this forum pitched in with their ideas....I really hope that if anyone ELSE ever has a problem like this...they'll stumble on this thread.

Thankyou thankyou thankyou! Thankyou trs32505 :) 
August 11, 2007 10:15:52 PM

In regard to the powersupply possibly being the problem. Try this, unplug everything from the psu except the fan. If you have them plugged into the mobo ports, plug them into a 4 pin molex connector instead. You should have adapters that came with the fans/case. after that, short out the power on pin and the ground pin with a paper clip or wire. This will turn the power supply on manually. That way, you can see if the fans continue running without incidence. If they do, I would say your next best shot is to RMA the mobo on monday.

Here is a link to help

http://www.techpowerup.com/articles/other/22

Your mobo manual should have a picture of the connector that will allow you to see which pin is the power and which pin is the ground. Power should be green and ground is, of course, any of the black pins.
August 11, 2007 10:22:15 PM

sorry, just caught your last post. Glad the problem is solved. I would recommend using zip ties to support the weight insead of velcro. they are less flamable. Although your heatsink should never get that hot. ALso, I would still rma the board. That crack or whatever is causing that problem was probably there from the start. I have a ultra 120 extreme and don't have that problem (yet). My system has been fine for about a week with it on the mobo. Just be careful if you ever move the case. 2 pounds is a lot of weight for the mobo, especially sticking out that far it becomes a very effective lever. I don't think i need to explain the physics of what that means.
August 12, 2007 11:33:09 PM

that great news! well sorta

Quote:
I didn't THINk it would as so many folks have such great success with the heatsink and don't report any 'processor weight' problems. But...I'm 'new'...so what do I know?


no it appears your earlier hunch was the correct course of action.
there are cases of this, a lot of mobo makers have the exact weight limitations specified.
!