GA-890XA-UD3 failures: Down to mobo or RAM!

deepruntramp

Distinguished
Jan 7, 2010
51
0
18,630
Hello again,

Let's just do a full hardware rundown:

-GA-890XA-UD3 (revision 1) mobo, default BIOS.
-2x G.SKILL ECO DDR-1600 2gb sticks.
-Sapphire Radeon HD 5870
-Phenom x4 965 BE w/ stock HSF.
-Sony Optisomething ODD
-Samsung 1TB drive (gotta find the exact model of this).
-XFX 750W PSU (again, gotta find the exact model).
-Windows 7 Home Premium 64-bit

After all kinds of problems when I installed the drivers in a less than professional order, I reformatted and am in the process of starting over from the top.

Here's my history over the past 72 hours.

**ROUND THE FIRST**

-Hardware setup. Everything went fine, only "hmm" moment was the HSF 'gliding' on the CPU die (it was noiseless and smooth, definitely felt like the thermal grease was completely inbetween the two when I went to lock it down). Cable management isn't great due to no extension cords.

INSTALL ORDER:

-Windows Update + all optionals (potentially included some of the later drivers...)
-Microsoft Security Essentials
-Video Drivers (generic Cata 10.7).
-AMD chipset drivers from my mobo's page.

>>Starcraft 2, get a BSOD (dxgmms1.sys, the DirectX 11 VRAM manager, I think?) and freak out, start digging around...<<

-USB 3.0 drivers from my mobo's page.

>>BSOD after USB 3.0 drivers install but before restart can occur, citing BUGCODE_USB_DRIVER. Opcode says software/driver based, thankfully. Figure it's growing pains.<<

-Dropped in the Gigabyte starter CD, installed the SATA and LAN drivers (LAN drivers were newer than the ones Windows had, the SATA ones were generics).

>>BSOD on shutdown/restart, citing a PINBALL-type error (apparently something to do with a file system?) Run dskchk overnight, schedule a driver integrity check.<<

>>Wake up to a shut down system which doesn't cough up video until two power cycles. Restart to Windows 7 running the driver integrity checker. BSODs at NTFS.sys. Decide to nuke the site from orbit, just to be sure.<<

**ROUND THE SECOND**

-Set up BIOS to Optimized Defaults. Set manual on the RAM and change timings, realize it's running at 1333MHZ instead of 1600. Change multiplier to fix that, reset.

>>Infinite restart loop. Pop open the case, reset CMOS.<<

-Restarts fine. Set timings but don't touch multiplier (so we're back at 1333MHZ. Everything looks OK.


^^ The above bolded for emphasis -- this is of course a bonafide hardware issue.

-Format/reinstall Windows 7.

INSTALL ORDER #2:

-Gigabyte CD auto-install, minus the bloatware (no Eco-whatever, no "online games"). Reset cycles.

>>No BSODs after USB this time! Plug in internet.<<

-All Windows *Critical* updates. Reset cycles.
-Microsoft Security Essentials.
-*Optional* updates detect only my monitor hardware-wise. Download it + the other Windows-related stuff. Reset cycles.

CURRENTLY SITTING SCARED AT:

-AMD Chipset from my mobo page. Do I need this, or did the driver CD take care of it (it said "Chipset Drivers" on the front).

-Video Drivers (have the ones off Sapphire's site this time, it's 10.7 cata, which is the latest anyway).

-Perhaps something I'm forgetting?

-Perhaps voltage settings that the auto-Optimize isn't behaving with?

-Prime95 + CoreTemp are sitting in the wings (my idle is at ~40 with a Phenom x4 965 BE, from minor research this seems high so I feel like I need to stress it... though one of my case fans is off due to not having enough fan plugins on the mobo (a molex -> 3-pin adapter is in the mail, though!)

-----

ANY help or advice would be GREATLY appreciated. I'm hoping I just flubbed the software the first time, but I'm dreading I botched the CPU (installing HSFs scare me to death) or that the mobo is bad (PSU voltage seems OK, memtest86+ was OK). Again, I have yet to update to the newest BIOS, but due to the software-nature of the BSODs I sided with "don't fix what isn't proven to be broken."

THANKS SO MUCH! :D
 

bilbat

Splendid
1600 is a non-standard speed for DDR3. G.SKILL ECO Series says: "Designed Specifically for Socket LGA 1156 Intel Core i5 & i7 CPUs" :(

Your BIOS loads your memory parameters from a little 'table' in EEROM called an SPD - the first three entries for DDR3 typically contain settings for 800, 1066, & 1333. There are many more timing parameters that the 'big four' you are normally shown by the manufacturers - CAS-tRCD-tRP-tRAS... If you want to get 1600, you will need to determine and set them all. The best way I am aware of is CPU-Tweaker (current version: CPU-Tweaker 1.5); with one DIMM in, and a fresh LoadOpt, run it, click on the SPD button, and it will give you a list of the majority of your RAM timings - scribble 'em down - go into the BIOS & set 'em. You will see the three 'standard' tables to the left; the one far to the right is the one you want... If there are some your BIOS needs, that are not shown, post back - some can be calculated, plus there's another (albeit more complicated) method...

This may help you as well: AMD's tuning guide...
 

deepruntramp

Distinguished
Jan 7, 2010
51
0
18,630


The RAM was actually recommended to me by a member of these boards, knowing full well I'd be using this very mobo. Several others backed up the suggestion, so I went with it.

The Intel line is there because the minimum voltage the memory wants is 1.35v, and AMD boards apparently can't do that. My BIOS lets me drop DRAM voltage to 1.32 or 1.42, but not 1.35. For now, I've left it at the mobo default of 1.50. Maybe someday I'll learn how to tweak it to where it's using the extra volts. Or I might just return/scrap the stuff and buy RAM that's actually appropriate for my system. :??:

(I can't seem to find the link where I learned about the above, but it was a G.SKILL representative that was explaining it...)
 

bilbat

Splendid
I think you have every chance in the world of getting it working - the BE's have far and away the best track record with higher speed RAM - but - you'll have to tweak it; as I said, there's no standard BIOS 'machinery' for RAM above 1333. I, too, have recommended some 1600 to AMD users, with two caveats - it takes hand setup; and you are limited to one DIMM per channel (two DIMMs total) - far as I know, AMDs don't do above 1333 with two DIMMs/channel; and some are limited to 1066 then... When they say "Intel", all it means is that the 'extra' memory timing info for the higher than JEDEC speed is in Intel's 'XMP' format - which 'kind of' can 'BIOS load' higher speed info - but - it has many drawbacks, too...

If you'd like a bit of help, d/l & run the program I pointed out - ckick the 'SPD' button, and post those screens... I have your manual here - we can take a stab at it... (If you've never captured & posted screens, I have a tutorial posted here...
 

deepruntramp

Distinguished
Jan 7, 2010
51
0
18,630
First, thank you so much for your advice and help.

Secondly, I would definitely follow your advice, but the system's dead. Prime95 (64-bit) crashed with a Win32k.sys BSOD, and the Core Monitor showed temperatures climbing toward 60C (idle 42C, about 10 sec in 52 and rising) when I got the bluescreen. I'm going to save myself the time and grief of making extra special sure this isn't software -- this isn't Windows 95 we're dealing with. BSOD isn't something you brush off as business as usual.

Five BSODs, new system, each citing a new issue, each under stress, persisting after format. All kinds of mobo hardware oddities which I probably shouldn't have overlooked (back USB slots just plain not working at times, strange behavior after non-consensual resets, etc). Stick a fork in this one, it's done. Sad that it's my first Gigabyte too... but with how complex these things are, I imagine DOAs are probably pretty frequent.

Sure, it could be RAM, but memtest86+ checked out...

CPU temperatures are kind of scary too. What do I do about those?

Finally... where should I go on these forums to ask about how to determine what to return/RMA?

Maybe I've lost my touch with all this. Or I'm just getting too old, one of the two...
 

bilbat

Splendid
First thing I gotta get a look at is what the 'stock' HSF on a BE is - for intels, that's the one aftermarket part I insist on 'fore I'll help with an overclock - doesn't matter what kind, needn't be pricey, just anything but the stock rotary postage stamp... My understanding is that the AMD sockets are pretty much a b!tch to hang the coolers on - may be loose, not level, bad pasting? Usually problems with the thermals are either HSF-related, or too much voltage - and you haven't 'cranked 'er up' yet...

Made my day with the 'nonconsensual resets'![:jaydeejohn:4] [:bilbat:9] [:jaydeejohn:4] [:bilbat:9] Love literate people - unfortunately, becoming a thing of the past :cry:

I'd certainly be willing to help you take 'er back to 'ground zero', start with just the board and the basics, build & test from there... Here's how I start every build:
0270n.jpg
I don't like people to have a 'bad time' with their GIGABYTEs - think they're one of, it not the best in the business. I've likely done a dozen or more systems in the past year - not only never got a malfunctioning MOBO, but got one bad part out of hundreds (a digital temp readout), and I dropped that before getting to the install - so I don't think I can count it! ('course, like with everything else - better to be lucky than good [:bilbat:2] )

Can't imagine anyone doing this stuff is older'n me [:bilbat:6] I learned FORTRAN4 on a mainframe (fondly remember the systems guy bragging about their new 5 meg harddrive - size of a chest-freezer [:jaydeejohn:3] ), from a nun, working off punch-cards!
 

deepruntramp

Distinguished
Jan 7, 2010
51
0
18,630
I'd appreciate it. I'm still in the frustrated "return it all and maybe buy a Cyberpower" stage (turns out it would have cost about the same :sweat: ) . The only thing holding me back at this point is the 15% restocking fee. And pride, of course.

My major reasoning for the mobo being in some state of disrepair is that the back USB ports wouldn't always work. For instance, when the system powered on, the connected keyboard/mouse would not come on with it. After Windows 7 booted, I had to dance the two across various USB ports until one finally worked. The problem inexplicably went away, though, so I forgot about it.

I still haven't tried updating the BIOS yet. I didn't see anything in the changelogs that would help, plus I'd rather not risk bricking it when I'm still considering a return.

There's always the chance I botched something during construction. I'd like to believe that would cause big huge honking in-your-face problems, but that's probably not the case...

But again, any help you could provide would be appreciated. I'd love to avoid the hassle, and potential disappointment, of an RMA (or watching cash go down the drain with a restock + Cyberpower build).
 

deepruntramp

Distinguished
Jan 7, 2010
51
0
18,630
Small update: after a bunch of digging (on Newegg, sigh) this RAM apparently doesn't play nice with (some/most) AMD boards at DDR3-1600/CL7. (Advertised specs are 7-8-7-24, DDR3-1600, 1.35V) Which could explain the reset loop at least.

Downside is that I've been running 7-8-7-24 in 1333 and still had all these problems. Apparently loosening the CL to 8 will allow 1600 -- I wonder if it'll increase stability in general?
 

bilbat

Splendid
Well - there's light at the end of the tunnel - and for once, it isn't the usual freight train coming at us! Take a peek at this, from Extreme, and this, from TechReaction...

I'm busily 'rooting around' in the current AMD processor errata - there are a lot of AMD's with memory 'incapacities', but yours (which is, I think, an RB-C2 - even figuring that out takes a while!) doesn't appear to have any that might affect this situation...

Have a few irons in the fire at this point - will, hopefully, be back in a little bit with more info
 

deepruntramp

Distinguished
Jan 7, 2010
51
0
18,630
Sounds great. Thanks for you help!

I've got a multimeter and a voltage tester coming in from an electrician friend today to also test out the PSU, which could certainly explain the problems.

Also, all of my components are still well within the return period at newegg, so if worse comes to worst and anything has to get sent back, I'll return the RAM as well and get something that plays a little more friendly with AMD!
 

deepruntramp

Distinguished
Jan 7, 2010
51
0
18,630
Eh, everything there is listed as intel-favored. Why not just get a different brand? Isn't Mushkin a modern heavyweight? Last time I built, Kingston was the cheap but OK stuff, Corsair was the reliable but boring stuff, and OCZ was the cutting-edge, overclocked, overpriced stuff.

I know that's an irrational statement as hardware is hardware is hardware, but still, these shenanigans about the memory just straight up not working at the manufacturer's settings -- and covering their rear with a "well it's designed for intel" line -- is kind of disheartening.

But, like you mentioned in your guide, that may as well be standard practice. At least G.SKILL has the decency to tell us all the settings, right?
 

bilbat

Splendid
I'm still trying to find out if the 'Intel' designation actually means anything in reality... As I note in the memory "candidate for sticky", the difference between XMP (Intel) and EPP (AMD - actually nVidia) is the physical location of the 'extra' SPD info that provides the setup data for the 'added' speed(s) above the JEDEC's top 1333. I also note that, as far as I can tell, the DIMM's SPD could contain both sets (EPP & XMP), as the areas designated do not overlap! Intel BIOS have a mechanism to load the XMP; I have not been able to confirm that there is any such mechanism in any AMDs since the last nVidia piece - the GeForce 7025/nForce 630a pair.

As I mentioned, the AMD's do have a lot of 'limitations' on what will work in the way of memory; some are documented (if you can 'decipher' them, in the document I showed you: at least items 264, 278, 293, 295, 355, 370, and 379, affect memory function is some of the 'families' - and 'deciphering' the families themselves can be daunting!); some are not.

I've had excellent luck with G.Skills since I started doing GB boards, but I don't 'do' AMD! My complaint there is not about the products, it's the 'paucity' of documentation. When I first started tackling this, my 'mentor' (LSDmeASAP, over at TweakTown) told me that GB boards love mushkin, and that it will nearly always start right up, without any manual 'tweaking' - but he 'doesn't do' AMDs, either! G.Skill was 'next on the list'. As I found mushkin to be pretty pricey, and I wanted to buy a bunch of extra sticks, to do my own 'speed-binning' (test several, pick out the four fastest...), I went with the G.Skill - have done every 'client' machine since with them, and they've never let me down yet! When it was mostly DDR2, answering questions here, I learned that there were differences; Kingstons seemed to not work more often than they would; Corsairs have, on a couple occasions, appeared to 'degrade' over time, requiring 're-tweaking' of a once-working setup; OCZ's seemed to be kind of 'voltage-happy', needing bigger 'bumps' of both Vddr and Vmch; however, since the advent of DDR3, these differences seemed to have 'smoothed out' - what I note now (and am writing the 'memory sticky' to combat), is the fact that what people pay for their RAM, and their ability to get it working seem to be inversely proportional! People are impressed with 'high numbers', and seldom know what those numbers actually mean! I maintain that, with a very few and specific exceptions in use, low latency will always beat high clocking in actual operation... If you hang around Extreme for any amount of time, you will discover a 'curiosity': significant numbers of people running very expensive, very fast RAM, at, say, half its rated speed, brutally over-volted, at ridiculously low latencies!
 

deepruntramp

Distinguished
Jan 7, 2010
51
0
18,630
An update:

--Yesterday--

*Updated BIOS, seemed to stop a BUGCHECK_USB bluescreen on shutdown.

Start memory testing...

STICK ONE: Stable at 8CL, Bank1 (no errors in memtest86+, ran Prime95 for 30min without issue, regular browsing without issue).

STICK TWO: Stable at 8CL, Bank1 (same story).

TWO STICKS, BANK 1-2: Stable at 8CL, memtest left untested, ran Prime95 for 2 hours without issue, regular browsing without issue). Full chkdsk, scanning/recovering bad sectors across the whole drive. No errors.

Reset to Win7 desktop and shut down for the night.

--Today--

Cold Start:

TWO STICKS, BANK 1-2, CL8: Infinite reset cycle. Remove wireless USB mouse, keyboard still plugged in. Infinite reset cycle. Pop CMOS battery. CL is now... 9?. Starts up, head into BIOS. Lockup. Reset, try again. Lockup. Reset, keyboard is non-responsive so it starts heading into Win7. Lockup before it finishes loading.

TWO STICKS, BANK 3-4, CL9?: Load into BIOS, select Optimal Defaults. Save & Quit. Load into BIOS, tweak the RAM down to CL8, disable the FDD + various sleep settings + Cool&Quiet (random throttling due to heat, causes miserable stability in games)-type stuff. Save & Quit. Load into Win7. *Almost* makes it to desktop before flickered bluescreen and reset. Loads into Win7, wants to know if I'd like to do the diagnostic. Sure, why not. Resets 25% of the way in. Shut down.

(yes, these dual-channel bank setups are the correct ones according to the manual).

ONE STICK, BANK 1, CL8. Plug wireless mouse back in. Loads Win7 fine. Typing this to you right now, in fact... though the mouse does have some lag moments that it doesn't have on my laptop. Another strike in the "bad mobo" column.

Current conclusions:

-There's a chance I'm screwing something up in regards to dual-channel settings. Right now it's all on Auto (unganged) save for timing stuff.

-Regardless of what hardware is bad here, (and a BIOS freeze is definitively some flavor of hardware), the RAM sucks and is getting returned for a refund one way or another. I'll be picking up a different brand that doesn't have some dubious intel bullshit and less than honest timings associated with it. Mushkin sounds Giga-friendly.

And maybe you're right, there might be a chance I could get this less than compatible stuff working with a day of tweaking. But honestly, I just really don't want to do that. Tweaking the square peg and the round hole so they fit seems like a fool's errand when you can just, you know, return the square peg and get a round one instead. If I had a system up and this was an issue with a system I didn't really care about, sure, why not play around with it just to learn something? But I'm down a desktop system right now.

-Still feeling that the mobo is, in some flavor, bad. I'm not sure if I should send it back or not, but considering I'm already packing a newegg care package, tossing it in and asking for a new one doesn't seem unreasonable. Maybe it hates just this RAM, or maybe the memory controller is less than happy with DDR. If Gigabyte is as high quality as you say it is, wouldn't an RMA generate just as friendly a product?

-I'm no electrician and have no clue how to run a multimeter, so I'm sort of thinking popping the PSU and testing it would have an equal chance of electrocuting me as it does actually finding out whether it's stable or not. Voltages in the BIOS seem ok, but I know those can be unreliable. That said, I *am* making a care package...

As for "doing" AMD or not, I'm no loyalist but one way or another I always end up with an AMD. The last four personal systems I've built have been AMD and the laptop I bought was also AMD. Again, it's not a theological issue, it's just what ends up being the most reasonable choice, most often because of the price :: performance ratio.

UPDATE:

Still running from the last setup (One Stick/Bank 1/CL8). Installed Furmark, it's been running for two and a half hours without issue. I think it's safe to rule out the video card.

So, from my personal diagnostics, we're down to:

-Memory Timings?
-Mobo
-Memory
-PSU

In order of highest to lowest potential failure, imo.
 

bilbat

Splendid
-There's a chance I'm screwing something up in regards to dual-channel settings. Right now it's all on Auto (unganged) save for timing stuff.
This whole issue is somewhat 'bathed in mystery'! AMD say this, in "UNLEASH THE DRAGON - AMD “Dragon” Platform Technology Performance Tuning Guide":
The Memory controller of AMD Phenom and AMD Phenom II CPUs can be set to run in Ganged mode or in Unganged mode. Ganged mode means that there is a single 128bit wide dual-channel DRAM Controller (DCT) enabled. Unganged mode enables two 64bit wide DRAM Controllers (DCT0 and DCT1). The recommended setting in most cases is the Unganged memory mode. Ganged mode may allow slightly higher Memory performance tuning and performs well in single-threaded benchmarks.

Depending on the motherboard and BIOS, it may be required manually setting the timing parameters for each DCT (in Unganged mode) when performance tuning the memory or fine tuning the timings. Some BIOS versions apply the same timings automatically for both DCTs in an Unganged mode.
Interestingly, they also tell you the exact opposite of what the GB memory installation recommendations say:
The DIMM slots furthest away from the CPU socket should be equipped first (usually marked as DIMM slot 2&3 or A2&B2).
Mightn't be a bad idea to read all of page 15: AMD's tuning guide...

I kind of am an Intel 'loyalist', but it's because of one single issue - documentation. I do industrials, for the main part, and I want to know how that southbridge actually works! AMD does often have the edge in price/performance, but when I went hunting for bridge docs for the last two generations, what I found were guidelines for how you were allowed to reproduce their logos!! I've posted this before:
I know a lot of arcane BS about Intel processors - cause they're the only ones who document everything! If you wanna know how many Lahore pigeons crap on the roof of the Santa Clara fab each year, not only can you find it in a PDF somewhere (but where - that's the skill!) on their web site, but there's probably a three year plan documented to change their feeding habits, so they crap a lighter color, causing the roof to reflect more sunlight, and cut down on the air-conditioning costs... Every time I try to find out something about an AMD BIOS for someone, I see this business about "update AGESA three point five point three point nine point more digits than pi", and I've been randomly trying for months just to find out what 'AGESA' is - bah - no luck! (I hate acronymns anyway - the only one that ever sticks in my head is back from the days when they finally got completely out of hand with 'PCMCIA' - people can't memorize computer industry acronymns!) And you don't wanna even get me started about nVidia! As far as I can figure, nVidia is actually a front company for the CIA/NSA - if you go looking there for documentation, they'll have you investigated to find out why the hell you're looking for their documents!

Can't hurt to RMA, and I know I've found G.Skill documented to be 'for' AMD platforms - I just can't remember how I found 'em! [:bilbat:6] Too old, too much BS going on; last night, saw an old Peter Gunn episode, and realized the bad guy had been Wilbur Post's next-door neighbor on "Mr. ED"!! No wonder half the time I go in the basement, I wind up standing at the bottom of the stairs wondering "now, why did I come down here?"

As I said, the field seems to have 'leveled out' considerably in the DDR3 arena - nobody documents worth a damn, and they all seem to perform about equally, as far as I've heard here...
 

deepruntramp

Distinguished
Jan 7, 2010
51
0
18,630
The problem turned out to be the memory timings. After digging around it finally dawned on me to go to G.SKILL's forums and see if they had anything to say about it.

I found this thread. The original post is quoted below.

Regarding my previous post from Nov 17th about Phenom II memory controller problems...page 89 of this AMD Support Document (written back in Feb 09!) http://support.amd.com/us/Processor_TechDocs/41322.pdf outlines the problem. Seems that those of you that have been able to run stably at speeds of 1333 or higher are the lucky ones as the Phenom II CPU (particularly stepping 2, revision RB-C2) memory controller is known by AMD to have a stability issue with more than 1 DIMM on a channel. So, if all else fails, drop you memory down to 1066Mhz and tighten the timings. I did just that with my (4) 2GB F3-10666CL8D modules. I simply down-clocked to 1066Mhz and tightened my timings from 8-8-8-21-2t to 7-8-8-19-1T. My system is now stable and performance is virtually identical at 1066Mhz compared to what it was at 1333Mhz with the looser timings.

So I left the timings where they were (8-8-8-24-1333Mhz) and -- here's the kicker -- stopped using dual-channel, putting one DIMM on each separate channel. Bam, stable.

I opted to stay with 1333Mhz in single-channel mode instead of attempting 1066 in dual (which seemed like my choices according to that thread).

The only annoyance left with the system at this point is the stock CPU fan's whine (but with the stock averaging at 3000RPM I don't know what I was expecting). That should be remedied tomorrow, assuming I don't flub a new heatsink install (fingers crossed, wish me luck)!

Anyway, you're a memory guru, so you may want to take a look at that thread and the AMD document linked in it. It might be worth throwing something in your guide about this too -- I mean, every AM3 board theoretically has the problem, but Gigabyte seems to be the most popular choice and I'm sure someone else will eventually run into the same problems I have...
 

bilbat

Splendid
Ahh - got all excited, thought "a new peice of (rare) AMD documentation!" D/L'd it, same one I quoted earlier - remember, "at least items 264, 278, 293, 295, 355, 370, and 379, affect memory function"? However, thanks much for the thought - I am always on the lookout for any tidbit of AMD info...