Stump the Tech - AMD system video errors

rdtsc

Distinguished
Dec 16, 2010
16
0
18,510
I sure hope someone more skilled than myself (into computer since 1983, electronics engineer) can give this a read, because it's stumped me for years. Ok a few years ago I bought an ECS micro ATX, AMD X2 64 4000+ BH-G1 Brisbane, and SuperTalent 2x 1GB non-ECC DDR2-800. (Bear with me.) Also 2x SATA 80GB drives in a RAID0, CoolerMaster Real Power Pro 650W PFC supply, and a GeForce 8600GTS. Was on WinXP Pro, now Win7 Pro. Tried both 32 & 64-bits, same deal. Never overclocked any of it, and from the very first day, experienced similar problems. The issue is rather convoluted:

Windows will run 2D graphics without issue. And 3D graphics apps like games or whatnot run fine, as long as they are windowed. However, running *anything* full-screen (requiring a graphics mode change) will start the issue. And that is:
1. Random video "blackouts" - mouse cursor still visible and movable, audio playing, but nothing else seen. Happens maybe once every 6 hours on average but can go days without occurring. Can alt-tab out and back to a blacked-out screen. Sometimes partially-blacked out, and/or "cutting planes" exaggerated, and move with character. Exiting the game usually returns to desktop, upon which time navigating an explorer window (or doing just about anything really) may cause a "video display driver stopped responding" error, which Win7 can often recover from, but not always. Infrequently, "snow" also appears on the desktop, but usually clears itself up. Once any of these things manifest, system stability is significantly compromised, and a BSOD may occur doing almost anything, but never without provocation (User must click on something.)
2. Video "scrambling" and artifacts. More frequently, various parts of the video frame become scrambled, both textures and color. Vertexes may be distorted in 3D. It may do it for a second, clear up, then stop for hours, or stay scrambled for awhile, or do it a few times rapidly then crash. Once it does any of this, the chances for a "video driver stopped responding" or "video driver in infinite loop" BSOD dramatically increase, and may occur at any time, without provocation.
3. Video "stuttering" and much increased time changing video modes. Alt-tabbing from a (working) fullscreen game, or experiencing any of the previous issues, might cause further alt-tabs to take an exorbitant amount of time, up to 30 or 45 seconds or more, to complete. During this time, the display flickers on and off about six times as it struggles to change modes. A BSOD can occur in this process.

Now this all sounds definable as a power issue, old drivers, or RAM... until I talk of what has been replaced in this system. EVERYTHING but the CPU, RAID, and RAM have been replaced with different components, yet exhibit the same results. I've tried WHQL's to Guru3D drivers, and update frequently. Even have a new monitor (only one, nothing fancy there.) Temps are good, voltages are good. Even under-clocked the CPU and RAM to half their rated speeds, toyed with their voltages across the safe range, tried running on each RAM stick separately, and nothing has EVER helped one iota. I don't have any other DDR2 RAM to test with, but ran memory tests for DAYS without a fault. Went through two mainboards, tried both onboard videos and the 8600GTS, all did the same thing.

The 8600GTS recently died, because the old ECS board had bad caps (tried replacing but lead-free solder proved to be a bitch to work with.) New board is a Biostar TA780G mini-ATX with ATI onboard vid... and get this, the ATI video driver crashes in place of the nVidia driver! Baffling.

At this point, I'm thinking of buying 2 Corsair 2GB DDR2-800 sticks, a better CPU (Phenom II x4 920 AM2+), and a GeForce GTS 450. I want to know what the culprit is that has been plaguing me all these years, and squash it like a little bug!

What do you think this problem could be, defective CPU? I can't find any links between these components and the stated video errors. Then again, defining the "error" is hard in itself. But over the years, I've logged many hundreds of BSOD's... sometimes a handful a day. Very annoying. The only reason this has gone on, is because I'm not an avid gamer. When I do play, I like to heal, so imagine how frustrating it is when the party healer goes linkdead multiple times a night... same with tanking.

Thanks for looking, have a great day.
Regards,
Mark
 

suteck

Distinguished
What is the refresh rate for card set at and what is the monitor refresh rated at? 60MHz? What are the resolutions you are trying to run at? Have you tried reducing them? Do your specs handle - Minimum of a 350 Watt power supply.
(Minimum recommended power supply with +12 Volt current rating of 22 Amp Amps.)
Minimum 450 Watt for SLI mode system.
(Minimum recommended power supply with +12 Volt current rating of 24 Amp Amps.)
An available 6 pin PCI-E power connector (hard drive power dongle to PCI-E 6 pin adapter included with card)

You might just not have enough system ram to run those games. You said anything - does that mean if you opened excell in full window mode it would crash? Sorry to be so thick I'm just trying to wrap my head around it.

You said you tried different cards right? Were they all nvidia cards? You are running an AMD/ATI chipset on both those boards have you tried an ATI card? Did it do the same thing with the onboard video with both boards? (yes, saw this - with ATI onboard vid... and get this, the ATI video driver crashes in place of the nVidia driver!) but I'm asking about both boards.
 

Wolfshadw

Titan
Moderator
Given all the changes that you've made and you're still running into the same issue, I'd almost guess (and it is just a guess) that you may have a dirty power issue. There's nothing wrong with your PC, but the power that comes into it is fluctuating too severely. Is the system plugged into a wall socket or surge protector?

Off the top of my head, I don't know what you can use to test this, but I'm sure there are tools available.

-Wolf sends
 

jack_attack

Distinguished
Aug 26, 2009
751
0
19,060
Awesome! Sounds like you've got a good one on your hands. By default, the GPU would sound like a culprit, but you've ruled that out.


I have worked through one with similar issues that did end up being a bad core on the CPU causing graphical issues, so it's always a possibility. If your board has a BIOS option of disabling multicore, give that a shot. Some do, some don't. I know, it sounds silly, but that's how I discovered the failing core.

How about signal cord? DVI, VGA? I'm not sure how that would cause a black screen, but that's about the last thing you haven't ruled out!

Have you breadboarded the machine?


Ah, you beat me wolf! I was just coming back to add, it might be dirty power from the start...
 

rdtsc

Distinguished
Dec 16, 2010
16
0
18,510

Hello, thanks for replying. Old monitor was a Panasonic S70, still works. New is a 19" LCD. On either, did desktop at 1280x1024 and 3D at 1024x768 @ 60Hz max. Using the old 8600GTS before it died, I could get about 50FPS in WoW at full settings. Using the current onboard ATI HD3200 and lowest settings, I get about 30FPS. Nothing was ever overclocked.

Ive tried a generic 450W, 600W, and the stated 650W power supply, all seemed to have no influence. Never ran SLI or Crossfire. Used the 4-pin GPU power connector from the power supply when the 8600GTS was installed (didn't need to use an adapter, but tried one anyways, same result.) Also remembered to connect the four-pin power to the mainboard. :)


No, only when the graphics mode changes. Full-screening Excel is still using the desktop graphics mode; it has to be something that changes to a different video mode. I've tried many games, all of them will run well if windowed (running under desktop resolution), however will misbehave if ran full-screen. Tried games in 800x600 a few times too, but that didn't seem to change anything.


The first ECS mainboard was an nVidia 6100 chipset, and had really poor onboard video. I tried that briefly, but it was too slow and suffered much worse from the same video issues. So I installed an nVidia 8600 GTS. The ECS died (capacitors failed due to bad electrolyte issue) and that must have damaged the VRM. I tried to replace the capacitors (successfully repaired an Iwill board a few year prior.) But it wouldn't P.O.S.T.

So I swapped the ECS for a new Biostar TA780G board with onboard ATI HD3250, and was surprised the 8600GTS still worked, but that was only for a short while. This too, suffered from the same video limitations. Then the 8600 GTS croaked. (I believe it was weakened from the ECS failure.)

So removed that, and reverted to the Biostar's onboard vid temporarily. Even using that, the same problem still exists, now it just crashes in atixxxx.sys instead of the nvidia driver. So it doesn't seem to matter what configuration was used... it's like the issue has nothing to do with video at all.

Two other things I forgot to mention:
1. I've kept up on BIOS updates and whatnot, including the 8600GTS video BIOS. (Nothing ever seemed to make a difference.)
2. With the video "stuttering" that occurs when alt-tabbing, the audio loops repeatedly (echo) and the entire system is affected -- for instance, I can look on the back while it is happening and watch the NIC Tx/Rx light freeze multiple times.

This video stuttering and echoing likes to happen frequently (repeatable) with the free game "Vindictus." In the starting town there, is a marketplace. Accessing it brings up a window in-game that somehow uses Internet Explorer to render content from a webpage. (The marketplace is a webpage.) I know this because several script errors on that page have popped up the familiar-looking IE scripting error dialog. In any case, accessing the "marketplace" always causes about 10-15 seconds of stuttering, and again when closing it. Very odd.
 

rdtsc

Distinguished
Dec 16, 2010
16
0
18,510

Interesting! It's plugged into the wall. Hmmm come to think of it, there is both a refrigerator from 1972 and a chest freezer from before that both within 20 feet to this location, and I know their motors are getting old and cranky... perhaps that could be it?

Wow... It should be possible to detect such a spike at the PC when an "event" occurs. I'll see if I can do that.
 

suteck

Distinguished
This is an on-line game? "Vindictus"? What is your up and download speeds? The video might be stuttering in this instance because of lower than required bandwidth speed. Have you had a chance to check the power source yet as Wolfshadw suggested? One thing you can try are some software based stressed testing program(s). Download OCCT HERE and test the different components. I would test the CPU, GPU and PSU and see what happens. I saw that jack_attack suggested that it might be a cpu core, this will help to determine if that might be the culprit, since that is a likely cause in your case.
 

rdtsc

Distinguished
Dec 16, 2010
16
0
18,510
Yes suteck, vindictus.nexon.net

Running some OCCT tests now, thanks for the link, things are working normally so far. I've ran hours and hours of SiSoft Sandra tests in the past, with inconclusive results. But OCCT looks really robust, maybe it can find something Sandra didn't.

Currently ironing out a testing methodology for analyzing the power coming into the PC. Using a digital-storage oscilloscope, will probably start testing with a +/- 2.5% peak detection trigger, also considering methods for triggering on frequency deviation (60Hz reference --> PLL --> error amplifier?)

Ordered a Phenom II x4 940 Black CPU (max this AM2+ board supports), 2x2GB of Mushkin silverline DDR2-800 heat-spreadder RAM (good reviews, fastest this board supports), and an nVidia GTS 450 1GB (best bang-for-the buck, and IMO nVidia drivers are better than ATI.)

One way or another, I'm gonna get to the bottom of this! :)
 

rdtsc

Distinguished
Dec 16, 2010
16
0
18,510
Actually the OCCT graphs are showing an interesting anomaly already. On the Power Supply test, CPU1 is consistently running 10°C hotter than CPU2, at a peak of 49.5°C, compared to 39.25°C for CPU2. This is of course, one physical processor. No crashes have occurred yet, but I had no idea the first core was running that hot, if this is indeed the real temperature. CPU1 also seems to have a much wider temperature deviation. Interesting.


 

rdtsc

Distinguished
Dec 16, 2010
16
0
18,510

I aborted prematurely due to time constraints. Ran a 3hr fullscreen 1024x768 test after with similar results.

Keen eye, friend! Goof on my part, it is a Biostar as the graph shows. I've been looking at Gigabyte boards lately, so must have had that stuck in my head. Correcting the earlier posts now.

As for the dips to 50% loading on both cores, I've always thought this was a little strange, but just assumed it was particular to this specific make and model of processor.
 

suteck

Distinguished
Mine drops down to 30% where yours are only dropping to 50% Don't know if that's because I have 4 cores or what. My other machine does it also, but it's the same type - i7 965 but it's and engineering sample and they're both extreme editions.
 

rdtsc

Distinguished
Dec 16, 2010
16
0
18,510
Update, I can't seem to pinpoint incoming power as having any correlation. I can see some spikes and whatnot, but that's about it. Considering monitoring the board power instead... would be a time-consuming task, only two scope probes at a time.

I'd received my new CPU, RAM, and video card. This mainboard said "125W CPU capable" up to a Phenom II 945, so I bought a Phenom II 940 Black 125W die assuming it would work, and guess what... the board only supports one 125W processor, and this isn't it... So still running on the old CPU. Anyone want to buy a new HDZ940XCJ4DGI? /sigh

But I replaced the cheap ram with the 4GB of Mushkin 991557 RAM and installed the GTS 450. Things have been running better, played some games and whatnot with reduced problems and no BSOD's. I say "reduced" because there was still the same "stuttering" when opening the marketplace in Vindictus (even with new nVidia drivers and the GTS as opposed to the board's built-in ATI HD3200 I was using prior.) Alt-tabbing still behaves somewhat strangely, but seemed to be cooperating much better.

I've learned from this experience never to rush into judgement, which is why I've been silent for awhile, waiting to see what would eventually happen... well just a little while ago the screen went black several times, the GPU fan spinning up to max, and Win7 eventually responding with a "nVidia driver stopped responding, but system was able to recover message." Things were quite unstable at this point so I (soft) rebooted. Startup seemed fine, and I tried to start WoW, and was welcomed with a scrambled log-on screen. Polygons all over the place.

Ran a disk check, no errors on that (or any other) drive.

Gonna hard-boot and try this again...

Guess this really is the issue that will never be fixed... it's a curse!
 

suteck

Distinguished
Won't the company you bought it from take it back as an exchange? I see a whole list of compatible ones, they ought to let you trade for one. Given how you've changed out everything else and still have the same problem??? - jack_attack's reply about it being a bad cpu might have been the problem all the time. Other than that I'm like you. But that's the most likely cause at this point seeings as this is the last component.
 

jack_attack

Distinguished
Aug 26, 2009
751
0
19,060
Returning CPU's can sometimes be a wash. I know Microcenter won't let you, even if it's 100% unopened. They're funny like that. You could always buy a little Sempron and see if that's a fix, then go from there. They're under $50 everywhere.
 

rdtsc

Distinguished
Dec 16, 2010
16
0
18,510
Got the parts from Newegg, and they are picky about CPU's. Trying to negotiate a return RMA with them, but doubt they're gonna allow it. Prolly end up selling it on eBay or something.

P.S. Last night, a hard-reset (via reset pushbutton during P.O.S.T.) STILL would not fix all the polygon-misplacement issues... it took an actual power-off, wait 5 minutes, and power-on to "clear" whatever funk was going on. But now it seems better again. For now. Can still tell something is fishy though.

I bet you're right, it must the CPU. With the difference in core temps, I was wondering if it could be some kind of thermal migration, internal thermal stress, or some other physical defect or anomaly inside the die itself which manifests the issue.

Curious, anyone know, are "cores" stacked on top of each other, such that core0 is closer to the heatsink and thus warmer, while core1 is closer to the socket, and thus cooler? I assume they are all fabricated on one planar, single-layer chunk of silicon, but soon 8 and 12-core versions will be coming out. Seems to me, "stacking" cores would be great to save space, but a nightmare when it comes to thermal gradients.
 

jack_attack

Distinguished
Aug 26, 2009
751
0
19,060
Nope, they're next to each other. The temp differences you see are because core0 get's the most instructions. If there was a way to give a CPU truly 0 work, they'd all be the same temperature. My core0 usually is 2-4 degrees warmer than my others.

It's strange though that a reset fixes it, but almost certain that's your issue. Semiconductors do "break" over time, but it's varied and not very predictable.
 

rdtsc

Distinguished
Dec 16, 2010
16
0
18,510
This CPU has exhibited this behavior from the moment I got it. Was my first "ECS Mobo + Athlon X2 64 4000+ CPU + Supertalent 2x1GB RAM" combo purchase, and they installed the CPU and cooler (didn't give an option not to.) Think I even put in the comments that I wanted to do it, but it came installed, so perhaps the damage occurred then. Of course it was so obscure of an issue, I assumed it was drivers or something until way after the return period expired.

Another artifact manifested itself today, something new... tried to start WoW in it's normal 1024x768 fullscreen mode (was working previously), and now it's a 1024x768 screen in a letterboxed 1280x1024 desktop resolution. (For some reason, it is refusing to switch video modes into 1024x768, so just displays that size in the center of the screen.) Tried auto and manual-configuring the LCD monitor, assuming it "lost" the screen dimensions or something, but it insists the display is 1280x1024. Soft, hard, power-down no effect, repeatable. /sigh

Working on getting a new (correct) CPU and selling the (wrong) one. Will update more then. Thanks for following. :)
 

rdtsc

Distinguished
Dec 16, 2010
16
0
18,510
Update... Bought a Phenom II x4 945 (95W TDP, 3GHz, biggest CPU this board supports) and installed it... and am experiencing repeatable freezing issues. Both underclocking and over-volting the CPU/Chipset/HT/RAM seem to have zero effect on hard freezes...

Since the CPU jumped from a 65W to a 95W but the cooler remained the same, I had to bump up the fan settings to bring down the temps from 45-60°C to 30-50°C. (Only hit 60°C for a few seconds.) CPU is rated for 71°C. Using the latest BIOS.

Often it will freeze right after entering the Windows user password. If lucky enough to get to the desktop, it seems to work alright, but refuses to run any games. Once I was able to get Vindictus to run for about 5 minutes, by underclocking from 3GHz to 2GHz. Even then, the same "stuttering" effect was briefly seen in the game before it froze.

I'm thinking not enough power. Thought about load balancing; PSU has two 6-pin graphics power cords (for SLI), so I shut down and swapped which one was going to the GPU, reboot, no change. PSU purports "One +12V rail." Now wondering if there is any way I can run a second supply for the GPU, RAID, floppy, and DVD drive temporarily, just to try and see if this helps.

According to http://www.coolermaster.outervision.com/PSUEngine I need at least a 361W supply for my config, although this seems artificially low. Also, they don't seem to offer the iGreen 600W (700W max) PSU any longer, and finding their page on it was difficult: http://www.coolermaster.com/product.php?product_id=39

It gets good reviews, can't dig up much dirt on it:
http://www.viperlair.com/reviews/cases/coolermaster/psu/iG600/

But could it still be power if both underclocking and/or over-volting didn't help? (I mean, underclocking to 1GHz and it runs at 25°C, so is drawing very litlle power, and it still freezes?)

See, this system is cursed I tell ya!

3GHz, over-volted 0.1v:
2011011319h59cpu1.png


3GHz, regular voltage:
2011011400h05cpu1.png
 

suteck

Distinguished
Is your psu suppling (Minimum recommended power supply with +12 Volt current rating of 22 Amp Amps.) for the video card? I see the igreen power you're looking at only has 16A on the biggest rail. The review your link takes me to says - "The iGreen Power also has triple +12v rails (peaks of 16A, 14A and 8A respectively). Additional rails will aid in keeping a system stable as you can separate devices based on power consumption into the rails of your choosing. This is extremely important these days with water cooling, and multiple video card setups." So I'm wondering if your current psu is supplying enough juice to the video card. I know I mentioned this in my first post as an item of suspect and reading back through he posts I don't see if you checked that or not. Are you using the CoolerMaster Real Power Pro 650W? Cause I found this on that unit -

AC INPUT - 115/230V ~ 10/5A 60/50Hz

DC OUTPUT - - - - - - +3.3V - - -/ - - +5V - - - /- - +12V - - - - +12V- - - - +12V - - -/ - - -12V - - - - - - +5V

MAX LOAD (A) - - - - - -25 - - - -/ - - -25- - - - /- - - 19- - - - - - 19- - - - - - 19 - - - /- - - - 0.8 - - - - - - 3.5

MAX POWER - - - - - - -- - - -191W - - - - - - -/ - - - - - - - - - - 540W- - -- - - - - - - / - - - 9.6W - - / - -17.5W

- - - - - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - 650W

HERE'S A LINK incase it's gets screwed up when it posts. Just scroll down to almost the bottom of the page.

I'm not sure if it means anything but this also looks like it doesn't supply enough Amps to the video card. I know it's has pcie 6 pin card connections on it but they don't seem to be supplying enough power.

Something like THIS or THIS And THIS ONE this is a good one. Just to give you an idea.