(There's actually countless forums out there with threads like this – including at least 4 threads in our own forums. I'm concentrating on the AMD forums because these guys, between them, have collected nearly 40 pages of possible causes involving everything from Windows, to mobos, to RAM.)
So far it's unclear as to what is causing the problem. Users report that grey, brown or colored stripes/screen appear while playing games, watching movies and in some cases, while idle. The problems seems to be confined to the HD 5xxx series, although there are a couple of mentions of 4xxx cards
Poster Jogob9 says:
There are 3 big categories of problems:
#1 is people experiencing 2d crashes: to these people, a good fix that seems to be working very well for most is to set your idle clocks higher (most suggest 400MHz for core and 900MHz for memory, but any value between that and 725/1000 should, in theory, be fine.
#2 is people experiencing 3d crashes: to these, it's a little bit more complicated. A lot of people had success by setting the voltage higher, or downclocking the core and memory for more stability. It certainly is a more complicated problem than the 2d crashes.
#3 is people (like me...) getting both. In this case, as far as I know, the only thing you can do is severely downclocking your core and memory in order to get more stability, but still... there are crashes (less often though...)
Several have complained to ATI and received word back from the company that the problem is a result of a Windows 7 update. Pasted below is the response one user received after filing a complaint about his HD 5870:
"Thank you for your feedback. We are aware of this issue, and it has to do with Windows 7 update. We are working on a solution for this problem.
In the mean time we recommend you do a clean install of the Graphic card driver by removing all ATI and or other Graphic card software from Windows Control Panel> Program and Features in safe mode. From feedback on our forums some people have successfully solved the issue by doing this.
In order to update this service request, please respond, leaving the service request reference intact.
AMD Global Customer Care"
However, posters don't seem convinced. One user (roadhead) said he first had to do a fresh install of Windows 7 x64 Ultimate to get his card's driver installer stop crashing, proving that a clean install can have the exact same issue. Further, the poster on the receiving end of the email above (Fl00D) said:
"This Win7 update thing seems rubbish to me... several people solved the issue by raising the voltage, then I think it has nothing to do with Win7 update..."
Windows Vista and XP users are also experiencing problems so writing it off as a Windows 7 issue is definitely inaccurate.
Users with AMD systems think it could be a memory addressing bug as changing the RAM from unganged to ganged mode seems to help. One poster says, "I think a lot of the problems people are having is to do with their system memory! Most people think we are crazy saying that!" While another says:
"Well that's peculiar... I have never had problems with my system memory and I have also tested it not only by memtest86+ (at least 4 passes) but by Prime95 Blend test which is a lot more demanding. My old Giga mobo could not handle the memory at 1600MHz and every time I ran Prime95 I got errors. Now with my new msi mobo even at 1600Mhz everything is absolutely prime95 stable. So no CPU rounding errors, no RAM errors. I suppose if setting the RAM to ganged mode REALLY solves the problems then there must be sg wrong with the drivers/Win updates etc etc. Maybe some sw bug which only occurs when memory is set to unganged. more demanding."
Yet more posters (including out tipster, jogob9) suggest it could be a mobo issue because although there are people using the same CPU and same card, not all of them are experiencing the problem.
Listed below are six possible causes that jogob9 gleaned from the problems he and everyone else is having:
#1 - Bad cards:
Some people who RMA their cards got new functional cards; which might mean that there is a huge amount of bad cards on the market. --- I think it is wrong, because as I have said before, a LOT of people did not have problems one day, and next it was hell.
#2 - Bad system alchemy:
It is very possible and has happened often in the past that simply, some parts are not meant to be together. And as you can guess, the graphics card is usually the girl: causing problems with the guy (motherboard) and his deficient brother: the PSU. --- For the same reason than with #1, I don't think it is the problem.#3 - Memory problem:
A few people, including myself; have noticed improvement in system's stability by changing the "Ganged" feature on the BIOS, or by removing memory chips. It certainly is possible that the problem is memory related, but I think I can safely say that we have tried everything that possibly can be done with a memory chip , and it did not work!#4 - OS problem:
Some think that the problem might be related to the OS used, which could make sense... But it has been tested by myself and others on: XP x86, XP x64, Vista x86, Vista x64, 7 x86 and 7 x64. Results: x86 versions seems a bit more stable in 3d, but crashes more often in 2d. XP is slightly better than Vista and 7, where I saw absolutely no difference. I tested each of these WITH windows updates done, and WITHOUT; without any change. --- In ALL case, the problem persists, so I doubt it is the OS.#5 - Voltage problem:
As I said before, some people got their system fixed for the moment by tweaking the voltage settings. It is possible that drivers included a voltage drop setting, perhaps to consume less energy, but it turned out that it made the system unstable. I believe it is a probable cause for the issue, but I still have doubts, because It did not change a thing for a lot of people, including myself.#6 - Sensor problem:
I have not seen anyone talking about it yet, but as you can read in the forums, I have noticed that sensors indicate that my card consumes INSANE amounts of power when loaded. Numbers that are so insane in fact that it becomes ABSOLUTELY certain that sensors are doing something wrong.
My first theory about it is that maybe sensors get insane values and try to "slow" the card, in order to get normal values. These major changes in voltage, core clock and memory clock can easily destabilize a card to a point where you get artifacting and even a lockup. This would also explain why we are experiencing so BAD performance. I have read everywhere, and I also have myself VERY BAD FPS in benchmark and games, FPS drops when playing, short freezes under windows, etc. When a graphics card downclocks itself (overclockers will know what I'm talking about) this is exactly what happens. --- What I like about this theory is that it FITS to the problem perfectly. It explains all kinds of bugs we can get! And even better, it is VERY easy to fix! If it is indeed the problem, ATI just has to check and repair sensors in the drivers, which takes very few time!
My second theory is that the card does consume a lot of energy. Of course, not 4000W, but enough to trigger the same safety protocols as in theory #1, leading to the same effect. This can be caused by a corrupt function in drivers; for example: shaders, or memory managing. The corrupt function doesn't work, so it loops and tries again, but doesn't work, and tries again, and etc... Consequently, the card would need much more energy, because the GPU would be busy trying to do this non-working thing, and there we go. --- As in theory #1, this fits very well, but it would also mean that it could take a LOT of time for guys at ATI to find what part of the drivers cause the problem...
Long story short, no one has a clue what's wrong; people are RMAing their cards and getting a second faulty card back, while others are getting replacement cards that work just fine. One poster on the X-Treme Systems forums says NewEgg had no problems giving him a new card and suggests it's because the retailer knows there is a problem with the cards.
"Just sent my second XFX 5870 back to newegg today and theyre giving me a full refund for it. i let them know that many people are having major issues with these cards and they clearly must know it too. not sure what ill do when i get my money back, either buy a gtx 295 or just keep going with my trusty 4870 lol"
We contacted AMD and received the following response on Tuesday:
"The answer I have received so far on this is that we are aware of forum posts relating to this issue. As with any issue of this kind, we are testing to determine under what conditions the issue manifests itself, at which point we will be able to determine how to fix the problem if it is related to the graphics card or driver."
We are still awaiting further details from AMD, which we will post on Tom's Hardware as soon as we get them.
If you've read this far, congratulations! It's a long news post but we feel it's something our readers need to be aware of.
Big thanks to both Kewlguy and Jo_gobeil!
[UPDATE] So after huge amounts of forum trawling, we're seeing the following model numbers crop up again and again: 4770, 4850, 4870, 5770, 5850 and 5870. The cards seem to be coming from all different companies; the following are the ones we've seen crop up more than a few times from different users: XFX, Sapphire, Diamond, ASUS and HIS.