Very vexing problem... Tbird 1.4 ghz / Abit KG7

G

Guest

Guest
First off, my specs:

AMD Thunderbird 1.4 ghz AYHJA core CPU
Abit KG7 (no raid) running 4J (latest bios dated 8/10/01)
512 MB PC2100 Crucial DDR
IBM Deskstar 60 GXP
Elsa Gladiac Geforce 2 GTS 32 MB
SB Live 5.1 Value
3Com 3C905-BTX NIC
300W power supply (Tornado 1000 model case)

Nothing tricky enabled in the BIOS.

Two OSes on seperate drives:

Win98SE
WinXP RTM (2600)

The problem:

I'm a pretty good troubleshooter, and this has got me stumped. I've had this machine up and running for about 1 to 2 weeks with no trouble at all. Then last night, I was doing some tweaks to my WinXP setup and decided that they weren't working, so I went and returned to a restore point. They weren't anything special , just some MTU / RWIN tweaks for my network card, I can't bark up that tree. I rebooted, my system was restored fine. Except, I have a ! on my SB Live. Hmm, that's odd, I thought -- but I knew that there were some issues between WinXP and the SB Live 5.1 so I figured it was just an intermittent thing. I rebooted, and everything was cool. At some point later that night, I rebooted again and I got a blue screen Windows STOP error on boot:

"Windows has detected that your motherboard is not fully ACPI compliant. Please visit www.hardware-update.com for updates blah blah blah.... If this is the first time you've seen this error, reboot your machine and try again. If you see it again, then (some stuff, basically like -- you're screwed)."

Huh? I'd been running this thing fine for a couple weeks with no trouble. So I figure that something's up with the soundcard so I take it out and try again. Same error. VERY weird. I shut down, pull my XP drive, and switch over to my Win98 drive.

I get THIS error when booting Win98:

"While loading device MTRR: Windows protection error. You need to restart your computer."

Now I'm really hot. WTF is wrong with this thing. I restart, try to reboot in Win 98 safe mode. It makes it through. I'm like, hmm, maybe it's something it's loading that it doesn't like? I pull the SB Drivers in safe mode and reboot and try again. This time I don't make it in. Other subsequent times it would look like it was loading into safe mode and then subsequently just reboot in the middle of Windows loading. All my attempts to make a bootlogged version failed too because somehow the thing just isnt getting written before the machine errors / reboots itself. I'm frustrated by this point. Could this be heat related? Not enough voltage to the CPU maybe? I check the heat -- only 45 degrees celsius. That's not bad. I have a Thermaltake Volcano cooler on it. So it doesn't seem to be heat. I up the voltage a tad. Reboot. Same deal. I up the IO Voltage a tad. Same thing again. I reset the voltages and load the BIOS "Fail Safe Defaults" -- setting the CPU to 1050 (10.5 x 100) , AGP 2x, no DMA, blah blah blah.

Success! The machine reboots fine into Win98. I begin to wonder if it's a fluke. I reboot 10 more times to make sure. It makes it in fine each time. Now I'm beginning to wonder what part of my system is flaky enough that I can boot fine at 1050 but not at 1400. My first thought is the memory is screwy -- but then I check the BIOS and see that the memory timings and stuff are all the same in Fail Safe as they were in my regular BIOS setup -- the only thing that's different is that the bus is running at 100 mhz instead of 133. Curious, I go in and change all of my items in the BIOS back to normal EXCEPT for the CPU speed . I set AGP back to 4x, reenable DMA modes, IDE prefetch, etc -- basically turning all the good stuff back on. I reboot. Success again! Everything is working fine. Now I'm puzzled.

I go back into the BIOS and start playing with CPU settings. Because I have an AYJHA, I can change the multiplier on the CPU to test. I change to lower mhz ratings, all with 133 as the FSB. I had no luck with any of them -- every single one gave me the same MTRR error on bootup. Then I start toying with multipliers and 100 mhz busses -- and everything works fine. I am able to boot into 13x100 okay a couple of times (it crashed once later in the night though).

By this point I'm beginning to wonder if heat is indeed a factor so I shut the CPU off and go to bed. I wake up this morning and immediately switch back on to my XP drive and give it a shot at 1400. No dice, same ACPI blue screen error. Doesn't seem to be heat related because the CPU's at 26 degrees and I'm still getting the same problem.

I reload the fail safes and boot fine into WinXP at 1050 mhz, which is where I am writing this now.

To put it mildly, I don't know where to point the finger. I don't have any spare parts to test with unfortunately so I'm unable to isolate the point of failure.

What do you think?

Is it the ... CPU? Strange that this thing worked fine for a couple weeks and all of a sudden any attempt to run the thing at its rated speed makes it cry?

Is it the ... KG7 motherboard? I think this might be -- it's strange that I can run my computer fine at multiples of 100 mhz but not 133... indicating that there might be some kind of problem on the board that prevents correct operation at 133.

Is it the ... memory? I don't know ... PC2100 of Crucial seems pretty good to me. Rare that they'd have a bum stick, but I suppose its possible... it's funny because errors like these - weird boot errors at rated bus speeds USUALLY the first thing I would point to would be the memory but I've disabled Quick POST and had it test itself like 3-4 times and no problems there. Also ran a Sandra memory benchmark and it didn't lock up. I suppose it's possible, but I tend to doubt its the memory with quality like Crucial.

Is it the ... power supply? I've had this 300 watter for a while and it's always done me right, so again, I doubt it. I don't have an unusual amount of stuff in my box to power, so...

Someone mentioned something in another thread here about possible incompatibilites between the 1.4 and KG7? Is this true?

As you can see, I'm at my wits end. Any help would be MOST appreciated.
 
G

Guest

Guest
Update:

I checked my WinXP system event log, and I have this all over it, even from when my machine booted fine:

AMLI: ACPI BIOS is attempting to read from an illegal IO port address (0xcfc), which lies in the 0xcf8 - 0xcff protected address range. This could lead to system instability. Please contact your system vendor for technical assistance.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

I don't know how this would explain the inability to boot in 133, but ... it's something else to think about.

Thanks!
 

phsstpok

Splendid
Dec 31, 2007
5,600
1
25,780
You might try <A HREF="http://www.viahardware.com/faq/kg7kr7/kg7kr7faq.htm" target="_new">Paul's Unofficial Abit KG7-RAID and KR7-RAID FAQ</A>.

Would you like a Quarter Pounder?
No, thank you. Just give me the BIG heatsink. It's an Athlon.
 

MeTaLrOcKeR

Distinguished
May 2, 2001
1,515
0
19,780
I had the same experience as you have.....but the difference was I was using a Duron 750MHz CPU and a A7V133...and it only happened when I was trying to overclock it.......

I might be way off, but i know its possible that you could indeed have the 1.4GHz part but the one that isnt rated at 133MHz FSB.....even though it SHOULD work fine @ that......i recommend takign off your HSF and checking the core of your CPU and see what the specs. are written on it...yes, i nkow that still doesnt explain why it ran fine for 2 weeks.....but in a way it does...it just...gave up running out of spec. (assuming it is a Type 'B' 1.4 T-Bird)
other possibility even though you seem to disagree is your PSU.....I know you've had it for awhile...but is ti an AMD approved Unit ?? keep in mind its not the wattage that really matters for Athlons.....its the Amprage on the Voltage rails (+5 and +12 if if im not mistaken) besides, get a new 431 Enermax anyways....decently cheap...and defenently good for future upgrades....a 1.4 T-Bird sucks i think 74 watts of power.....not 100% sure about that one, but its somewhere around there.......so it could indeed be that......

BTW........in your bios try disablign all your ACPI features...im pretty sure its possible...see if it makes a difference...... =)

-MeTaL RoCkEr
My <font color=red> Z28 </font color=red> can take your <font color=blue> P4 </font color=blue> off the line!
 
G

Guest

Guest
Yeah I've never even tried to overclock this thing. I'm going to take off the HSF and check the part... I ordered it from newegg.com so hopefully if it is indeed the wrong CPU then I can return it.

I'll list the ID on here when i do it, but damn the HSF is hard to take off once its on...

I'll think about getting a new Enermax regardless.

Actually it's not possible to disable ACPI in the bios on this particular rev. I can use modbin to edit the feature and make it visible, but with an already unstable machine I'm not sure if I want to go that route. I might just flash to an earlier rev and see if that clears up the problem.

I'm beginning to believe its a motherboard issue over a CPU issue but I really can't be sure.
 

Kelledin

Distinguished
Mar 1, 2001
2,183
0
19,780
It's also possible that the memory is just not stable at 133MHz. Just out of curiosity, what brand/type of memory do you have? How does it behave at CAS2.5 v. CAS2?

Kelledin

"/join #hackerz. See the Web. DoS interesting people."

P.S. D'oh...just noticed the memory brand...sorry 'bout that. What about CAS latency, though?
 

Schmide

Distinguished
Aug 2, 2001
1,442
0
19,280
This may sound nitpicky but did you check your CPU temps under load? Idle temps mean nothing. A HSF has to dissipate and manage enough heat to keep the CPU in temp under load. I’ve put together machines that are fine for a while, then a bit of the thermal paste drips away and it can’t handle the thermal load. You should have a very very small amount between the CPU and HS. What type of goo are you using? Artic Sliver, Zinc Oxide, Thermal Tape is useless for anything over 1ghz. You won’t believe how quickly you can pull a plug when you start up Sandra CPU benchmark and you watch the temps climb past 60c.

PS clean your surfaces with alcohol and do not get any finger grease on the components. Forehead grease is good for beer foam. Use an inert object like plastic or cardboard to spread the goo. Wash you hands, as Artic Sliver is not good for human consumption. You can use the Zinc stuff on your nose but that would be a waste of money.

Schmide
 
G

Guest

Guest
I think I've found the culprit, and of course it figures its the item I least suspect.

It was a stick of my Crucial PC2100 DDR causing the problem. I pulled every component from my reg except for 1 stick and it errored, again. I switched that stick for the other one and presto. Everything worked!

I did it again just to be sure, and sure enough, it was easily reproduceable. I'm now running down 256 mb, but happy at least. This little bastard's going back to Crucial tomorrow!

As far as the ACPI error in the Windows XP event log, I still got it even after pulling the bad DIMM. Not wanting to tempt fate, I just reinstalled XP using a standard HAL and everything's cool now.

Thanks for all the help everyone, you guys were great.
 

FUGGER

Distinguished
Dec 31, 2007
2,490
0
19,780
After how much time wasted did it take to find the problem?

All I can say is wow!! high QA on that motherboard.
As far as the ACPI error, just ignore it and hopefully it goes away.
 

FatBurger

Illustrious
Right, and bad RAM is always because of AMD. Stop being an [-peep-], FUGGER. If you actually know what you're talking about someday, give me a call.

<font color=green>In memory of those who died, simply because they lived in America.
Rest in peace.</font color=green>