First Outage After Three Days
The stress test of an AMD and Intel system runs since Friday, Dec. 17, 11 am EST. Not everything has been great so far.
THG’s stress test - We "Stress Out" AMD and Intel - has been running for over three weeks. Especially reactions from our loyal readers were overwhelming. Several hundred emails reached our staff within a short period of time and basically expressed the same concern : Why doesn’t THG use a PC case for the stress test ? Additional questions on message boards went into the same direction. Our answer : Yes, we will integrate both systems into high quality casings, as it is usual in everyday life.
As a result, both systems will face a tougher environment, since temperatures will be higher than in an open operating scenario. To keep the whole display viually appealing, our readers still will be able to monitor all components.
A brief note about the Intel’s System : So far we were not able to display the supply voltage (5V), since the currently used software is not compatible with the motherboard. We are working to resolve this issue.
Here is the update on the stress test :
Monday, Jan. 10 2005 : We are done today ! Both test systems survived the start into the the new year without problems. After system crashes in the beginning, both systems achieved a flawless running time of 11 days and 22:53 hours for AMD and 17 days and 14:28 hours for Intel. However, we recorded three reboots for Intel and two for AMD.
We can draw the following conclusions from the stress test : Users of socket 775 systems (Intel) should especially focus on a correct installation of the CPU cooler to effectively absorb the enormous power dissipation. We were positively surprised by the AMD system. In contrast to bad experiences in the past, these platforms now can be considered as stable. However, the popular operating system Windows XP contributes considerably to the stability of the system. According to our assessment, such running times would not have been possible with an aged and less reliable software such as Windows 98.
The valuable feedback and the considerable participation of our readers encourage us to set up further stress test in the Munich THG lab - also for other components.
Finally, two notes for our attentive readers : The current software Active Webcam from PYSoft was not reliable at all times. Until today the power supply manufacturer Tagan has not been able to tell us, why the power supply TG480-U22, used for the Intel system, failed.
Wednesday, Dec. 29, 2004 : Putting the two test systems into identical cases went without any problems. After a short break, both systems were running under full load again. Our expectations of higher temperatures and an increased failure probability didn’t turn out to be the case. Nothing happened for six days instead. The average temperature of the Athlon64 sank from 60°C to 57°C thanks to putting the system into a case. Though the CPU fan speed stayed stable at 4000 rpm and the system temperature of the AMD system fell from 38°C to 36°C.
The CPU temperature of the Intel system didn’t change from 62°C, but the speed of the CPU fan went up from 3100 rpm (no case) to 5000 rpm (with case). The system temperature rose from 38°C to 45°C.
Despite these facts, the AMD system rebooted today. Quickly checking the log files didn’t reveal any helpful clues. We will continue to investigate.
Thursday, Dec. 23, 2004 : Both systems are running without outages. In the evening we put both systems into transparent casings. The goal is to create realistic operating environments as experienced by every user. In any case, the result are higher temperatures due to the enclosed system and closer proximity of the components.
Both systems were put into a transparent case.
Wednesday, Dec. 22, 2004 : After we had replaced the failed Tagan power supply through an Antec device, we noticed yet another problem. The temperature of the Intel system increased from 65 degrees Celsius to a concerning 75 degrees celsius (analysis of the temperature diode of the CPU). At the same time, the fan speed slowed from 4000 to 3500 rpm. What happened ? During the exchange of the power supply, the cooler’s position on the CPU moved by a shade and reduced its contact pressure. As a result, thermal resistance increased, while cooling performance was decreased. Since there is an additional temperature sensor on the CPU’s cooler that notices a decrease in temperature, the motherboard automatically reduced the fan speed. To put it simple, The sensor of the cooler reacted to reduced heat flow.
This block diagram taken from Intel’s "Thermal Design Guide" explains the control loop of a CPU cooler.
These statements only concern motherboards with Socket 775 and 4 pin connector. In comparison, this problem does not occur with boards with Socket 478, since the cooler ist not directly connected to the board (because of the rentension module). Therefore it is less likely that actions as described above will show any impact. Additionally, there is no active controlling of the fan. Similar experiences were reported to us from our readers several weeks ago.
We have little to say about the AMD system. All components have been running now for five days without problems.
Tuesday, Dec 21, 2004 : The day started with no issues. Both systems were running perfectly and did not report and errors. However, it did not last very long. At 11:31 am, the Intel system failed again and turned itself off.
After comprehensive analysis of all components we identified the source of the problem. The Tagan-built power supply TG480-U22 delivered a standby voltage of 5V, however could not be turned on again. The test engineers suspected the Tagan device to be the problem already the day before, since the system demonstrated similar behavior then. The power supply test station finally cleared any doubts and confirmed this device to be the problem’s origin. The failed component from Tagan will stay in THG’s test lab in Munich until the end of the test and can be reviewed by the manufacturer.
Tagan’s weaknesses are not new to us. A total of five power supplies suddenly failed in the labs during the past year. A newly designed and rubberized casing is unlikely to improve the reliability of the devices.
We substituted the TG480-U22 with an Antec NEO480. The system has been running again since 12:12 pm EST.
Responsible for the outage of the Intel system : The power supply TG480-U22 from Tagan with revised case design. The power supply is priced in stores at about $100.
Monday, Dec. 20, 2004 : The day begins with outside temperatures of about -10°C (13F). Several problems occurred. First, the software we use to transfer pictures from the cameras showed weaknesses and quit its job - luckily just for a short period of time, since a member of our staff was able to quickly locate the error. A big surprise was the Intel system, which failed at 11:15 am EST. The platform simply turned itself off. All details are displayed within the published charts. Currently, the Intel platform is running again
Sunday, Dec. 19, 2004 : Heavy snowfall in Munich, which was nicely documented by the outside cam. Both system are running without complaints.
Saturday, Dec. 18, 2004 : An unexpected error occurs in the home-made tool to display the system running time. The display did not show the accurate time for some time. The problem was resolved and did not affect the stress test.
Friday, Dec. 17, 2004 : Launch of both platforms, AMD and Intel, running at continuous full load. All data are displayed in charts. No problems so far.
Full Load All The Way : AMD Vs. Intel
Around this same time last holiday season, we started a project that ended up setting a world record for processor speeds. Originally, we had planned to content ourselves with the ambitious enough 5 GHz Project : CPU Cooling With Liquid Nitrogen . In the end, we achieved a maximum speed of 5.25 GHz at -190°C. Chipmakers aren’t likely to start offering production models capable of such speeds until 2006 at the earliest !
So what about this year ? No new speed record ? After discussing the issue ad nauseam, we finally decided to shelve record-breaking in favor of revisiting the systems currently in use in real life - where stability and product quality are the primary factors in making a buying decision. This is especially true of the business environment. Here the critical decisive factor for any investment is the TCO (total cost of ownership) over a period of time . So we thought the obvious thing to do would be to subject the latest PC system components to a demanding stress test under real-life conditions.
Unlike most Tom’s articles about hardware testing - where we perform a test and then report the results - this test is just starting, and you get to participate ! We will be running the test over the next several days, and streaming data and pictures live to the web so you can watch the action and see the results in real time. Let’s take a look at the hardware we’ll be using, and how the test will be conducted.
Not all that easy to get your hands on : high-end graphics cards with ATI and Nvidia chips