AMD Issues "Stop Ship" Order for Opterons; TLB Errata Cripples K10

According to a 12/3/2007 TechReport article a "Stop Ship" order has been given to vendors that sell Barcelona Opteron systems.

http://www.techreport.com/discussions.x/13721

AMD's quad-core "Barcelona" Opterons have been notably difficult to find since their introduction two months ago, and The Tech Report has learned that a chip-level problem has impacted the supply of these chips to both server OEMs and distribution channel customers.

Chipmakers refer to chip-level problems as errata. Errata are fairly common in microprocessors, though they vary in nature and severity. This particular erratum first became widely known when AMD attributed the delay of the 2.4GHz version of its Phenom desktop processor to the problem. Not much is known about the specifics of the erratum, but it is related to the translation lookaside buffer (TLB) in the processor's L3 cache. The erratum can cause a system hang with certain software workloads. The issue occurs very rarely, and thus was not caught by AMD's usual qualification testing.


An industry source at a tier-two reseller told The Tech Report that the TLB erratum has led to a "stop ship" order on all Barcelona Opterons. When asked for comment, spokesman Phil Hughes said AMD is shipping Barcelona Opterons now, but only for "specific customer deals." Industry sources have suggested to TR that those deals are high-volume situations involving supercomputing clusters. Such customers may run workloads less likely to be affected by any workarounds for the erratum that reduce L3 cache performance, and those customers could potentially consume hundreds of thousands of CPUs. Our sources indicate, and the current availability picture would seem to confirm, that quad-core Opterons are not shipping to OEMs or the channel more generally.

News of this problem is notable because it confirms that the TLB erratum affects Barcelona server processors as well as Phenom desktop CPUs, and that the problem impacts AMD's quad-core processors at lower clock speeds. AMD's initial public statements about the erratum and the delay of the 2.4GHz Phenom seemed to imply that the issue was closely related to clock frequencies. The Opteron 2300 lineup spans clock speeds from 1.7GHz to 2.0GHz. Those CPUs' north bridge clocks, which determine the frequency of the L3 cache, range from 1.4GHz to 1.8GHz.

The erratum is present in all AMD quad-core processors up to the current B2 revision. AMD has said a revision B3 is in the works and expected in Q1. One source told TR that large quantities of B3 chips might not be available until the end of Q1.

The potential for instability with the TLB erratum can be corrected via BIOS-based workaround, but multiple sources have suggested the BIOS fix involves a substantial performance hit. AMD has publicly estimated the performance penalty for the BIOS fix could be around 10%, and one source pegged the penalty at 10-20%. AMD has acknowledged that the TLB erratum particularly affects virtualization, and industry sources say the performance hit from the fix may be most severe with virtualization, as well. Server administrators responsible for virtualized environments will probably want to wait for the B3-rev CPUs before upgrading.

TR has attempted to confirm the impact of the BIOS-based fix, but the BIOS for the SuperMicro H8DMU+ motherboard used in our review of the Barcelona Opterons has not been updated since mid-September and doesn't appear to include the TLB erratum workaround.

Linux users may have another option in the form of a patch for that operating system's kernel. Sources estimate this patch's performance hit at less than one percent, but it comes with several caveats. At present, the patch purportedly only applies to the 64-bit version Red Hat Enterprise Linux, Upgrade 4. Customers must sign a non-disclosure agreement in order to obtain the patch, and will be responsible for supporting it themselves. The patch doesn't currently appear to be available via Red Hat's regular support channels.

At present, Microsoft doesn't offer a Windows hotfix to address the problem, and our sources were doubtful about the prospects for such a patch. CPU makers have oftentimes addressed errata via updates to the processor's microcode, but such a fix for this problem also appears to be unlikely.

What a disaster.
 
Thanks for posting the link.



http://www.overclockers.com/tips01260/

To summarize and emphasize:

10. Barcelonas have not been available to the mainstream markets.

9. This is because AMD has stopped selling them to the mainstream markets, though they have sold some to some mystery buyers.

8. They stopped selling them because all the Barcelonas have the TLB bug.

7. No, the TLB bug doesn't just affect processors running above 2.3GHz, so previous statements saying so were not so.

6. That also means there must have been another reason why AMD pulled the 2.4GHz models, probably because they couldn't make them.

5. Back to TLB, AMD said they told the mobo makers prior to launch about the problem and that it was up to the mobo makers to fix the problem.

4. Tech Reports says that they couldn't find anyone who had, so this might not be so, either.

3. It appears that initial testing, was done with B3 stepping 2.6GHz chips which did not have the errata.

2. Those 2.6GHz chips also ran using a HT speed higher than real 2.2/2.3 Phenom, which inflated the benchmark scores.


And, last but not least.

1. When asked why they had not updated their technical documentation to include an erratum which has caused AMD to pull product from the marketplace and stop sales of products already in the marketplace, the AMD response was "the guy who does that is on vacation, he'll do it in a few weeks."

Oh.

To paraphrase a famous statement from a certain U.S. scandal called Watergate, "What did AMD know, and when did they know it?"

At the moment, this is unclear. The TR article says that AMD knew about the TLB erratum before the Phenom product launch, but not exactly when. We also don't know when AMD restricted/ended mainstream Barcelona sales, or even if they initiated them.

We do not know if, or how many Barcelonas were sold before AMD became aware of this problem. Much more critically, we do not know how many Barcelonas were sold after AMD became aware of this problem, but before they stopped selling, at least to the unaware.

It's bad enough if Barcelona sales have been halted, but it will be much worse if they hadn't been after this problem emerged.
 
Good lord a Pentium div bug sized problem from AMD. Not sure if AMD can afford this size of a slip up in their current state of market. This could push them to below $5 a share. Some wealth company will swoop in and buy them out no question once this occurs.
 

sailer

Splendid
Not sure what to think. Oh yeah, its easy to think that AMD really screwed up, probably really knew about this problem since last spring, which was why Barcy and Phenom weren't released last spring as originally schedualed and has set itself up for a big fall. Anyone remember the line from Humpty-Dumpty? "All the king's horses and all the King's men, couldn't put Humpty together again". Well, just put AMD in place of Humpty and that may be the case.

As some may know, I own a bit of AMD stock. Not because I think its a great company, but because I've made a lot trading it for the past several months. If AMD goes into a tailspin, like elbert says, someone, hopefully IBM, might step in and pick up the pieces. Then the stock value may go back up. Otherwise, well, gambling is what Nevada is known for, and some gambles involve losses..
 
As some may know, I own a bit of AMD stock. Not because I think its a great company, but because I've made a lot trading it for the past several months. If AMD goes into a tailspin, like elbert says, someone, hopefully IBM, might step in and pick up the pieces. Then the stock value may go back up. Otherwise, well, gambling is what Nevada is known for, and some gambles involve losses..

Easy solution: Short the stock.
 

sailer

Splendid


I sold my holdings rather than to go down with the ship. Shorting the stock seems difficult at this point. It seems so many people are doing it that there isn't enough available stock to sell short.

I did get to wondering about something else. Since it appears AMD knew the Phenom chips were defective before they shipped, as well as possibly Barcelona, might AMD have opened itself up to a class action lawsuit? Don't know the answer to that. My opinion at the moment is that it would have been better if AMD threw iself down on a sword rather than to knowingly ship defective chips. But that's just my opinion.
 

AMD needs to save face here fast and give a statements saying they will replace b2 barcy's and phenom's with b3's when they launch in January.
 

enewmen

Distinguished
Mar 6, 2005
2,249
5
19,815
The X2s used to be great, the Radeons 9k, Xk, and 1Xks where great.
What happend?
Problems will be fixed and I still hope AMD will get back on track.
Cars get recalled all the time, how is this different?
 


Are you serious?

A. Cars can be fixed, processors cannot
B. Imagine if after your car gets "patched" due to a defect that it accelerates 15% slower, gets 15% worse gas mileage, and has a cylinder disabled?
 
The HD2x series is not a bad card it just seems that the 512bit(on the 2900 series) bus is not properly utilized by the drives. I remember when the R300(9700Pro) came out with the 128bit bus which was 2x what NVidia had and it punded it like no tomorrow. Love that card.

I read this elsewhere and this is not good considering the Barcys are clocked relatively lower than Phenoms and the error is coming at those clock speeds on better silicon compared to the desktop chips.

Also I was looking at AMDs stock today at work and they are down almost 4% again today. At this rate it would look as if there is something wrong going on.

It almost seems as if Ruiz wants to sell AMD to make a little profit for himself.

The difference between a car getting recalled and a CPU is that the car is usually owned and just taken to a dealer to be fixed by the dealer. This is AMD stopping entire shipments which would mean at some time there would be no profit due to it. And AMD needs as many sells as possible right now. This will also stop their laptop CPUs from coming out soon.

I prefer Intel but am kinda worried that AMD is screwing up so bad.
 

sailer

Splendid


When a car gets recalled, its because a part that was previously thought to be good failed. What AMD apparently did was to ship parts that it knew to have problems and mispreresent them as having no problems, saying that the problems only occured in higher speed chips which hadn't been released, though they were occuring in all the chips, even the slowest. That may open up AMD for prosecution for fraud. Not sure there, as I'm not a lawyer.

One thing I think I can predict is that a lot of companies who either bought or were thinking of buying Barcelona or Phenom chips will decide to go with Intel instead. The same with many in the enthusiast market. I myself had been borderline with the idea of buying a 790FX mobo and putting a AM2 chip in it at first and then putting a Phenom in later when the bugs got worked out. I think my reaction to all this can be safely predicted.
 

xrider

Distinguished
Jul 9, 2007
21
0
18,510
Sorry but Amd reminds me of this company
3dfx Interactive was a company that specialized in the manufacturing of cutting-edge 3D graphics processing units and, later, graphics cards. After dominating the field for several years in the late 1990s, by the end of 2000 it underwent one of the most high-profile demises in the history of the PC industry. It was headquartered in San Jose, California until, on the verge of bankruptcy, many of its intellectual assets (and many employees) were acquired by its rival, NVIDIA Corporation. 3dfx Interactive filed for bankruptcy on October 15, 2002.[1]3dfx pursued lengthy, ambitious development cycles, and NVIDIA and ATI cards eventually ended up with better overall performance,

yes they may have a better design than Intel. (native Quad) but who cares it doen't work and it takes forever to release it to the market.It might be better by design but what does that mean?
just my 2cents
 

ryman554

Distinguished
Jul 17, 2006
154
0
18,680


Fair enough -- he might be seen to be pro-intel these days.

But, do me a favor. Go ahead and read what he wrote just this once.. where do you think he is spinning the truth? What alternative conclusions would you draw?
 

zenmaster

Splendid
Feb 21, 2006
3,867
0
22,790


But What do you think would happen if Toyota all of a Sudden was unable to Ship Any Camrys in 2007?
What would happen if Camry's 2007 V6 with 240Hp all of a sudden was only 170hp?

 

cruiseoveride

Distinguished
Sep 16, 2006
847
0
18,980
I'm a systems design engineer, and i thought i'd explain to you guys what a TLB is.

In a computer program, each line of code has an address for the cpu to execute. This address is called a logical address. eg...

ProgramA.exe
Address - Code
1 - Print Hello on screen
2 - read file from hard drive
3 - print file on screen
4 - quit

Before the cpu can execute a line of code, it has to be moved into memory.

Memory too, has addresses. Now in order to map the logical address of a program into a real physical address in memory. You require a MMU (a memory management unit, hardware) that converts the logical address of a program to a physical address by adding a re-allocation offset.

So in our example, address 3 might actually be address 40003, where 40000 is the offset for our program.

Before a cpu can execute a program it needs to find the physical address of the instruction. So the cpu looks up the logical addresses in a page table, which then in return gives a physical address to the cpu, so that it can go and execute the particular instruction.

Now this requires 2 accesses to memory,
1st to get the physical address of the instruction
2nd to go and get the instruction to execute

This is where a TLB comes into play.

A TLB is basically a special fast-lookup hardware cache, usually called "associative memory" or "translation look­aside buffers (TLBs)" A TLB is a cache for the page table. So if a particular instruction is being executed more than once very quickly (example load the image of a bullet in crysis) the cpu doesnt need to go to the page table, it can get the address (frame) from the TLB.

TLBs can also do other marvelous things, but if you really want to know, i suggest you take up a bachelors in comp eng, or systems eng

In conclusio a TLB acts like a local cache for the actual page table residing in main memory.
 

cruiseoveride

Distinguished
Sep 16, 2006
847
0
18,980
TLB is critical to high performance systems. Even a slight mistake in policies can cripple an operating system during heavy load.

So basically, if you design TLBs for AMD, you have a house in the florida keys, your own jet and your dog drives a lambogini
 

BaronMatrix

Splendid
Dec 14, 2005
6,655
0
25,790


"Price war" decimates profits. Next gen architecture suffers.
 

ryman554

Distinguished
Jul 17, 2006
154
0
18,680


Do you think that intel's price war, then, is the direct cause of AMDs execution problems?
 

Harrisson

Distinguished
Jan 3, 2007
506
0
18,990

IMO its the sum of all factors. Miscalculation in execution, ATI purchase bad timing, price war along purchase drained resources which could be used in faster developing of better next gen cpu/gpu/chpsets, you name it. Engineers spread thin over loads of projects doesnt help either, and AMD doesnt have money to hire the best minds in industry to help with it.