Sign in with
Sign up | Sign in
Your question

Supermicro X8DAi can't install 2nd cpu

Last response: in CPUs
Share
July 13, 2011 7:07:29 AM

My backordered 2nd cpu came and when I try to install it my 7046a-t with mb x8DAi it shows no power led and I get a flashing red led indicating chassis fan failure. The monitor never lights up. I have been running whs 2011 fine with one cpu for almost a week. The cpu's are Intel Xeon E5506 Quad Core 2.13GHZ LGA1366 4MB 4.8GT/SEC Nehalem Retail Processor. The machine runs fine with one cpu and I've even switched the cpu's and heatsinks around and each one will work but not when I have both installed. I can't figure out what the problem is, anyone know?

Edit to add: I updated my bios with a beta from Supermicro and it didn't help. I've switched cpu's and both work fine independently whether in cpu 1 or cpu 2 spot so it's not bent pins. I'm at a loss here and hoping to hear back from supermicro soon.

Edit: Supermicro has issued me an RMA. It's going to cost a bundle to ship this server back!
October 7, 2011 5:26:45 AM

KansasA said:
My backordered 2nd cpu came and when I try to install it my 7046a-t with mb x8DAi it shows no power led and I get a flashing red led indicating chassis fan failure. The monitor never lights up. I have been running whs 2011 fine with one cpu for almost a week. The cpu's are Intel Xeon E5506 Quad Core 2.13GHZ LGA1366 4MB 4.8GT/SEC Nehalem Retail Processor. The machine runs fine with one cpu and I've even switched the cpu's and heatsinks around and each one will work but not when I have both installed. I can't figure out what the problem is, anyone know?

Edit to add: I updated my bios with a beta from Supermicro and it didn't help. I've switched cpu's and both work fine independently whether in cpu 1 or cpu 2 spot so it's not bent pins. I'm at a loss here and hoping to hear back from supermicro soon.

Edit: Supermicro has issued me an RMA. It's going to cost a bundle to ship this server back!

I have the same problem with a Supermicro X8DT6 motherboard and SC743 chassis. My server has run just fine since early this year with just one processor. Today I added a second processor and populated the extra memory slots.

I have two XEON 5650 processors with 48GB 1333 ECC Registered Server Memory. The server comes on, and the monitor light stays orange and never turns blue. All the fans spin, and the hard drives boot, but nothing comes up on the screen. My red fan failure light is blinking as well.

Please let me know if you have resolved this issue, or if anyone else has run into this problem. I was so excited to get my package today, and I was expecting a faster computer, not one that doesn't work.

I have an aftermarket graphics card, so I might try switching back to the 8 MB VGA onboard video memory. Were you using onboard graphics, or an aftermarket card, and could that even matter?
m
0
l
October 7, 2011 7:00:51 AM

photolurp2 said:
I have the same problem with a Supermicro X8DT6 motherboard and SC743 chassis. My server has run just fine since early this year with just one processor. Today I added a second processor and populated the extra memory slots.

I have two XEON 5650 processors with 48GB 1333 ECC Registered Server Memory. The server comes on, and the monitor light stays orange and never turns blue. All the fans spin, and the hard drives boot, but nothing comes up on the screen. My red fan failure light is blinking as well.

Please let me know if you have resolved this issue, or if anyone else has run into this problem. I was so excited to get my package today, and I was expecting a faster computer, not one that doesn't work.

I have an aftermarket graphics card, so I might try switching back to the 8 MB VGA onboard video memory. Were you using onboard graphics, or an aftermarket card, and could that even matter?


I never did get the two cpu's to work together and Supermicro ended up sending me another server and it works fine. I also have an aftermarket graphics card (Nvidia) and my mb didn't come with onboard video. I'm using the same graphics card in the new server and both cpu's work. I spoke with a tech and installed a beta version bios and nothing worked on the first server. Supermicro was okay in sending me another server although there were a few glitches, don't let fedex bully you into paying any kind of duty/taxes/etc and make Supermicro pay all costs for the ship to you. I paid about 70 bucks to ship the broken server back and that was cheap because I had a family member mail it from the States (I live in Canada) but I still wasn't happy that I had to pay any shipping.
m
0
l
Related resources
October 8, 2011 7:23:17 PM

I spoke to tech support and tried everything you did, ie switching sockets, etc. Each CPU will work by itself but not together. I purchased all of the items separately, so all I should have to send back is the motherboard. The first tech was pretty cool and seemed to know what he was talking about, and the second one was OK, but a little bit of a know it all jackass. Both CPUs were identical, and had the same revision and stepping. He tried to convince me that the QPI link on one of the CPUs might not be communicating with the other. Yeah right.

He was like, well if you don't believe me then just send it back for RMA, which is what I am going to do. I asked him how was I supposed to check a QPI link, and that I didn't know of a way. He then just said send it back. Seeing as how this is a production server, I am going to try and get a new one sent out first. Do you think I should ask them to test two similar CPUs on the board before they send it out? What was the turnaround like for you? Seeing as how I live in the US I doubt seriously there will be any of those types of fees, at least I hope not.
m
0
l
a b à CPUs
October 8, 2011 7:38:14 PM

One requirement for dual CPU to work is the CPU must support dual QPI. Some Xeons only have a single QPI and that will not work with dual socket motherboards.
http://ark.intel.com/compare/37096,47922

m
0
l
October 8, 2011 8:47:54 PM

photolurp2 said:
I spoke to tech support and tried everything you did, ie switching sockets, etc. Each CPU will work by itself but not together. I purchased all of the items separately, so all I should have to send back is the motherboard. The first tech was pretty cool and seemed to know what he was talking about, and the second one was OK, but a little bit of a know it all jackass. Both CPUs were identical, and had the same revision and stepping. He tried to convince me that the QPI link on one of the CPUs might not be communicating with the other. Yeah right.

He was like, well if you don't believe me then just send it back for RMA, which is what I am going to do. I asked him how was I supposed to check a QPI link, and that I didn't know of a way. He then just said send it back. Seeing as how this is a production server, I am going to try and get a new one sent out first. Do you think I should ask them to test two similar CPUs on the board before they send it out? What was the turnaround like for you? Seeing as how I live in the US I doubt seriously there will be any of those types of fees, at least I hope not.


After getting my rma server I put in both processors and they worked beautifully so it was definitely a problem with the mb and not the cpu's. I also bought all my items separately and just a barebones server but I still didn't want to take out the mb so I opted for a whole new server. I was able to keep my server while waiting for the new one because one of the techs said I should do a "cross ship," he was quite concerned that I shouldn't be without my server, and that worked out well. It would've only taken 3 days to get it if fedex hadn't tried to get money out of me, and that's pretty good considering it came out of California all the way up to BC, so in the end it was four days. I took a lot longer to ship it back but still under the 30 days waiting for my relative to come up from the states and get it across the border. One thing I was concerned about was they charged my cc well over the price I originally paid for it as I got it on sale at newegg.ca and I figured with the few glitches I had they would stall on crediting my cc back. No worry tho', even after I kept two of the fans (my server only came with 2 instead of 4 behind the harddrives and I figured after all the headache I went through I was keeping them) not only did they credit me back almost immediately they also gave me an extra 35 dollars! I have no idea why (maybe they refunded part of the shipping??) but I wasn't going to b*tch. :)  All in all Supermicro did a great job and I would buy their products with no hesitation.
m
0
l
October 9, 2011 5:56:52 AM

lp231 said:
One requirement for dual CPU to work is the CPU must support dual QPI. Some Xeons only have a single QPI and that will not work with dual socket motherboards.
http://ark.intel.com/compare/37096,47922

I think I have that taken care of.
Thanks

m
0
l
October 9, 2011 7:32:55 PM

KansasA said:
After getting my RMA server I put in both processors and they worked beautifully so it was definitely a problem with the MB and not the cpu's. I also bought all my items separately and just a barebones server but I still didn't want to take out the mb so I opted for a whole new server. I was able to keep my server while waiting for the new one because one of the techs said I should do a "cross ship," he was quite concerned that I shouldn't be without my server, and that worked out well. It would've only taken 3 days to get it if fedex hadn't tried to get money out of me, and that's pretty good considering it came out of California all the way up to BC, so in the end it was four days. I took a lot longer to ship it back but still under the 30 days waiting for my relative to come up from the states and get it across the border. One thing I was concerned about was they charged my cc well over the price I originally paid for it as I got it on sale at newegg.ca and I figured with the few glitches I had they would stall on crediting my cc back. No worry tho', even after I kept two of the fans (my server only came with 2 instead of 4 behind the harddrives and I figured after all the headache I went through I was keeping them) not only did they credit me back almost immediately they also gave me an extra 35 dollars! I have no idea why (maybe they refunded part of the shipping??) but I wasn't going to b*tch. :)  All in all Supermicro did a great job and I would buy their products with no hesitation.

I am glad yours worked out. Maybe I should do that, because I want the two extra fans as well... So you were able to have them build you a server that was identical to the one you built yourself? So did you just have to transfer the data over to your new computer, or did you just switch drives? I did put in a RMA request, and they got back quickly with the form, and said that a RMA number would soon follow. So it sounds like they charged your credit card immediately, and they kept the charge on there until they got them back? My Motherboard is a strange example. It was listed at Newegg as new, but it was $100 off the regular price. I think it was an open box return and that since most everything was in the box, that they would sell it as new, but at a hefty discount. I paid $349.99 for it, and they are now back up to $449.00 for it. Maybe they knew it had a problem, who knows. I wonder what they will charge me. I am going to go with UPS because they are a better company, and the shipping is much cheaper.
m
0
l
October 9, 2011 8:16:50 PM

lp231 said:
One requirement for dual CPU to work is the CPU must support dual QPI. Some Xeons only have a single QPI and that will not work with dual socket motherboards.
http://ark.intel.com/compare/37096,47922

Without looking at all of them I would say all of the 5000 and greater with QPI have 2 links for the most part, even though it does not list how many QPI links the 7000 series MP CPUs have, but I would imagine probably 4. 3000 series on the other hand are designed for single socket motherboards, which I think is silly. Lets do a comparison shall we:


If you look they are all functionally the same: TDP, clock, Max Turbo frequency. The Extreme 990x and the Xeon 3690 are basically the same except the Xeon has slightly higher memory bandwidth, even though they only can address up to 24 GB of memory. The 3690 uses ECC, the 990X does not. They are fundamentally the same, and both are around $1,000. Now lets take a look at the 5690. It will address up to 288 GB of memory, and it has 2 QPI links. The 3690 and 990x only have one QPI, so therefore you can only put one on each board. With the 5690, you are paying 70% more ($1700), just for that second QPI link, and the ability to address substantially more memory.

You are going to have spend substantially more on a board that supports ECC memory, the ECC memory itself is going to cost twice what the same non-ECC memory would cost. It just does not make any sense to me. 3000 series processors seem like a joke to me. There is no real need for them unless you need slightly higher memory bandwidth, and ECC Memory. By the time you have error correction in the memory, it probably negates the higher bandwidth. I just do not understand why on earth anyone would pay more for a single CPU server solution when they can get better or equal performance for substantially less with the second generation processors, at least when you compare apples to apples.

All in all these are all three Intel Extreme 990x processors, with minor differences, mainly the second QPI link for an extra $700. And many of the newer processors are going back to DMI instead of QPI. Go figure.

I could not wait for Sandy Bridge Server chips to come out, but it will be interesting to see what they are all about. I have right at five grand pumped into this machine now, so I am hoping it will be future-proof for a while with two 2.66 GHz processors, which runs at 3.06 GHz with Turbo Boost. They have 256x12 L2 cache and 12 MB shared L3 Smart Cache (for a total of 24 MB L3 cache), a total of 12 cores and 24 threads {assuming I can get both processors to work together}, 48 GB 1333 ECC Registered Server Memory, four 15,000 RPM 15K.7 SAS hard drives with three in a RAID-5 array and one as a hot spare.

I got the most powerful video card which does not require an auxillary power cable, the HIS IceQ Radeon 5670. As this serves as a home computer as well as a server, I wanted a little more than the 8 MB video that came with the board. The only things left are 4-8 or more hard drives. I am thinking of getting a SSD and getting the CacheCade 2.0 software, and perhaps a more powerful video card, but I doubt it as it runs fine for the racing sims that I do.
m
0
l
October 9, 2011 8:30:43 PM

photolurp2 said:
I am glad yours worked out. Maybe I should do that, because I want the two extra fans as well... So you were able to have them build you a server that was identical to the one you built yourself? So did you just have to transfer the data over to your new computer, or did you just switch drives? I did put in a RMA request, and they got back quickly with the form, and said that a RMA number would soon follow. So it sounds like they charged your credit card immediately, and they kept the charge on there until they got them back? My Motherboard is a strange example. It was listed at Newegg as new, but it was $100 off the regular price. I think it was an open box return and that since most everything was in the box, that they would sell it as new, but at a hefty discount. I paid $349.99 for it, and they are now back up to $449.00 for it. Maybe they knew it had a problem, who knows. I wonder what they will charge me. I am going to go with UPS because they are a better company, and the shipping is much cheaper.


Actually the server I bought from newegg was a barebones one that included the mb and dual nic, I had to add video card, memory, cpu's, harddrives, cd/dvd, etc. I waited for the rma server to come to me and just swapped everything including the drives. This meant buying some arctic silver but I already had some from a different job so I was good with that.
Yes they charged my cc right away and didn't credit it back until I sent them the broken server but as I said before they actually credited me more than they charged (and I was worried they would ding me with the two missing fans lol), if I had of known I would have been tempted to keep a few of the harddrive trays but I didn't want to push my luck! :) 
You'll probably be charged the going rate they sell it for and I doubt you'll get any discount but it shouldn't matter because when you ship the old one back you'll get credited back.
m
0
l
October 10, 2011 1:00:41 AM

How much reduction in temperature have you noticed in the system, CPU(s), Graphics card and Hard Drives since you added the extra two fans? Is there a great deal more noise? I imagine because of the positive pressure, you would have less dust build up over time inside the case. Mine came with two real fans and two dummy fans in the space between the backplane and the central compartment. I have looked for the part number for the drop in fans and they are not common or cheap. Seems like they range in price from $15 to $30 or more dollars for each unit, but I start questioning companies that are substantially less than the competition. Most are not in stock and are special order. All I actually need I think is two the SAN 80 fans, but I cannot find that same exact model anywhere I look. Must be a vendor thing.

I am thinking of just getting a couple 80x80x25 fans that use the same amperage, have the same speed, cfm, and db ratings. I wrote to the manufacturer and they did not have the SKU in their database for the actual fan itself, but one very similar. Strange seeing as how they made the fan. All of the specs seemed the same, and tech support said they did not see any reason that they would not work.

I imagine I would just have to press the fan into the two housings and correctly position the 4 pins. Perhaps I will look around locally for similar fans. Being that this is designed as a DP server chassis, and it is the SQ (Super Quiet) model, maybe I won't even need them. What was your overall impression of two versus the four fans? The CPU Heatsinks have fans on the front of them blowing toward the rear, the Power supply blows the hot air out the back, and then there is a 92mm exhaust fan on the rear of the case, so I wonder if I really need a fan behind it. On the other hand, filling up the other socket and the extra 6 memory modules, maybe I should, but the cheap part of me keeps on thinking that they wouldn't have put only two fans in there if they were not adequate.

I think I will just carry one of my dummy fan holders to Micro Center and see if I can find a fan that would work. A) Would you have paid for the fans if they charged you for them, and B) How much would you be willing to pay? The System usually stays below 40°C, as does the processor (well technically Low), and never gets above 60-62° when running Prime 95 with all 12 threads running at 100% in Task Manager. The Hard drives stay in the 34-37° with low disk I/O, although they have gotten to 55° (at their their stated limit) when we had an air conditioner that quit working and it was 85°+ in the room. I keep my case in immaculate condition<dust=the devil>.

My graphics card stays low at idle, around 40° as well. Basically with the ambient air temp around 72-74°F (22°C), everything usually stays at or near 40°C, except the power supply, which tends to run a little warmer say 50-53°C at idle, and in the mid to upper 50s at high load. I wish I could get that down a bit more, but with fans that run at 572-687 RPM, and a power supply with 865 watts, nothing but another fan right behind it will help. Speed Fan doesn't seem to be able to control my power supply fans. This is even with the fan setting in BIOS at the High Performance level. Maybe I have just answered my own question.

Percentage wise, how much above the normal Newegg selling price did they charge you? As I said, I paid 350 for a board that normally sells for 450. Should I expect to have a $600 hold on my CC? I have the money I just hate to have it tied up. I also have a hard drive that MegaRAID Storage Manager says might be going out, so I am going to have to RMA it the same way. I just don't want a combined $1,000 hold on my account. These are the errors and information that it gives me, and if anyone knows what those error messages mean, let me know.

2960 [Information, 0] 33seconds from reboot Controller ID: 0 Unexpected sense: PD
= --:--:2 - Failure prediction threshold exceeded, CDB = 0x03 0x00 0x00 0x00 0x40 0x00 , Sense = 0x70 0x00 0x00 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x5d 0x00 0x10 0x00 0x00 0x00
2959 [Warning, 1] 33seconds from reboot Controller ID: 0 PD Predictive failure:
--:--:2

Most things seem fine, but the read/write times are down from the glorious levels that they once were at, look at the comparison:

8 Months ago when brand new:
Random Access: 5.5ms, CPU utilization 1%, Average Read 347.8MB/s, Burst Speed 855.5 MB/s.
Not too bad for a rotating hard drive array consisting of three disks in a RAID-5 setup if I do say so myself.
Today:
Random Access: 5.6ms, CPU utilization 2%, Average Read 313.4MB/s, Burst Speed 473.6 MB/s.
Obviously I am most worried about the burst speed. It is off by 55%.

Seeing as how The LSI 9260-4i MegaRAID controller card has 512 MB onboard cache, set to predictive read ahead, it is making me wonder about the card itself. No, I can try and fool myself all I want, but I should RMA that hard drive maybe I should see if there is a way to transfer the data from the supposed failing disk to the hot spare, without having to remove the bad one and rebuild the array as if it had failed. Do you have any advice about this? I know I should have started a new topic for this, but just too lazy.

With the hot spare in there, maybe I should just let it go until and if it fails. The array has three partitions, ant the total combined usage is almost exactly 50%, but I keep my hard drives well cleaned of junk and defragged. I did not notice the slowdowns as much with my old first generation 36GB Raptors on my last PC, but I do not know if using 50% of the hard drive should slow down peak performance by half in SAS drives (which kill Raptor and Velociraptors by the way). I wonder if frequent defragmenting the hard drives could be causing the disk failure? With my old Raptors, they are basically the same as when new 5-6 years ago.

While I am at it, I have an APC 950VA battery backup. In the powerchute software, it says that I am using about 170 watts at idle, With Prime 95 running, Supero Doctor 3 reports CPU1 Medium , system to 44C, power to 60C. APC Powerchute goes from 170 at idle to 286 Watts when Prime 95 is running, and it says that it is not recommended that you connect more equipment to your battery backup, even though the bar goes to 540 Watts, which I assume is the most it can put out. I suppose it does not like it when you go over 50% of the available wattage. With another processor, that will be 286+95= 381 watts. The software says that the Estimated Battery Time is 13 minutes, which in reality is probably less than 10, based on my experience. Various other prograss read the sensors at up to 67° on the CPU, and while not catastrophic, certainly more than I want. I guess I need the fans.

I imagine with the extra RAM installed we might be looking at over 400 watts. When I RMA the bad drive, I am going to add the new one to the array, for a total of four 15k.7 SAS hard drives in a RAID-5 array. I wonder what size power unit I need. Will a 1500VA unit do it, or am I looking at 2,000+? I have always used APC, Do you have any input on the cheaper brands out there?

Sorry for such a long post. Please feel free to answer any or all of my questions.
m
0
l
October 10, 2011 5:39:03 PM

photolurp2 said:
How much reduction in temperature have you noticed in the system, CPU(s), Graphics card and Hard Drives since you added the extra two fans? Is there a great deal more noise? I imagine because of the positive pressure,...


I have to say this is the quietest system I have ever had, my desktop computer that sits right beside it is noisier. If I had to pay for the fans I probably would have, and I probably would have paid about 20 dollars each, give or take. I feel it's better to be safe than sorry and heat is an issue for sure. Where my computers are located I have two large windows and on the Ranch we have nothing but gravel roads so dust is a big problem, I'm always opening cases and blowing air into my machines. As for actual temps, sorry I haven't check that out (see below). I run 9 internal harddrives and also have a esata/usb box with two more drives installed and could add two more harddrives but have them empty (for now lol). Everything just seems to work, and work well together. I do have a raid card installed but I currently don't have raid set up and mainly use the whole system as jbod but run backup software, for now that works because of the amount of space I have. I also run an HP MediaSmart that backs up all my computers in the house every night (something I bought years ago but just currently installed whs2011), this too has a 4 harddrive esata box connected to it for a total of 8 drives (but right now I only have one drive installed).

I have an xbox 360 and a WDTV box that the server streams movies to and, other than a few glitches with MyMovies software on occasion, streaming works great, with our hot summers (no air conditioning at all) heat seems to be okay. I too have everything (probably too much) connected to an APC UPS XS 1500 (PowerChute says I am using 432 watts of power and I can connect more) but I only get about 17 minutes when our power goes out. I immediately put my server and desktop in sleep mode and my time goes way up, I'll usually grab my netbook if I need to go online in the time it takes for our power to come back on. If it's off for a long time (and in the area I live in that's quite common) I grab our gas generator, run an extension through our doggy door and I'm good for hours.

It's been awhile since I did the rma and if I remember correctly they charged me about 200 extra, but most companies charge more than any store I've seen. For instance if you go to Western Digital webpage you'd pay a lot more for any product than if you went to Futureshop, Costco, Newegg, etc. It's not often you get a "deal" from a manufacturer's website.

As for your harddrive problem have you gone to the manufacturer's website to see if they have software to check the drive? Most of them do although I did come across a laptop with a Toshiba drive recently and it did not.

You got me curious on temps so I installed CPUID Hardware Monitor but one thing I notice is it does not include all my harddrives (most likely the ones connected to my raid card). I snipped a pic and uploaded it to flickr for you to compare to your system. http://www.flickr.com/photos/kansasa/6230947745/ I remember checking the fan settings in the bios when I first installed the two cpu's and temps were a bit high but I chalked it up to letting the arctic silver settle in and from the numbers now things look good.

Well I hope I answered most of your questions as best I could.

m
0
l
October 10, 2011 9:42:12 PM

Well your system does seem to run cooler, at least the hard drives do. It is strange that the CPUs MAX TDP on the L5506 is 60 watts, while the 5650 is 95 watts, and runs at a higher frequency, but the temperatures of your cores are higher than mine, but I don't know what kind of load you have on it. One thing is odd, and I can't say whether it is usual or not, but one of your CPUs seems to ruin about 5°C hotter than the other. Are you using the same heatsinks? Perhaps one CPU is closer to something hot in your system.

When I first bought my system I had a hell of a time with my cooling. Being the first server processor that I had ordered, I did not know that the boxed CPUs did not come with heatsinks. I understand why, because there are so many different configurations. I first bought a big heatsink, but it would not work because the X frame covered the holes, or really there just was not enough clearance between the edge of the metal bracket and a capacitor, so I ended up having to take that one back.

I finally found a Dynatron cooler at Micro Center that would work because it screwed straight down, but it was the old type with the copper core and was more like a Pentium 4 type cooler blowing the hot air upward instead of backwards toward the rear of the system. When the price of the memory I was using dropped substantially (from 97.99 a stick to $38.99 a stick), I ordered enough 4GB modules to populate all 12 slots and also a couple of Supermicro heatpipe type heatsink, SNK-P0040AP4, which runs at 2800 RPM, is 24 DB, and is really designed for workstations, but the reviews I read said it was very quiet and performed very well. The CPU overheat alarm would go off if I ran Prime95 torture test with the Dynatron, but when I switched to the Supermicro cooler, it only went from low to warm, but not overheating. It kept the CPU about 15°F cooler!

It seems like your hard drives run cooler, but that might be due to the rotational speed. I tried every disk tool from the manufacturer, but they could only read the array, seeing as how it hardware instead of software, an add on card, and is handled by bios as a single drive, so there is no way to test it individually, unless I shut the system down and booted from floppy, but it probably would not recognize the drives either due to being on a controller.

This now in, here is what the jackasses are saying, I cut and pasted it:

• The Credit Card holder is authorizing $536.00 for 1pc _X8DT6 to be charged by Supermicro as a security deposit (sales tax may be applied for CA & UT residents).

• Warranties may be voided and/or full credit of $536.00 will not be refunded if inspection finds that the returned product has been abused or altered.

• Along with the security deposit for the replacement product, customer will incur a $120.00 non-refundable service fee as the requested product is over the standard cross-shipment period based on Supermicro’s original invoice date. Please note that this service fee is non-refundable once replacement has been shipped.

I don't know what to do now. I either pay hundreds of dollars in shipping, pay a non refundable $120 service fee, or go for two weeks without my server running. I am a little pissed to be mild about it. I have no idea what the origional invoice date has to do with it. Well I just looked at their warranty page and cross ship is only 30 days. Lesson to be learned: do not order a dual processor motherboard from Supermicro with just one CPU with the idea of buying a second one later. If you do, you will lose an extra $120.00, which is F*cking ridiculous. Perhaps I should demand overnight RMA both ways, and they pay for shipping. I could live with that to a certain point, but UPS is 5 days for standard Ground, so that would mean that I would be out a server for half a month if I did it that way.

I guess I need to call someone and complain. $656 (of only $536 I will get back) for a product that I paid $350 for in the first place. F*cking outstanding. You did yours through phone support if I remember correctly. I am so pissed this might be the last Supermicro product that I ever purchase, depending on what the competition offers. They know that since this is a production server that it can't be out for half a month. It's not MY fault that I discovered the defect later, just because I could not really afford both CPUs at the same time. There is still ASUS, Intel, TYAN and other reputable companies that make good server boards.
m
0
l
October 10, 2011 10:23:29 PM

photolurp2 said:


...on what the competition offers. They know that since this is a production server that it can't be out for half a month. It's not MY fault that I discovered the defect later, just because I could not really afford both CPUs at the same time. There is still ASUS, Intel, TYAN and other reputable companies that make good server boards....


Fortunately my server came with two heatsinks, I just had to order the two CPUs. I too almost bought one cpu thinking I would buy the 2nd one later but my brother talked me into it and we both came to the same conclusion "what if there was a problem later on and I was out of warranty?" I'm now thankful I did because I'd be sitting in your position now (sorry). I didn't know it was only 30 days and luckily my 2nd cpu that was back-ordered only took a couple of weeks but I must have been awful close to the 30 day limit. What I found frustrating was paying shipping charges when I so meticulously put together a server with the best prices I could find, and that included ordering from several different companies... the extra money spent on shipping the defect server back I could have used somewhere else. I checked into shipping from Canada and was told Canada Post wouldn't even take it because it was too heavy and the gal at the courtesy drop for fedex said I was probably looking at about 200 dollars from any shipping company! I waited for my relative to come up from the States and was glad she came up when she did.

If I were in your position I would still exchange it because if you are anything like me it will bug the hell out of you. Complain loudly and you just might get away without paying the non refundable fee, after all it is their fault and not yours. I dealt with both phone and email (Jason, I believe) was the one guy that I got along with pretty well. Have you checked with Newegg to see if you can exchange it there with no extra fee? I remember checking newegg.ca and I would still have to pay return shipping so I just went with SuperMicro because I didn't want to start all over with newegg.

The picture I clipped was my server under no load. The CPUs are identical other than one was backordered, but I had to compare the boxes when I first spoke with SuperMicro and everything was identical so I don't know why one is running hotter than the other? Like you say it could be closer to something (my raid card?) than the other one.
Do let me know how things turn out, I'm curious that way. :D 
m
0
l
October 11, 2011 8:51:33 AM

I called the RMA department, and they basically told me that since it was an online RMA, it would be better to reply to the email, especially since I didn't even have a RMA number yet. They are waiting to see how I want to do it before they issue the number. I have gotten Newegg to work with me on returns and restocking fees in the past. I mentioned to them that I was going to do some upgrades in the future, and that I would likely spend another few thousand on this machine, that may have helped them decide. I have right at $5,000 in this machine, so I am not their biggest account, but at the same time that is a whole lot more than someone who builds a $250 PC.

True to my word, I did purchase the rest of my memory, the other CPU, two heatsinks, and another SAS 15K.7 HD from them. As far as returning things to Newegg, you only have 30 days generally speaking, and I have had my MB since February, so I seriously doubt they would take it back. The memory I could return (with a re-stocking fee of course), but the policy for CPUs is replacement only, so unless it is defective they will not take it back. So I am stuck with a useless $1,000 CPU unless I get a new board. That is a pretty expensive mantelpiece.

The name of the document they sent me is xship_with_fee.doc, with a bullet point that states "• Along with the security deposit for the replacement product, customer will incur a $120.00 non-refundable service fee as the requested product is over the standard cross-shipment period based on Supermicro’s original invoice date. Please note that this service fee is non-refundable once replacement has been shipped." I wonder if I renamed the document from xship_with_fee.doc to xship.doc, and deleted that bullet point line, if they would notice or remember.

Also in the email, there is a line in the email that reads "This is over cross-ship period and qualify for repair only. However, cross-ship is still available upon request with a non-refundable service." I guess I would have to delete that line as well. I wonder if they would catch on...Just kidding (or am I?) Wow I really do have a criminal mind.

I really pleaded my case to them with the basic premise which I outlined earlier, and that is that I should not be punished because I could not afford the second CPU at the time, and that had I purchased both CPUs at the same time, I would have noticed on day 1. I told them that I had Newegg invoices with dates for the motherboard and CPUs to verify my claim. I think this is an extenuating circumstance, and that they should wave the fee.

Who knows what will happen, but I am praying and hoping that they will do the right thing. I also asked them to test it with two six core XEON 56xx CPUs, preferably if they have the X5650, that would be best. No sense in getting another defective board. I hope they don't just send it out without testing it, and it does not work either, and then they claim it worked fine when we shipped it. Hopefully they are not that crooked. I will probably hear something back later today. Maybe I should ask them to do it as a repair instead, and ask that they pay shipping both ways for same day service. I wonder if I sent it out early in the morning if they could get it the same day, repair or replace it, and get it back to me the same day. They probably could, but it would most likely be more in shipping than the MB is worth. I will keep you posted. I think if they refuse my first offer, I will ask for the second, just to see what they say.
m
0
l
October 11, 2011 12:02:42 PM

As a side note, I have already had to send a motherboard back once, when I was building the machine. I ordered a SUPERMICRO MBD-X8DAH+-F-O Dual LGA 1366 Intel 5520 Enhanced Extended ATX Dual Intel Xeon 5500 and 5600 Series Server Motherboard. Well when it got there, it would not fit in my SUPERMICRO CSE-743TQ-865B-SQ Black Pedestal Server Case which I had just ordered. I thought extended ATX was extended ATX, but I was wrong. It was ENHANCED extended ATX board, which was 13.68" x 13". REGULAR extended ATX cases are 12" x 13". Since I liked the case, and figured the shipping would be much greater, I sent the first MB back, had to pay a restocking fee and of course wait some more. That along with the cooling fans, the potential failing drive and now this has been more work than I ever imagined.

My first server that I built was P4 3.GHzhz 1M L2 cache with 80MHzhz FSB and 2 GB non-ECC ddr 400 with three 36 GB Raptor HDs in RAID0 with a Radeon 9200 in an ASUS P4P800-Deluxeux MB with a cheap case, and Server 2003, Enterprise Edition.

I will tell you that this has been a very steep learning curve with Server 2008 Enterprise Edition, along with "real" server hardware. The damn thing is going to be obsolete before I finish getting it built! Now I see servers that take up to 192GB 1333 / 1066 / 800MHz DDR3 ECC Registered DIMM, but as a new feature can instead take up to 384GB 1066MHz ECC LRDIMM ECC, whatever that is. I guess this new RAM that I have never heard of is why the memory went down from $98 to $38 in 8 months. Now I have to figure out what LRDIMMS are, what makes them so special, and how expensive they are. Maybe I will have to take my memory back after all. At least I bought on the toc side of the Intel processor line, so it is the most modern of the server chips that are going to be phased out.

Just an off the wall question, even with the new Sandy Bridge stuff coming out, and there is the Extreme Edition 990X, will there be an Extreme 1000X, or will they go with the2xxx series line? They went back down to double channel memory instead of triple channel when they switched from 1333 to 1055. And why 1055 when they just had 1056? Just cut off the corner pin on your 2600K and it will fit. Just kidding, don't anyone try that.The rumor mill has it that they will have quad channel memory with the new
m
0
l
October 11, 2011 10:35:56 PM

Well they said no to waiving the fee, and I replied that I wished that they would lower the price, but if not it is OK. I just need to get it here without any further delay, so I signed the agreement form and emailed it back to them. I hope it gets here by the end of the week, and I think it can if they send it out today. I hate having 24GB of memory and a $1003.99 processor as paperweights.

I have been thinking about what you said. I guess I should get a bigger battery backup, as well as a battery backup for my raid controller, but I have been thinking about buying a cheap generator just for that purpose. I am sure that with a 865 watt power supply and other items, I should be fine with a 1,000 watt power supply. I don't have a lot to spend, and I was wondering what you thought about the quality of a generator. I guess as long as I don't overload it and have as much gas as I need, it should keep on going.

I specifically requested that they test it with two processors installed, all memory banks populated, and to flash the motherboard BIOS to the latest version. Oh well there goes $120 that I will never see again, and $536 that I will not have available until I get the new MB and ship it back, which could be up to two weeks. Oh well, you live and you learn. I am waiting to hear back from them with a RMA number. That is the latest for now.
m
0
l
October 15, 2011 7:28:40 PM

photolurp2 said:
Well they said no to waiving the fee, and I replied that I wished that they would lower the price, but if not it is OK. I just need to get it here without any further delay, so I signed the agreement form and emailed it back to them. I hope it gets here by the end of the week, and I think it can if they send it out today. I hate having 24GB of memory and a $1003.99 processor as paperweights.

I have been thinking about what you said. I guess I should get a bigger battery backup, as well as a battery backup for my raid controller, but I have been thinking about buying a cheap generator just for that purpose. I am sure that with a 865 watt power supply and other items, I should be fine with a 1,000 watt power supply. I don't have a lot to spend, and I was wondering what you thought about the quality of a generator. I guess as long as I don't overload it and have as much gas as I need, it should keep on going.

I specifically requested that they test it with two processors installed, all memory banks populated, and to flash the motherboard BIOS to the latest version. Oh well there goes $120 that I will never see again, and $536 that I will not have available until I get the new MB and ship it back, which could be up to two weeks. Oh well, you live and you learn. I am waiting to hear back from them with a RMA number. That is the latest for now.


Well if they do the testing and flash the mb at least you'll feel like the 120 dollars goes towards something, although I'd still be ticked.

I really don't like the generator I have right now. It's about 10 years old with a manual pull start and sometimes it's a b*tch to start. I've had to drag it in the house and put it in front of the woodstove to warm it up so it will fire. I've been seriously looking at ones with a battery and no manual start but they are a bit pricey, wheels would be a nice feature too because the thing weighs a ton.
m
0
l
October 15, 2011 11:23:28 PM

KansasA said:
Well if they do the testing and flash the MB at least you'll feel like the 120 dollars goes towards something, although I'd still be ticked.

I really don't like the generator I have right now. It's about 10 years old with a manual pull start and sometimes it's a b*tch to start. I've had to drag it in the house and put it in front of the woodstove to warm it up so it will fire. I've been seriously looking at ones with a battery and no manual start but they are a bit pricey, wheels would be a nice feature too because the thing weighs a ton.

Contrast the service of Supermicro with that of a company with outstanding customer support, Seagate. MegaRAID controller keeps predicting drive failure for a certain drive, and I communicated by email with Seagate about this, and they told me to RMA it. I chose to do an advance return, and for this they also send you a new or reconditioned drive, second day air. You then have I think 25 days to return it, or they will charge your card $240, not too far from the retail price. And the $9.95 service fee INCLUDES return shipping! I will get the new drive first, and only be out of pocket around $11.04. The strange thing about it is that they included tax for a grand total of $10.55, but it is showing up on online banking for $11.04, so it is no real big deal, just more of a curiosity thing. They don't bill you the $240 unless you don't return the product, instead of treating you like a criminal assuming you are out to screw them.

LSI is another good example. I think I read somewhere that they offer advanced returns for the duration of their 3 year warranty. ASUS, Tyan and and Intel have 3 year warranties on many of their server boards. I know Supermicro has the reputation of being reliable, but I wonder if they don't just market themselves that way. They only offer 1 year parts warranty, and shitty service, or at the very least overpriced. I will think long and hard before I buy another Supermicro Product, if I ever do.

I am going to get myself that 800 rated watts/900 max watt generator from Harbor Fright. Even though they are not known to have the highest quality products that money can buy, I read almost exclusively good ratings. It is rated at 4.5 out of 5 stars, by a total of 139 reviews, and 89% said they would recommend it to a friend. I am sure with my APC it should keep the power good and clean. Have you noticed any drastic changes in voltage when running on generator, or does it stay pretty much within spec?

It runs on 89 octane 50:1 oil mixture, not too thrilled about that, as I would rather use straight gas and oil, but on sale it is $89.99, so I will give it a shot. It only has a 1.1 gallon gas tank, but is supposed to run for 5 hours at 50% load. I think if I start to offer web hosting on a larger scale, customers would love to hear that I have battery backups and generators to keep things going. Now if I could only rely on Comcast to be mission critical. I love their speeds, but there have been times when it goes down monthly or more often, and when I had DSL, I think I had 2 outages in three years, but the speed is terrible. I wish we could get Verison FIOS, but, that is not part of the monopoly agreement.
m
0
l
October 19, 2011 6:46:25 AM

Yeah, I got my replacement motherboard today, and both CPUs work! You know, the one I paid $120 for cross shipment, well now the MEMORY doesn't work right. All 48GB worked fine (24GB per CPU) on my old motherboard, but now I have had Bios register up to 28GB total, then when I add more the total goes down. If I install all memory, it usually goes down to 16 GB, but ranges between 16 and 28. There are 12 slots, 6 per CPU, and each identical module is 4 GB ECC Registered Server Memory, from SUPERMICRO approved memory list.

Sometimes during POST I get the error message: Un-correctable DRAM ECC Error detected at CPU01/DIMM1A. Press F1 to continue. The funny thing is it does it with different modules, you know the ones, the ones that all worked on the other board. Is SUPERMICRO trying to drive me insane in addition to ripping me off? Do I now have to decide if I want a motherboard that supports two CPUs, or a motherboard that works with the ram in it. I have been trying different things for The last 12 hours, and quite frankly am at my wits end.

Is it too much to ask that a motherboard allow two identical CPUs to work simultaneously, and which will allow all of the ram to be used, especially since they are all the same size, manufacturer and specifications. Am I asking to much? am I being unrealistic? Am I setting my goals to high?

If technical support cannot fix it over the phone tomorrow, is it not reasonable to expect SUPERMICRO to pay for next day air return of their defective motherboard, and upon receipt, send out a brand new one that has had two x5650 CPUs attached to it with all 12 memory slots filled with Kingston Value Ram 4 GB memory modules, and which ran with Prime95 torture test on all 24 threads for 24 hours.

What the hell is it going to take? An act of Congress? An act of God? A lemon law? At this point if they cannot fix this within the next three business days, I just want a full refund of the original purchase price, a full refund of the Cross-shipment fee ad the return of my money they are holding hostage. It takes a good hour or so to shut down a server, completely disassemble it, exchange the motherboard, and then hook everything back up.

I am seriously wondering if they didn't do this on purpose. Maybe because I didn't put test with memory on the original work order(I only put test with 2 CPUs), but after that I told them on every correspondence that I wanted it tested with memory. I can think of a lot of things I would like to do to these worthless communist Chinese a$$holes, but all of them would end with me in the electric chair. Besides, I don't want to drive all the way to California.

All kidding aside,is this going to be two more weeks then they say I didn't get my original motherboard back in time so we are keeping your $536. I am not writing this company off yet for the sole reason that I had heard that they produced reliable products. I am at a loss as to what to do next. They know that only about $900 of the parts in my $5,000 computer came from them, so really, why should they give a damn. I am sure there are plenty of corporate customers that buy tens of thousands of dollars of their products.

I am running out of options. I am not bragging when I say my computer cost R$5,000, I am just putting things into perspective. Neither of our family vehicles is worth as much as this computer, and they are 11 and 15 years old. I have a broken down $5,000 pile of junk, and all of it works except the SUPERMICRO part. I am beginning to wish I had never bought that second processor, but if not, then why did I waste all of this money?
m
0
l
October 19, 2011 7:01:02 AM

photolurp2 said:
Yeah, I got my replacement motherboard today, and both CPUs work! You know, the one I paid $120 for cross shipment, well now the MEMORY doesn't work right. All 48GB worked fine (24GB per CPU) on my old motherboard, but now I have had Bios register up to 28GB total, then when I add more the total goes down. If I install all memory, it usually goes down to 16 GB, but ranges between 16 and 28. There are 12 slots, 6 per CPU, and each identical module is 4 GB ECC Registered Server Memory, from SUPERMICRO approved memory list.
just putting things into perspective. Neither of our family vehicles is worth as much as this computer, and they are 11 and 15 years old. I have a broken down $5,000 pile of junk, and all of it works except the SUPERMICRO part. I am beginning to wish I had never bought that second processor, but if not, then why did I waste all of this money?


Well that's some good news and bad! Good you got the board but crappy you're having problems with it. I googled your error and found a guy was having similar problems with a Supermicro board and documented everything he did. He ended up updating the BIOS and that worked to fix it. You can read his notes here: http://dev.overthere.org/wiki/Xen_X8STE_Un-Correctable_...

I know you've probably already updated the BIOS but I thought I'd send the info of what he did. I'd be feeling frustrated at this point because that's exactly how I felt when I went through the 2nd cpu issue, hopefully things will work out and this will be an unpleasant memory in the near future. :) 
m
0
l
October 19, 2011 7:23:00 PM

Well I already had that bios, but just to be sure I downloaded it as you suggested and re-flashed the BIOS. That caused the beep code to go from four beeps to three beeps during POST, but it really varies. Four beeps is not listed in Supermicro Documentation. According to this:

3 short

Base 64K memory failure

A memory failure has occurred in the first 64K of RAM. The RAM IC is probably bad



4 short

System timer failure

The system clock/timer IC has failed or there is a memory error in the first bank of memory

It has to be the board. The memory worked in the other computer. Maybe I will take the motherboard out and set it on the wooden desk and see if that helps, but I doubt it would. I guess I need to call Supermicro.
m
0
l
October 23, 2011 12:05:55 AM

photolurp2 said:
Well I already had that bios, but just to be sure I downloaded it as you suggested and re-flashed the BIOS. That caused the beep code to go from four beeps to three beeps during POST, but it really varies. Four beeps is not listed in Supermicro Documentation. According to this:

guess I need to call Supermicro.


This has got to be pretty darn frustrating and I don't think I'd be able to hold my temper while speaking with Supermicro! I hope they pay for all costs involving the recent mb... any word yet?
m
0
l
October 23, 2011 3:09:14 AM

No I have been busy with other stuff including a Kingston RMA as well. It seems as though Supermicro's motherboard fried three of my memory modules. The strange thing is that on two of the sticks nothing happens, it does not POST (on the new board, as I wasn't going to risk ruining the known good memory I had left in there), and the other one has the 5 beeps and then the long beep, indicating no memory. I need to get this taken care of asap so I don't get charged for it. I don't know whether to write or call, seeing as how it was an online cross shipment return.

I don't know how it happened, because all the memory worked before. I have 9 out of 12 good sticks remaining. but I have to wait on my RMA from Seagate to get the other three replacement modules (even though that is not critical at the moment), and I also need another motherboard before the memory will do me any good. It seems like I am having all kinds of trouble from my enterprise class equipment. I never had nearly the same amount of problems on consumer hardware.
m
0
l
October 23, 2011 3:27:23 AM

And to top it all off I had to replace a failing hard drive that my LSI 9260-4i MegaRAID controller card predicted that it was failing. Well after getting a replacement and doing a rebuild, I moved it to the onboard LSI SAS2008 controller and ran Seagate SeaTools on it, and ran all the tests, and it found no errors. The MegaRAID Manager shows the PredFail Count at 0 on the new controller. What the hell. Now I guess I wipe the drive out, sanitize it good, and send it back to Seagate. Then they can test it and find nothing wrong with it and take off the label and replace it with one that says re-certified, and send it out to the next customer.

What gives? Sorry for the rant, I just do not understand what is going on. Was the original adapter right, or is the onboard adapter along with SeaTools right? Who knows. There was a lot of extra codes when I checked the SMART status of the supposedly bad drive with Hard Drive Sentinel, and those codes were not in the other drives, so maybe something is awry. That PredFail Count got up to like 9, but kept resetting after reboots. This is all a whole big mess that I hope to get straightened out soon. I will try a pre-boot (non-Windows) version of the tool and see what comes up and then install my floppy drive and run the SCSIMax tool as well, even though I am not sure if it works on SAS drives. I guess I will try Western Digitals tools too. I hate to send a good drive back.
m
0
l
November 2, 2011 12:58:10 AM

Well they are finally shipping me a replacement board. It seems as though they have thoroughly tested it, and everything seems OK. I will keep my fingers crossed and pray that it makes it here fine and still works OK. It is scheduled to be delivered by 3 PM tomorrow. Maybe the motherboard will help with some of the other problems.

My LSI 9260-4i MegaRAID controller card with 512 MB DDR2 onboard cache has an alarm present, but I have disabled it. In MegaRAID manager, it says that the Virtual Disk State is Optimal, and that all four drives in the RAID5 array are Online. There are no Media Count Errors or Predicted Failure Counts on any of them. The MRM does not list any errors. Any idea what could be going on? Is it safe to ignore the alarm (which is set to disabled by default I do believe), or should I be worried?

I will keep everyone updated.
m
0
l
November 2, 2011 5:56:14 AM

photolurp2 said:
Well they are finally shipping me a replacement board. It seems as though they have thoroughly tested it, and everything seems OK. I will keep my fingers crossed and pray that it makes it here fine and still works OK. It is scheduled to be delivered by 3 PM tomorrow. Maybe the motherboard will help with some of the other problems.

My LSI 9260-4i MegaRAID controller card with 512 MB DDR2 onboard cache has an alarm present, but I have disabled it. In MegaRAID manager, it says that the Virtual Disk State is Optimal, and that all four drives in the RAID5 array are Online. There are no Media Count Errors or Predicted Failure Counts on any of them. The MRM does not list any errors. Any idea what could be going on? Is it safe to ignore the alarm (which is set to disabled by default I do believe), or should I be worried?

I will keep everyone updated.


Wow you have really been around the block with this haven't you? On the plus side you will know the system inside and out. :)  But what a hassle for sure.
No idea why an alarm would be present, especially if no error is listed, google might help? Firmware update maybe? It would bug me knowing there's something there.
m
0
l
November 6, 2011 6:42:09 AM

I did in fact flash the MegaRAID controller with the newest BIOS. It is getting worse and worse. The maximum read speeds from the RAID5 array have gone from around 850 MB/s to around 400, and the Sustained Read speeds have gone from around 350 MB/s to 275, and I have been having system reboots with Fatal cache errors. Event log said that the file system on a certain partition had become corrupt and unusable. I ran chkdsk and that seemingly fixed the problem, but I have been having unexplained reboots. I am going to RMA the controller.

As far good news, my replacement motherboard is working just fine with both physical CPUs, and all 24 threads. I did some more testing on the bank of memory, but it was only one stick that was bad, so I am RMAing it too. I have the other two 4GB modules, but because this machine uses triple channel memory, I will wait until I get the replacement before I add it and the other two. I am hoping that the failing LSI MegaRAID card is the cause of the random reboots. Other than the reboots, the MB has been working fine, and I think they may have shipped me a brand new one, but I am not certain.

There is another strange occurrence. BIOS, CPU-Z, System Information, and other applications all register 36GB. Task Manager registers 36GB under Total Physical Memory, but the Commit Charge is 10/35, for example, depending on how much memory I am using, not 10/36. No big deal but I still wonder why. I can't wait to add the other three modules for a total of 48GB.

I made another purchase recently, due to the fact that I do racing Sims and listen to music occasionally ( I usually listen to talk radio, so that doesn't really justify it, but even that is notably better). The main thing I couldn't stand was that the wires always needing to be adjusted to get sound, or to get rid of static. It was a cheap model, and I replaced it with the Bose Companion 3, series 2.

I bought it mainly because my old speakers sucked, and I wanted decent audio quality. I have always wanted to try Bose. The deciding factor is that Sam's Club had it on their discount rack for $135.79, instead of the $194.74 regular price. The only reason for the markdown was it was open box. I cut it open and looked inside, and it was in pristine condition, and didn't even look like it had been used. For a savings of over $60 (with tax), I couldn't go wrong. They usually sell closer to $200 online.

I know it is not the absolute best sound system in the world, but it is excellent and more than met my expectations. The sound is truly amazing for the size and price. I would recommend this system to anyone. The desktop speakers take up a very small amount of space, and the Acoustimass Module hides away and the effect is amazing. It really spreads out the sound and makes it come to life. If you close your eyes, you can visualize where all of the musicians are, and so forth. If you don't mind paying full price, go ahead and find one online or even better check for deals.

I have several rules deciding on online purchases. If it is a well-known retailer such as Newegg or Amazon, I look for a high customer service rating, or how many eggs, and I prefer at least 98% satisfaction for sellers and around 80% in eggs of 5 and 4 combined, for example out of 100, I would prefer at least 70% with 5 eggs, and 10% or more with 4 eggs. For unknown sellers or companies, I first use NetCraft to determine how long they have had a website up and such. If a deal seems too good to be true, it probably is. Then I check RipoffReport, BBB, and Consumer Reports.org (I have a subscription, and I renewed for like $19.00 for one year) to start with. I think the normal price is $26 a year. It is well worth it. It pays for itself in one purchase in many cases.

My parents chose not to use it, or either I did not offer first. They purchased an inexpensive dryer, and out of about 90 models, they picked the one which only had three models worse than it. That means about 85 models were better than this one, and a Consumer Reports Best But model which was way up on the scale only cost around $100 extra. How stupid on their part. Oh well it is their dryer and their money. Ok enough of my rants. I am starting to sound like a salesman.

Hopefully when I get my controller and memory module replaced, I will have the perfect system (until I can afford an 8 way server). I have over $5750 in the whole system, including everything, so I just want everything to work properly. I wonder if the motherboard has not been responsible for some of the other problems. Oh well, I see light at the end of the tunnel. Hopefully, with the new stick of RAM, and controller, I will be golden.

Next is an 800 watt Honeywell Inverter type generator, which is designed for sensitive electronics, and an onboard battery backup for the LSI MegaRAID card, then I can turn off Windows write cache flushing to disk altogether. A SSD for my pagefile and CacheCade software are also on the wanted list. I have room for 4 more 3.5" hard drives in the backplane, but I am not certain whether I can use the onboard LSI 2008 SAS controller for the other 4 drives.
m
0
l
November 14, 2011 4:43:00 AM

Supermicro got their board back last Thursday, and I am waiting on my $536.00 refund. Won't that be nice? Everything seems to be working great, except I get random reboots from time to time. Seeing as how I have already RMA'd two motherboards, and this one works with both physical CPUs (all 12 cores, and all 24 threads), and all 48 GB of ECC Registered Server Memory, I think I will keep it, and hope to find a solution elsewhere. I got my replacement memory module, and RAID card. The memory works fine, but I will get into the Controller Card later.

I have had 10 reboots since 10/1/2011, and they do not seem to be related, but I am coming up with a theory. Yesterday it crashed while playing F1 2010 (a racing sim), and today during a GPU stress test the screen turned white a little after an hour. The GPU never got over 60C, and it is a Radeon IceQ 5670, the only piece of consumer grade hardware that I have on my server. I could not do anything. It locked up, and Ctrl+Alt+Del did nothing, nor did Ctrl+Shift+Esc. The strange thing is the Num Lock worked, but not Scroll Lock or Caps Lock. I had to manually press the power button, and I got the exact same event in the event log as the other nine: Event 41. Then, during or after POST, I get this message from the LSI controller:

Cache data was lost due to an unexpected power-off or reboot during a write operation, but the adapter has recovered. This could be due to memory problems, bad battery, or you may not have a battery installed. Press any key to continue, or 'C' to load the configuration utility.
_

Well I don't yet have a BBU for the controller, so the battery is not the problem. I am going to get one soon, hopefully this week. The system memory is just fine, but I assume they may be talking about the memory on the card, but if the RMA replacement is having memory problems, then it needs to go back too. Could the controller card work with bad memory, and all the speed is just from the four disk RAID5 array? It has a sticker on it that does not instill a great deal of confidence. It reads, "Serviceable Used Part". That means to me, it was ready for the trash and someone said no, it still works.

The funny thing is that it is slower than the one I was planning on RMAing. I moved it from the lowest slot, closest to the bottom of the case, where there was less than half an inch between the heat spreader and the case to the PCI-Express slot above it, and the speeds went up, dramatically. My guess is that it got better airflow there, or for some reason that slot performs better, although not as good as my card when new. I thought they were all X8. It makes no sense to me.

I have run Windows Memory Diagnostics several times, including the option where you press F1 and can run all of the more advanced memory tests, which takes several hours. It came back fine with no problems. The other day I ran System Stability Test - AIDA64 [TRIAL VERSION] for over one and a half hours, with the options checked to Stress CPU, FPU, cache and system memory. None of the cores really got much hotter than 65C, maybe 68C for a second, but there were no problems. I ran Prime95 with 24 execution threads for three hours, 16 minutes, in which time it completed 78 tests with 0 errors, and 0 warnings. I monitor the voltage in Supermicro SD3, and they are always well within tolerance (although I have never been staring at that screen when a crash happens).


Even though I have told the computer not to automatically restart on system crashes, there still is no blue screen, which makes me all warm and fuzzy inside, but keeps me from figuring out what the problem is. Do power switches ever go bad? I know they may quit working, but have you ever known of them to shut down the computer themselves? I am just trying to think of anything. I want to offer web-hosting service, but I must have a rock-solid machine which does not crash in order to do so.

Gag me with a spoon, I could shut the machine down, change the jumper, take out my 1GB 5670 Radeon card, and switch back from HDMI to VGA, then hook it onto the motherboard's onboard 8MB video. If it ran fine for a week or so, I think I could safely say the video card is the culprit. I don't want to do that because it would suck. My buddy said I should put it in a colo and vpn when I need to, but just let it sit there and run. I know this is a server, but I built it as a dual purpose machine. With this much money in it, I want to play with it too!

Here is the simple error that I get:

The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.

Here is the more technical log (which I have put an X in the place of some numbers or letters)


- <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
- <System>
<Provider Name="Microsoft-Windows-Kernel-Power" Guid="{XXXXXXXX-XXXX-XXX-XXXX-XXXXXXXXXXXX}" />
<EventID>41</EventID>
<Version>2</Version>
<Level>1</Level>
<Task>63</Task>
<Opcode>0</Opcode>
<Keywords>0x8000000000000002</Keywords>
<TimeCreated SystemTime="2011-11-14T01:34:35.938850000Z" />
<EventRecordID>278807</EventRecordID>
<Correlation />
<Execution ProcessID="4" ThreadID="8" />
<Channel>System</Channel>
<Computer>XXXX</Computer>
<Security UserID="X-X-X-XX" />
</System>
- <EventData>
<Data Name="BugcheckCode">0</Data>
<Data Name="BugcheckParameter1">0x0</Data>
<Data Name="BugcheckParameter2">0x0</Data>
<Data Name="BugcheckParameter3">0x0</Data>
<Data Name="BugcheckParameter4">0x0</Data>
<Data Name="SleepInProgress">false</Data>
<Data Name="PowerButtonTimestamp">0</Data>
</EventData>
</Event>

Here is the friendly version:

- System

- Provider

[ Name] Microsoft-Windows-Kernel-Power
[ Guid] ="{XXXXXXXX-XXXX-XXX-XXXX-XXXXXXXXXXXX}

EventID 41

Version 2

Level 1

Task 63

Opcode 0

Keywords 0x8000000000000002

- TimeCreated

[ SystemTime] 2011-11-04T04:34:56.531651000Z

EventRecordID 267944

Correlation

- Execution

[ ProcessID] 4
[ ThreadID] 8

Channel System

Computer XXXX

- Security

[ UserID] X-X-X-XX


- EventData

BugcheckCode 0
BugcheckParameter1 0x0
BugcheckParameter2 0x0
BugcheckParameter3 0x0
BugcheckParameter4 0x0
SleepInProgress false
PowerButtonTimestamp 0

This is driving me nuts. The main reason I spent this type of money on this machine was to make it bulletproof, but my old P4 that I spent about a third the price on 5 years ago, when parts were expensive, has proven much more reliable in some cases. I know I should have started a new thread, but someone please help! I am at my wits end on this. It is not right to have over $5,000 in an Enterprise Server, and have all these problems.

I will be up to $6250 with a generator and a battery backup unit for my LSI 9260-4i controller, for everything. Let's not even start thinking about adding four more hard drives, and then we will be topping the $7K mark! What have I gotten myself into? Not bragging, but am seriously starting to wonder why I put so much money into this system.

I could enable Hibernation at 100%, and use a full sized Pagefile. I wonder if that would help. That would only waste 98 Gigabytes of hard drive space. That is 2GB less than my Boot, Page File and Crash Dump Partition (Where Windows is Located). My system reserved partition is a measly 100MB. I don't suppose that I can put the Hibernation and Pagefile files on a different partition, and expect it work properly. I am of the impression that they have to be on the root partition.
m
0
l
November 14, 2011 12:46:59 PM

Sorry, I'm at a loss as to what it could be. How long is the system on for before you get the error? Could it be a heat issue? Have you considered installing more fans?
m
0
l
November 16, 2011 11:42:14 AM

It did it yesterday as I was replying to this message. I think it might have to do with simultaneous access of SMBus, but I am not sure how. Speedfan Exotics page killed it. I had a game paused and was writing an email when it happened today. Whatever the cause, I am getting sick of it. Intel lists the TCASE as 81.3°C, although I am not exactly certain what that means. I probably could benefit from two more fans and will most likely get them, but, nothing in my system has ever gotten anywhere near that temperature listed above. On the most severe CPU testing with all 24 threads running at 100%, they rarely ever go above 65C, and at rest they are usually less than the system temperature, that is why Intel started using the reading of "Low", because they said that below around 45 or 50 I think, the measurements of the core's temperatures are not that accurate. Most everything in my system stays at or below 45C during general usage, with the power supply closer to 50 (It is the super-quiet 875 watt model).

Just Error 41, your system did not cleanly restart. I think I might have another bad LSI adapter. What are the odds of that? Who knows. It took three motherboards from Supermicro to get it right. It seems like the CPUID programs hardware monitor and CPU-Z work fine, but their PC Wizard 2010 makes it crash (I think, so I don't want to try). How about this for good measure, I have only seen this exact thing posted only one other time, maybe I need to leave off information. There was no resolution, although it came to fruition that he had things in the 120C range in there. Gee, I wonder why increased CPU usage causes my computer to slow down. The guy asked for answers for weeks before that was discovered. There was not another post after that. Either his computer bit the dust, or he wised up and purchased some canned air. I found this in Device Manager:

General tab, Device status -No drivers are installed for this device

Driver tab lists below named as installed

Intel(R) 7500/5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller-32D

Resource settings:
Memory Range 0000000FEC8A000 - 0000000FEC8AFF

Conflicting device list:
Memory Range 00000000FEC8A000 - 00000000FEC8AFFF used by:
ACPI x64-based PC
System board

Under the details tab, with Power Data selected, it lists:
Current power state:
D3

Power capabilities:
00000099
PDCAP_D0_SUPPORTED
PDCAP_D3_SUPPORTED
PDCAP_WAKE_FROM_D0_SUPPORTED
PDCAP_WAKE_FROM_D3_SUPPORTED

Power state mappings:
S0 -> D0
S1 -> D3
S2 -> Unspecified
S3 -> Unspecified
S4 -> D3
S5 -> D3

I do not know if I am looking squarely at the problem, or if this is pretty much standard. I am generally of the opinion that conflicting devices are not a good thing, although I am not certain that anything can be done in this case. I also don't know about the S2 and S3 states. This might be worth a further look, although I do not know how I could fix such a thing. Is the computer blaming itself?
m
0
l
November 16, 2011 11:49:04 AM

In system Info, the following conflicts/sharing are listed:

I/O Port 0x00000000-0x0000000F Direct memory access controller
I/O Port 0x00000000-0x0000000F PCI bus

I/O Port 0x000003C0-0x000003DF ATI Radeon HD 5600 Series
I/O Port 0x000003C0-0x000003DF Intel(R) 7500/5520/X58 I/O Hub PCI Express Root Port 5 - 340C

IRQ 10 Intel(R) Chipset QuickData Technology device - 3431
IRQ 10 Intel(R) Chipset QuickData Technology device - 342A

IRQ 11 Intel(R) Chipset QuickData Technology device - 3429
IRQ 11 Intel(R) Chipset QuickData Technology device - 3430

IRQ 23 Intel(R) ICH10 Family USB Enhanced Host Controller - 3A3A
IRQ 23 Intel(R) ICH10 Family USB Universal Host Controller - 3A34

IRQ 14 Intel(R) Chipset QuickData Technology device - 3432
IRQ 14 Intel(R) Chipset QuickData Technology device - 342B
IRQ 14 Intel(R) ICH10 Family SMBus Controller - 3A30

IRQ 15 Intel(R) Chipset QuickData Technology device - 3433
IRQ 15 Intel(R) Chipset QuickData Technology device - 342C

IRQ 16 Intel(R) ICH10 Family USB Universal Host Controller - 3A37
IRQ 16 Intel(R) ICH10 Family PCI Express Root Port 6 - 3A4A

IRQ 17 Intel(R) ICH10 Family PCI Express Root Port 1 - 3A40
IRQ 17 Intel(R) ICH10 Family PCI Express Root Port 5 - 3A48

Memory Address 0xD0000000-0xDFFFFFFF ATI Radeon HD 5600 Series
Memory Address 0xD0000000-0xDFFFFFFF Intel(R) 7500/5520/X58 I/O Hub PCI Express Root Port 5 - 340C

IRQ 18 Intel(R) ICH10 Family USB Enhanced Host Controller - 3A3C
IRQ 18 Intel(R) ICH10 Family USB Universal Host Controller - 3A36

IRQ 19 Intel(R) ICH10 Family 4 port Serial ATA Storage Controller 1 - 3A20
IRQ 19 Intel(R) ICH10 Family USB Universal Host Controller - 3A39
IRQ 19 Intel(R) ICH10 Family 2 port Serial ATA Storage Controller 2 - 3A26
IRQ 19 Intel(R) ICH10 Family USB Universal Host Controller - 3A35

Memory Address 0xA0000-0xBFFFF ATI Radeon HD 5600 Series
Memory Address 0xA0000-0xBFFFF PCI bus
Memory Address 0xA0000-0xBFFFF Intel(R) 7500/5520/X58 I/O Hub PCI Express Root Port 5 - 340C

I/O Port 0x000003B0-0x000003BB ATI Radeon HD 5600 Series
I/O Port 0x000003B0-0x000003BB Intel(R) 7500/5520/X58 I/O Hub PCI Express Root Port 5 - 340C

Memory Address 0xFEC8A000-0xFEC8AFFF Intel(R) 7500/5520/5500/X58 I/O Hub I/OxAPIC Interrupt Controller - 342D
Memory Address 0xFEC8A000-0xFEC8AFFF System board

I/O Port 0x0000D000-0x0000D0FF ATI Radeon HD 5600 Series
I/O Port 0x0000D000-0x0000D0FF Intel(R) 7500/5520/X58 I/O Hub PCI Express Root Port 5 - 340C

Does that look bad? Is there an easy fix? I am not sure how to assign an IRQ, or if I should just let the system handle it. It looks to my untrained eye like the graphics card might be wreaking havoc, but I am no expert. What do you think?
m
0
l
November 20, 2011 7:28:26 AM

Casually looking over the MegaRAID SAS 9260-4i RAID Controllers Quick Installation Guide, on page 2 of 4, on the right hand column on the bottom half of the page, I found this "Step 4 Insert the RAID controller in a PCI Express slot on the motherboard, as shown in figure 2......
Note: This is a PCI Express X8 card and it can operate in X8 or X16 slots

Well silly old me somehow got the mistaken idea that all of my PCI Express slots were X8. According to the manual, there are four X8 slots, and two X4 slots, but in reality, my motherboard has three 4X slots, with a grand total of five. I had moved it originally because it was sandwiched between the CPU Heatsink, and the double width graphics card. There is about a half inch on either side.

The performance has improved drastically; I have not had any problems. I think I will give it a few more days then decide whether to keep it, or send back the RMA replacement part. I cannot believe I did not think of that. That may also explain why my video card (or system) has hung during game play. I know you are wondering what that has to do with anything, but here is my answer: The graphics card is X16 in an X8 slot, which has an opening in the back. The raid controller is should be installed in an X8 or X16 slot. That is my theory. The adapter takes to it better, but not the LSI MegaRAID card. I am praying that this will answer my prayers.

As I said before, I have an APC 950 VA UPS, but it does nothing when the computer just shuts down with error 41, with no record in the event log of what may have caused it. I am considering purchasing a LSI00161 MegaRAID LSIiBBU07 Battery Backup Unit for my array. My question is twofold. First, is there a need for both, or is the APC suitable? It can store cache data for up to 72 hours during the event of an extended power outage, so that should answer my question. I have had the crashes, and it says you either have bad memory, a bad battery, or no battery installed, and that the cache data has been lost, but the adapter can recover. If it could not recover, I might be in trouble. What I am saying is apart from the problems I have experienced, this machine runs good, and I do not want to re-do anything if I can avoid it.

So here is the second and probably most important question. Say the machine reboots all on its own again, and I had the LSI battery backup unit attached to the MegaRAID card, would that save the data, during an unexpected power outage where the UPS doesn't even help, the machine just shuts down in an unclean fashion.

I am seriously considering purchasing it, but if it is not going to do anything better than my UPS, then why waste the money? With a calculated risk, even though I do not yet have the LSI BBU installed, I have already enabled write cache policy to always write back on my server. There are three selections in WebBios or MegaRAID Manager: Write Through, Always Write Back, and Write Back With BBU. I have heard folks say that it does go faster with the BBU installed, and set to Write Back With BBU. Would there really be any difference that and Always Write back, or are they just trying to sell their batteries?
m
0
l
!