I'm running ubuntu server 14.04 on Supermicro X10SLM-F / Xeon E3-1271 v3
Memory: SuperTalent 32GB DDR3 1600 ECC
About every 4 days, the logs on Ubuntu will show this:
Immediately after this the server reboots itself in a "power-cycle" fashion.
When I look in the BIOS event log, I see this:
And the description of the error is:
A few questions:
1. If the ECC memory is self correcting, why does the machine reboot itself?
2. Am I, perhaps, missing some setting in the BIOS that will stop the box from rebooting itself?
3. Is this obviously a memory stick issue or can this be a slot issue or a CPU issue?
Thank you for any advice.
Memory: SuperTalent 32GB DDR3 1600 ECC
About every 4 days, the logs on Ubuntu will show this:
Code:
{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
{1}[Hardware Error]: It has been corrected by h/w and requires no further action
{1}[Hardware Error]: event severity: corrected
{1}[Hardware Error]: Error 0, type: corrected
{1}[Hardware Error]: fru_text: CorrectedErr
{1}[Hardware Error]: section_type: memory error
[Firmware Warn]: error section length is too small
Immediately after this the server reboots itself in a "power-cycle" fashion.
When I look in the BIOS event log, I see this:
Code:
DATE TIME ERROR CODE SEVERITY
06/13/15 13:13:38 Smbios 0x02 P1-DIMMB2
And the description of the error is:
Code:
Single Bit ECC Memory Error
A few questions:
1. If the ECC memory is self correcting, why does the machine reboot itself?
2. Am I, perhaps, missing some setting in the BIOS that will stop the box from rebooting itself?
3. Is this obviously a memory stick issue or can this be a slot issue or a CPU issue?
Thank you for any advice.