5Big Network 2 - RAID will not rebuild after latest failure

cstokes86

Reputable
Sep 15, 2015
7
0
4,510
Hi Everyone,

I hope this forum is still pretty active. I have an issue that I've never run into before.

I have 5 2TB drives in RAID 5 and it has worked flawlessly over the 3-4 years I've had it. Drives in the past have failed and I simply swapped it out with the same size and model number as before. The array resynced no problem.

On 8/31/15 the drive in bay 5 failed. No big deal! I had to order a new 2TB drive because the replacement I had on standby was dead.

I have Seagates, model ST2000DM001, but I got a slightly updated model when I setup an RMA for the failed drive, most likely due to the fact they didn't have the exact model on hand. The replacement has the same specs from what I can tell.

While waiting for a replacement from Seagate to come in I ordered and installed a WD20EFRX to get me over the hump. The plan was to keep the replaced Seagate as a backup for when another drive fails down the road.

The raid started to rebuild normally. After a few days it hit about 70% and the array went completely died. Screenshot here.

I received a notification in my email: The RAID array is inactive

All shares were offline and the drives showed as inactive. After checking the logs I didn't see where any other drive failed, so it couldn't be possible the entire data set was blown away. I thought "It must be a glitch in the OS, right?"

I rebooted the NAS and it came back up normally. All the drives online, shares are back and all the data is there. The sync restarted on drive 5. All is good right?

After waiting another few days the same exact thing happened!

Thinking the replacement WD drive I installed was bad I took it out and replaced it with the Seagate replacement and let it go.

Waited another few days and the same thing happened again. It seems to happen right at 69.3% complete. I was watching it this morning when it happened.

So it appears the raid fails to rebuild at or around the same place with drives of the same capacity and speed. What could possibly be the issue?

I could buy another ST2000DM001 which will fail after a year and be out of warranty by then and see if that makes a difference, but shouldn't I be able to mix and match different model numbers and brands as long as the drive size is the same? Any ideas/suggestions would be awesome and greatly appreciated.
 
Solution
The synology disk stations are nice. Just installed one for a client (415+) and the features they have are amazing. Domain user syncing, phone and PC apps, or use a web browsers and access your files from anywhere (If my RAID server ever fails I'm getting ones of these). The one you are gettings will be a very nice addition.

Also Get WD Red drives. They are more reliable and designed for RAID. You don't need to get the RED Pros just the regular REDs.

As for the ssh part I haven't done that before on these guys (and has been a long time since i have even used SSD so yea i'm not help haha) but finding out what the SMART status of all the drives might help. Who knows there could another another drive that is about to fail that is...
Mixing and matching is frowned upon in the RAID world but for the most part yes as long as the drive is as big or bigger it should work.

If they both seem to crash as the time place maybe something is wrong with the RAID itself? A corrupt file or something that could be causing this.

do you have a backup or a drive big enough to put all your data? I know that would be a pretty big drive.

I'm thinking maybe backup all your data, delete the RAID, remake it, copy it all back.

I Don't know how much data you have but that could take a long time depending on file size (Fewer bigger files will transfer much faster than a lot of smaller files) and how fast of a connection you have.

Also if you don't have a backup GET ONE NOW! You don't want to wait until a second drive dies.

Now what if you put back in the other drive or if you take out the drive you put it. Does the RAID come back online? What if you just reboot it. Does it still stay offline?
 

S Haran

Distinguished
Jul 12, 2013
219
0
18,910
I agree with drtweak, make a backup asap. Seagate drives with "DM" in the name are notoriously problematic.

That said if the rebuild fails again then ssh into the LaCie's Linux OS and capture the output of mdadm --examine on the data partition of all 5 drives. This should give you a clue as to what is going on.

Also check the SMART data on all 5, the Linux command is smartctl -a /dev/sdX
where X is the drive letter id, likely a,b,c,d,e
 

cstokes86

Reputable
Sep 15, 2015
7
0
4,510
This is odd, I thought I responded to drtweak last night, but I don't see my response. :/

drtweak,

You may be right that something on the array is corrupt and causing the rebuild to fail. I ordered a new ST2000DM001 and will throw that in once it arrives to see if the NAS just doesn't like the drives I was using to replace the failed one.

I did make a backup of all the really important stuff already in the event I do lose another drive.

I ordered a new Synology DS1515+ to act as a new primary storage location. This has been my plan for a long time, but these recent happenings kind of forced my hand.

I do plan to use the Lacie as a backup destination for the rsync jobs coming from the Synology once I've got everything setup. This means I may end up blowing away the array on the Lacie anyway and starting over.

When I reinstalled the original drive the raid came back online without any issues, but the drive would fail after a couple hours and send the array into a degraded state. Even after rebooting the array would be ok until drive 5 failed.

I've since sent back the failed drive so I cannot test any further with it.


S Haran,

Thanks for the information to research this further. I have never setup SSH access to the Lacie, but I did find this:

http://lacie.nas-central.org/wiki/Category:2big_Network_2#Getting_root_SSH_access_.28without_disassembling_your_Lacie.29

Is that a good tutorial to utilize to enable SSH access?
 

cstokes86

Reputable
Sep 15, 2015
7
0
4,510
S Haran,

I looked through the entire manual but didn't find anything explaining how one would connect via SSH from putty or terminal. While there is the mention of SSH FTP (SFTP) that is just a secured version of FTP and from what I can tell isn't any means of using SSH to get into the underlying OS to do more digging.

I think they just mention SSH to show they include an encrypted/secured method of transferring files. Just a guess.
 

cstokes86

Reputable
Sep 15, 2015
7
0
4,510
Thanks, S Haran. But without the drives being attached to the raid controller from which they were configured through I do not believe I would be able to access any of the data on the drives.

Are you saying the individual drives contain some sort of file that the raid controller sets up when the drive is added to the array?
 

S Haran

Distinguished
Jul 12, 2013
219
0
18,910
Looking at my notes from a LaCie 5Big Network 2 that I worked on I can confirm there is no hardware RAID controller. Instead it uses Linux software RAID (mdadm).

Each drive has an mdadm superblock and you can see it's contents using the command: mdadm --examine ...

Note partition 2 will be the data partition that you are interested in.
 
The synology disk stations are nice. Just installed one for a client (415+) and the features they have are amazing. Domain user syncing, phone and PC apps, or use a web browsers and access your files from anywhere (If my RAID server ever fails I'm getting ones of these). The one you are gettings will be a very nice addition.

Also Get WD Red drives. They are more reliable and designed for RAID. You don't need to get the RED Pros just the regular REDs.

As for the ssh part I haven't done that before on these guys (and has been a long time since i have even used SSD so yea i'm not help haha) but finding out what the SMART status of all the drives might help. Who knows there could another another drive that is about to fail that is causing this error as well on the rebuilt.
 
Solution

cstokes86

Reputable
Sep 15, 2015
7
0
4,510
drtweak,

Thanks for the props. It's definitely a huge upgrade. I'm installing extra memory AND WD Red Pros :)

I'm going to see about gaining SSH access into the Lacie and checking the SMART status and mdadm if the rebuild fails with the new replacement I ordered.

Hopefully it will just all pick back up and rebuild like a charm without doing anything once the replacement drive is int. Who knows. I will keep you guys posted.

Thanks again for your help and input.
 

cstokes86

Reputable
Sep 15, 2015
7
0
4,510
Just got my new Synology. Can't wait to get this beast setup: https://lh3.googleusercontent.com/-FOIwZa9U5sk/VfsOfaI9CwI/AAAAAAADzjA/w2o4tCuGlPA/s576-Ic42/IMG_20150917_142422.jpg

It's a smaller footprint than what I was expecting, but that's good for me! I already installed the extra 4GB RAM stick.

The Lacie failed at exactly 69.2% a few moments ago. Since it's so consistent I think there must be something corrupt within the mdadm file. I will check that out more in the future, but for now I will work getting this new beast setup. Can't wait to test.
 

jetich

Reputable
Sep 25, 2015
1
0
4,510


Did you ever recover your data? I'm crying the blues right now with an RX1211RP that dropped two drives then the entire volume.
 


OUCH! That sucks! You may want to use something like R-Studio that is designed for RAID recovery (just files not fixing the RAID) to get your files if you don't have a backup.

After that GET A BACKUP!!!