NVME SSD vs 2x m.2 RAID 0 SSD

CX

Commendable
Jun 19, 2017
6
0
1,520
Hello,

I'm creating a workstation and I'm looking for a solution for fast read speeds for data I can afford to lose. Particularly I will be using it as SWAP space for data which doesn't fit in my RAM, so fast reads are paramount.

The options I'm looking at are:
- NVME SSD
- 2x M.2 SSD in RAID 0

I don't need an overly huge amount of storage, 256 Gb will probably be enough even. But I'm wondering which of these is generally the fastest?

Thank you in advance,

CX
 
Solution
I don't know what your budget is, but have you thought about using AWS EC2 or Google compute for your LARGE runs? For a few $$ per hour you can run your data in a host with 100+ GB RAM. An M4.10xlarge ($2/hour) has 160GB RAM and 40 cores an r4.4xlarge ($1.064/hr) has 16 cores and 122GB RAM

CX

Commendable
Jun 19, 2017
6
0
1,520


Thanks for the suggestion, but I'm not sure I want to spend > € 1000 on RAM alone as I will at require around 100 Gb, and it may not even be enough, as my data grows. Therefore the SSD solution seems like a better solution for my use case.
 

kanewolf

Titan
Moderator
I don't know what software you are using that you believe will keep that much in RAM. But I believe you will be disappointed in the performance you get as soon as you have to swap.

Can you better describe your use? You might have the right solution, but so far you have presented an implementation rather than a problem.
 

CX

Commendable
Jun 19, 2017
6
0
1,520


The reason for my build is that I'm disappointed with my current swap. And I don't think there is a way to really avoid swap.

I'll be using it for machine learning in Python. The full feature data I'm working with is a matrix that's 20M rows long and about 450 columns with mostly double precision numbers. With no overhead, this alone is about 70Gb worth of data. I've run it on the partial data, but if I run it with all of it, I will have better results.

I'm not going the hadoop route, used for big data. I'm fine with a tenth of the speed of my RAM, as long as it with finish within a week or so. With my current setup, I will be dead before it's done.
 

kanewolf

Titan
Moderator
I don't know what your budget is, but have you thought about using AWS EC2 or Google compute for your LARGE runs? For a few $$ per hour you can run your data in a host with 100+ GB RAM. An M4.10xlarge ($2/hour) has 160GB RAM and 40 cores an r4.4xlarge ($1.064/hr) has 16 cores and 122GB RAM
 
Solution

CX

Commendable
Jun 19, 2017
6
0
1,520


Hey man that's an awesome solution! I never know this was so cheap! Thank you and have a great day.
 

kanewolf

Titan
Moderator
This is why having you provide your problem, rather than just choices of solutions allows much more creative options.
A few links
EC2 instance types -- https://aws.amazon.com/ec2/instance-types/
EC2 pricing -- https://aws.amazon.com/ec2/pricing/on-demand/

Remember that you will be billed if your VM is created. You need to be disciplined to kill your VMs when you aren't using them. If you leave an inactive VM up overnight (because your job died, but you expected to run overnight), then you will be billed for those hours.
BUT for an infrequent surge of processing required, it is a good option.

Do all your "practice" on a small VM that is similar to your target. If you are going to use an M4 instance then use the smallest M4 instance for your testing and debug.