I would like to build a fairly high end system to render my insanely huge files.
combined budget $10 000. (i have a partner).
I'm not a very hardware technical person so I need some help.
I'm a professional photographer and are currently doing large scale panoramic images. One very large image will consist of 2000+ , 21megapixel images, the resulting image will be at least 20 000 megapixels +. (20gigpaixels)
This has been done before rendering on a server board with 2 xeon processors, 8 gigs ram - specs unknown - rendering time took 48hrs. The files were only 12mp each and the resulting file was 96gigs at 13 gigapixels. I estimate my file will be 200gigs +
The rig should be damn stable as this huge file needs to be edited in Photoshop CS4 afterwards.
I don't want an overclocked system, it should be as stable as possible.
Hardware i'm looking at:
I'm currently looking at a Gigabyte server board - GA-7TEWH (1.0) - http://www.gigabyte.com.tw/Products/Networking/Products...
I will go for 2 Intel Xeon X5570 processors http://ark.intel.com/Product.aspx?id=37111&processor=X5...
For memory, I will go with Corsair DOMINATOR, 24GB kits x 2 - 48gigs
Solid state drives I think will be a must but will push up the overall price quite a bit.. will consider
I can add several 2 TB drives in Raid 0 config as I have a few drives already.
For graphics I would like a Quadro FX 5600 but will have to look at my budget options again :-(
I will get whatever power supply that will be needed included extra cooling.. options?
I will be running Windows 7 Ultimate 64bit
What I need to know:
Am I looking at the right specs for my task at hand?
Motherboard is probaly the most important so I am open for better suggestions..
also, this gigabyte board only supports Sata 2. Is there a server board that supports Sata 3 with Raid 0? that will be the way to go.
extra features like USB 3 will also be a bonus..
Any other options/suggestion as to how I can improve on this would be greatly appreciated.
1) Many cores and high compute capability. I understand that CS4 is multi core enabled and benefits from a high clocked cpu.
To that end, a server board with two high end Xenon cpu's is probably the best you can do today. The new 32nm 6 core gulftown cpu's will be coming this year that will be a major improvement. If you can wait, fine, but if not, make certain that the motherboard you get is capable of replacing your cpu's with gulftown.
2) High Hard drive performance, particularly in sequential operations.
I think a very good hardware based raid controller or two will serve you well here. I think it will be the key component, research these cards carefully, they are expensive. Motherboard sata 6gb or not will not be adequate.
Set it up with two raid-0 arrays, one for input and one for output.
Populate each array with multiple hard drives that have dense platters which give better data transfer rates. They will usually be the 1tb and larger drives.
A good SSD may be useful for the OS, and perhaps some other functions. I do not see it as a critical component unless your applications do lots of small I/o's to smallish files.
USB 3.0 may not be important to you. I see it as useful for the home user to do external backups.
You will probably want to be able to do backups faster at e-sata speeds.
Thanks for your input. I reviewed our budget and it seems with the specs that we want, 10K wont be enough (looking into that now). Gulftown cpu's definitely would be better but it wont be cheap either... and we can't really put our projects on hold to wait for a cpu to be released.
Any boards you can recommend for future cpu upgrades?
I fully agree with the drive perfomance. Any suggestion on good cards I should be looking into as I have not dealt with this in the past.
Will onboard raid not be good enough? Sata 3 in Raid 0 on the gigabyte board claims speeds up to 24GB/s.. your thoughts on this?
..but I cant find this feature on a server board though..
Corsair Dominator still sounds like pretty damn good memory to me and my supplier has this in stock.
They also have ECC server memory in stock - Kingston ECC-Register with Parity , 3x 8GB Kit DDR3-1333
With these kits I can then fill all the ram slots with a total of 96gigs ram but it will be expensive.
My application for rendering these images is memory intensive. Server memory is slower but more stable. Should I start with 2 kits (48gigs) of these ECC ram and later add more when I can afford?
The current X58 chipset based motherboards will support gulftown. A bios update may be required.
I expect that the server equivalent to be the same. Look to see if the server motherboard has any statement of such future support.
Some server motherboards may have onboard raid that is powered by a hardware raid chip. But, I do not think that the typical onboard raid will suffice.
I am not experienced with high performance discrete raid. I do know that the best cards will have hardware raid processors, and perhaps multiples. They will also have large caches, and battery backup to be able to use deferred writes safely. They will be installed in a X16 pci-e slot which gives them many times the data transfer capability of sata 3.0.
I would suggest that you call customer support for some of the major manufacturers such as Adaptec, 3ware, Areca, LSI, Highpoint.
You will get better informed advice. You will also be better able to assess how well the vendor supports their cards. In the past, I have called Adaptec, and was well satisfied.
The best hard drives will have a maximum continuous data transfer rate of 200 mb/sec or so. That is on the fastest outer parts of the drive. As the drive is filled, that rate will go down, perhaps to 100 mb/sec. With a good raid card you can gang together a good number of drives and get an impressive sequential throughput rate. It is not clear to me how much is justified. It is an area you should test. If more drives are very helpful, it does not cost that much to add more later.
The nehalem architecture onboard memory controller is very good. It is able to feed the cpu very well from any speed. From all the tests I have seen, the ram speed or timings make very little difference to real application throughput(vs. synthetic benchmarks). It is on the order of 2-4% between the slowest and fastest ram. Faster ram may be good for record level overclockers. Don't pay much more for fast ram.
It would be better to get more ram for your application. Ram construction components can vary from one batch to the next. Even within the same brand specs and part number. Some motherboards are very sensitive to this. That is why ram is sold in kits of multiple sticks. I would try to populate all of the ram in one batch if possible.
ECC ram has error correction capabilities. I see it as good for systems that must be up 24/7, and which handle life threatening critical applications. If ram should fail on your system, no fatal harm will be done. I see no reason that normal ram will have any stability issues, just so long as your system has adequate cooling. Enthusiast ram is sold with fancy heat sinks, but it is really not necessary, particularly if the ram is not overclocked.
With all these hard drives, ram, and possibly a strong graphics card, pay attention to the case you get. It should have lots of room and good cooling.
Geofelt, no disrespect, but you are focusing too much on hard drive throughput. I have experience with panoramic images, albeit not nearly as large as the OP. Last year, I used Photoshop CS4 64bit for 20 10MP images with a Q6600, 8GB ram, 4 10k rpm Raptors in Raid 0 for OS, Apps & Scratch disk (onboard Intel raid) and 4 7200.11s in Raid 5 containing source images (on 3ware 9650SE controller) and 2 Raptors in Raid 0 for output. Then I upgraded to an i7 920, 12GB ram, Intel 80GB SSD for OS/Apps/Scratch disk and changed source to 2 7200.12s in Raid 1. I created the same pano image and watched Task Manager for ram and CPU usage. The new system took less than half the time even though the source disk throughput was a lot slower. Also, Photoshop didn't use more than 4-4.5GB of ram and CPU utilization wasn't very high. My conclusion from this test is that the Scratch Disk is the most important, and the random read/write of the Scratch Disk is the most critical for rendering.
Here is another reason why sequential read/write is not very important - if the ending file is 200GB and it takes 24hrs to create, that equates to 2.31 MB per second.
Here are my recommendations:
Motherboard: SuperMicro is known for quality and reliability in the server market.
Ram: a motherboard with 12 ram slots, you can do some testing. Try 48GB ram (6 x8GB ECC Registered). Test this with the next part - SSDs.
Hard Drives: 3 80GB Intel X25 SSDs in Raid 0 for Scratch Disk - this will work great for both rendering and editing. Depending on the size of the final images and the speed of 48GB ram + 3 SSDs in Raid 0, you could add another 80GB Intel SSD, which should allow the full image to be loaded into the Scratch Disk.
Raid: Adaptec or Areca. Areca is known for reliability, which is why I use their 1680ix for my video editing/compositing workstation. Some say a BBU(battery backup unit) is necessary but a UPS is absolutely critical. For hard drives, Seagates have been perfect for me (currently have 11 in my video workstation) but Samsung F3's are faster and appear to be very reliable according to their newegg reviews. For data protection, I would use Raid 6 due to it allowing 2 drives to fail without data loss. With as much data as you will have, Raid 5 is very risky - if a drive fails in Raid 5, it could take 24hrs+ to rebuild the array, which leaves you totally vulnerable during that time.
Graphics Card: I don't see an advantage to using a FX5600 or even a FX4800. The OpenGL acceleration in Photoshop CS4 really has little to no effect on extremely large images because the graphics card only has so much ram to store image data. A GTX285 2GB or the new ATI 58xx with 2GB would be the best.
Cases: I prefer Lian-Li as they are top quality and provide storage of up to 10 drives and up to 2 power supplies.
More thoughts: you need drives to store source images (raw or jpeg?) and drives to store output for editing. For maximum reliability, the source and output drives should use Raid 1, 5, 6 or 10. I think you could get away using 2 drives in Raid 1 for the Source images, but the output/editing drives should be Raid 6 using at least 5 drives. More drives increases throughput, and editing is where higher throughput helps (for reading the 200+GB file).
I like to hear from those with actual experience. I will readily admit to no experience with CS4.
However, I am experienced in performance issues, and I have a few questions about CS4 in particular:
1) Does CS4 process the 20 10mp images one at a time consecutively, or does it somehow take all of them together?
2) Exactly what is CS4 doing with the images?
3) How big (in bytes) are the images? Is it reasonable that the complete image can fit in ram?
4) I understand that CS4 is multi core enabled. Is there a limit to the number of cores/threads that can be usefully used?
5) Does the source have to be read in it's entirety before any processing is done, or can processing start as soon as the first part of the source has been read?
6) Similarly, is the output written as parts are completed, or must output wait until all the processing is done?
7) In your example, the output was 200gb. What was the source size?
8) Where did the source originate? I presume from a camera SD card. If so, how fast can such a SD card be read?
How important is protective raid to this application?
If a hard drive should fail what is the impact? Is there no more impact than the frustration of rerunning the process again?
I ask because hard drives are quite reliable, claiming 1 million hours MTBF. That is over 100 years. The price you pay for any of the protective raid-s is extra writes.
You may recoup some of that overhead with parallel reads from an excellent controller.
I suspect that you are correct that the scratch disk is very important, but perhaps for a different reason. A scratch disk will be necessary to store partial results
if it can not be held in ram. Therefore, the more ram, the better..particularly if the whole work can be kept in ram. The change from 8gb to 12gb
might explain a large part of your performance increase. A SSD reads very quickly, in part due to it's negligible latency. But lots of writes can overwhelm the nand chips.
It is not clear to me that today's SSD is the best device for a scratch disk. Perhaps a test is in order. I have tried the X25-M SSD in raid-0. It was not helpful to normal OS type operation.
I did it to get a 160gb single image for my OS drive. I replaced it with the 160GB SSD, and things do seem faster. Larger SSD's do an internal form of raid-0, so the larger they are, the better.
If ram is insufficient, and a scratch drive is necessary, look at fusion IO drives. They attach via pci-e slots and have higher bandwidth. Currently, some fail for not being bootable.
One reason cpu utilization is not maxed out is because the cpu is waiting on something. Most likely I/O of some sort.
Also, the task manager can be confusing to interpret. The cpu utilization is a percentage of the whole system. If only one core is doing all the work, then it may not
show very high. Similarly the ram usage is confusing. Ram data may be kept around after use in anticipation of reuse later. I like the concept of working set, but
I have no idea how the performance monitor calculates it.
Could you possibly run some tests to determine exactly what resources are actually the critical factors?
In particular, Can you test performance where the input and output files can be kept entirely within ram?
Does the resource usage change during the run? Perhaps it is I/O bound at the input and output stages, but cpu bound in the middle?
Before we get too far into this topic with CS4...
Let's first look at the job at hand.
Stitching the images. My source images, 2000 +, will be on a separate drive, varies in size 20 -35mb each (2000 images @30mb +/- 60gigs)
I use Autopano Giga Pro 2 64bit to stitch the images, the best prog so far, tested PTGui, but to slow...
It handles up to 32 cores and uses the GPU for processing.
That being said, CS4 is not practical for this type of stitching.
Autopano - The program reads all the data from each file, put them together in a preview screen to allow some manual changes to be made. This is basic for any computer to handle.
The real jobs start with rendering these images and creating the actual image. The app will use the maximum ram that has been allocated for blending the images. As soon as 2 images has been blended, the cpu will be maxed out until that image has been placed in its respective place. Cpu usage will then drop and ram will take over to the next blending step. This will continue until the full image has been rendered.
Therefater, Autopano will then colour blend all the images, from what I have seen, this is not really cpu nor ram intensive, its just takes the program a very long time to complete this task... still need to figure that out..
I feel that the scratch disk is critical here as all changes are being written directly to the temp folder. The final file/image are also being written to this folder but in smaller chunks of 4gigs. The final photoshop file goes to the output directory.
Taken the answers from above into consideration, opening the final image in CS4 should be much smoother as with previous loads. I have worked on my other files before, 30 - 60 gigs each, all containing layers of each image and this takes a while to load.
Would love to see what the new results will be.