F@H - My Experiences Using both AMD and Intel

Do you fold on an AMD or Intel based Rig?

  • AMD

    Votes: 4 19.0%
  • Intel

    Votes: 3 14.3%
  • Both

    Votes: 3 14.3%
  • Intel + GPU2

    Votes: 5 23.8%
  • AMD + GPU2

    Votes: 4 19.0%
  • Neither/Some other processor/GPU2 client Alone

    Votes: 2 9.5%

  • Total voters
    21

ElMoIsEviL

Distinguished
One of my personal guilty pleasures is folding. There is some sense of accomplishment I get when my PC crunches a Work Unit and uploads the results to Sanford. The sense is almost like a sense of goodness and altruism as I am helping our scientists understand the way in which proteins fold and thus unlock the building blocks of the cells themselves in order to find a cure of many illnesses.

I used to have a wide array of rigs folding at once but have since dropped down to 3 rigs.

1. Core i7 920 @ 4GHz with HT (NH-D14 cooled) folding EL WUs.
2. Core i7 920 @ 4.4GHz with HT (Watercooled) folding EL WUs.
3. Core 2 Quad Q9550 @ 3.7GHz + nVIDIA 8800 Ultra folding SMP2 + GPU2 WUs.

You might be asking why the lack of AMD rigs? Well I have found that when it comes to folding, AMD rigs are just not worth the hassle (Performance/Watt wise). You see I was folding on a Phenom II X4 955 BE @ 3.8GHz for a while doing VMWare SMP2 work units. What I found is that the Phenom II X4 at the speed of 3.8GHz was only able to do just a little more over half the PPD (Points per Day) of the Core 2 Quad Q9550 @ 3.7GHz despite being more power hungry.

In fact it is my experience that the PPD performance of a Core i7 rig simply nullifies AMD from any sort of competition at all under Folding@Home.

I want to know what you think...

So I have a few questions... do you think Bulldozer will rectify this apparent lack in Execution performance and Caching performance in AMDs current architectures?

And the second question is the poll question... Do you fold on an AMD or Intel based Rig?
 

ElMoIsEviL

Distinguished

Current GPGPU implementations (mostly on the software front) have not caught on with the newly available GPGPU features found in the Radeon HD 5000 series (primarily the 5850 and 5870 as the 5700 series and lower don't support Double Precision FP64 results).

GPU3, not in it's first iteration with the new OpenMM core written in CUDA but in its second iteration with the OpenMM core written in OpenCL, will allow for a far wider range of support out of the entire C++ libraries (which are written specifically with x86 in mind).

This will allow the GPU3 client to use both available GPU (from multiple cards) and CPU resources together, balancing the load based on what libraries are supported by what hardware and delivering truly remarkable new levels of performance.

In the end we will have a single client rather than several different clients thanks to OpenCL's ability to support a wide variety of differing processing platforms and architectures.

But until then, EL WU folding (on a Core i7 with HT) is the way to go for the ultimate in PPD.
 


I actually got started folding when a biochemistry TA in undergrad was teaching us about protein folding and mentioned F@H by name in class. I thought it was a good use for my otherwise idling machines and let 'er rip ever since.

Here's what I currently use:

1. Athlon 64 X2 4200+ Manchester @ stock folding SMP WUs- roughly 1400 ppd
2. Dual 3.20 GHz/2 MB L3 Xeon Gallatins (same core as the first Intel Extreme Edition CPU :sol: ) @ stock, HT on folding four UP WUs- 500-750 ppd depending on WUs
3. Core 2 U7500 in a broken laptop @ stock folding SMP WUs- roughly 850-900 ppd

All in all, I average around 3000 ppd or so. It's not great but I don't see it changing until I upgrade my machine in a year or so. Then...watch out as I have a real beast planned :D Think "doesn't fit in any case currently on Newegg" and you probably have a pretty good idea.

I used to fold on my Athlon XP 3200+ HTPC but the A7N8X-E Deluxe motherboard the 3200+ sits in draws CPU power from the +5V rail instead of the +12V rail. I quit folding after I blew an ATX12V PSU and learned the hard way about the goofy power arrangement on the A7N8X-E. It folded UP WUs at about 200-220 ppd, depending on WU.

You might be asking why the lack of AMD rigs? Well I have found that when it comes to folding, AMD rigs are just not worth the hassle (Performance/Watt wise). You see I was folding on a Phenom II X4 955 BE @ 3.8GHz for a while doing VMWare SMP2 work units. What I found is that the Phenom II X4 at the speed of 3.8GHz was only able to do just a little more over half the PPD (Points per Day) of the Core 2 Quad Q9550 @ 3.7GHz despite being more power hungry.

A lot of F@H WUs tend to favor Intel chips for some reason; I wonder if they specifically optimize for Intel CPUs like they used to. F@H used to have the famously-fast QMD WUs that they could only run on Intel CPUs due to some licensing restriction with Intel, which is why the Linux client (which didn't used to grab your CPUID like the Windows client did) never gave out QMDs. The best we got was Double Gromacs WUs, which were still faster than any UP WUs out now- I was pushing 250 ppd on a 2.2 Northwood-A, while a single typical Gromacs WU (a 2480 or 2490 series WU) on each 3.2 Gallatin gets about 280-290 ppd. But I suppose Linux users get the last laugh as the native Linux SMP client is about twice as fast as the Windows one :sol:

In fact it is my experience that the PPD performance of a Core i7 rig simply nullifies AMD from any sort of competition at all under Folding@Home.

...except when you can buy a quad-socket Barcelona unit for roughly what a good highly-overclocked i7 machine would go for. A guy I know put together an 8-way Opteron 8356 unit and he reported it got about 52,000 ppd with the -bigadv WUs after the early-turn-in bonus :eek: A quad would do less than half of that because of a lower bonus, but it will still do very, very well. I think I've seen i7s get in the 8000 ppd or so range, but I am not sure of the clock rate required to get that score.

I want to know what you think...

So I have a few questions... do you think Bulldozer will rectify this apparent lack in Execution performance and Caching performance in AMDs current architectures?

Bulldozer looks like it will be much better than K10 in FP performance due to the massive FMAC FPU, so I expect it will do very well in F@H. If the reason that the Intel CPUs do so well is sheer cache size (this has been debated a lot as the SMP clients have been evolving), then AMD may not catch Intel. Intel historically puts larger caches on its chips than AMD does, and I don't see that changing a lot. Also, if Pande et al. are specifically optimizing for Intel CPUs, then AMD may also not do as well as expected.

And the second question is the poll question... Do you fold on an AMD or Intel based Rig?

I run what I got, which is a little of both. :kaola:
 

ElMoIsEviL

Distinguished

With a single Core i7 with HT on of modest clock speeds (around 3.2GHz) will net you around 25-26K PPD using EL WUs (around 61-65K every 2.5 days).

From what I've seen an 8 core AMD Opteron setup is much slower than a Core i7. I could be mistaken but that has been my experience thus far.

EDIT:
That's what I thought (http://foldingforum.org/viewtopic.php?f=55&t=10733&start=15) here is a 16 core Opteron 8358 system:

Project #: 2681
Average time/frame: 29.25 Mins ( +- 5 sec )
CPU: opteron 8358 @ 2.4 GHz
# of CPU sockets: 4
# of cores: 16
# of fahCore_A2 processes running: 15

RAM installed: 16gb
RAM used by FAH: 6gb
OS / Linux kernel: centos 5.3 x64 2.6.18-128 el5


8 cores for AMD:

Project #: 2681
Average time/frame: 42.4 Mins
CPU: AMD Opteron 2380 @ 2.5 GHz (stock speed)
# of CPU sockets: 2
# of cores: 8
# of fahCore_A2 processes running: 8

RAM installed: 8 GiB
RAM used by FAH: 3.28 GiB
OS / Linux kernel: linux 2.6.28.10-vanilla


Compare that with a Core i7 @ 3.8GHz:

Project #: 2681
Average time/frame: 35 minutes 20 seconds
CPU: Core i7 920 @ 3.8GHz
# of CPU sockets: 1
# of cores: 4 physical / 8 virtual
# of fahCore_A2 processes running: 8

RAM installed: 6GB
RAM used by FAH: 3.4GB

OS / Linux kernel: Fedora 11 kernel-2.6.29.6-213.fc11.x86_64

So 16 Opteron cores at 2.4GHz each bests a Core i7 @ 3.8GHz but an 8 core Opteron setup doesn't come close to the i7s 4 cores in terms of performance.

To be frank I am seeing a 34mins frame time at 3.6GHz and a 27mins frame time at 4GHz. So a Core i7 setup can quite easily cream a 16-core Opteron setup... 4 cores vs. 16 cores (clock speeds being 4GHz vs. 2.4GHz).
 

Cryslayer80

Distinguished
Aug 28, 2009
433
0
18,810

I'm getting hungry with all of this creaming and jamming that's being talked about here... Hey, don't you just think a Core I7 would easily cream all of the AMD CPUs in the world combined? I guess it wouldn't be too hard...
 


http://folding.stanford.edu/

http://en.wikipedia.org/wiki/Folding@home

well define what your meaning by accomplish something. If you meaning does it really help anyone:

then yes does accomplish something, May not benefit you but what F@H does is fold proteins

F@H goal with folding proteins is to treat diseases like Alzheimer's disease, Parkinson's disease, mad cow disease, cancer, and many more.
 

Kewlx25

Distinguished
I'm actually using Boinc[http://boinc.berkeley.edu/]

Same Idea as folding, but is more general use and is used by A LOT more projects, ranging from Cancer to AIDS to Enhanced Rice to Micro Fluid Interactions in zero gravity.

It claims my i7 can do 2.7 billion FLOPS per virtual core or 5.4 billion FLOPS per physical core, which means 2 FLOPS per cycle per core. You should see it on any SSE optimized apps, it's even faster.
 


Actually, if you look at the data, four Barcelonas beats even ridiculously-overclocked i7s. Jack57000's quad Opteron 8358 setup gets 26 min 42 sec frame times on p2681s. The highest-performing i7 system on there was ParrLeyne's i7 975XE running at 4.37 GHz, which gave 27 min 44 sec frame times on p2681s. The highest-clocked i7 on there was road-runner's i7 920 @ 4.50 GHz, which got 28 min 18 sec times. And last but not least, the highest-performing system on thee was Slash's glorious 8-socket Opteron 8356 unit, which knocks out a p2681 frame in a mere 14 min 51 sec, which is six and a half minutes less per frame than the next guy, who was running Xeon X5560s. The best part about his unit is that he built the whole thing for $2500, while the X5560s cost that just for the two CPUs, let alone the board and RAM.

So yes, core for core an i7 will outperform an Opteron- particularly when the i7 is heavily overclocked and the Opteron is a 2007-vintage 65 nm unit. But a lot of slower cores trumps a small number of fast cores in highly parallel code like F@H, which is why the "Opterons are crap, AMD can't compete against the Xeons" crowd is wrong. It will be interesting to see how well a dual Magny-Cours with twice the core count of a Westmere Xeon at approximately the same price tag will perform in F@H. I predict that Intel won't even be competitive unless those Westmeres get massively overclocked on that EVGA dual LGA1366 motherboard while the Magny-Cours run at stock on a Tyan or Supermicro board. And even then, you're just comparing how well a non-overclocked system compared to an overclocked system. That's not even a fair fight as the Magny-Cours units could be overclocked too. An NDA-breaking guy got an early multiplier-unlocked but non-VID-adjustable Magny-Cours EX up to 3.2 GHz.
 

BadTrip

Distinguished
Mar 9, 2006
1,699
0
19,810



Dude, you seriously need to go somewhere. That is the most ignorant statement you have made here.
 

Cryslayer80

Distinguished
Aug 28, 2009
433
0
18,810
BadTrip, you need to stop tripping. Ignorant? Wait, wait, wait... Now if anyone believes that there is not a single person who does this only for himself to be on a top list of noobs that other noobs will praise then he can immediately go to a mental institution... I'll even pay his trip...
 

randomizer

Champion
Moderator

I bet it wins in big power consumption numbers too :D



I happen to know at least one person, possibly more, and I don't pay much attention to folding.
 

theholylancer

Distinguished
Jun 10, 2005
1,953
0
19,810




anyone tried that evga setup with 2x Xeon 5520s running at 4 GHz (xeon I7 parts)
 

jennyh

Splendid
I ran SETI for a while too, then I figured out that aliens probably don't exist, amusingly enough when I stumbled upon the Fermi Paradox.

OT - Didn't Elmo link something about intel trying to manipulate the folding results recently?
 


The 8-socket unit used a fair bit of juice. Slash had two PSUs hooked up to it and IIRC it consumed something around 800 W full-load. However, the quad-socket units take 400-450 W, which is not all that different from the single i7 systems in the 4.3-4.5 GHz range.



I didn't see anything in particular, but it's rather widely accepted that F@H runs faster on Intel CPUs of the same general performance as AMD CPUs. I don't know if Intel is trying any funny business (I doubt it, that would screw with the scientific results) but Pande's lab certainly dis have some Intel-specific optimizations in the past (such as the QMD cores.) They may still do that. Unless you can get one of the guys in the Pande Lab to spill the beans, I don't think we'll know 100% what's going on WRT that.