Sign in with
Sign up | Sign in
Your question

Compile Benchmark request: please help

Last response: in Linux/Free BSD
Share
April 28, 2007 2:31:11 PM

I am mostly interested in people with an Intel Core 2 Extreme QX6700,
but I see no reason not to use this thread to create our own
Linux Compilation Benchmark.

Please follow the instructions below, and then post the following
information:

1) The cpu that you are using.
2) How many cores were USED for the compilation (I expect just 1)
3) Optional: as much information about your hardware as you know.
4) The linux distribution that you are using.
5) The compiler version that you used.
6) The time it takes to finish the compilation.

Make sure that your machine has no other load!

Here is how to perform the benchmark:

wget http://downloads.sourceforge.net/libcwd/libcwd-0.99.45....
tar xzf libcwd-0.99.45.tar.gz
cd libcwd-0.99.45
./configure --enable-maintainer-mode --disable-pch
time make
April 28, 2007 2:37:07 PM

Quote:
1) The cpu that you are using.

$ cat /proc/cpuinfo | egrep '(vendor_id|model name|Hz|cache size|bogomips)'
vendor_id : GenuineIntel
model name : Intel(R) Pentium(R) 4 CPU 1.70GHz
cpu MHz : 1708.705
cache size : 256 KB
bogomips : 3420.70
Quote:
2) How many cores were USED for the compilation (I expect just 1)

1 (only has one core anyway)
Quote:
3) Optional: as much information about your hardware as you know.

I'll add this later in an edit.
Quote:
4) The linux distribution that you are using.

Debian testing: Lenny, last updated on Apr 26 2007.
Quote:
5) The compiler version that you used.

$ g++ -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --with-tune=i686 --enable-checking=release i486-linux-gnu
Thread model: posix
gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
Quote:
6) The time it takes to finish the compilation.

real 2m54.545s
user 1m46.567s
sys 0m18.877s
April 29, 2007 5:13:39 AM

It fails to compile:

[code:1:a442efd178]
make[2]: Entering directory `/home/pgkbnd/Programs/libcwd-0.99.45'
source='elf32.cc' object='libcwd_la-elf32.lo' libtool=yes \
DEPDIR=.deps depmode=pch /bin/sh ./depcomp \
/bin/sh ./libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c -o libcwd_la-elf32.lo `test -f 'elf32.cc' || echo './'`elf32.cc
source='environ.cc' object='libcwd_la-environ.lo' libtool=yes \
DEPDIR=.deps depmode=pch /bin/sh ./depcomp \
/bin/sh ./libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c -o libcwd_la-environ.lo `test -f 'environ.cc' || echo './'`environ.cc
source='bfd.cc' object='libcwd_la-bfd.lo' libtool=yes \
DEPDIR=.deps depmode=pch /bin/sh ./depcomp \
/bin/sh ./libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c -o libcwd_la-bfd.lo `test -f 'bfd.cc' || echo './'`bfd.cc
mkdir .libs
g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c elf32.cc -fPIC -DPIC -o .libs/libcwd_la-elf32.o
g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c environ.cc -fPIC -DPIC -o .libs/libcwd_la-environ.o
g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c bfd.cc -fPIC -DPIC -o .libs/libcwd_la-bfd.o
source='debug.cc' object='libcwd_la-debug.lo' libtool=yes \
DEPDIR=.deps depmode=pch /bin/sh ./depcomp \
/bin/sh ./libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c -o libcwd_la-debug.lo `test -f 'debug.cc' || echo './'`debug.cc
source='debugmalloc.cc' object='libcwd_la-debugmalloc.lo' libtool=yes \
DEPDIR=.deps depmode=pch /bin/sh ./depcomp \
/bin/sh ./libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c -o libcwd_la-debugmalloc.lo `test -f 'debugmalloc.cc' || echo './'`debugmalloc.cc
source='private_allocator.cc' object='libcwd_la-private_allocator.lo' libtool=yes \
DEPDIR=.deps depmode=pch /bin/sh ./depcomp \
/bin/sh ./libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c -o libcwd_la-private_allocator.lo `test -f 'private_allocator.cc' || echo './'`private_allocator.cc
g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c debug.cc -fPIC -DPIC -o .libs/libcwd_la-debug.o
g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c debugmalloc.cc -fPIC -DPIC -o .libs/libcwd_la-debugmalloc.o
source='demangle3.cc' object='libcwd_la-demangle3.lo' libtool=yes \
DEPDIR=.deps depmode=pch /bin/sh ./depcomp \
/bin/sh ./libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c -o libcwd_la-demangle3.lo `test -f 'demangle3.cc' || echo './'`demangle3.cc
g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c private_allocator.cc -fPIC -DPIC -o .libs/libcwd_la-private_allocator.o
cc1plus: warnings being treated as errors
./include/zone.h:38: warning: overflow in implicit constant conversion
make[2]: *** [libcwd_la-private_allocator.lo] Error 1
make[2]: *** Waiting for unfinished jobs....
cc1plus: warnings being treated as errors
./include/zone.h:38: warning: overflow in implicit constant conversion
make[2]: *** [libcwd_la-debugmalloc.lo] Error 1
g++ -DHAVE_CONFIG_H -I./include -I./include -DCWDEBUG -Wall -Woverloaded-virtual -Wundef -Wpointer-arith -Wwrite-strings -Werror -Winline -g -c demangle3.cc -fPIC -DPIC -o .libs/libcwd_la-demangle3.o
make[2]: Leaving directory `/home/pgkbnd/Programs/libcwd-0.99.45'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/pgkbnd/Programs/libcwd-0.99.45'
make: *** [all] Error 2[/code:1:a442efd178]

It took 11.6 seconds to get that far on my cpu:
vendor_id : AuthenticAMD
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
cpu MHz : 2200.000
cache size : 512 KB
bogomips : 4421.67
vendor_id : AuthenticAMD
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
cpu MHz : 2200.000
cache size : 512 KB
bogomips : 4421.67

I was using both cores with "time make -j3." I've got them...why not use them? I run Gentoo and thus compile things very often and the second core helps a lot.

I am running Gentoo 2007.0 amd64, which might explain why the code is not compiling as wonderfully as it should. Numbers are handled slightly differently in 32 vs. 64 bit and I don't have a chroot jail set up, so I can't prove that.

I am running GCC 4.1.1:
arget: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-4.1.1-r3/work/gcc-4.1.1/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.1.1 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.1.1/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.1 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.1/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.1/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.1.1/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --enable-secureplt --disable-libunwind-exceptions --enable-multilib --disable-libmudflap --disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
Thread model: posix
gcc version 4.1.1 (Gentoo 4.1.1-r3)

It took me 11.683 sec to get to the error message. Sorry I couldn't get any further, but this library IS still pre-1.0, so there might be some bugs left in her. However, compiling without the "--enable-maintainer-mode" flag lets it compile. It takes 42.035 seconds to compile. If you're aiming for <25, then a QX6700 should certainly get you there as it has twice as many cores as my 4200+ and each is faster. Should be under 20.

My question is why did you pick this program? I noticed that you asked the same question over in the CPU forum too. Are you a dev working on this and have to compile it many times per day?
Related resources
April 29, 2007 5:54:35 AM

Quote:
It fails to compile:
[code:1:f22763e3ab]
cc1plus: warnings being treated as errors
./include/zone.h:38: warning: overflow in implicit constant conversion
[/code:1:f22763e3ab]


Yeah, it's just a warning - but --enable-maintainer-mode adds
-Werror, so it barfs on that.

Quote:

It took me 11.683 sec to get to the error message. Sorry I couldn't get any further, but this library IS still pre-1.0, so there might be some bugs left in her. However, compiling without the "--enable-maintainer-mode" flag lets it compile. It takes 42.035 seconds to compile. If you're aiming for <25, then a QX6700 should certainly get you there as it has twice as many cores as my 4200+ and each is faster. Should be under 20.


Leaving out "--enable-maintainer-mode" (you still used --disable-pch I hope), also should cause it to compile it with -O3 -g (instead of just -g).
If I do that on my pentium-4 1.7GHz it takes 231 seconds! So, you are
already 5.5 times faster... however, with two cores :p . It's hard to
deduce from that how fast it would be with one core, because using two
might interfer with eachother in many ways. I'd rather people tested it
with a single core for better comparison.

But, even in the worst case that two cores double the speed, then
your IPC is still twice as high as the Pentium-4 (5.5/2/2.2 * 1.7 = 2.1),
and hence, comparable to the IPC of the athlon 900. So, if I buy a
comparable cpu (the QX6700) at 2.66 GHz, I can expect it to be (only)
2.66/0.9 = 3 times as fast - which is pretty disappointing. Of course, I'd
have four cores like that - but I might not always be able to use all
four; for example, when I compile only a single compilation unit.

Quote:
My question is why did you pick this program? I noticed that you asked the same question over in the CPU forum too. Are you a dev working on this and have to compile it many times per day?


Um, yeah - I'm the developer. But it's probably not a good benchmark
because libcwd doesn't support 64bits yet (because I don't have 64bits yet). I am dependend on others to send me bug reports - and well, I don't get any... It might be that it doesn't work on Gentoo because I don't use Gentoo therefore - it might also be that nobody ever tries --enable-mainter-mode and that the fact that I never get reports from people just means that it works.

Anyway - I picked it mainly because it's a C++ program that reflects the type of code I'd be dealing with. Compiling C code is less useful, because it is much faster and therefore would be influenced by disk access more.

Perhaps some other piece of code, like something heavily using Boost,
would be better to use a benchmark - but I wouldn't know of a simple
to download tar ball that takes just a few commands to compile.
April 29, 2007 1:21:21 PM

Yes, I did have to use "--disable-pch" as it didn't find PCH on my system when I left out that switch and promptly died.

Redoing the compile with a single thread from GCC yields a 1 minute, 6.758 second (66.758 sec) build time. So I get roughly a 56% gain from compiling with both cores. This is a little low compared to what most packages I compile see, but smaller packages like libcwd often have a few dozen or so object files that need compiled, so GCC doesn't really get the change to distribute the work to all cores for a long time. Usually I seem to keep both cores pretty well loaded. I'd expect that the QX6700 will see worse scaling than I did as scaling always goes down with more cores as GCC might not always find enough to keep all cores lit at all times. And you are correct, compiling a single file uses only one core.

A single 2.2 GHz Athlon 64 core appears to be (231/66.758)*(1.7/2.2) or 2.5 times as fast per clock tick. I also have a 2.2 GHz Mobile Pentium 4-M laptop which is based on the slightly newer Northwood A Pentium 4 chip rather than your Williamette P4:

vendor_id : GenuineIntel
model : 2
model name : Mobile Intel (R) Pentium (R) 4 - M CPU 2.20GHz
cache size : 512 KB
bogomips : 4388.92

It runs Gentoo 2007.0 x86 (32-bit) and has the same GCC 4.1.1 compiler and settings as my desktop does, except the target is i686-pc-linux-gnu instead of x86_64-pc-linux-gnu. It took the P4-M 1 minute 58.714 seconds (118.714 sec) to compile libcwd, so the P4-M is approximately 1.77 times slower than the Athlon 64 clock-for-clock and core-for-core. My P4-M is also (231/118.714)*(1.7/2.2) = 1.50 times as fast as your P4 Williamette clock-for-clock.

The P4 Northwood is a slightly reworked design over the original P4 Williamette that you have. The cache size is doubled from 256 KB to 512 KB, which helps to mask the latency issues from the FSB. (This is why the Core 2 Duos have 4 MB L2 cache and FSB-less Athlon 64 X2s have 1 MB.) I think that the discrepancy between the Northwood and the Willy isn't quite 50% clock-for-clock, so I think that you have other issues. What RAM do you run with that Willy? My P4-M has DDR 266. Some Willys had DDR 266 as well, but many used old PC100 SDRAM or Rambus RDRAM, both of which are slower than DDR 266, especially the PC100.
May 1, 2007 8:13:16 PM

Quote:
1) The cpu that you are using.

vendor_id : GenuineIntel
model name : Intel(R) Celeron(R) M processor 1.50GHz
cpu MHz : 1496.412
cache size : 1024 KB
bogomips : 2996.06

Quote:
2) How many cores were USED for the compilation (I expect just 1)

1

Quote:
3) Optional: as much information about your hardware as you know.

I have 1gb ddr2533 ram.
It uses a via vn800 chipset.

Quote:
4) The linux distribution that you are using.

Ubuntu 7.04

Quote:
5) The compiler version that you used.

4.1

Quote:
6) The time it takes to finish the compilation.

real 0m56.424s
user 0m42.531s
sys 0m6.944s
May 3, 2007 4:39:47 AM

Note: I had to also disable maintainer mode due to warnings stoping compilation.

Quote:

1) The cpu that you are using.

~/libcwd-0.99.45 $ grep "model name" /proc/cpuinfo
model name : Dual Core AMD Opteron(tm) Processor 285
model name : Dual Core AMD Opteron(tm) Processor 285
model name : Dual Core AMD Opteron(tm) Processor 285
model name : Dual Core AMD Opteron(tm) Processor 285
Quote:

2) How many cores were USED for the compilation (I expect just 1)

All four (-j 6)
Quote:

3) Optional: as much information about your hardware as you know.

Dual AMD Opteron 285's, Tyan Thunder K8WE, 4 gigs of memory (2gigs per proc, 2x1gig for dual channel to each processor), main disk is a set of 74GB WD Raptors on a 3Ware 9500S PCI-X card. That should be the pertinent info.
Quote:

4) The linux distribution that you are using.

A fairly up-to-date and patched Gentoo AMD64 affair
Quote:

5) The compiler version that you used.

~/libcwd-0.99.45 $ gcc -v
Using built-in specs.
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/gcc-4.1.1-r1/work/gcc-4.1.1/configure --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.1.1 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.1.1/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.1 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.1/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.1/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.1.1/include/g++-v4 --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --disable-libunwind-exceptions --enable-multilib --disable-libmudflap --disable-libssp --disable-libgcj --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
Thread model: posix
gcc version 4.1.1 (Gentoo 4.1.1-r1)
Quote:

6) The time it takes to finish the compilation.

time make -j 6
...
real 0m17.277s
user 0m49.779s
sys 0m6.712s

17 seconds, not too shabby for an old machine :) 

Just for completeness, here's with 1 core

time make
...
real 0m53.484s
user 0m46.691s
sys 0m6.604s
May 8, 2007 6:00:10 PM

Edit: did not use maintainer mode due to problems discussed previously.

Quote:
1) The cpu that you are using.


(from same command you used. 2 cores, so it came out twice)

vendor_id : AuthenticAMD
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
cpu MHz : 2000.000
cache size : 512 KB
bogomips : 4019.74
vendor_id : AuthenticAMD
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
cpu MHz : 2000.000
cache size : 512 KB
bogomips : 4019.74


Quote:
2) How many cores were USED for the compilation (I expect just 1)


I think I did this right, correct me if I'm wrong, but 'time make' should just use one core, and 'time make -j3' should use both.

Quote:
3) Optional: as much information about your hardware as you know.


See my sig. It's pretty straightforward

Quote:
4) The linux distribution that you are using.


@localhost ~ $ cat /proc/version
Linux version 2.6.19-gentoo-r4 (root@lxnaydesign.net) (gcc version 4.1.1 (Gentoo 4.1.1-r1)) #1 SMP Sat Jan 6 01:54:52 UTC 2007

Quote:
5) The compiler version that you used.


@localhost ~ $ gcc -v
Using built-in specs.
Target: i586-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-4.1.1-r1/work/gcc-4.1.1/configure --prefix=/usr --bindir=/usr/i586-pc-linux-gnu/gcc-bin/4.1.1 --includedir=/usr/lib/gcc/i586-pc-linux-gnu/4.1.1/include --datadir=/usr/share/gcc-data/i586-pc-linux-gnu/4.1.1 --mandir=/usr/share/gcc-data/i586-pc-linux-gnu/4.1.1/man --infodir=/usr/share/gcc-data/i586-pc-linux-gnu/4.1.1/info --with-gxx-include-dir=/usr/lib/gcc/i586-pc-linux-gnu/4.1.1/include/g++-v4 --host=i586-pc-linux-gnu --build=i586-pc-linux-gnu --disable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --disable-libunwind-exceptions --disable-multilib --disable-libmudflap --disable-libssp --enable-java-awt=gtk --enable-languages=c,c++,java,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
Thread model: posix
gcc version 4.1.1 (Gentoo 4.1.1-r1)

Quote:
6) The time it takes to finish the compilation.


time make:

real 1m14.206s
user 1m6.990s
sys 0m6.726s

time make -j3:

real 0m39.191s
user 1m7.399s
sys 0m6.617s

Results may be slightly skewed because X is running, and I'm connected via SSH running make in a terminal window. Everything should be idling, though, so I'll bet I'm pretty close.
May 26, 2007 1:05:50 AM

Hey, I got my new PC!

The result is VERY satisfying!

hikaru:~>grep "model name" /proc/cpuinfo
model name : Intel(R) Core(TM)2 Quad CPU @ 2.66GHz
model name : Intel(R) Core(TM)2 Quad CPU @ 2.66GHz
model name : Intel(R) Core(TM)2 Quad CPU @ 2.66GHz
model name : Intel(R) Core(TM)2 Quad CPU @ 2.66GHz

CPU: Intel Core 2 Extreme Processor QX6700, 2.66 GHz, 1066 MHz FSB, 8 MB L2 Cache
Motherboard: ASUS P5B Deluxe
Memory: Dual channel (2 times 2 times 1 GB) 4 GB pc6400 (800 MHz) low latency DIMMs (CL4-4-4), Kingston KHX6400D2LLK2/2GN
Disks: Three Western Digital Raptors 74.0GB, 10 krpm, SATA in RAID5 (software RAID), hdparm -tT gives 165 MB/s.

Operating System: Debian Testing amd64 4.0rc0

Benchmark:

Using ./configure --disable-pch

time make -j 6
[...]
real 0m12.555s
user 0m38.098s
sys 0m4.716s


So, that is EIGHTEEN times as fast as my old PC :D 

Using one cpu:

real 0m39.848s
user 0m35.458s
sys 0m3.960s


which is still 5.8 times as fast! Man, that means I'd have paid
this money for just one cpu ;) . I am very happy with this speed beast! :) 
!