I've built a ray tracer in Java. It generates images like this:
(this is not mine, it's just an example). I multithreaded it by cutting up the image horizontally and telling each thread to render a chunk of the final image. I did quite a few tests and these are the average times:
I own a Intel Wolfdale @ 3.0 GHz (Dual Core). The improvement when going to 2 threads is obvious, but I can't make sense of why going to 4 threads makes it go even faster. I thought Intel didn't have hyperthreading since Pentium IV, so my 2 cores mean that I can only run 2 threads simultaneously. The dip in 8 threads makes sense because of more context switching.
------------------------------Gigabyte ga-p35-ds3l mobo, Wolfdale E8400 3.0Ghz, Evga GeForce 8800GT 600Mhz, Seagate 7200.11 500GB HDD, G.Skill 800 2GB DDR2, 500 W Enermax PSU, Windows XP 32 bit, Acer 22' LCD, Logitech X-540 5.1 Speakers, NZXT Apollo case.
You don't have hyperthreading - that would give a bigger gain than that. I don't know why 4 would be faster, but that isn't a big enough difference to really matter.
------------------------------Asus P6T deluxe
i7 965 @ 4.2GHz (200*21), 1.384V
12GB Corsair Dominator DDR3-1600 CAS 7
Reply to cjl
It's possible that one thread takes longer to run when you do 2, and total processing time will be based on the longest running thread. When you run 4 threads, the work is better balanced, resulting in the slightly faster run time.
It is possible that caching either of the disk or the CPU is providing a slight benefit for 4 threads over 2.
One thing to remember is that even with a single core, the Processor does not run a thread to completion before starting the next one. Instead, it runs part of one thread, then part of the next and so on until everything is done. Therefore, it may be that it happens to be more efficient to do the first part of multiple threads and then do the second part of them than to do parts one and two of the first and then part one and two of the second.
It is possible that caching either of the disk or the CPU is providing a slight benefit for 4 threads over 2.
One thing to remember is that even with a single core, the Processor does not run a thread to completion before starting the next one. Instead, it runs part of one thread, then part of the next and so on until everything is done. Therefore, it may be that it happens to be more efficient to do the first part of multiple threads and then do the second part of them than to do parts one and two of the first and then part one and two of the second.
The replay/prefetch/cache mechanisms are quite more complex than that. Not to mentions the vectorial protcols.
But anyway, that 101 Informatics post gets my seal of approval !!!!
------------------------------Rock journalism is people who can't write interviewing people who can't talk for people who can't read - Frank Zappa
Reply to radnor
You are about to answer a thread that has been inactive for more than 6 months. If you still wish to proceed, please ensure that your posting is original and does not duplicate or overlap any prior responses to this thread.