Sign in with
Sign up | Sign in
Your question

HT?? Something for the Gamers??

Last response: in CPUs
Share
November 14, 2002 11:07:38 AM

Hi there,
Just would like to know...
Will HT help games in the future.....

More about : gamers

November 14, 2002 1:32:16 PM

not yet. UT2k3 shows a HT speed increase of almost 5% which is really not that much. Only heavy CPU dependent games like Comanche 4 can benefit of HT.

In any way, there is no reason to upgrade to a 3.06 GHz if you own a 2.0-2.8GHz P4. The 3.06 is for the time being just too expensive for the private user and there is no game out there which needs such a fast CPU, except maybe doom3 :) 
a b à CPUs
November 15, 2002 12:34:43 AM

Unlikely that it will do much for future games, except help the system allong if you have another demanding ap running while you're gaming.

<font color=blue>You're posting in a forum with class. It may be third class, but it's still class!</font color=blue>
Related resources
Can't find your answer ? Ask !
November 15, 2002 12:39:13 AM

As imgod2u said, there is little to be expected EVER out of games. Even optimizing may not do anything, simply put games are not suited for SMT. I am guessing it's because it's all sent as one thread constantly, or that simply because the GPUs are more important now than CPUs, so CPUs play a much smaller role.

BTW Commanche 4's results were barely any beneficial, maximum you got was 4 frames more. That game should be out of the benching list, it makes the 3.6GHZ look crap with 68FPS, which is utterly untrue! Lousy programmers...

--
*You can do anything you set your mind to man. -Eminem
November 15, 2002 7:25:03 AM

Games don't benefit from SMT for the same reason they don't benefit from SIMD. They're simply serial by nature. One frame goes after another and that frame depends on what the frame before it was. You simply have 1 huge job that you have to do one thing after another. There is very little opportunity for parallelism (doing more than one thing at once). You can put the keyboard/mouse commands in a different thread, but it would benefit very little as those don't take up much CPU time anyway. There is very little that can be done independently on a software-level, not just code-level. You'd have a hard time producing threads that can execute at the same time.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
November 15, 2002 12:13:36 PM

I'm no game programmer, but I could easily envision a separate thread of execution for deciding what the computer-generated bad guys are going to do. At it's extreme, each enemy could have its own thread used to compute what it's going to do. They would need to share some common store of info about what was where. Of course up to now programming a game that way would have been pretty pointless.
November 15, 2002 1:29:52 PM

The AI is very dependent. It has to react to every move you make which means it has to wait for keyboard/mouse commands. It also depends on each other (i.e. one AI moves one place, another has to move a certain way based on that). It's all dependent so you can't really do it all at once, you have to do it one after the other. That's the whole problem, dependency. How does the computer know where the A.I. bot is going to shoot unless it knows your coordinates and the coordinates of all the other bots?

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
November 15, 2002 2:41:21 PM

Quote:
How does the computer know where the A.I. bot is going to shoot unless it knows your coordinates and the coordinates of all the other bots?

Actually, I could see a considerable potential here for any sort of multi-threading. By using a pointer to a more or less 'global' repository of information on bot/character/object/misc data (such as location, action, movement, facing, etc.) you could easily set up numerous threads for all of the AIs that wouldn't have to be directly tied into the main engine's thread at all and still be pretty darn accurate.

<b>To all:</b> The reason why games and, hell, MOST apps that you run by themselves won't gain from HT is that unless they were written by monkeys, they're going to be highly optimized to max out the usage of the CPU whenever possible to begin with. HT <i>only</i> becomes useful when the applications that you're running aren't all designed to suck up as much as inhumanly possible out of the CPU. Thus, with any well-written game, compression software, rendering software, etc. that is designed to use as much of the resources of the CPU as inhumanly possible HT won't do jack because there are no unused resources to take advantage of in the first place.

On top of that, ANY multi-threading code adds a little bit of overheard as well as sometimes wasted CPU cycles for timing and synchronization purposes. So even the best-written multi-threaded code will still be slightly slower than single-threaded code when run on a single-CPU system.

(That is, again, unless that single-threaded code was written by monkeys. ;)  Profilers were invented for a reason after all...)

The purpose of multi-threaded code is <i>not</i> to optimize a single-CPU system. It's to actually take advantage of several physical CPUs simultaneously, something which single-threaded code just can't do. Hence a multi-threaded application should run almost twice as fast on a dualie system as it does on a single-CPU system. (Though for games this is less true because there's no second graphics card.)

With HT though, it makes matters even worse. Not only is there that minor hit for making code multi-threaded, but you're still technically using one single CPU, which means that now you're having to share cache and quite possibly make two cache reads and writes simultaneously. And if that wasn't bad enough, now because you're splitting up that cache you're more likely to need to dredge up the data from the main system memory, which slows things down MUCH more.

So only multi-threading and multi-tasking which <i>isn't</i> originally designed to maximize CPU usage will gain <i>anything</i> from a single CPU system with HT. So yeah, running Photoshop, Powerpoint, Word, WinAmp, and Internet Explorer simultaneously will run better with HT. But no, running a well-written game by itself shouldn't see any gain from HT. (In fact, it should see a loss of performance if the game was actually written well and optimized efficiently in the first place.)

Well ... that's not entirely true either. Because Windows, OpenGL, DirectX, and numerous other background processes and threads do in fact use up a tiny amount of CPU time to do things that often don't fully utilize the CPU, single-threaded software could very well gain a <i>tiny</i> performance gain from these taks being more intelligently run on the CPU while the massive single-threaded app is sucking away CPU cycles with impunity. It's a toss up and really depends on how many of these backround tasks you have running while you're trying to run the massive app.

Really, HT only makes sense for office type folk or people running software that isn't optimized worth a darn. Otherwise it's usefullness is highly questionable.

Now, had Intel done the <i>smart</i> thing and duplicated a second set of caches and interfaces to and from them so that the second imaginary processor would in no way impede the processing of the true processor (though the memory controller would still be a possible bottleneck, there's not much that could be done for that) then we would see HT making a lot more sense. Hell, then it might have even given a performance boost in games.

The way that Intel implemented HT though, it's almost worthless. Granted, I can understand why they did it since they didn't want to increase the CPU die size just to implement HT. Still, half-arsed is half-arsed, no buts about it. ;)  So presently, HT is only useful for half-arsed applications.

<A HREF="http://forumz.tomshardware.com/community/modules.php?na..." target="_new"><font color=red>Join</font color=red> <font color=blue>the</font color=blue> <font color=green>THGC</font color=green> <font color=orange>LAN</font color=orange> <font color=purple>Party</font color=purple>!</A>
November 15, 2002 2:47:45 PM

Quote:
Hi there,
Just would like to know...
Will HT help games in the future.....

As presently implemented by Intel, only poorly-written games or games run on systems with a lot of background tasks that aren't using the CPU's resources very well.

<A HREF="http://forumz.tomshardware.com/community/modules.php?na..." target="_new"><font color=red>Join</font color=red> <font color=blue>the</font color=blue> <font color=green>THGC</font color=green> <font color=orange>LAN</font color=orange> <font color=purple>Party</font color=purple>!</A>
November 15, 2002 5:18:29 PM

The cache thing isn’t a issue it was stated in Aces review and Anandtech’s I think that the threads are tagged so that they don’t fight with each other.

Also just a question you stated that the apps that are well coded to use as much CPU power as possible won’t benefit from HT then how come DivX encoding and 3D studio went threw the roof "figuratively speaking"?

Sounds like you are just speculating a lot of things silver perhaps taking a breather and rethinking your points will stem my curiosity with you "half arsed speculations" its not like you to say stupid things.

-Jeremy


<font color=blue>Just some advice from your friendly neighborhood blue man </font color=blue> :smile:
November 15, 2002 6:46:49 PM

While the idea of using multiple AI bots in different threads sounds good, they each are codependent. That is, each bot depends on the movements/positions and actions of other bots. This causes dependencies which prevent parallelism. You simply can't do an instruction unless you know the location of the operands (the data you're operating on) in the registers. Nor can you afford to put them in separate threads and risk a skew in the instruction flow. Simply put, it's a very one at a time thing.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
November 15, 2002 6:47:29 PM

Quote:
The cache thing isn’t a issue it was stated in Aces review and Anandtech’s I think that the threads are tagged so that they don’t fight with each other.

It's not about threads fighting with each other. It's about cache not holding as much information and needed to access system memory more often because of it.

Quote:
Also just a question you stated that the apps that are well coded to use as much CPU power as possible won’t benefit from HT then how come DivX encoding and 3D studio went threw the roof "figuratively speaking"?

Through the roof? Ha! I haven't seen any benchmark that I'd say would substantiate such a claim like that yet. But besides that, as I said, apps that are coded to maximize their use of the CPU won't see much (if not any) improvement. Obviously these benchmarks are idea for pointing out just which companies hire programmers who use profilers to code to the CPU's maximum potential and which companies hire lazy programmers. While I'm not at all surprised about DivX, I have to admit that I am a little surprised about 3DSM. Yet when you consider programmers who are <i>known</i> to fully-optimize code and stretch things to their absolute limits for every last ounce of performance, such as Id software, then look at Quake3 results and see just where the truth lies.

Id software engineers kick much arse. They don't just strive for quality, they define good programming. They were making games multi-threaded before people even thought of running games on dualie systems, and they've innovated more optimization tips and tricks than any other software house combined. (Except for maybe Intel themselves.) Now look at Quake3's performance. Quake3 is a multi-threaded application. This means that it <b>should</b> run at it's absolute best with HT enabled because it is <i>designed</i> to take advantage of multi-threading.

Yet what happens? With HT enabled, Q3 performance <b>drops</b>. Why? Well, I've already said why. Q3 was <i>already</i> written to squeeze every last drop of performance out of your CPU. Throw in the half-arsed implementation of HT and you end up getting screwed by 'innovation'.

I fully stand by my statements. Any software that is written well will see little or no performance improvement. It's only software that either doesn't challenge your CPU (such as Word) or that was written by monkeys that will see good improvements from HT.

Quote:
Sounds like you are just speculating a lot of things silver perhaps taking a breather and rethinking your points will stem my curiosity with you "half arsed speculations" its not like you to say stupid things.

Sorry, but anyone who even halfway researches this will see how much truth there is. I've been saying for years that the increases in CPU speed and memory have made programmers lazy and wasteful. Hell, LOOK AT JAVA! You can't get much more of a wasteful and lazy language than that, except for maybe Visual BASIC. Most programmers have either forgotten or never learned the fine art of optimization. I doubt most programmers even know what a profiler is, not to mention how to use one.

So in the face of this, yes, I bet there are an awful lot of programs that <b>do</b> gain from HT. I never said there weren't. I do however fully stand behind the very simple and obvious. If a processor-intensive program is written well in the first place, it gains nothing and can even lose performance from HT.

I guess it's just good for Intel that so many programmers out there are unskilled hacks born from the dot-com era and unleashed upon the world when that era came to the obvious and unavoidable messy end that you get with so many clueless hacks trying to cash in whenever possible.

I doubt even 20% of the programmers employed today have ever written code in Assembly. Hell, I doubt even 30% have ever coded in ANSI C++ or would even know a vector if one bit them in the arse. Between Sun and Microsoft programmers have just gotten <i>way</i> too lazy.

So will HT benefit a good number of programs. Sure! Will HT benefit a program that was actually written well in the first place? Not a chance! If saying that is stupid, then I'll proudly be the dumbest ignoramus to ever walk the face of the Earth.

When everyone was saying how Java would one day replace C++, I just laughed and laughed, even when they meant it. Gee, who was right there? I wonder... So if anyone says HT will boost a well-designed application's performance, well... can you hear my laughter all the way from Wisconsin? I'll take good programming over half-arsed products developed by monkeys any day of the week.

<A HREF="http://forumz.tomshardware.com/community/modules.php?na..." target="_new"><font color=red>Join</font color=red> <font color=blue>the</font color=blue> <font color=green>THGC</font color=green> <font color=orange>LAN</font color=orange> <font color=purple>Party</font color=purple>!</A>
November 15, 2002 6:57:31 PM

what sort of cs program wouldn't teach c++ (or c) and assembly language? I'm a math/cs major and these are some of the topics that will get covered (or are being covered). Or are there a lot of programmers out there who don't have cs degrees???

It's always darkest just before it goes pitch black.
November 15, 2002 7:43:20 PM

Silver I never thought about it this way, and your argument is really true. I still disagree however in that it is not that MUCH half-arsed when tested on a multitasking POV than a performance enhancer in programs. I've not tried Dual CPU setups but HT sure rocks when using several programs with hard coding. It does however make me wonder how if MPEG uses all 100% of the CPU, will HT help you multitask if it can't use much ressources at all during that time?!

But your explanation makes sense, and it in fact corrects Dark_archonis' claim that I won't see benefit in Word or IE, and that now I will! :smile:

--
*You can do anything you set your mind to man. -Eminem<P ID="edit"><FONT SIZE=-1><EM>Edited by Eden on 11/15/02 04:44 PM.</EM></FONT></P>
November 15, 2002 7:45:14 PM

Quote:
While the idea of using multiple AI bots in different threads sounds good, they each are codependent. That is, each bot depends on the movements/positions and actions of other bots. This causes dependencies which prevent parallelism. You simply can't do an instruction unless you know the location of the operands (the data you're operating on) in the registers. Nor can you afford to put them in separate threads and risk a skew in the instruction flow. Simply put, it's a very one at a time thing.

I disagree completely. Through the proper use of pointers, data can be shared. Through the proper use of stacks and/or queues, execution of events can be synchronized. Through the proper use of thread priorities, AI threads <i>should</i> recieve the same amount of execution time.

A good programmer <i>could</i> easily develop an engine where each AI was run under a unique thread. Most of the time the effort wouldn't be worth it though because multiple CPU systems just aren't common enough to warrant the extra coding and increased difficulty in debugging that it would take. That aside, just because it would take more work it is a far cry from being impossible.

Just as it is entirely possible to create a generic 'human' model with generic 'human' skins and then develop a standardized system for warping that model and using that warping information to warp the skins so that you only have to develop one single set of skins for all humans.

Just because something isn't done doesn't make it impossible. :) 

<A HREF="http://forumz.tomshardware.com/community/modules.php?na..." target="_new"><font color=red>Join</font color=red> <font color=blue>the</font color=blue> <font color=green>THGC</font color=green> <font color=orange>LAN</font color=orange> <font color=purple>Party</font color=purple>!</A>
November 16, 2002 9:17:00 AM

Quote:
I disagree completely. Through the proper use of pointers, data can be shared. Through the proper use of stacks and/or queues, execution of events can be synchronized. Through the proper use of thread priorities, AI threads should recieve the same amount of execution time.

Data can't be shared because each object uses different data, very little static members are used for AI bots. There is absolutely no way to correctly synchronize threads because the application is not in control of thread management, the OS is.

Quote:
A good programmer could easily develop an engine where each AI was run under a unique thread. Most of the time the effort wouldn't be worth it though because multiple CPU systems just aren't common enough to warrant the extra coding and increased difficulty in debugging that it would take. That aside, just because it would take more work it is a far cry from being impossible.

I disagree. John Carmack has tried to "optimize" Q3A quite a bit for multithreaded solutions and yet it has shown very little improvements. As I said, games aren't very instruction intensive nor data intensive, it's about even so data is not shared very much nor are instructions. It is very one-by-one by nature.

Quote:
Through the roof? Ha! I haven't seen any benchmark that I'd say would substantiate such a claim like that yet. But besides that, as I said, apps that are coded to maximize their use of the CPU won't see much (if not any) improvement. Obviously these benchmarks are idea for pointing out just which companies hire programmers who use profilers to code to the CPU's maximum potential and which companies hire lazy programmers. While I'm not at all surprised about DivX, I have to admit that I am a little surprised about 3DSM. Yet when you consider programmers who are known to fully-optimize code and stretch things to their absolute limits for every last ounce of performance, such as Id software, then look at Quake3 results and see just where the truth lies.

I'd consider a 20% improvement in per clock performance for 3dsmax "through the roof". I would also have to disagree with your statement about Q3A. 3DSMax simply has a lot more things that can be done independently. With a game, you have to wait for user input before doing tasks. With 3d rendering, you know all that you are about to render beforehand and can easily extract tasks to do. Hence the huge opportunity for parallelism. I doubt very much that Q3A is "maxing out" any modern MPU. If it was, improving memory bandwidth would not cause so much of a performance increase. Remember, CPU idle cycles due to memory latency is a huge part of performance loss. Hyperthreading for multithreaded applications attempts to relieve this as well. Because if one thread is not in cache, you can easily use instructions from another thread that is in cache while waiting for a cacheline to be fetched from memory.

Quote:
Id software engineers kick much arse. They don't just strive for quality, they define good programming. They were making games multi-threaded before people even thought of running games on dualie systems, and they've innovated more optimization tips and tricks than any other software house combined. (Except for maybe Intel themselves.) Now look at Quake3's performance. Quake3 is a multi-threaded application. This means that it should run at it's absolute best with HT enabled because it is designed to take advantage of multi-threading.

And yet, Q3A doesn't benefit much from even dual processor solutions. A situation in which you have double the processing resources, so that "the processor is already maxed out" arguement doesn't apply.
<A HREF="http://www.tech-report.com/reviews/2001q4/athlonmp-1900..." target="_new">As seen in this TechReport benchmark</A>

The dual Duron system actually ran the test SLOWER than the single Duron system.

Quote:
Yet what happens? With HT enabled, Q3 performance drops. Why? Well, I've already said why. Q3 was already written to squeeze every last drop of performance out of your CPU. Throw in the half-arsed implementation of HT and you end up getting screwed by 'innovation'.

<A HREF="http://www.tech-report.com/reviews/2002q4/pentium4-3.06..." target="_new">Tech-report</A>, a 0.15% difference.
<A HREF="http://www.tomshardware.com/cpu/02q4/021114/p4_306ht-11..." target="_new">Toms</A>, a 2.24% decrease in the most CPU-intensive Q3A benchmark.
Both are well within the margin of error.

Quote:
I fully stand by my statements. Any software that is written well will see little or no performance improvement. It's only software that either doesn't challenge your CPU (such as Word) or that was written by monkeys that will see good improvements from HT.

Sorry, but x86 code just simply isn't that good at filling the processor. You can make your program the best in the world but you won't be able to solve the problems of data dependencies and memory latency. Eventually, you'll get stalls due to cache misses, branch conditions, etc. that will not allow one thread to max out the processor's resources.

Quote:
Sorry, but anyone who even halfway researches this will see how much truth there is.


Follow your own advice. Look at some dual processor benchmarks for these "well-written software" that you claim to be multithreaded well but simply max out the processor's resources (such as Quake 3). In dual processor solutions, you have twice the processing power, yet Q3A actually runs SLOWER. Why?

Quote:
I've been saying for years that the increases in CPU speed and memory have made programmers lazy and wasteful. Hell, LOOK AT JAVA! You can't get much more of a wasteful and lazy language than that, except for maybe Visual BASIC.


Java is probably one of the most ambitious languages out there. Don't confuse Netscape and M$'s bitchy javascript with the Java language. It runs very well considering it emulates absolutely everything. But such is the price of being able to run the exact same program equally well on any platform, be it Unix, Linux, Windows, MacOS, BeOS, etc. without any recompilation.
And if you bother to look into well-written Java compilers such as IBM's, it actually performs very well such as <A HREF="http://www.aceshardware.com/read.jsp?id=153" target="_new">this Ace's article shows</A>. The difference is, you don't have to recompile, rewrite or optimize the program at all when switching from Windows to Linux to whatever. The exact same program will run on all of them equally.

Quote:
Most programmers have either forgotten or never learned the fine art of optimization. I doubt most programmers even know what a profiler is, not to mention how to use one.


Simply not true. The first thing they teach you in any CS class is proper programming techniques. Programs such as 3dsmax and other high end programs are very well optimized. The problem is, you can't be too specific with optimizations otherwise your program will only run well on one specific type of computer. You need to make it so your program runs great on an Athlon or a P4 or a P3 or a Mac or Windows 98 or Windows NT or 2k or Unix or Linux, etc. etc. etc. You simply can't rewrite your whole code for each platform. The most you can do is use differently optimized libraries and a different compiler. The only exception I may make to this rule is M$ programmers.

Quote:
So in the face of this, yes, I bet there are an awful lot of programs that do gain from HT. I never said there weren't. I do however fully stand behind the very simple and obvious. If a processor-intensive program is written well in the first place, it gains nothing and can even lose performance from HT.


Simply not true. No amount of optimization makes up for the latent deficiencies of x86. Data dependencies, hard to decode instructions and a problem that is not specific to x86 (memory latencies).

Quote:
I guess it's just good for Intel that so many programmers out there are unskilled hacks born from the dot-com era and unleashed upon the world when that era came to the obvious and unavoidable messy end that you get with so many clueless hacks trying to cash in whenever possible.


Just out of curiousity, how many large projects have you worked on?

Quote:
I doubt even 20% of the programmers employed today have ever written code in Assembly. Hell, I doubt even 30% have ever coded in ANSI C++ or would even know a vector if one bit them in the arse. Between Sun and Microsoft programmers have just gotten way too lazy.


Assembly really is overrated with modern compilers. Microsoft has moved on to their own C# language and Sun mainly uses Java, so you're probably right, very little employees write programs in ANSI C++ or assembly anymore with those companies. But you know what? That's not actually a bad thing (in the case of Sun at least). Java is what higher order languages are suppose to be. Programmers worrying about algorithm level performance and leave the hairy details to the compiler writers who know how to optimize on an asm level the best. A n log n algorithm will benefit your program a lot more than saving a few cycles by reordering a few instructions in asm.

Quote:
So will HT benefit a good number of programs. Sure! Will HT benefit a program that was actually written well in the first place? Not a chance! If saying that is stupid, then I'll proudly be the dumbest ignoramus to ever walk the face of the Earth.


No comment.

Quote:
When everyone was saying how Java would one day replace C++, I just laughed and laughed, even when they meant it.


Java is slowly but surely making its way into high end applications. HP has embraced it with their Dynamo, IBM has embraced it with their own JVM. The only people who are bitching about it are Microsoft who doesn't like anyone else creating a standard. Instead they wanna replace c++ with their c# and .Net infrastructure. Sadly, they've actually done a great job at stopping Java from becomming widely accepted. Just look at how they strong-armed Intel out of building a high-performance JIT platform.

Quote:
Gee, who was right there? I wonder... So if anyone says HT will boost a well-designed application's performance, well... can you hear my laughter all the way from Wisconsin? I'll take good programming over half-arsed products developed by monkeys any day of the week.


Wisconsin......that explains a lot.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
November 16, 2002 4:14:24 PM

Quote:
... and see just where the truth lies.

I like that one ... :cool:

Seriously, though ...

I think HT <i>can</i> bring performance improvements to games, even to very assembly-optimized programs in general. Why? Even when you get to the most efficient way of asm-coding a certain alghorithm, chances are very little you get to an maintain having the CPU's pipeline(s) filled. This is EXTREMELY difficult, certainly looking at today's most advanced processor architecture (IMHO, the NetBurst one ... Guess this kinda blows my objectivity ... :redface: ). Branch-prediction mistakes, data dependencies and everything else one can think of, make instructions have to stop flowing through it. But this is because of events that produce at instrucion level.
If you go one step higher, (or many, I actually am not a C(omputer)S(cience?)-student, far from it) you get to thread parallellism. This is an entirely different level, where entirely different dependencies occur, almost always at different moments where the instrucion-level parallellism fails (I guess, statistcally seen, this makes sense). So then the instructions of the other thread can hop into the gaps that are made by the insturction-dependencies at that lower level, and let the P4 regain some of it's rather low IPC, according to many.
That's in a nutshell my vision on this topic. As usual, I've been wondering, though ... How does HT work? Is it actualy doing it's thing like I described above, by running one thread, and making another fill the gaps the first one makes in the pipeline? Or does HT have, besides of the OS's, a seperate thread-priority-system that determines which thread gets priority, which one doesn't? On the other hand, it si possible that the threads are pushed through the pipeline togheter, but than I can't understand, at least not without breaking my poor, little brain on it, how HT gets this efficient ... Maybe one of you knows ... Or maybe I should just check that kick-ass-site, <A HREF="http://www.arstechnica.com" target="_new">http://www.arstechnica.com&lt;/A> ...

EDIT: Hmmm ... Maybe an article titled <A HREF="http://arstechnica.com/paedia/h/hyperthreading/hyperthr..." target="_new">Introduction to Multithreading, Superthreading and Hyperthreading</A> could do the trick ...

Greetz
Bikeman

<i>Then again, that's just my opinion</i><P ID="edit"><FONT SIZE=-1><EM>Edited by bikeman on 11/16/02 07:16 PM.</EM></FONT></P>
November 16, 2002 5:04:42 PM

Just a few disagreements:
Quote:
Data can't be shared because each object uses different data, very little static members are used for AI bots. There is absolutely no way to correctly synchronize threads because the application is not in control of thread management, the OS is.

Data can be shared via the use of mutexes. This allows the programmer to "lock" certaint data that one thread is using, thus disallowing other threads to use that same data. Thus keeping everything happy, until you start using too many threads and then it gets messy. Programmers shy away from threads because they get really complicated to synchronize when using several threads.

Quote:
The problem is, you can't be too specific with optimizations otherwise your program will only run well on one specific type of computer. You need to make it so your program runs great on an Athlon or a P4 or a P3 or a Mac or Windows 98 or Windows NT or 2k or Unix or Linux, etc. etc. etc. You simply can't rewrite your whole code for each platform. The most you can do is use differently optimized libraries and a different compiler.

Well a lot of the times if you want to make your game mulitplatform, you are gonna be doing qutie a bit of rewriting of the code. It is true that you can use libraries that are ready for multi-platform, but I think that might come with a bit of overhead and a longer development cycle. Its a choice that you have to make. imho, i think that knowing optimizations is very helpful, because without it your game just wont run well on lowend systems. You can program for any platform, but you game will end up only in the hands of the people with highend computers. (But really, knowing both is important!)

Quote:
A n log n algorithm will benefit your program a lot more than saving a few cycles by reordering a few instructions in asm.

Well that is true in many cases, but a counter case would be if you had a section of code that was ran a hundred times a frame, if you could get rid of just 3 cpu instructions, that would save you 3000 cpu cycles a second. This is where profiling is very important. Find out where most of the cpu time is being spent and then optimize the hell of of that section of code.


-Your Friendly Neighborhood Mathematician!
<P ID="edit"><FONT SIZE=-1><EM>Edited by deBoor on 11/16/02 02:54 PM.</EM></FONT></P>
November 16, 2002 6:38:43 PM

Quote:
...for the same reason they don't benefit from SIMD

I know I'm a bit late to this thread but SIMD is a great tool for improving calculations. I'm curious why you would make such a statement?

One of the biggest misconceptions in 3d programming is that of the Lighting and Transformation engine. People seem to believe that you just throw your hierarchy at it and it does all the work. It only does the final transformations you still have to concatenate all your matrices and packetize everything before you send it off.

In a modern 3d game, timing and organization is everything. Due to the nature of the processor and geometry engine there are times where the processor has to do as much as it can in the shortest time possible and other times where it has nothing to do. During these so-called dead spots other less time specific actions can be performed. If the transformation engine has another thread competing for its time critical slice, it's not going to get its data to the graphics card as fast as it can. Once the data sent to the graphic card and it takes over the processor can do less time critical operations before it starts the physics and geometry calculations for the next frame.

Dichromatic for your viewing plesure...
November 16, 2002 7:08:13 PM

Quote:
I know I'm a bit late to this thread but SIMD is a great tool for improving calculations. I'm curious why you would make such a statement?


The concept of SIMD is to more balance out the instruction to data ratio. That is. You have 1 instruction operating on 4 different pieces of data. Saving the need to decode the same instruction over and over again for all those pieces of data. However, games aren't very SIMD friendly. As I've stated before, they're more one on one as far as instructions and data. There are very few cases in which you can find one instruction operating on multiple data sets in parallel. Rather, you have one instruction working on one piece of data, another instruction working on another piece of data, etc. Doesn't make a lot of opportunities to use SIMD.



"We are Microsoft, resistance is futile." - Bill Gates, 2015.
November 16, 2002 7:13:34 PM

Quote:
Well that is true in many cases, but a counter case would be if you had a section of code that was ran a hundred times a frame, if you could get rid of just 3 cpu instructions, that would save you 3000 cpu cycles a second. This is where profiling is very important. Find out where most of the cpu time is being spent and then optimize the hell of of that section of code.


The whole point of better algorithms is that you reduce how many time you actually have to repeat sections of code. So yes, while shaving 3 instructions off a critical loop may be very helpful, the savings as far as growth rate would be much better if you made it so the loop only had to run 1000 times instead of 3000 times for a given number of inputs.

"We are Microsoft, resistance is futile." - Bill Gates, 2015.
November 16, 2002 7:33:15 PM

I look at it this way.....

Hyperthreading = +~200 mhz

It really isn't THAT big of a deal. Nothing a lil overclock cant match up to :p 

As far as scalability goes, im not sure if faster revisions will increase that rough estimate of 200 mhz, but we all know at the rate we are travelling at, who cares if a CPU is hyperthreaded or not, it's not a mind boggling increase.

What would you rather have. A 3.06 hyperthreaded processor, or a 3400 mhz processor without it. :/ 


Progress it good however. Right now im PRO intel, but I WANT AMD to quickly make up some ground because I love healthy competition and who knows where wed still be if AMD never made their big push.

We'd likely still be paying $700+ for a p4 1.6 processor :p 
November 16, 2002 9:13:04 PM

I'd take the 3.06GHZ anytime, because it multitasks at a rate 2 to 5 times faster.

--
*You can do anything you set your mind to man. -Eminem
November 16, 2002 10:31:57 PM

Quote:
The whole point of better algorithms is that you reduce how many time you actually have to repeat sections of code. So yes, while shaving 3 instructions off a critical loop may be very helpful, the savings as far as growth rate would be much better if you made it so the loop only had to run 1000 times instead of 3000 times for a given number of inputs.

Not really what I was saying. You say that a n log n algorithm is better than saving a few cpu instructions here and there. Well what if there was a section of code that was not an algorithm? Lets say that theres something that is done to every object in the game each frame and that there are 100 objects (Ie throwing them onto the graphic pipeline). If you could save yourself 3 cpu cycles for each object, you would save 300 cpu cycles right there! An easy example of non-assebmly optimizations is using the pre increment versus the post increment. The counter variable in a loop should almost always be the pre increment.
ie:
for(int i = 0; i < 100; ++i)
If you were to use i++, there would be overhead in creating a temporary i value. (You'd only get this is you are a programmer...) So right there you save yourself tons of cpu cyces that could be used elsewhere. So there really was no n log n algorithm that could save you here. Just better optimized code! Thats the case in a lot of programming situations. Theres not a n log n algorithm out there for everything... or not yet at least.

-Your Friendly Neighborhood Mathematician!
November 17, 2002 12:13:57 AM

I can't think of a single 4-item operation when doing calculations in 3-space. Remember every vector is never really complete without its w component. Swizzeling may be a little costly, but optimizing throughput has its advantages as well.

Dichromatic for your viewing plesure...
November 17, 2002 12:16:04 AM

Quote:
ie:
for(int i = 0; i < 100; ++i)
If you were to use i++, there would be overhead in creating a temporary i value. (You'd only get this is you are a programmer...)

"i" is the primary variable of the loop. It was created a long time before that statement is ever executed. It makes no difference either way. Think about it, if you're changing the value of "i" you better be ready to use it sometime after it is incremented/decremented. The only time the compiler would ever generate different opcodes with pre/post operators, is when they're used on the right side of the assignment operator.

i.e..

a=++i; // pre
a=i++; // post

Dichromatic for your viewing plesure...
November 17, 2002 12:19:46 AM

I'm sorry. I agree with every other thing you said...

Dichromatic for your viewing plesure...
!