Sign in with
Sign up | Sign in
Your question

32-bit/64-bit Explained (Ignorami read here)

Last response: in CPUs
Share
April 23, 2007 9:16:54 PM

Many comments have been made stating opinions like Intel is not a true 64-bit, Intel copied AMD, Intel is keeping 64-bit down, etc. I would like to offer some information.

Keep in mind that AMD64/EM64T is an INSTRUCTION SET and as such must be identical to maintain compatibility between programs. If they were different, than a 64-bit program made for AMD64 would not work on an Intel processor supporting EM64T, or vice-versa.

There can be performance differences. This is because the instructions themselves merely tell the processor to do things, and don't require that the processor do them in a certain amount of time. The internal workings of every processor can be different, even when they both adhere to the same instruction set.

Different companies rarely (if ever) share instruction sets. The Intel/AMD situation is unique because of their history, which is as follows:

When Intel was starting out in the CPU market, they made processors for the IBM PC. At that time IBM was a corporate giant compared to Intel, and they called the shots. For supply/security purposes, IBM had a policy which required that the processors they buy from Intel come from two sources. Meaning that Intel had to sign an agreement allowing another company to make their processors too (they chose AMD), just to keep IBM secure so that if anything happens to their primary source (shortages, etc.) they have a backup source.

This of course meant Intel had to give all their technical documents, lithography designs, instruction set reference, everything related to their processors to AMD. AMD then began producing Intel processors (mainly the Intel 8088 model). At that time the processors were made to support the 16-bit instruction set designed by Intel.

This was a huge growth period for Intel. When they began designing the 80386 instruction set (generally referred to as 80x86 or simply x86), it supported both 32-bit and 16-bit modes of processing which made it fully backward compatible with older software at its time of release. In fact, as of this date in 2007 you can buy a processor that would run code written more than 20 years ago. When Intel made this chip, they did not release any documentation to AMD since they were big enough that they no longer needed AMD to produce their chips. However, given the original terms of the contract, they were technically not allowed to cut AMD off at that point. AMD therefore sued Intel and later won the case. They then settled the contract so that AMD could no longer produce exact replicas of Intel chips (after which AMD starting making their own K5, K6, etc. lines of chips). However, they could still develop chips that follow Intel's 80x86 instruction set, so they could be compatible.

Fast forward to AMD64. AMD pre-empts Intel's development of a 64-bit instruction set. It supports all the old instructions but adds a new operating modes, new instructions that support 64-bit operands, etc. Was it necessary at that time? Absolutely not. Was it useful in some cases? Sure. Was it inevitable anyway? Of course. Just as 128-bit instruction sets are inevitable. That doesn't mean we need them any time soon.

In order to maintain compatibility, Intel updated all of their processing cores to be 64-bit, and extended their instruction set to support all of the instructions added by AMD64. They weren't very much behind the curve, considering all of their new CPUs (even their $50 Celeron chips) were 64-bit BEFORE Microsoft launched the 64-bit edition of Windows XP. They were a few years later than AMD but still in time for mass market use (if you could call Win XP-64 mass market considering even > 95% of Vista users still run 32-bit).

Now that I've explained the history, here are the technical details of what 64-bit means to you:

Each processor has different cores that perform different tasks. Integer math, floating point (decimal) math, and vector math (MMX, 3DNow!, SSE/SSE2/SSE3/SSE4).

AMD64/EM64T only affect the integer math portion of the processor. This is the main unit that is used for basic tasks such as conditional branching, looping, addressing of virtual memory, and basic math. In a 32-bit processor, this portion of the processor can only deal with 32-bit numbers. Therefore each program that runs on the system has their address space limited to 4GB since 32-bits can only represent 4 billion different states. Therefore, the main reason to move to 64-bit is so that a single program can address more than 4GB of memory. Let's say you're doing video editing and you open a raw video file that's 20GB in size. You must use special techniques to address this which includes splitting up the data in different chunks of 4GB. The program itself may only use less than 100MB for its resources and code, but the data can be enormous. Splitting up the data like this causes a lot of overhead in the program itself.

Aside from that, there is very little need for 64-bit. For the normal flow of programs, 32-bit is enough to represent any number of possible states, logic, and code-related math. In fact, even 32-bit processing is overkill for 80-90% of what a basic program does because we often use 32 bits to store 1 bit of information. This is because while running in 32-bit mode a program cannot technically move 1-bit, 8-bit, or 16-bit data around. The default operand size for all instructions is 32-bits. Even though your average routine only reports a 1-bit result (0 or 1) to indicate failure or success. There is no need to code routines to be able to use 64-bit numbers for anything but addressing memory. 32-bit code can use numbers to represent 4294967296 different states. And they usually only want to represent a handful in most cases. 64-bit numbers can represent 18446744073709551616 different states, which is roughly FOUR BILLION TIMES the number of states that could be used before (not DOUBLE, as you might think going from 32 to 64). If you need, for some rare scientific/multimedia purpose, to perform math on 64-bit numbers, you can use the floating point instructions or the SSE instructions which all modern processors have. You can even perform all types of arithmetic on 128-bit numbers if you so choose.

My main point is that unlessing you are addressing more than 4GBs of data or plan to have more than 3GB of system memory, you do not need 64-bit processing.

Even the aforementions SSE extensions which allow processing of 128-bit numbers, are almost never used for that. The computations that SSE is used for are called SIMD computations. Single Instruction Multiple Data. Since we usually only need to work on groups of 8-bit, 16-bit, or 32-bit data, we can pack these values into a 128-bit register and perform up to 2, 4, 8, or 16 calculations using a single instruction. Hence, the need for bigger numbers is not the purpose of SSE, it's the need for repetitive calculations. Take graphics applications that use a 32-bit color palette. If you wanted to perform a math function or a series of math functions on a large texture, you would have to deal with each 32-bit number millions of times, once for each pixel. If you could shorten the process by packing those 32-bit numbers into a 128-bit register, you could do them 4 at a time. This doesn't increase the speed by a full factor of 4 because of other considerations (overhead of the SSE instructions, keeping the data in the cache, etc). But a well designed program can still see a huge benefit.

None of this applies to gamers (whose heavy math is done on their GPU which has its own 64-bit/128-bit processing ability), business app users, or the average desktop PC user.

Then again, with Vista being the resource hog it is, I can see in a couple years most gamers having 4GB of RAM, which would require having a 64-bit OS to be able to fully access.

But the point is, while it may be necessary in 2008/2009 for certain users, it certainly wasn't necessary back in 2001 or even now for most people.

Also, keep in mind that in using Physical Address Extension which most modern processors support, you can address 64GB of memory. This is often used on servers and workstations.

If any of you has more than 3GB of RAM in your system, run a program that shows/logs your memory usage, and play the most memory intensive game you have at maximum quality settings, with your taskbar/system tray full of applications. Is your memory full? 99% of the time it won't be.

I do personally run 64-bit Vista, however, because I could see myself using 4GB of RAM when I build a new PC next year. But the reality is 19 out of every 20 programs I use does not have a 64-bit version, does not need one, and my Vista is running those 32-bit apps through an interpreter which hurts the performance slightly. It's almost not worth it.
April 23, 2007 9:32:08 PM

Very interesting read and very accurate, thanks! For some reason it makes me wonder if AMD has advanced AMD64 further in K10 though given that they've been on their current instruction set for 4 years now. I would imagine they'll be pushing this come launch time.
April 23, 2007 9:46:29 PM

piesquared:

They might make it faster behind the scenes, but they won't change the AMD64 instruction set. Changing the instruction set is bad. Old programs won't work.

You can ADD instruction sets, that's fine. They'll probably add SSSE3 or SSE4 in their new architectures. But these must be supported by new programs in order to be useful, and software development takes time.

Companies like Adobe look at it this way. When a new lineup of processors comes out that uses a new instruction set, say SSSE3 as with the Core 2 processors... They would test it for performance gains and rate the performance. Say 20% performance gain over regular SSE3. Then they would factor how much of their customer base is going to have a processor supporting that instruction set at a given time. Initially a very small < 5%. They can use a simple formula to calculate the cost benefits, factoring in the cost of paying programmers to rewrite the code.

In general, most extensions aren't used for at least a few years. It all depends on the software companies.

The won't make any changes to the core AMD64 instructions until... 128-bit processing. That won't be necessary until we need to address more than 16 exabytes of data in a single linear address space. We aren't even into terabytes yet. Petabytes come after terabytes. Then finally there are exabytes. So it's a ways off. ;) 

The important thing is, architectures like K10 will improve performance using the exact same instructions as the K8 processors.
Related resources
Can't find your answer ? Ask !
April 23, 2007 9:53:19 PM

Quote:
Oblivion will use almost all 3gb of my memory with bkgrnd services. On windows xp.


Impressive!

I can only run up to 1600x1200 so I've never gone over 2GB in XP.

You would definitely need a 64-bit version of Vista if you upgrade to it.
April 23, 2007 10:14:07 PM

Quote:
Quarls texture pack 3.0 beats my system to a pulp every time i play .res is 1024x768 QTP is several gigs of textures that replace oblivion texture. massive txtre maps.


Does that make the distant landscape look more realistic?

That's the only thing that turned me off about Oblivion. Everything close-up looked awesome but if you looked 100m away it was a fraction of the quality (even when cranking draw distances up).
a c 96 à CPUs
April 23, 2007 10:18:03 PM

Two things you did not mention that are different between x86_64 and x86 is that the old x87 FPU is not used in x86_64 long mode; instead the FP math is done in SSE. This results in a big speed-up on certain chips. Also, there are double the number of general-purpose registers in x86_64 chips as there are in x86 chips.
April 23, 2007 10:21:26 PM

Quote:
Quarls texture pack 3.0 beats my system to a pulp every time i play .res is 1024x768 QTP is several gigs of textures that replace oblivion texture. massive txtre maps.


Does that make the distant landscape look more realistic?

That's the only thing that turned me off about Oblivion. Everything close-up looked awesome but if you looked 100m away it was a fraction of the quality (even when cranking draw distances up).
April 23, 2007 10:23:49 PM

Well, for instance is it possible that K10's form of "macro fusion" will be ported over to AMD64 properly? This is the kind of thing I was refering to, not changing the instruction set per se. Could be an advantage to AMD in 64 bit computing. Either way, i'm running 64 bit Vista right now, and find it extremely responsive. I haven't compared it to the 32 bit OS, but I might as well do what I can to help advance the technology. If nobody offered it before it was needed, then what happens when the time comes that it is manditory which is inevitable. That's how I look at it anyway. :) 
April 23, 2007 10:32:41 PM

Quote:
Two things you did not mention that are different between x86_64 and x86 is that the old x87 FPU is not used in x86_64 long mode; instead the FP math is done in SSE..


I'm not quite sure what you mean here, but from the way I understand this statement, it is wrong. In page 8-2 of the "IA-32 Intel Architecture Software Developer's Manual, Vol. 1," section 8.1.1 says

Quote:
In compatibility mode and 64-bit mode, x87 FPU instructions function like they do in protected mode.


In another sense, you may be referring to the fact that architectural (x87) instructions are broken down into u-ops and sent to the floating point cluster, which also handles both SIMD and SISD ops, then you are correct. But this is unrelated to 64-bit mode, and was true well before the advent of x86-64.
April 23, 2007 10:33:03 PM

Quote:
Two things you did not mention that are different between x86_64 and x86 is that the old x87 FPU is not used in x86_64 long mode; instead the FP math is done in SSE. This results in a big speed-up on certain chips. Also, there are double the number of general-purpose registers in x86_64 chips as there are in x86 chips.


True. This can prove a pretty decent benefit in some routines. Although most compiler-generated x86 code is so stack-oriented that I don't know if the extra registers would see a whole lot of use except in low-level code. I will do a simple test when I get home from work by compiling some C++ code for x86-64 and telling my compiler to print out the generated assembly code. It would vary on a compiler to compiler basis but it would show which registers are used.

One negative factor to 64-bit code is that the code size itself increases. Which means less room for code branches in the L1/L2 cache. Meaning more of a chance for cache misses and pipeline stalls. It's a delicate balancing act.
April 23, 2007 10:42:51 PM

Quote:

One negative factor to 64-bit code is that the code size itself increases. Which means less room for code branches in the L1/L2 cache. Meaning more of a chance for cache misses and pipeline stalls. It's a delicate balancing act.


This is not as huge of a deal as it is in RISC architectures. When instructions deal with 64-bit registers, it adds the REX prefix to the instruction, which is only 1 byte. I would say the average instruction encoding is around 4 bytes, so this is only a 25% increase. The other instance of code size increase is for 64-bit immediate values, which aren't too common.

You are right in stating that there is a performance penalty, but it is not 2x. I know that you didn't suggest this, but one might think 2x from 32->64.
April 23, 2007 11:05:00 PM

:trophy:

Most excellent summarization of the history. Great read for both the informed and those who aren't.
a c 96 à CPUs
April 23, 2007 11:15:11 PM

I stand corrected after reading the appropriate section in my AMD Architecture Programmer's Manual. The math being done in SSE in 64-bit mode is generally done as it's usually faster (as all x86_64 chips support at least SSE2) but the x87 stuff still works in 64-bit long mode. Thanks for the correction.
April 23, 2007 11:27:26 PM

Darn good read. :trophy:
April 23, 2007 11:31:40 PM

Just a nitpicky note, Intel's Itanium came out before the Opteron, so Intel actually had a 64-bit proc before AMD, it just wasn't x86 compatible.
April 24, 2007 12:42:31 AM

Nice primer !!!!

I would like to add an overseen item though, maybe the most important.....

I agree compleetly with the fact that it is not much necessary a the moment the extra addressing space that 64 bit procs offer. What its missing here is the performance boost that AMD64 has over x86 because of the new ABI. It is not true that the code gets bigger, in fact it gets smaller most of the time and hence faster. WHy?? Well having that amount of registers allows you not only to pass more arguments in registers but frees a lot of stack allocation for local function variables reducing stack access. Just try a objdump on any code you like and you'll see the big advantage of AMD64, calling convention is almost forced to fastcall. And the whole thing about possible states, well, most booleans and inmediates are still represented as int32 so they all keep their size. Its all clearly specified in the ABI, where the real performance gains of AMD64 lie. BTW, microsoft uses a different abi than gcc for example. You get more performance from x86-64 than from x64, but anyway serious stuff and ms are mutually exclusive :) 

Itanium 64 bits?? mmmmmm I remember some 64bit MIPS cores in the 90s.....
April 24, 2007 12:47:04 AM

Quote:
Itanium 64 bits?? mmmmmm I remember some 64bit MIPS cores in the 90s.....


Yeah but this was specifically about Intel-AMD, 64-bit goes back to the 60's if you include every architecture.
April 24, 2007 6:23:41 AM

18,446,744,073,709,551,616 bytes, and how it was explained to me:

Imagine an elephant eating away for a full year in lush fields, peanuts galore, and then taking a shit on your lawn. It's that big.

Overall I liked the laymen terms of the post. Made it easier to comprehend.
April 24, 2007 7:05:23 AM

A very good read. :wink:

@verndewd
Where can I DL Quarls texture pack 3.0?
April 24, 2007 8:27:19 AM

First off, I agree that most gamers currently don't have much use for 64-bit instructions aside from being able to address more than 3-4GB.

I strongly disagree with you overall though. Many scientific and server related applications need to do computations with very large numbers very fast. Take databases or cryptography for example.

Also, floating point instructions are much much slower than integer instructions.

I think your post is very misleading and all of those "Ignorami" you are now addressing supposedly think that the only thing 64 bit is good for is addressing memory.

Sure if all you do is play games on windows you don't need it and it doesn't do you much good. I do most of my research on an all 64 bit gentoo system and it makes my life alot easier.
April 24, 2007 8:41:33 AM

Quote:
First off, I agree that most gamers currently don't have much use for 64-bit instructions aside from being able to address more than 3-4GB.

I strongly disagree with you overall though. Many scientific and server related applications need to do computations with very large numbers very fast. Take databases or cryptography for example.

Also, floating point instructions are much much slower than integer instructions.

I think your post is very misleading and all of those "Ignorami" you are now addressing supposedly think that the only thing 64 bit is good for is addressing memory.

Sure if all you do is play games on windows you don't need it and it doesn't do you much good. I do most of my research on an all 64 bit gentoo system and it makes my life alot easier.


My guess is, that was the objective of the post. I'm also guessing that Intel realizes the further disadvantage they're going to have in 64bit computing, and doing everything they can to preemptively downplay that advantage. And WOW! 12 5 star votes! You don't see that happen every day.
April 24, 2007 8:49:18 AM

Piesquared, thanks for that post.

It isn't every day i read a thread that long and not get bored after the first scroll! I had the 4Gb in Bios-3 in 32Bit OS problem quite a while ago, and managed to research about it and come up with what you said, but i really wish i had this article to read, as this puts it in plain english.

Cheers Pi^2...
April 24, 2007 8:52:16 AM

Quote:
Piesquared, thanks for that post.

It isn't every day i read a thread that long and not get bored after the first scroll! I had the 4Gb in Bios-3 in 32Bit OS problem quite a while ago, and managed to research about it and come up with what you said, but i really wish i had this article to read, as this puts it in plain english.

Cheers Pi^2...


lol. Ok, hold the phone here fellas. First, I never wrote the original post. Secondly, I completely disagree with it. :lol:  Although it also sounds like you have a pretty decent handle on how it works judging from your wording, and i'm not really sure which part you agree with?
!