But compiler cant vectorize every piece of high level code you write. The coder has to give some keywords to the compiler to tell it to vectorize specific piece of code. Or change their loops to be more data parallel, which the compiler can optimise.Or change the coding style to make it better accessible to compilers. Or use a library to which has this implemented. But then someone has to write that library as well. So why arent library developers/ low level coders using AVX ?
1: Because its a waste of time for very little performance gain in applications that are for the most part not performance sensitive.
2: Because most developers don't know HOW.
3: Lack of compiler support (again, Visual C 6 is probably still the most used Windows compiler, which is SAD, but true).
You aren't going to see developers waste their time developing something like this:
if AVX_Supported then
else if SSE_42_Supported then
else if SSE_41a_Supported then
else if SSE_41_Supported then
else if SSE_3 supported then
and so on and so forth. Its a waste of time that could otherwise be used to hammer out the other outstanding bugs that exist. Sure, you have a handful of applications (games/encoding) that actually need the performance gain, but for everything else, whats the point?
This is the same exact reason why if you have a 'for' loop that looks like this:
for i in 1 to Some_Really_Huge_Number
a = Some_Number
you don't see developers thread, even though this example would thread well: Because its a waste of time for very little performance benefit. (At least in this case, you can use OpenMP to automatically parallize this construct, which is an improvement...if using OpenMP)
Seriously, developers are often working unpaid overtime as deadlines approach, trying desperately to quash every remaining major bug prior to some corporate mandated release date, then working more overtime to get the first patch out when customers yell causing a PR nightmare, while being blamed by upper management for the state of the product, despite warning months more development time was needed. (I have stories, lets leave it at that). Then I get people who complain I don't use the latest and greatest CPU opcodes to squeeze an extra .01% performance in an application that is not performance sensitive!