# Part 2: 2D, Acceleration, And Windows: Aren't All Graphics Cards Equal?

## 2D Graphics Output Using GDI: Direct Or Buffered?

It doesn’t matter how XP, Vista, or Windows 7 interact with GDI; from a programming standpoint, coding methods all remain the same. How those implementations differ is what we covered in the preceding section of this story, particularly when it comes to the graphic card. How this actually work for these OSes is what we cover here in this section.

Line-drawing Commands

No matter what we might want to say about 2D output from GDI, everything is based on a well-established collection of standardized drawing instructions. Because the details of those instructions fall outside the scope of this story, we have only this much to say about it here: each of the graphics primitives, including lines, curves, polygons, rectangles, and ellipses, has its own well-defined command, including properties like area fills, line width, color, and so forth. We’ll describe these commands along with any associated parameters they might take, just as they’re passed to the GDI. Anything that happens after that falls outside the active application’s control anyway.

Direct and Buffered Drawing: Comparing Ants to Elephants

In principle, it shouldn't make a difference if a million ants carry grains of sand 100 feet from point A to point B, or if you load a big container full of sand onto the back of an elephant and move all that sand in a single transfer. Both approaches achieve the same goal.

Nevertheless, let’s examine the differences between these approaches: the elephant involves a lot less traffic between the two endpoints. Coordinating the efforts of a million ants takes more time and effort than loading and hoisting a single container onto the elephant.

The benefit of the ant method is that no additional container (or buffer) is needed to convey the materials. Also, when only a few grains of sand need to be moved, ants are more efficient and flexible than elephant. It always comes down to the types of activities and the amount of data under consideration as to which side wins. Now, let’s look at how drawing proceeds using GDI to paint a display device (monitor).

Direct drawing using the ant method is slow and well-suited only in certain cases, because individual graphics objects are rendered slowly, one at a time.

Indirect drawing using the elephant method means assembling all visible elements within some rectangular region inside a buffer, then rendered all in one go, when the buffer is filled.

It’s easy to see that as soon as more complex drawing commands must be rendered, the buffer method works noticeably faster. The disadvantage inherent to this method is that a form of temporary storage (called a device-independent bitmap [DIB]) that’s equal in size to the visible display region is required.

If you overlay two identical objects using XOR, here’s how they look

Fortunately, the resource cost is usually more than offset by the increase in rendering speed that this approach affords. Of course, this also means that when only small changes are needed, the entire buffer must still be populated and then copied to the graphics card or window manager. Let’s look at one particular case when a direct output is enabled.

Real-time Output when Positioning and Editing 2D Objects

For example, if you want to use the mouse to move a geometric shape, such as a polygon, from position A to position B on the drawing surface, it wouldn't make sense to re-draw that object for each point along the cursor path between those two points, where each such rendering requires filling the buffer then rendering its contents. With the help of the ROP (raster operator), it’s much more straightforward to proceed using XOR (exclusive OR) rendering techniques.

Moving an object with driect draw using the output from the ROP (XOR) function

First, you must redraw the object using XOR at its prior location directly on the display device. This causes the original object to “disappear” on the display surface, as if by magic. Next, you must draw that object in the new position sans XOR to make it appear in its new location. Repeat this process for each individual mouse movement, and it’s possible to render anywhere from 10-50 position changes every second. The human eye sees this kind of movement as smooth and flicker-free. Only when the final position is reached will the buffer be completely refilled and then redrawn on-screen.

This method aside, direct drawing to the display device is called “floating drawing.” Please take note of this process, because we will refer back to it in our next section, when it comes to explaining the 2D behavior of the ATI Radeon HD 5000-series graphics cards as they sit today.

Another point of discussion comes from the rendering of so-called “floating objects.” This subject includes all of the marking points used to guide how drawings are displayed and oriented when they’re rendered on-screen, along with the graphics primitives involved. As the number of such objects or values gets progressively larger, graphics problems may manifest themselves. They are not a constant element for drawing on-screen, and aren’t buffered in the vast majority of programs.

Conclusions

Looking back at the various diagrams in the preceding section we can see that 2D hardware acceleration is supported in Windows XP and involves no detours for direct graphics output. In Vista, it really doesn’t matter if we use a buffer or attempt to send each drawing instruction directly to the display device. The whole window gets buffered along the output path anyway. For Windows 7 with WDDM 1.1 drivers, we lose the second buffer so that only changes need be updated on-screen.

Summary
Comment from the forums
• Anonymous
It would be great if you can run the test on some "pro" cards (quadroFX, quadroNVS, firePro & fireMV). Just to see if the "pro" drivers change standard UI rendering or the optimizations are only for the professional DCC software.
• wxj
I’ve always preferred GDI operations over those of the NOD. GDI have more basic operations set verses NOD’s more complex and sometimes unreliable operations.
• mdm08
I have a 5850 with 10.1 drivers and it seems Photoshop CS4 doesn't recognize it as a graphics card that can improve performance so all those cool new features like animated zoom, kinetic panning, and such seem to be disabled. Also, it when you have a very complex group of objects and you try to nudge it ( move it one pixel with arrow keys) the computer actually shows the spinning wheel and has to process this instead of being instantaneous like it was on my older 7600GT. Is this an issue related with what this article is saying about apps written for GDI or is this a different issue i'm experiencing?
• jrharbort
Scores on 9600M GT and T9600 Core 2 Duo with Windows XP and latest graphics drivers. Only 11 active background processes no including benchmark, and themes disabled.

BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

Text: 8556 chars/sec
Line: 47513 lines/sec
Polygon: 7757 polygons/sec
Rectangle: 6564 rects/sec
Arc/Ellipse: 3874 ellipses/sec
Blitting: 13974 operations/sec
Stretching: 266 operations/sec
Splines/Bézier: 10510 splines/sec
Score: 984
• Anonymous
It would be great if you can run the test on some "pro" cards (quadroFX, quadroNVS, firePro & fireMV). Just to see if the "pro" drivers change standard UI rendering or the optimizations are only for the professional DCC software.
• liquidsnake718
mdm08I have a 5850 with 10.1 drivers and it seems Photoshop CS4 doesn't recognize it as a graphics card that can improve performance so all those cool new features like animated zoom, kinetic panning, and such seem to be disabled. Also, it when you have a very complex group of objects and you try to nudge it ( move it one pixel with arrow keys) the computer actually shows the spinning wheel and has to process this instead of being instantaneous like it was on my older 7600GT. Is this an issue related with what this article is saying about apps written for GDI or is this a different issue i'm experiencing?

Oh great, more news on a 5xxx series not being able to handle simple apps like CS4.... I have yet to use CS4 on my desktop with my 5850..... I hope Ati comes out with more patches if this is a problem.
• taltamir
windows XP is dead... get on the windows 7 64bit bandwagon already you Luddites! (not referring to the authors of the article, they raise good points; I am referring to those customers who insist that XP is some sort of holy grail of windows bliss never seen before or after)
• Anonymous
Scores on P4 2.8 HT Northwood W ati 2600 pro drivers 10.1 aero Win 7 :
BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

Text: 8106 chars/sec
Line: 6528 lines/sec
Polygon: 249 polygons/sec
Rectangle: 1484 rects/sec
Arc/Ellipse: 6127 ellipses/sec
Blitting: 379 operations/sec
Stretching: 80 operations/sec
Splines/Bézier: 5263 splines/sec
Score: 362
• Anonymous
Scores on P4 2.8 HT Northwood W ati 2600 pro drivers 10.1 aero Win 7 :

BENCHMARK: DIB-BUFFER AND BLIT

Text: 12633 chars/sec
Line: 21067 lines/sec
Polygon: 4087 polygons/sec
Rectangle: 535 rects/sec
Arc/Ellipse: 5604 ellipses/sec
Blitting: 1443 operations/sec
Stretching: 213 operations/sec
Splines/Bézier: 12213 splines/sec
Score: 607
• giovanni86
BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

Text: 54466 chars/sec
Line: 73135 lines/sec
Polygon: 23943 polygons/sec
Rectangle: 3927 rects/sec
Arc/Ellipse: 26911 ellipses/sec
Blitting: 9827 operations/sec
Stretching: 464 operations/sec
Splines/Bézier: 41911 splines/sec
Score: 2600
• helle040
Rdaeon 4670, amd 7750be, winxp, drivers 10.1, resolutie 1280x1024, 32bit
Text: 45746
line: 40508
Splines/beziers: 20466
Poygon: 322
Rectangle: 1954
Arc/E.: 3494
Biting: 2406
Stretching: 211
Score: 1150
• wxj
I’ve always preferred GDI operations over those of the NOD. GDI have more basic operations set verses NOD’s more complex and sometimes unreliable operations.
• snemarch
A couple of ideas...

First, try adding a DDB (device dependent bitmap) test mode as well - even if your DIBs are using the same colordepth as the display mode, perhaps some drivers fail to optimize for this?

Second, what about the power savings mode modern GPUs tend to run in, in 2D mode? I'm not sure what it takes to kick out of the power-savings mode, but perhaps it could be as simple as creating a D3D context and displaying a single frame?
• helle040
I have two more scores, 950, 1002,and the first was 1150 So how is this possible, more then 10% difference?
• JonathanDeane
BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

Text: 45496 chars/sec
Line: 44352 lines/sec
Polygon: 12293 polygons/sec
Rectangle: 7013 rects/sec
Arc/Ellipse: 9854 ellipses/sec
Blitting: 7265 operations/sec
Stretching: 642 operations/sec
Splines/Bézier: 36955 splines/sec
Score: 1853

4870 Windows 7 running Cat 9.12
When will they get curves/ellipses fixed?
• proofhitter
BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

Text: 57405 chars/sec
Line: 42421 lines/sec
Polygon: 14198 polygons/sec
Rectangle: 10724 rects/sec
Arc/Ellipse: 18070 ellipses/sec
Blitting: 16297 operations/sec
Stretching: 615 operations/sec
Splines/Bézier: 37779 splines/sec
Score: 2382

BENCHMARK: DIB-BUFFER AND BLIT

Text: 38521 chars/sec
Line: 145631 lines/sec
Polygon: 22599 polygons/sec
Rectangle: 2572 rects/sec
Arc/Ellipse: 30395 ellipses/sec
Blitting: 10595 operations/sec
Stretching: 1082 operations/sec
Splines/Bézier: 47304 splines/sec
Score: 2904

Windows 7 core i7 920@3.8 ati 4890 Catalyst 10.1
• snemarch
ljbadeWhen will they get curves/ellipses fixed?
Probably not the easiest thing to accelerate on a GPU - the prime candidates would be filled polys, rectangles (subset of polys) and blits (including stretched ones - easy to do with a 3D quad or two tris).

Points, lines, bezier curves, arcs (and circles, a subset of arc), ellipses, unfilled polys/rectangles ... all those aren't easily/efficiently implemented with the core 3D primitive: filled polygons. Perhaps some of it can be efficiently implemented with shaders, but that's uncharted territory for me
• amdfangirl
BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

Text: 25419 chars/sec
Line: 26376 lines/sec
Polygon: 8153 polygons/sec
Rectangle: 1152 rects/sec
Arc/Ellipse: 8672 ellipses/sec
Blitting: 3359 operations/sec
Stretching: 455 operations/sec
Splines/Bézier: 15977 splines/sec
Score: 1011

CPU: C2D e4300
GFX: GMA 3100
MB: G31
OS: Win 7
• Anonymous
BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE (AMD FUSION: Max Performance)
Text: 17806 chars/sec
Line: 20636 lines/sec
Polygon: 17975 polygons/sec
Rectangle: 3312 rects/sec
Arc/Ellipse: 23234 ellipses/sec
Blitting: 6199 operations/sec
Stretching: 321 operations/sec
Splines/Bézier: 20157 splines/sec
Score: 1433

BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE (AMD FUSION: Productibity)
Text: 6140 chars/sec
Line: 5109 lines/sec
Polygon: 5233 polygons/sec
Rectangle: 1017 rects/sec
Arc/Ellipse: 6819 ellipses/sec
Blitting: 1851 operations/sec
Stretching: 104 operations/sec
Splines/Bézier: 5673 splines/sec
Score: 427

BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE
Text: 43898 chars/sec
Line: 73421 lines/sec
Polygon: 17483 polygons/sec
Rectangle: 2989 rects/sec
Arc/Ellipse: 21805 ellipses/sec
Blitting: 6082 operations/sec
Stretching: 363 operations/sec
Splines/Bézier: 35804 splines/sec
Score: 2138

Windows vista 64, Phenom2 x3 720 (4 core enabled), Catalyst 10.1, ati 5850
• rickzor
Wow, voodoo4 actually made better than newer gpu proposals in some tests, despite the fact that it was running under win98.
• gamerk316
*Sigh*, again, why I prefer OpenGL instead; having two seperate operation pipelines for 2d/3d space is just madness...