Part 2: 2D, Acceleration, And Windows: Aren't All Graphics Cards Equal?

The 2D GDI For Windows XP Through Windows 7, In Detail

XP: Clear Sailing without Competition

Up until (and including) Windows XP, GDI played a key role in rendering 2D graphics. The presence of easily-simplified procedures makes this patently obvious. The mouse movements used to draw a line are transmitted to win32.sys, the central clearinghouse for graphics input. It doesn’t matter whether we’re using the mouse, keystrokes, or other graphical inputs; all such data congregates in this routine and goes directly to 2D graphics rendering modules form there. Our user actions include only 2D graphics information, which gets translated immediately into GDI drawing instructions. These are forwarded to the GDI, as illustrated by the purple arrows in the following diagram.

Running XP, the old 2D world remains orderly and predictable

These simple procedures used to handle 2D graphics in software also explain why it’s so easy to convert them to hardware acceleration, provided that the graphics card offers the necessary capabilities to render them independently. The blue arrow in the preceding figure shows how information returns to the calling application, so that it may be notified that window contents have changed (for example, when other windows may no longer obscure some of its visible content), thereby forcing a redraw.

Radeon HD 1000-series and GeForce 7000-series cards all included discrete 2D circuitry. This disappeared with the introduction of DX10 cards, with their unified shader architectures.

Windows Vista: CPU instead of GPU, and Buffering instead of Direct Delivery

As we explained in Part 1 of this story, Vista introduces a completely new path for graphics data through the OS. Using the GDI, all versions of Windows up to and including XP handled 2D drawing through outputs from win32k.sys to manage window contents on-screen.

In Vista, the DWM (dynamic window manager) takes over this role. As a consequence, Vista uses only Direct3D to manage windows instead. Every windows for every application is written to the texture storage as a 3D texture map on the graphics card. This is a practical evolution for more modern graphics cards, but it also means that GDI can no longer read from or write to this data. The communications chain appears to be broken in this situation.

At this point, the double buffering of window content that we explored in Part 1 of this story comes into play.

Excessive memory consumption and length code paths lead to perceptible sluggishness for 2D graphics in Vista

What exactly is going on here? Look at the red arrows in the preceding diagram. In the place of a unified graphics driver (in XP this is called the DDI Display Driver) the new CDD (Canonical Display Driver) is addressed instead. This module is independent from the graphics card in Vista. While the pending window content is stored as a texture map in the graphics card RAM, each window must also be stored in an equivalent buffer in the system RAM as well (its size equals window width times window height times four bytes for 32-bit color data).

The most current rendering of each window get transformed into a bitmap in the system RAM buffer, after which it is converted into a 3D texture map in the video RAM on the graphics card. Throughout, the DWM manages all windows and moves their contents around using Direct3D. The DWM also contains data about which portions of each window are visible on-screen, so that when any region in a window becomes obscured or gets revealed it can be redrawn (shown by the blue arrow in the preceding figure). At this moment, the DWM copies the contents of system RAM into the video RAM, re-rendering the window using Direct3D. The applications no longer need to redraw the window (in contrast to the way things worked under Windows XP).

The aforementioned approach effectively disables 2D hardware acceleration, resulting in a significant performance reduction compared to Windows XP. This manifests itself most clearly in Vista’s well-documented tendency to drag on 2D graphics and to consume large amounts of RAM.

Windows 7: Hardware Acceleration in Miniscule Doses

Even in our initial testing for Part 1 of this story, we could tell that Windows 7 once again offered at least partial support for hardware acceleration of GDI commands—that is, for cards with WDDM 1.1 drivers. Where such drivers are not available (for example, on some Intel graphics chipsets), Windows 7 behaves more or less like Vista. What does this mean for us exactly? Let’s take a look at a diagram of graphics flow in Windows 7:

Tips and tricks to lighten the ballast

At first glance, things look pretty much the same as they did under Vista. We can see, however, that it’s no longer necessary to double-buffer every widow’s contents. Instead of system RAM, the term aperture memory now comes into play. This refers to a specific region within the normal system memory that the graphics card can access directly. If a window area changes because of movement or overlays, those window contents may be copied directly from this memory range to the video RAM on the graphics card.

By comparison with Windows XP, only a subset of the GDI commands are supported in the GPU—namely, ClearType, ColorFill, BitBlt, AlphaBlend, TransparentBlt, and StretchBlt. Here’s the skinny for those not already in the know: this means direct text output, surface area fills with simple colors, and copying of image contents and transparent overlays. Whereas rendering of complex geometrical figures isn’t supported at all, copies of image contents and area fills can easily be transferred from aperture memory directly to video RAM.

Summary

Windows 7 reduces memory usage by eliminating most of the double buffering of window contents. Even Vista benefits from some of the same effects, thanks to advances resulting from the newer WDDM driver model. That’s why hardware acceleration is once again possible, thanks to the new platform update (which occurred in tandem with the introduction of DirectX 11) for Windows 7. Those specifics are what we hope to chase down in the rest of this story.

  • mdm08
    I have a 5850 with 10.1 drivers and it seems Photoshop CS4 doesn't recognize it as a graphics card that can improve performance so all those cool new features like animated zoom, kinetic panning, and such seem to be disabled. Also, it when you have a very complex group of objects and you try to nudge it ( move it one pixel with arrow keys) the computer actually shows the spinning wheel and has to process this instead of being instantaneous like it was on my older 7600GT. Is this an issue related with what this article is saying about apps written for GDI or is this a different issue i'm experiencing?
    Reply
  • jrharbort
    Scores on 9600M GT and T9600 Core 2 Duo with Windows XP and latest graphics drivers. Only 11 active background processes no including benchmark, and themes disabled.

    BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

    Text: 8556 chars/sec
    Line: 47513 lines/sec
    Polygon: 7757 polygons/sec
    Rectangle: 6564 rects/sec
    Arc/Ellipse: 3874 ellipses/sec
    Blitting: 13974 operations/sec
    Stretching: 266 operations/sec
    Splines/Bézier: 10510 splines/sec
    Score: 984
    Reply
  • It would be great if you can run the test on some "pro" cards (quadroFX, quadroNVS, firePro & fireMV). Just to see if the "pro" drivers change standard UI rendering or the optimizations are only for the professional DCC software.
    Reply
  • liquidsnake718
    mdm08I have a 5850 with 10.1 drivers and it seems Photoshop CS4 doesn't recognize it as a graphics card that can improve performance so all those cool new features like animated zoom, kinetic panning, and such seem to be disabled. Also, it when you have a very complex group of objects and you try to nudge it ( move it one pixel with arrow keys) the computer actually shows the spinning wheel and has to process this instead of being instantaneous like it was on my older 7600GT. Is this an issue related with what this article is saying about apps written for GDI or is this a different issue i'm experiencing?Oh great, more news on a 5xxx series not being able to handle simple apps like CS4.... I have yet to use CS4 on my desktop with my 5850..... I hope Ati comes out with more patches if this is a problem.
    Reply
  • taltamir
    windows XP is dead... get on the windows 7 64bit bandwagon already you Luddites! (not referring to the authors of the article, they raise good points; I am referring to those customers who insist that XP is some sort of holy grail of windows bliss never seen before or after)
    Reply
  • Scores on P4 2.8 HT Northwood W ati 2600 pro drivers 10.1 aero Win 7 :
    BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

    Text: 8106 chars/sec
    Line: 6528 lines/sec
    Polygon: 249 polygons/sec
    Rectangle: 1484 rects/sec
    Arc/Ellipse: 6127 ellipses/sec
    Blitting: 379 operations/sec
    Stretching: 80 operations/sec
    Splines/Bézier: 5263 splines/sec
    Score: 362
    Reply
  • Scores on P4 2.8 HT Northwood W ati 2600 pro drivers 10.1 aero Win 7 :

    BENCHMARK: DIB-BUFFER AND BLIT

    Text: 12633 chars/sec
    Line: 21067 lines/sec
    Polygon: 4087 polygons/sec
    Rectangle: 535 rects/sec
    Arc/Ellipse: 5604 ellipses/sec
    Blitting: 1443 operations/sec
    Stretching: 213 operations/sec
    Splines/Bézier: 12213 splines/sec
    Score: 607
    Reply
  • giovanni86
    BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

    Text: 54466 chars/sec
    Line: 73135 lines/sec
    Polygon: 23943 polygons/sec
    Rectangle: 3927 rects/sec
    Arc/Ellipse: 26911 ellipses/sec
    Blitting: 9827 operations/sec
    Stretching: 464 operations/sec
    Splines/Bézier: 41911 splines/sec
    Score: 2600
    Reply
  • helle040
    Rdaeon 4670, amd 7750be, winxp, drivers 10.1, resolutie 1280x1024, 32bit
    Text: 45746
    line: 40508
    Splines/beziers: 20466
    Poygon: 322
    Rectangle: 1954
    Arc/E.: 3494
    Biting: 2406
    Stretching: 211
    Score: 1150
    Reply
  • wxj
    I’ve always preferred GDI operations over those of the NOD. GDI have more basic operations set verses NOD’s more complex and sometimes unreliable operations.
    Reply