Sign in with
Sign up | Sign in

The 2D GDI For Windows XP Through Windows 7, In Detail

Part 2: 2D, Acceleration, And Windows: Aren't All Graphics Cards Equal?
By , Igor Wallossek

XP: Clear Sailing without Competition

Up until (and including) Windows XP, GDI played a key role in rendering 2D graphics. The presence of easily-simplified procedures makes this patently obvious. The mouse movements used to draw a line are transmitted to win32.sys, the central clearinghouse for graphics input. It doesn’t matter whether we’re using the mouse, keystrokes, or other graphical inputs; all such data congregates in this routine and goes directly to 2D graphics rendering modules form there. Our user actions include only 2D graphics information, which gets translated immediately into GDI drawing instructions. These are forwarded to the GDI, as illustrated by the purple arrows in the following diagram.

Running XP, the old 2D world remains orderly and predictableRunning XP, the old 2D world remains orderly and predictable

These simple procedures used to handle 2D graphics in software also explain why it’s so easy to convert them to hardware acceleration, provided that the graphics card offers the necessary capabilities to render them independently. The blue arrow in the preceding figure shows how information returns to the calling application, so that it may be notified that window contents have changed (for example, when other windows may no longer obscure some of its visible content), thereby forcing a redraw.

Radeon HD 1000-series and GeForce 7000-series cards all included discrete 2D circuitry. This disappeared with the introduction of DX10 cards, with their unified shader architectures.Radeon HD 1000-series and GeForce 7000-series cards all included discrete 2D circuitry. This disappeared with the introduction of DX10 cards, with their unified shader architectures.

Windows Vista: CPU instead of GPU, and Buffering instead of Direct Delivery

As we explained in Part 1 of this story, Vista introduces a completely new path for graphics data through the OS. Using the GDI, all versions of Windows up to and including XP handled 2D drawing through outputs from win32k.sys to manage window contents on-screen.

In Vista, the DWM (dynamic window manager) takes over this role. As a consequence, Vista uses only Direct3D to manage windows instead. Every windows for every application is written to the texture storage as a 3D texture map on the graphics card. This is a practical evolution for more modern graphics cards, but it also means that GDI can no longer read from or write to this data. The communications chain appears to be broken in this situation.

At this point, the double buffering of window content that we explored in Part 1 of this story comes into play.

Excessive memory consumption and length code paths lead to perceptible sluggishness for 2D graphics in VistaExcessive memory consumption and length code paths lead to perceptible sluggishness for 2D graphics in Vista

What exactly is going on here? Look at the red arrows in the preceding diagram. In the place of a unified graphics driver (in XP this is called the DDI Display Driver) the new CDD (Canonical Display Driver) is addressed instead. This module is independent from the graphics card in Vista. While the pending window content is stored as a texture map in the graphics card RAM, each window must also be stored in an equivalent buffer in the system RAM as well (its size equals window width times window height times four bytes for 32-bit color data).

The most current rendering of each window get transformed into a bitmap in the system RAM buffer, after which it is converted into a 3D texture map in the video RAM on the graphics card. Throughout, the DWM manages all windows and moves their contents around using Direct3D. The DWM also contains data about which portions of each window are visible on-screen, so that when any region in a window becomes obscured or gets revealed it can be redrawn (shown by the blue arrow in the preceding figure). At this moment, the DWM copies the contents of system RAM into the video RAM, re-rendering the window using Direct3D. The applications no longer need to redraw the window (in contrast to the way things worked under Windows XP).

The aforementioned approach effectively disables 2D hardware acceleration, resulting in a significant performance reduction compared to Windows XP. This manifests itself most clearly in Vista’s well-documented tendency to drag on 2D graphics and to consume large amounts of RAM.

Windows 7: Hardware Acceleration in Miniscule Doses

Even in our initial testing for Part 1 of this story, we could tell that Windows 7 once again offered at least partial support for hardware acceleration of GDI commands—that is, for cards with WDDM 1.1 drivers. Where such drivers are not available (for example, on some Intel graphics chipsets), Windows 7 behaves more or less like Vista. What does this mean for us exactly? Let’s take a look at a diagram of graphics flow in Windows 7:

Tips and tricks to lighten the ballastTips and tricks to lighten the ballast

At first glance, things look pretty much the same as they did under Vista. We can see, however, that it’s no longer necessary to double-buffer every widow’s contents. Instead of system RAM, the term aperture memory now comes into play. This refers to a specific region within the normal system memory that the graphics card can access directly. If a window area changes because of movement or overlays, those window contents may be copied directly from this memory range to the video RAM on the graphics card.

By comparison with Windows XP, only a subset of the GDI commands are supported in the GPU—namely, ClearType, ColorFill, BitBlt, AlphaBlend, TransparentBlt, and StretchBlt. Here’s the skinny for those not already in the know: this means direct text output, surface area fills with simple colors, and copying of image contents and transparent overlays. Whereas rendering of complex geometrical figures isn’t supported at all, copies of image contents and area fills can easily be transferred from aperture memory directly to video RAM.

Summary

Windows 7 reduces memory usage by eliminating most of the double buffering of window contents. Even Vista benefits from some of the same effects, thanks to advances resulting from the newer WDDM driver model. That’s why hardware acceleration is once again possible, thanks to the new platform update (which occurred in tandem with the introduction of DirectX 11) for Windows 7. Those specifics are what we hope to chase down in the rest of this story.

Ask a Category Expert

Create a new thread in the Reviews comments forum about this subject

Example: Notebook, Android, SSD hard drive

Display all 121 comments.
This thread is closed for comments
Top Comments
  • 19 Hide
    Anonymous , February 16, 2010 6:45 AM
    It would be great if you can run the test on some "pro" cards (quadroFX, quadroNVS, firePro & fireMV). Just to see if the "pro" drivers change standard UI rendering or the optimizations are only for the professional DCC software.
  • 18 Hide
    wxj , February 16, 2010 8:20 AM
    I’ve always preferred GDI operations over those of the NOD. GDI have more basic operations set verses NOD’s more complex and sometimes unreliable operations.
Other Comments
  • 0 Hide
    mdm08 , February 16, 2010 6:01 AM
    I have a 5850 with 10.1 drivers and it seems Photoshop CS4 doesn't recognize it as a graphics card that can improve performance so all those cool new features like animated zoom, kinetic panning, and such seem to be disabled. Also, it when you have a very complex group of objects and you try to nudge it ( move it one pixel with arrow keys) the computer actually shows the spinning wheel and has to process this instead of being instantaneous like it was on my older 7600GT. Is this an issue related with what this article is saying about apps written for GDI or is this a different issue i'm experiencing?
  • 0 Hide
    jrharbort , February 16, 2010 6:03 AM
    Scores on 9600M GT and T9600 Core 2 Duo with Windows XP and latest graphics drivers. Only 11 active background processes no including benchmark, and themes disabled.

    BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

    Text: 8556 chars/sec
    Line: 47513 lines/sec
    Polygon: 7757 polygons/sec
    Rectangle: 6564 rects/sec
    Arc/Ellipse: 3874 ellipses/sec
    Blitting: 13974 operations/sec
    Stretching: 266 operations/sec
    Splines/Bézier: 10510 splines/sec
    Score: 984
  • 19 Hide
    Anonymous , February 16, 2010 6:45 AM
    It would be great if you can run the test on some "pro" cards (quadroFX, quadroNVS, firePro & fireMV). Just to see if the "pro" drivers change standard UI rendering or the optimizations are only for the professional DCC software.
  • -1 Hide
    liquidsnake718 , February 16, 2010 7:02 AM
    mdm08I have a 5850 with 10.1 drivers and it seems Photoshop CS4 doesn't recognize it as a graphics card that can improve performance so all those cool new features like animated zoom, kinetic panning, and such seem to be disabled. Also, it when you have a very complex group of objects and you try to nudge it ( move it one pixel with arrow keys) the computer actually shows the spinning wheel and has to process this instead of being instantaneous like it was on my older 7600GT. Is this an issue related with what this article is saying about apps written for GDI or is this a different issue i'm experiencing?

    Oh great, more news on a 5xxx series not being able to handle simple apps like CS4.... I have yet to use CS4 on my desktop with my 5850..... I hope Ati comes out with more patches if this is a problem.
  • -7 Hide
    taltamir , February 16, 2010 7:25 AM
    windows XP is dead... get on the windows 7 64bit bandwagon already you Luddites! (not referring to the authors of the article, they raise good points; I am referring to those customers who insist that XP is some sort of holy grail of windows bliss never seen before or after)
  • 2 Hide
    Anonymous , February 16, 2010 7:48 AM
    Scores on P4 2.8 HT Northwood W ati 2600 pro drivers 10.1 aero Win 7 :
    BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

    Text: 8106 chars/sec
    Line: 6528 lines/sec
    Polygon: 249 polygons/sec
    Rectangle: 1484 rects/sec
    Arc/Ellipse: 6127 ellipses/sec
    Blitting: 379 operations/sec
    Stretching: 80 operations/sec
    Splines/Bézier: 5263 splines/sec
    Score: 362
  • -4 Hide
    Anonymous , February 16, 2010 7:55 AM
    Scores on P4 2.8 HT Northwood W ati 2600 pro drivers 10.1 aero Win 7 :

    BENCHMARK: DIB-BUFFER AND BLIT

    Text: 12633 chars/sec
    Line: 21067 lines/sec
    Polygon: 4087 polygons/sec
    Rectangle: 535 rects/sec
    Arc/Ellipse: 5604 ellipses/sec
    Blitting: 1443 operations/sec
    Stretching: 213 operations/sec
    Splines/Bézier: 12213 splines/sec
    Score: 607
  • -4 Hide
    giovanni86 , February 16, 2010 7:58 AM
    BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

    Text: 54466 chars/sec
    Line: 73135 lines/sec
    Polygon: 23943 polygons/sec
    Rectangle: 3927 rects/sec
    Arc/Ellipse: 26911 ellipses/sec
    Blitting: 9827 operations/sec
    Stretching: 464 operations/sec
    Splines/Bézier: 41911 splines/sec
    Score: 2600
  • 1 Hide
    helle040 , February 16, 2010 8:10 AM
    Rdaeon 4670, amd 7750be, winxp, drivers 10.1, resolutie 1280x1024, 32bit
    Text: 45746
    line: 40508
    Splines/beziers: 20466
    Poygon: 322
    Rectangle: 1954
    Arc/E.: 3494
    Biting: 2406
    Stretching: 211
    Score: 1150
  • 18 Hide
    wxj , February 16, 2010 8:20 AM
    I’ve always preferred GDI operations over those of the NOD. GDI have more basic operations set verses NOD’s more complex and sometimes unreliable operations.
  • 1 Hide
    snemarch , February 16, 2010 8:46 AM
    A couple of ideas...

    First, try adding a DDB (device dependent bitmap) test mode as well - even if your DIBs are using the same colordepth as the display mode, perhaps some drivers fail to optimize for this?

    Second, what about the power savings mode modern GPUs tend to run in, in 2D mode? I'm not sure what it takes to kick out of the power-savings mode, but perhaps it could be as simple as creating a D3D context and displaying a single frame?
  • 0 Hide
    helle040 , February 16, 2010 8:55 AM
    I have two more scores, 950, 1002,and the first was 1150 So how is this possible, more then 10% difference?
  • 0 Hide
    JonathanDeane , February 16, 2010 8:58 AM
    BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

    Text: 45496 chars/sec
    Line: 44352 lines/sec
    Polygon: 12293 polygons/sec
    Rectangle: 7013 rects/sec
    Arc/Ellipse: 9854 ellipses/sec
    Blitting: 7265 operations/sec
    Stretching: 642 operations/sec
    Splines/Bézier: 36955 splines/sec
    Score: 1853

    4870 Windows 7 running Cat 9.12
  • 0 Hide
    ljbade , February 16, 2010 9:18 AM
    When will they get curves/ellipses fixed?
  • 0 Hide
    proofhitter , February 16, 2010 9:22 AM
    BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

    Text: 57405 chars/sec
    Line: 42421 lines/sec
    Polygon: 14198 polygons/sec
    Rectangle: 10724 rects/sec
    Arc/Ellipse: 18070 ellipses/sec
    Blitting: 16297 operations/sec
    Stretching: 615 operations/sec
    Splines/Bézier: 37779 splines/sec
    Score: 2382

    BENCHMARK: DIB-BUFFER AND BLIT

    Text: 38521 chars/sec
    Line: 145631 lines/sec
    Polygon: 22599 polygons/sec
    Rectangle: 2572 rects/sec
    Arc/Ellipse: 30395 ellipses/sec
    Blitting: 10595 operations/sec
    Stretching: 1082 operations/sec
    Splines/Bézier: 47304 splines/sec
    Score: 2904

    Windows 7 core i7 920@3.8 ati 4890 Catalyst 10.1
  • 0 Hide
    snemarch , February 16, 2010 9:32 AM
    ljbadeWhen will they get curves/ellipses fixed?
    Probably not the easiest thing to accelerate on a GPU - the prime candidates would be filled polys, rectangles (subset of polys) and blits (including stretched ones - easy to do with a 3D quad or two tris).

    Points, lines, bezier curves, arcs (and circles, a subset of arc), ellipses, unfilled polys/rectangles ... all those aren't easily/efficiently implemented with the core 3D primitive: filled polygons. Perhaps some of it can be efficiently implemented with shaders, but that's uncharted territory for me :) 
  • 0 Hide
    amdfangirl , February 16, 2010 9:36 AM
    BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE

    Text: 25419 chars/sec
    Line: 26376 lines/sec
    Polygon: 8153 polygons/sec
    Rectangle: 1152 rects/sec
    Arc/Ellipse: 8672 ellipses/sec
    Blitting: 3359 operations/sec
    Stretching: 455 operations/sec
    Splines/Bézier: 15977 splines/sec
    Score: 1011

    CPU: C2D e4300
    GFX: GMA 3100
    MB: G31
    OS: Win 7
  • 0 Hide
    Anonymous , February 16, 2010 10:17 AM
    BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE (AMD FUSION: Max Performance)
    Text: 17806 chars/sec
    Line: 20636 lines/sec
    Polygon: 17975 polygons/sec
    Rectangle: 3312 rects/sec
    Arc/Ellipse: 23234 ellipses/sec
    Blitting: 6199 operations/sec
    Stretching: 321 operations/sec
    Splines/Bézier: 20157 splines/sec
    Score: 1433

    BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE (AMD FUSION: Productibity)
    Text: 6140 chars/sec
    Line: 5109 lines/sec
    Polygon: 5233 polygons/sec
    Rectangle: 1017 rects/sec
    Arc/Ellipse: 6819 ellipses/sec
    Blitting: 1851 operations/sec
    Stretching: 104 operations/sec
    Splines/Bézier: 5673 splines/sec
    Score: 427

    BENCHMARK: DIRECT DRAWING TO VISIBLE DEVICE
    Text: 43898 chars/sec
    Line: 73421 lines/sec
    Polygon: 17483 polygons/sec
    Rectangle: 2989 rects/sec
    Arc/Ellipse: 21805 ellipses/sec
    Blitting: 6082 operations/sec
    Stretching: 363 operations/sec
    Splines/Bézier: 35804 splines/sec
    Score: 2138

    Windows vista 64, Phenom2 x3 720 (4 core enabled), Catalyst 10.1, ati 5850
  • 4 Hide
    rickzor , February 16, 2010 10:23 AM
    Wow, voodoo4 actually made better than newer gpu proposals in some tests, despite the fact that it was running under win98.
  • 7 Hide
    gamerk316 , February 16, 2010 10:45 AM
    *Sigh*, again, why I prefer OpenGL instead; having two seperate operation pipelines for 2d/3d space is just madness...
Display more comments