HyperThreading Threads Its Way into Application

Video Editing Threads Run Riot (But In Synch)

For the desktop, video editing represents a computationally intensive desktop application for which HT can offer noticeable performance benefits. Intel describes in a white paper how, hypothetically, the CPU would read a stream of uncompressed video and would process the special effects in real time while the processed video stream would then be stored onto a disk in this application. This problem can be particularly performance sensitive if the special effects have to be applied to a live video stream, Intel says. The time available to process each frame of video is finite and should be processed before the next frame arrives.

With threading, a few pieces of information are crucial to the success of the threaded version. If the special effects to be performed on each pixel of the video frame are complex, for example, then the function will meet the computationally intensive criteria to which HT applies. Depending on the size of the video frame, the processing of each frame can be divided into multiple parts and each can be concurrently processed using threads. This translates into what Intel calls a "data decomposition problem," which applies to the time allotted for each thread-processing task. In any threaded design, the first areas that are targeted comprise the most time processor consuming areas in the code. In the hypothetical video editing example, the application of special effects to the video frame is the most time consuming task, followed by the I/ O to read and write a frame. The main thread acts as a master thread and divides the current video frame into four parts in the setup phase as illustrated in Figure 7(a). Once the data has been set up, the master thread wakes up the three other threads and all four threads, including the master, operate on its unique section of the video frame. Once the threads are done processing their share of the data, they wait at a barrier for all threads to complete their sections of the frame. The master then suspends all of the worker threads and writes the processed frame to disk before reading the next available frame from the stream 7(b) and (c).

Measuring Performance Boosts

The HT special effects application hinges on a balancing act of not running the main video algorithm to full excess so the unused portion of the pipeline can process another thread. The special effects processing in the video editing application, for example, might account for 80% of the processing time and the I/ 0 the remaining 20% of the time, with the frame read accounting for one half the I/ O time and theprocessed frame writes the other half. Assuming perfect scaling for the threaded portion of the run, then the expected performance of this type of application can never be higher than five, Intel says. Using four threads, the performance can never be greater than 2.5, as the serial portion still accounts for 20% of the serial run and the parallel portion for 80% of the four. This is the upper limit for the scaling performance when four threads are used. In reality, due to system overheads introduced due to threading the application, however, the performance can be expected to be lower than 2.5.