Instead of traditional disk-based queries and an approach that slows performance via memory latencies and processors waiting for data to be fetched from the memory, IBM envisions in-GPU-memory tables as technology that could, in addition to disk tables, significantly accelerate database processing. According to a patent filed by the company, "GPU enabled programs are well suited to problems that involve data-parallel computations where the same program is executed on different data with high arithmetic intensity."
Surprisingly, the patent does describe open high-level software architectures, such as OpenCL, in this patent filing, but mentions Nvidia's Compute Unified Device Architecture (CUDA) as only specific example to run GPU-accelerated databases: "CUDA toolkit exposes the underlying target hardware via a C-like API. CUDA is designed to increase performance in many data parallel applications by supporting execution of thousands of threads in parallel. As such, a compiled CUDA program may scale with the advances in hardware. There are at least two ways a GPU-enabled database may be implemented--a) in one embodiment, a full-fledged database system may be enabled, and b) a scratch pad for accelerating other database queries may be enabled, that is, a GPU-memory database for just executing database queries may be enabled."
However, IBM is not surprisingly trying to protect its patent, if granted, in other programming languages as in the key areas of:
- starting a GPU kernel
- hash partitioning the database relations by the GPU kernel
- loading the partitioned database relations into the GPU memory
- loading keyed partitions corresponding the hash partitioned database relations into the GPU memory
- building a hash table for a smaller of the hash partitioned database relations
- executing the query.
According to the patent applications, using GPU acceleration for databases "may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages." To cover all of its bases, IBM also states that the "program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server."