The next step in the evolution of GPUs was the Geforce3 released in 2001. It was the first chip to introduce custom programmable features into the otherwise common fixed graphics pipeline. The concept of vertex and pixel shaders was born. Direct3D8 was the first graphics API that had mandatory customisable parts in an otherwise fixed pipeline. Even though the feature set of pixel and vertex shaders at that time was very limited it still introduced the basis of a common paradigm until today. Both vertex and pixel inputs can be seen as a single stream where an equal number of vertices or pixels are coming into the processing unit as they are coming out. Vertices and pixels can not be duplicated, replicated or added during processing but they can be destroyed under certain circumstances and costs. Each of these vertices and pixels can in theory be processed independently from each other but they all go through the same executing programming code, similar to the behaviour of a traditional stream processor. Pixel and vertex programmes in Direct3D8 were written in sort of pseudo-assembly language that had specific vector (4D), scalar and texture load operations. But there were neither jumps, labels, loops or branches in the instruction set which made it a really easy, non interruptible execution model allowing maximum throughput.
The ATI Radeon 8500 added further extensions to these two new programmable units that were introduced in Direct3D 8.1. This added Pixel shader Version 1.1, 1.2, 1.3 and 1.4 and Vertex shader Version 1.1 to the API which basically changed the assembly language slightly and added new instructions, allowed longer programmes and more texture operations.
Another landmark in terms of programmability of GPUs marked the release of the Direct3D 9 and later Direct3D 9C. This extended the pixel and vertex programs even further to Pixel shader version 2.0, 2+, 3.0 and Vertex shader version 2.0, 2+ and 3.0. Additionally, to the assembly nature of previous vertex and pixel programmes it added a high level C-Style/Renderman like language called HLSL which allowed to write vertex and pixel programs in a more abstract, readable and reusable fashion.
To achieve this it added the concept of code compilation to the world of GPUs similar to the CPU decades ago but unfortunately without the maturity seen so far on CPUs. During that time OpenGL with the decline and exit of SGI as driving force more or less played catch-up with the significant and radical changes in the industry and it struggled for quite a long time with vendor-specific extensions and solutions until finally with OpenGL 1.5 and then later OpenGL 2.0 a common high level shading language for vertex and pixel programs had been introduced called GLSL.
In 2006 Microsoft again led the way for more radical changes in the world of graphics API with the release of Direct3D 10 API, only available through their new operating systems Windows Vista. It tried to force to combine the zoo of different shader version and capabilities of GPUs into one single entity by forcing the vendor to go for a unified shading core. This means that both vertex and pixel processing can be handled by one single programmable unit instead of having separate programmable units for both the vertex and pixel pipeline as it used to be the case in Direct3D 9. This has the advantage that depending on wherever the highest workload is required in your graphical application more of the total processing power can be automatically dedicated too, therefore theoretically removing any processing bottleneck. Of course this unification is only on paper it leaves it to the vendors to implement it adequately. Additionally to the unified shading model Direct3D 10 added another programmable stage called the geometry shader (see Diagram) to the pipeline and also a new virtual memory model.
Meanwhile the two major hardware vendors on the market NVIDIA and AMD/ATI introduced their own APIs (CUDA from NVIDIA and CAL/CTM and now StreamSDK from AMD/ATI) to breakout of the black box and abstraction layer model of GPU programmability that graphics API provide to give the programmer a finer grained access to their underlying internal hardware functionality. By hoping to tap into the growing GPGPU (general purpose GPU) / HPC market, these allow the programmer to take more advantage of the underlying compute power that modern GPUs provide while paying with the price of reduced portability and code complexity.