Удивительно, до чего ж быстро развивается этот сектор ИТ. Еще 5 лет назад единицы знали, что такое "шейдеры", игры писались с минимальным использованием оных, а о таких страшных технологиях, как HDR и Tessellation вообще мало кто слышал. Сейчас же даже самая захудалая игрушка перенасыщена шейдерами, вовсю используется HDR и уже на горизонте маячит Tessellation. Ну, это так, маленькое лирическое вступление к статье о DX11
__________________________________________________________________________________
DirectX 11: Sooner than You Think
(...)
Building on DirectX 10
DirectX 10 only works with Windows Vista, and offered no backward compatibility with Windows XP. Microsoft cites the fundamental change of display driver model in Vista as a requirement for DX10. DirectX 11 will also work with Vista, but is also targeted to work with Windows 7, the working title for Microsoft's next operating system release. (Windows 7 is expected to use a fundamentally similar display driver model to Vista, if not exactly the same.)
Let's take a look at the DirectX 11 pipeline.
There are three new stages in the standard graphics pipeline: the Hull Shader, Tessellator, and Domain Shader. In addition, changes have been made to the pixel shader to enable compute shaders (for general purpose applications). We'll touch on those shortly.
In addition to the new pipeline stages, DirectX is being tweaked to fully support multithreading. So DirectX 11 DLLs will spawn threads as appropriate on multicore and SMT-enabled CPUs.
Another key new feature are several new texture compression formats, which enable better image quality, and will support high dynamic range. Again, we'll touch on this in more detail a bit later.
A host of lesser features are also being implemented; most don't require new hardware. They include upping the resource limit to 2GB, increasing texture limits to 16K and support for double-precision floating point (this last one is optional, and is aimed at compute shaders).
Now let's drill down on some of the features.
Hardware Tessellation
One of the key goals for DirectX 11 is to enable more robust character authoring, while reducing the time to create complex and realistic characters. The trend has been to build characters with dense triangle meshes, then reduce the complexity depending on the target platform.
This creates a problem: The end result doesn't really jibe with the artist's conception.
Artists and game designers have been pushing for characters with denser triangle meshes, which enable more detailed characters. Animation complexity is also increasing. The net result is fewer pointy heads and moonwalking characters.
More detailed characters with increasingly complex animation eats into memory and storage requirements. This results in bandwidth issues—load times increase, and memory demands on graphics cards go up.
The answer is to use the power of the GPU to generate this additional complexity—hardware tessellation. Industry watchers were a little disappointed that hardware tessellation didn't make it into DX10, but it will be fully implemented in DX11. Note that this is the one feature that absolutely requires DirectX 11 hardware. When Gee was asked if the hardware tessellator currently built into AMD Radeon HD series GPUs would support DX11 tessellation, the answer was "No."
Gee went on to explain that DX11 tessellation was more robust and general than the solution built into current AMD GPUs. The AMD hardware uses essentially the same as the tessellation unit in the Xbox 360; DX11 tessellation is a superset of the AMD approach.
The hull shader takes control points for a patch as an input. Note that this is the first appearance of patch-based data used in DirectX. The output of the hull shader essentially tells the tessellator stage how much to tessellate. The tessellator itself is a fixed function unit, taking the outputs from the hull shader and generating the added geometry. The domain shader calculates the vertex positions from the tessellation data, which is passed to the geometry shader.
It's important to recognize that the key primitive used in the tessellator is no longer a triangle: It's a patch. A patch represents a curve or region, and can be represented by a triangle, but the more common representation is a quad, used in many 3D authoring applications.
What all this means is that fully compliant DirectX 11 hardware can procedurally generate complex geometry out of relatively sparse data sets, improving bandwidth and storage requirements. This also affects animation, as changes in the control points of the patch can affect the final output in each frame.
The cool thing about hardware tessellation is that it's scalable. It's possible that low end hardware would simply generate less complex models than high-end hardware, while the actual data fed into the GPUs remains the same.
Compute Shader
Nvidia and AMD have been pushing GP-GPU for several years now; the 8 series GPUs actually had hardware in place to better enable Nvidia GPUs to act as general purpose compute engines; the latest 200 series GPUs expands on that.
Companies are sitting up and taking notice of the performance gains possible in certain classes of applications when using the highly parallel computer engines that are part of a modern GPU. Apple is working with the Khronos Group on OpenCL, a standards-based method for general purpose GPU computing, modeled on OpenGL. AMD's Stream SDK enables GP-GPU support for Radeon HD series hardware, across multiple operating systems. Nvidia is probably the furthest along, with its CUDA technology; a host of applications using CUDA is starting to emerge.
DirectX 11 weighs in with compute shaders. The compute shader uses the resources of the GPU to perform post-processing chores, such as blur effects. This required adding syntax and constructs to the DirectX HLSL (high level shading language). The graphics pipeline can now generate data structures that are better suited to general purpose applications, which then can be operated on by the compute shader.
Note that the diagram doesn't imply that the compute shader is somehow part of the pixel shader. Rather, it's a shader that can take output from the graphics pipeline, after that data has passed through the pixel shader.
It's great that Microsoft is implementing compute shaders in DirectX. Once DX11 ships, GPU programmers will have a full array of tools to support general purpose applications on the GPU:
* CUDA on Windows, MacOS, and Linux on Nvidia GPUs and Intel (and presumably AMD) CPUs
* Stream SDK for AMD GPUs and CPUs on Windows and Linux
* OpenCL on MacOS (and possibly other OSes) on both Nvidia and AMD
* DirectX 11 compute shaders on both Nvidia and ATI GPUs and, presumably, Intel and AMD CPUs in Windows.
Windows and Mac OS programmers in particular won't have to choose which hardware they'll run on; the respective GP-GPU API will generalize support. It's likely that OpenCL will also show up on open source platforms (BSD and Linux) as well. At that point, the future for CUDA and Stream SDK may be limited to vertical applications requiring "closer to the metal" performance.
Multithreading and Dynamic Shader Linkage
Multithreading is a hot topic. Today, dual core CPUs are mainstream, and if Intel's announcement of the Q8300 quad core CPU, a future where four cores become mainstream isn't that far off.
Both AMD and Nvidia have built better multithreading support into their respective graphics drivers, but the majority of DirectX is still single threaded. Microsoft will rectify this shortfall in DX11, and those benefits will even accrue to applications running on DirectX 10 hardware.
Multithreading support will include asynchronous resource loading, which can actually happen while rendering threads are executing. Draw and state submission will also be threaded, which will allow rendering work to be spread out across multiple threads.
To facilitate all this, DirectX 11 devices are split into device, immediate context, and deferred context interfaces. The immediate context is the current device for state and drawing, while the deferred context is the per-thread device contexts for future renders. Each device interface can spawn thread resources as needed. The deferred context has support for a type of display list per object. Note that the rendering is actually deferred—this is not the same as drawing to a back buffer and flipping. Rather, each deferred context holds the display list (draw calls) ready for rendering when appropriate.
Dynamic Shader Linkage
Shader linkage is just another step along the way to make DirectX a more flexible and general purpose compute environment. Today, if multiple shaders need to be invoked, a large "uber shader" is created. This contains all the conditional statements needed to invoke whichever individual shader may be needed for a particular situation.
The problem is that this can create huge, unwieldy shaders that are difficult to debug. They also make less efficient usage of available hardware resources.
Microsoft's solution is to introduce object oriented features to the HLSL—interfaces and classes. This lets graphics programmers create shaders that behave like subroutines that are only loaded when needed.
Improved Texture Compression and Hardware Support
Today's DirectX texture compression is showing its age. When multiple textures are decompressed and displayed, the results are often blocky looking textures, even when the textures themselves are high resolution. On top of that, there's no support for compression of high dynamic range textures.
DirectX 11 introduces two new texture formats, BC6 (sometimes called BC6H) and BC7. BC6 supports HDR textures with 6:1 lossy compression (16 bits per channel.) This allows for high visual quality, but it's not lossless.
BC7 works with LDR (low dynamic range) formats, and can include alpha. It offers 3:1 compression for RGB or 4:1 for RGB + alpha. Visual quality should be very high with this format.
Microsoft will now require that DX11 hardware decompress textures in such a way to be completely accurate with the DX11 spec. Currently, there's some room for "interpretation" in the way that DX10 and below hardware handles texture decompression.
The block types are designed to offer smoother gradients and much less blocky results.
Support for DirectX 10 Hardware
Quite a few features—with the exception of hardware tessellation—will be supported on DX10 hardware. Of course, DX10 hardware will continue to run games and apps in DX10 mode. But unlike DX10, which only runs on DX10-compliant hardware, elements of DX11-specific features will also run on DX10 hardware.
Multithreading will work, although deferred contexts will have to be implemented at the API (software) level rather than in the hardware. The object oriented features added should also work, though how efficiently is anyone's guess. The new texture compression formats could be implemented at the driver level, though that would be slower than dedicated hardware.
Взято из: http://www.extremetech.com/article2/0,2 ... 314,00.asp
Русский перевод (Павел Михайлов): http://www.winline.ru/articles/4151.php