GPU has become a general purpose processor! or at least becoming more and more general. This is proved by the existence of GPGPU APIs such as DirectCompute, CUDA, OpenCL. It's time to start learning Compute Shader (CS), in this case, DirectCompute from D3D11.

Past GPGPU Coders...
Believe it or not GPGPU actually has existed before Compute Shaders arrived. However, you need to structure everything in terms of graphics, i.e. in order to launch GPGPU computation you have to render geometry and you basically use Pixel Shaders to do the computation.

While this style of GPGPU coding can still work today, we can do much better! Compute Shaders allow us to use GPU just like we program a regular code. The first benefit is that you don't need to care about graphics pipeline and such, you just need to dispatch your Compute Shaders and that's it. In addition, Compute Shaders bypass graphics pipeline, i.e. primitive assembly, rasterization, etc2; so you have the potential to run faster than running GPGPU with Pixel Shaders.. or at least in theory.

Setting Up Simple Framework
In order to start learning Compute Shaders, we need a framework, a simple one that allow us to focus on doing Compute Shaders and learn the performance characteristics. A good place to start is BasicCompute11 from DirectX SDK.

I'd start from that sample. However, we need a little bit more. We need to upgrade to VS2012+ so that we can potentially use VSGD (Visual Studio Graphics Debugger) to profile our application. In addition, since we want to learn the performance characteristic of Compute Shaders, we need to be able to time it. There are couple references on how to do this:
  1. Nathan Reed: GPU Profiling 101 - http://www.reedbeta.com/blog/2011/10/12/gpu-profiling-101/
  2. MJP: Profiling in DX11 with Queries - http://mynameismjp.wordpress.com/2011/10/13/profiling-in-dx11-with-queries/
  3. OpenVIDIA: Events: Basic Profiling and Synchronization - http://openvidia.sourceforge.net/index.php/DirectCompute#Events:_Basic_Profiling_.26_Synchronization
I prefer doing it via D3D11 queries, specifically D3D11_QUERY_TIMESTAMP_DISJOINT and D3D11_QUERY_TIMESTAMP. However, don't forget to wait for the data to be available when calculating the elapsed time of compute shader. Basically, here's how I profile the compute shaders:
void RunComputeShader(...)
{
    pContext->Begin(pQueryDisjoint);

    // Do some CS init, i.e. setting shader, resource, constant buffer

    pContext->End(pQueryBeginCS);
    pContext->Dispatch( x, y, z);
    pContext->End(pQueryEndCS);

    // Do some CS unit

    pContext->End(pQueryDisjoint);


    //********************************************************************
    // Collect time stamps
    //********************************************************************

    // Wait for data to become available
    D3D11_QUERY_DATA_TIMESTAMP_DISJOINT tsDisjoint;
    while (pContext->GetData(g_pQueryDisjoint, &tsDisjoint, sizeof(tsDisjoint), 0) == S_FALSE) {}
    if (tsDisjoint.Disjoint)
        return;

    UINT64 beginCSTimeStamp;
    UINT64 endCSTimeStamp;

    while (pContext->GetData(g_pQueryBeginCS, &beginCSTimeStamp, sizeof(UINT64), 0) == S_FALSE) {}
    while (pContext->GetData(g_pQueryEndCS, &endCSTimeStamp, sizeof(UINT64), 0) == S_FALSE) {}

    // Convert to real time
    float computeShaderElapsed = float(endCSTimeStamp - beginCSTimeStamp) / float(tsDisjoint.Frequency) * 1000.0f;
    printf("Compute shader done in %f ms\n", computeShaderElapsed);
}
For completeness, here's how I create and destroy the queries:
    // create
    D3D11_QUERY_DESC queryDisjointDesc;
    queryDisjointDesc.Query     = D3D11_QUERY_TIMESTAMP_DISJOINT;
    queryDisjointDesc.MiscFlags = 0;

    if (FAILED(g_pDevice->CreateQuery(&queryDisjointDesc, &g_pQueryDisjoint)))
    {
        printf("Could not create timestamp disjoint query!");
        exit(-1);
    }


    D3D11_QUERY_DESC queryDesc;
    queryDesc.Query     = D3D11_QUERY_TIMESTAMP;
    queryDesc.MiscFlags = 0;

    if (FAILED(g_pDevice->CreateQuery(&queryDesc, &g_pQueryBeginCS)))
    {
        printf("Could not create start-frame timestamp query");
        exit(-1);
    }

    if (FAILED(g_pDevice->CreateQuery(&queryDesc, &g_pQueryEndCS)))
    {
        printf("Could not create start-frame timestamp query");
        exit(-1);
    }
    // destroy
    SAFE_RELEASE( g_pQueryDisjoint );
    SAFE_RELEASE( g_pQueryBeginCS );
    SAFE_RELEASE( g_pQueryEndCS );    

That will allow us to start plunging into the world of Compute Shaders!
Just wanted to post my own version of XCode keyboard shortcut.

XCode Editor:

  • Command + Shift + B - Build
  • Command + Control + J - Jump to definition
  • Command + Shift + O - Open file...
  • Command + Ctrl + Up/Down - Switch between header/implementation file
  • Command + Ctrl + Left/Right - Go back/forward on opened files
  • Command + ] - Indent multiple lines
  • Command + [ - Unindent multiple lines
  • Command + / - Comment/uncomment multiple lines
Collection of useful data structures

HashMap / HashSet

This page will discuss/contains links about next-generation rendering topics. When mentioning about next-generation, it's helpful to be more specific. What I meant by next-generation is PS4/Xbone generation.

Update: This is becoming my links dumping ground...

Linear Space Lighting

HDR Rendering / Tonemapping / Color Management

Physically Based Rendering

Sparse Voxel Octree

Global Illumination/Area Lights

Order Independent Transparency

Order Independent Transparency (OIT) is a rendering technique that doesn't require rendering geometry in sorted order.