I was in need to create a minimum bounding sphere for a frustum (truncated pyramid). The easiest way is to find the "center" of this pyramid. I got it by calculating the middle point of "the center of the near plane" and "the center of the far plane". The radius will be the length between this middle point and one of the vertices of the far plane. This works however, this is not an optimal bounding sphere for frustum.

I sat down and tried to figure out this problem. It turns out there is a simple way of doing this. Since the frustum is created with perspective projection in mind, this frustum is symmetrical. Furthermore, if we temporarily forget about the aspect ratio, the problem can be reduced into 2D problem: "Given an isosceles trapezoid, find the circumscribed circle".

I got the image from http://mathcentral.uregina.ca/QQ/database/QQ.09.09/h/abby1.html. The first thing to realize is that the center of the enclosing circle is the intersection of the bisecting lines of each side.

If the length of the parallel sides are a (short) and b (long) and the height is h, and we want to find the length of CD, we can construct two equations:
  1. AB^2 + BC^2 = AC^2
  2. CD^2 + DE^2 = CE^2
Since AC and CE and essentially the radius of the circle, they should be the same. So we are left with:
(a/2)^2 + (h - CD)^2 = (b/2)^2 + (CD)^2

Solve the equation and you will get

CD = ( h + (a - b)*(a + b)/(4*h) ) / 2

Voillaaaa :)

I will left the reader with the exercise of finding the minimum bounding sphere of a frustum. It should be straightforward.

I've just started playing StarCraft 2 yesterday with my friend. Multi player is definitely more fun than single player mode. I am still learning the tactics and strategies to win a game. Currently, I'm using Terran and starting to get familiar with the game. I'm quite excited now!
When you create a point light in Maya 2010, there isn't any option to fine-tune the point light decay rate. Maya provided several options for decay rate: None, Linear, Quadratic, Cubic. I believe they have some reason to do this but this is actually quite annoying for developing games.

In game development, we often want to control the decay rate's radius of the point light. I spent some time looking around the Internet for the solution. And, I've finally found it! There's a script in creativecrash.com called rampLight (http://www.creativecrash.com/maya/downloads/scripts-plugins/rendering/misc/c/ramplight) that basically does this. It creates a point light with a sphere manipulator attached. You can then just scale the light, it will scale the sphere radius and display correctly! Really useful script!

In the last post, I argued how OOP focuses on interface an abstraction. OOP can be very useful for flexibility and maintenance. This time round, let's focus on other kind of paradigm: Data Oriented Programming (DOP) to squeeze out more performance out of your code.

Data Layout
As I hinted in the last post, with deep pipelines and cache system of current processor design, it is more beneficial if we layout the data contiguously in memory. So, instead of having array of pointer to object, we will layout the data as array of objects. The problem with that is we have different types of objects and each object can have varying size. Thank God, C/C++ allow us to solve this problem with union!

Here's the code that defines the data layout:
class Shape
    enum ShapeType
    struct Circle
        float radius;
    struct Rectangle
        float width;
        float height;

    ShapeType m_type;
        Circle circle;
        Rectangle rect;
    } m_data;

    ShapeType GetType() const { return m_type; }

    void CreateCircle(float radius)
        m_type = CIRCLE;
        m_data.circle.radius = radius;

    void CreateRectangle(float width, float height)
        m_type = RECTANGLE;
        m_data.rect.width = width;
        m_data.rect.height = height;

    void Update()
        switch (GetType())
        case CIRCLE: ...
        case RECTANGLE: ...
If you pay attention, there are several things that we need to take care when transforming inheritance hierarchy into union:
  1. Instance Type
    Since we are basically flattening the hierarchy, we now have to embed the type of the instance in the class. This is simply done by creating type enums.
  2. Instance Data as Structs
    For each instance type, we have to declare the instance data as structs. This allow us to easily put the data together in union.
  3. Union for Flattening Instance Data
    This is C/C++ language feature that's useful for interpreting a block of memory as different types.
  4. Factory Methods
    We need to create at least one different factory methods for each instance type. You can think this as flattening the constructors.
  5. Switch Statement for Polymorphism MethodThe effect of flattening virtual functions is having a switch statement. We now have to manually choose the appropriate actions depending of the type of the instance.
Class Size
Another implication we have to pay attention is the size of the class. It is actually: sizeof(m_type) + sizeof(union). This is actually the downside of using union to flatten out inheritance hierarchy: i.e. more space required. The more varied each type's size, the more space we are wasting. For example, if we have a type that only have 1 float and another type that has 10 floats, we can potentially waste 9 floats for each instance.

From one side, it seems these conflicting paradigm is again the battle between space vs time. The most important thing is actually understanding the hardware. By making sure the objects are contiguous in memory, we can take advantage of modern CPU cache system. Some people may argue that the programming paradigm described above is not elegant; just look how we have to have "type" and "switch" statements and adding a new type looks painful (because we have to change several stuffs). However, if you really want to squeeze out performance, you might consider the above technique. The choice depends on the problem you are tackling. And again, the choice is in YOUR hand!
This morning I received my copy of  GPU Pro book. I am currently reading and trying to understand NPR Effects Using the Geometry Shader by Pedro Hermosilla and Pere-Pau Vazquez. In Sillhouette Detection and Geometry Generation section, there's a piece of code that describes how to check if a triangle is front facing. This is actually simple stuff but I just want to extract some tips from it.

The code is like this:
// Calculate the triangle normal and view direction.
float3 normalTrian = getNormal( input[0].Pos.xyz, input[2].Pos.xyz, input[4].Pos.xyz );

float3 viewDirect = normalize( -input[0].Pos.xyz - input[2].Pos.xyz - input[4].Pos.xyz );

// If the triangle is frontfacing
if (dot(normalTrian, viewDirect) > 0.0f)
Initially, I was wondering how did they get viewDirect? I finally realized, that it's just the view vector. In order to get the view vector from the triangle to the camera position, we need to subtract the camera position with triangle position. Since the calculation is in view space, we can derive:
// In view space, so the position is in the origin 
float3 cameraPos = float3(0,0,0);

// We can assume the triangle position to be the average of the three vertices position
float3 trianglePos = (input[0].Pos.xyz + input[2].Pos.xyz + input[4].Pos.xyz) / 3;

// The view direct would be
float4 viewDirect = cameraPos - trianglePos;
viewDirect = normalize(viewDirect);

// Now work simplify above calculation and you will get
float3 viewDirect = normalize( -input[0].Pos.xyz - input[2].Pos.xyz - input[4].Pos.xyz );
Another thing to note is that: to determine if a triangle is front facing, we just need to know the sign. So actually we don't really need to normalize the viewDirect vector. For instance:
float3 viewDirect = -input[0].Pos.xyz - input[2].Pos.xyz - input[4].Pos.xyz;
I will share more tips as I'm reading this article.
Recently, I've been thinking about two (not necessarily) opposing programming paradigms: elegant programming vs simple programming. Let's start with a motivating problem.

We want to store a collection of shapes. Let's assume these shapes can be of type circle or rectangle. Design a class/structure to contain these shapes so that each shape can be updated. The update function is pressumably different for each shape.

I don't know about you, but with my brain so ingrained with Object Oriented Programming; my initial reaction was "This is so easy! The problem description fits very nicely to OOP concept!". As most people know, OOP can be a very elegant solution to this problem.

Elegant Solution
Let see the typical OOP solution to this problem. The shape can be written as:
class Shape
    virtual void Update() = 0;

class Circle : public Shape
    float m_radius;
    void Update() { ... }

class Rectangle : public Shape
    float m_width;
    float m_height;
    void Update() { ... }
The shape can be extended elegantly. If we have a new shape, we can just create a new class derived from class Shape. It is also easy to understand. Now, let's take a look the class that defines the collection of shapes.
class ShapeMgr
    Shape* m_shapes[SHAPES_COUNT];
    int m_shapeCount;
    void Update()
        // This is very elegant, we don't need to care about 
        // the type of the shape.
        for (int i=0; i < m_shapeCount; ++i)
Again the manager class can be written very elegantly. It is abstracted and any addition of new shape type won't change the manager's Update function.

A Closer look to OOP
Depending on what kind of application you are working on, the solution above may very well be perfect. However, there are several points I want to highlight:
  1. The manager class HOLDS a collection of POINTERS to Shape.
    In other words, the POINTERS are contiguous in memory but the ACTUAL INSTANCES are not. We are actually chasing the pointer to get the actual instance to Update it. There is a high chance we will get a lot of cache misses. With the current design of processor, this can increase latency.
  2. Virtual functions require V-Table
    Whenever we call Update() in the base class pointer, it actually tries to find the real function Update() in v-table. V-table actually increases the class size.
  3. (Arguably) harder to debug
    Imagine you have lots of shapes. The code with abstraction is actually harder to debug. You can't just easily inspect what's going on with the code.
OOP arguably focuses more on the interface and abstraction such that it sometimes sacrifices the data layout. Next, we will take a look at a more simplistic approach that focuses more on data. By focusing more on data, we can get more performance.
Once upon a time, a friend of mine gave some of us a challenge. It's not exactly the same but quite similar ;) Anyway, think of ways you can generate alternating 0 and 1. If you have an Update() loop, the first time you call update it will generate 0 and the next time will be 1, and then 0,1,0,1,.. you get the idea. I found 3 ways to do this (assuming initial value of i is 0).

Math + Bit
i = (i+1) & 1;      // You might think of this i = (i+1) % 2.
Simple Math
i = 1 - i;
Bits Operations
i = i ^ 1;
Can you come up with more ways?
We are always taught that when declaring constants in C++, it's preferred to use static const rather than macro. However, today I found that this rule does not really apply to me.

If you are a PS3 developer, you might want to DMA (read: copy) data in and out of SPU. When DMA-ing a class instance, the static variables are not copied because they actually reside outside the class data area. When I was debugging, I found out that all my static variables contain weird values. Subsequently, I realized: "Oh! It's the static variables >.<".

The solution is quite simple. Change all those static constants to macros!
// Don't use static const for this case.
//int MyClass::MY_CONSTANT = 100;
#define MY_CONSTANT 100
Depending on which platform and target hardware, it can be a good idea to eliminate branches in shader. Here's two techniques with samples.

Lerp, a.k.a. linear interpolation, is a useful function to select between two things. If you have two vectors v1, v2 and you want to select one of them based on some condition, lerp can be used. Make sure that the result of the condition (conditionMask)  is always 0 or 1.  You can then do this:
result = lerp(v1, v2, conditionMask);
If your condition is 0, it will return v1. If your condition is 1, it will return v2.

Min and max is very useful in some cases. For example, let say you want to have one shader to switch between lit and not-lit. Typically, we will multiply the lighting value with color. For instance:
light = CalcLighting();
color *= light;
So, the condition would be, if there's no lighting return 1; otherwise return the lighting value. We can easily do this with Lerp.
light = lerp(1, CalcLighting(), isLit);
color *= light;
Try to convince yourself that above pseudo-code works. The question is can we do better? Yes!! Use max():
light = max(CalcLighting(), isNotLit);
color *= light;
There's some condition though. CalcLighting() should return a value between 0..1 (as it should be the case disregard hdr, intensity, etc2). Check by yourself what happens when isNotLit=0 and isNotLit=1. The advantage of this method is that it requires 1 less instruction/cycle.
Today, I am playing around with noise. The basic idea is quite simple: generate a noice with varying frequency and sum them up. Hopefully, it will create a good-looking noise texture.

So, I created several 256x256 arrays and fill them with random values [0..255]. Furthermore, the values are smoothed based on bilinear interpolation by taking samples with varying step length. The idea is actually taken from http://www.iquilezles.org/www/articles/dynclouds/dynclouds.htm.

These are the textures with varying frequency (high frequency to low frequency):

and here's the result of their summation:

Interesting! :)
There are many ways to read/parse XML with Python.  I found at least 2 methods: DOM and SAX. Document Object Model (DOM) is a cross-language API from W3C for accessing or modifying XML; whereas SAX stands for Simple API for XML.

Most of the time, we don't need to understand the whole XML vocabularies; and most of the time we want to parse simple stuff like:

I think the simplest way to go is to use python minidom implementation that looks like this:
from xml.dom import minidom

# parse the xml
theXml = minidom.parse('data.xml')

# iterate through the root
rootList = theXml.getElementsByTagName('root')
for root in rootList:
    # you can get the element name by: root.localName

    # iterate through person
    personList = root.getElementsByTagName('person')
    for person in personList:

        # get the attribute
        nameAttribute = person.attributes["name"]
        print nameAttribute.name
        print nameAttribute.value  
I used MEL for our existing Maya pipeline and have decided to slowly move to Python. Python is more powerful than MEL and I think it's more organized. It also has a lot of useful standard libraries. I don't see any reason to use MEL anymore.

Anyway, there are several things that I did to setup Python for Maya:
  1. Decide where to put all my python scripts
    I don't want to put it in Maya's default python scripts directory in order to consolidate all the engine stuff in one place. Subsequently, I added the path to PYTHONPATH environment variable. This ensures that Maya can import my modules.
  2. Python Initialization
    There are some stuff that you need to do when Maya first loaded. In order to do that, I created UserScript.py in my python scripts folder. Maya will automatically execute UserScript.py in PYTHONPATH.
  3. Add Shelf Button to Execute Python Script
    In order to "source" Python script conveniently, you can check out the following link http://www.rtrowbridge.com/blog/2008/11/maya-python-import-scripts/. You can then just do this to source a python script (in PYTHONPATH) python("pySource('')");
Another useful link to understand Maya Path: http://david.buttress.me.uk/blog/2009/05/maya-paths/

In order to debug Maya Environment Variable, you can use a script found here: http://david.buttress.me.uk/blog/2009/05/envbrowser-environment-variable-browser-mel/

Finally, I can now click a shelf button and it will execute a python script! It took me sometime to figure out the steps above. But now, I can start working with python :)
Missing the flexibility of C/C++ way in interpreting bits? In C/C++, you can reinterpret the bits by simply casting it. For example if you have an int and you want to reinterpret it as float, you can do the following:
int   i = 1234;
float f = *( (float*) &i );
The question is how can we do this in C#? It turns out there's a class in C# to do this. Meet BitConverter class.
int   i = 1234;
float f = BitConverter.ToSingle( BitConverter.GetBytes(i), 0 );
With this class, you can reinterpret bits from primitive data types. Which should be enough. The only problem is that the class is not suitable to be called every frames. Why? Using Reflector.NET, you can easily tell that BitConverter.GetBytes() is actually allocating memory!
public static unsafe byte[] GetBytes(int value)
    byte[] buffer = new byte[4];
    fixed (byte* numRef = buffer)
        *((int*) numRef) = value;
    return buffer;
One solution is just to keep in mind not to call this every frame. If you are willing to compile your C# project with /unsafe option, you can actually do this:
public static unsafe float Int32BitsToSingle(int value)
    return *(((float*)&value));
The solution above (and including the C/C++ solution) is actually not a cross-platform solution. If you are writing XNA game for Xbox 360, you might already know that Xbox360 uses big endian and Intel PC uses little endian. You might actually screw things up!

I think the best solution is to write a class that mimic GetBytes() in BitConverter, but make sure we don't allocate memory. Here's my solution to the problem:
using System;

namespace CastTest
    public static class Cast
        private static byte[] tempBuffer = new byte[8];

        public static unsafe void GetBytesTo(int value, byte[] buffer)
            fixed (byte* numRef = buffer)
                *((int*)numRef) = value;

        public static float ReinterpretToSingle(int value)
            GetBytesTo(value, tempBuffer);
            return BitConverter.ToSingle(tempBuffer, 0);
I am very excited to find a new baby in Visual Studio 2008! It is called T4: Text Template Transformation Toolkit. Before I start talking about T4, let me share the background story.

Basically, I was trying to find a way to enable C/C++ preprocessor to C#. It is possible to use "cl /E" command line to pre-process C/C++ file. I thought C# is close enough to C/C++, so surely I can somehow utilize C/C++ preprocessor.

There are several problems to this:
  • Integration to Visual Studio
    I wanted the process work seamlessly with Visual Studio. User should just add preprocessor directives to their C# files without worrying about anything else or doing additional steps. I thought that I could somehow use custom task/hack/hijack the msbuild system  to make this work. However, I have limited knowledge of msbuild system and I was unable to find an easy way to work around this.
  • C# Directives Incompatibility
    C# has #region/#endregion directives that's incompatible with C/C++ preprocessor.
I was about to give up when I found T4!

Text Template Transformation Toolkit, a.k.a T4, is a code generation utility that's built right in Visual Studio! I believe it's available starting from Visual Studio 2008. I saw somewhere on the net that it can be made to work with VS2005. Note that it's not available within the Express Editions of Visual Studio.

I am still learning about T4. However, it enables me to use C# to generate some other code and it integrates with Visual Studio IDE seamlessly. Basically, the way it works is that you just add a new file with "tt" extension (it will prompt you about some warning). I believe if you are in C# by default the "tt" file will generate "cs" file. You can specify within the tt file to generate other extension if you like want.

So excited~~!! This can be potentially used for generating shader files/shader permutation.

Adding T4 file to Visual Studio:

TT file:

The generated CS file:

Believe it or not, I was an actor before :P. Although, I should admit, I wasn't a real actor. When I was in college, I needed to create special effects video and it happened that I became the main actor. Anyway, enjoy the video:

We used Maya + Real Flow for create the sandman. Very nostalgic!
Like everyone else, I'm working on Screen Space Ambient Occlusion. Initially, I implemented Crytek's version of SSAO which is listed in ShaderX7. However, there's a lightening near the edges and it just bugs me. So, I went on experimenting Starcraft 2 SSAO technique by considering normal + depth.

I think the result looks better although it is very noisy. When the scene is moving, the AO is shimmering which is not good. In addition, since I render the AO in a half resolution buffer, I got some problem near the edge when combining AO with the real scene. Anyway, check out my current SSAO implementation:

Definitely, there are a lot of things to fix...