This morning I received my copy of  GPU Pro book. I am currently reading and trying to understand NPR Effects Using the Geometry Shader by Pedro Hermosilla and Pere-Pau Vazquez. In Sillhouette Detection and Geometry Generation section, there's a piece of code that describes how to check if a triangle is front facing. This is actually simple stuff but I just want to extract some tips from it.

The code is like this:
// Calculate the triangle normal and view direction.
float3 normalTrian = getNormal( input[0].Pos.xyz, input[2].Pos.xyz, input[4].Pos.xyz );

float3 viewDirect = normalize( -input[0].Pos.xyz - input[2].Pos.xyz - input[4].Pos.xyz );

// If the triangle is frontfacing
if (dot(normalTrian, viewDirect) > 0.0f)
{
...
}
Initially, I was wondering how did they get viewDirect? I finally realized, that it's just the view vector. In order to get the view vector from the triangle to the camera position, we need to subtract the camera position with triangle position. Since the calculation is in view space, we can derive:
// In view space, so the position is in the origin 
float3 cameraPos = float3(0,0,0);

// We can assume the triangle position to be the average of the three vertices position
float3 trianglePos = (input[0].Pos.xyz + input[2].Pos.xyz + input[4].Pos.xyz) / 3;

// The view direct would be
float4 viewDirect = cameraPos - trianglePos;
viewDirect = normalize(viewDirect);

// Now work simplify above calculation and you will get
float3 viewDirect = normalize( -input[0].Pos.xyz - input[2].Pos.xyz - input[4].Pos.xyz );
Another thing to note is that: to determine if a triangle is front facing, we just need to know the sign. So actually we don't really need to normalize the viewDirect vector. For instance:
float3 viewDirect = -input[0].Pos.xyz - input[2].Pos.xyz - input[4].Pos.xyz;
I will share more tips as I'm reading this article.
Recently, I've been thinking about two (not necessarily) opposing programming paradigms: elegant programming vs simple programming. Let's start with a motivating problem.

Problems
We want to store a collection of shapes. Let's assume these shapes can be of type circle or rectangle. Design a class/structure to contain these shapes so that each shape can be updated. The update function is pressumably different for each shape.

Discussion
I don't know about you, but with my brain so ingrained with Object Oriented Programming; my initial reaction was "This is so easy! The problem description fits very nicely to OOP concept!". As most people know, OOP can be a very elegant solution to this problem.

Elegant Solution
Let see the typical OOP solution to this problem. The shape can be written as:
class Shape
{
public:
    virtual void Update() = 0;
};

class Circle : public Shape
{
private:
    float m_radius;
public:
    void Update() { ... }
};

class Rectangle : public Shape
{
private:
    float m_width;
    float m_height;
public:
    void Update() { ... }
}
The shape can be extended elegantly. If we have a new shape, we can just create a new class derived from class Shape. It is also easy to understand. Now, let's take a look the class that defines the collection of shapes.
class ShapeMgr
{
private:
    Shape* m_shapes[SHAPES_COUNT];
    int m_shapeCount;
public:
    void Update()
    {
        // This is very elegant, we don't need to care about 
        // the type of the shape.
        for (int i=0; i < m_shapeCount; ++i)
            m_shapes[i]->Update();
    }
};
Again the manager class can be written very elegantly. It is abstracted and any addition of new shape type won't change the manager's Update function.

A Closer look to OOP
Depending on what kind of application you are working on, the solution above may very well be perfect. However, there are several points I want to highlight:
  1. The manager class HOLDS a collection of POINTERS to Shape.
    In other words, the POINTERS are contiguous in memory but the ACTUAL INSTANCES are not. We are actually chasing the pointer to get the actual instance to Update it. There is a high chance we will get a lot of cache misses. With the current design of processor, this can increase latency.
  2. Virtual functions require V-Table
    Whenever we call Update() in the base class pointer, it actually tries to find the real function Update() in v-table. V-table actually increases the class size.
  3. (Arguably) harder to debug
    Imagine you have lots of shapes. The code with abstraction is actually harder to debug. You can't just easily inspect what's going on with the code.
Next...
OOP arguably focuses more on the interface and abstraction such that it sometimes sacrifices the data layout. Next, we will take a look at a more simplistic approach that focuses more on data. By focusing more on data, we can get more performance.
Once upon a time, a friend of mine gave some of us a challenge. It's not exactly the same but quite similar ;) Anyway, think of ways you can generate alternating 0 and 1. If you have an Update() loop, the first time you call update it will generate 0 and the next time will be 1, and then 0,1,0,1,.. you get the idea. I found 3 ways to do this (assuming initial value of i is 0).

Math + Bit
i = (i+1) & 1;      // You might think of this i = (i+1) % 2.
Simple Math
i = 1 - i;
Bits Operations
i = i ^ 1;
Can you come up with more ways?
We are always taught that when declaring constants in C++, it's preferred to use static const rather than macro. However, today I found that this rule does not really apply to me.

If you are a PS3 developer, you might want to DMA (read: copy) data in and out of SPU. When DMA-ing a class instance, the static variables are not copied because they actually reside outside the class data area. When I was debugging, I found out that all my static variables contain weird values. Subsequently, I realized: "Oh! It's the static variables >.<".

The solution is quite simple. Change all those static constants to macros!
// Don't use static const for this case.
//int MyClass::MY_CONSTANT = 100;
#define MY_CONSTANT 100
Depending on which platform and target hardware, it can be a good idea to eliminate branches in shader. Here's two techniques with samples.

Lerp
Lerp, a.k.a. linear interpolation, is a useful function to select between two things. If you have two vectors v1, v2 and you want to select one of them based on some condition, lerp can be used. Make sure that the result of the condition (conditionMask)  is always 0 or 1.  You can then do this:
result = lerp(v1, v2, conditionMask);
If your condition is 0, it will return v1. If your condition is 1, it will return v2.

Min/Max
Min and max is very useful in some cases. For example, let say you want to have one shader to switch between lit and not-lit. Typically, we will multiply the lighting value with color. For instance:
light = CalcLighting();
color *= light;
So, the condition would be, if there's no lighting return 1; otherwise return the lighting value. We can easily do this with Lerp.
light = lerp(1, CalcLighting(), isLit);
color *= light;
Try to convince yourself that above pseudo-code works. The question is can we do better? Yes!! Use max():
light = max(CalcLighting(), isNotLit);
color *= light;
There's some condition though. CalcLighting() should return a value between 0..1 (as it should be the case disregard hdr, intensity, etc2). Check by yourself what happens when isNotLit=0 and isNotLit=1. The advantage of this method is that it requires 1 less instruction/cycle.
Today, I am playing around with noise. The basic idea is quite simple: generate a noice with varying frequency and sum them up. Hopefully, it will create a good-looking noise texture.

So, I created several 256x256 arrays and fill them with random values [0..255]. Furthermore, the values are smoothed based on bilinear interpolation by taking samples with varying step length. The idea is actually taken from http://www.iquilezles.org/www/articles/dynclouds/dynclouds.htm.

These are the textures with varying frequency (high frequency to low frequency):



and here's the result of their summation:


Interesting! :)
There are many ways to read/parse XML with Python.  I found at least 2 methods: DOM and SAX. Document Object Model (DOM) is a cross-language API from W3C for accessing or modifying XML; whereas SAX stands for Simple API for XML.

Most of the time, we don't need to understand the whole XML vocabularies; and most of the time we want to parse simple stuff like:

I think the simplest way to go is to use python minidom implementation that looks like this:
from xml.dom import minidom

# parse the xml
theXml = minidom.parse('data.xml')

# iterate through the root
rootList = theXml.getElementsByTagName('root')
for root in rootList:
    # you can get the element name by: root.localName

    # iterate through person
    personList = root.getElementsByTagName('person')
    for person in personList:

        # get the attribute
        nameAttribute = person.attributes["name"]
        print nameAttribute.name
        print nameAttribute.value