Managing Meshes and Vertex Attributes

Mesh data often requires to have more than one normal for the same vertex. Imagine a cube: It has eight different vertices, six unique normals (one for each face) and only four unique texcoords. Even so, each vertex needs to be rendered with three different normals, depending on which cube face it is considered to be part of.


Wrong: Lighting with one normal per vertex, interpolated from adjacent face normals.	Correct: Each vertex is rendered with three different normals, depending on which face is rendered.	Another fine mess, this time with texcoords: Each vertex only has one texcocord. BUT: The vertices on the zero-median need two coordinates - 0.0 and 1.0.

In ancient days when rendering in immediate mode, this was not an issue. You'd simply design your rendering loops to just provide one normal per face instead of per vertex. But with modern OpenGL this is a no-go if you want to see at least some performance.

The way to go with modern OpenGL is to cache as many data in objects residing in GPU memory as possible. Typically, vertex buffer objects (VBOs) are used to keep all kinds of vertex attributes, like coords, normals, colors, in GPU memory. Index buffer objects (IBOs) allow to do the same for index data, referring to the arrays of attribute vectors.
Unfortunately, the associated OpenGL render calls for indexed data, like glDrawElements(), only support one index common to all attributes. This means that a given index always refers to the same combination of coordinate, normal, texture coordinate, etc. If you need to have different normals for the same vertex, you need to duplicate the coordinate. For the cube example, this boils down to render 24 vertices with 24 normals and 24 texture coordinates, not using indices at all.

Handling multiple indices

Much like OpenInventor, tinySG allows to maintain a different index for each of the (up to) 16 vertex attribute arrays, so each vertex can have multiple normals, colors or texture coordinates. This strategy saves a lot of main memory and is especially handy for mesh manipulations, because you manipulate one vertex instance instead of several copies (just think of dragging one single vertex instead of three identical copies needed to render three different normals at that location...).

However, when uploading mesh data to GPU memory, tinySG "unrolls" all indices and expands all attributes into an index-less, flat VBO to please OpenGL as mentioned above. This of cause means that the GPU memory footprint is in fact higher than in main memory. A cube is then again made of 24 vertices, normals and texture coordinates.
Doing so also allows to organise the data in an interleaved layout when submitting it to the driver. This layout renders faster than arrays storing attributes sequentially (all coordinates before all normals before...). As soon as the application applies changes to the attribute or index data, the VBO is rebuilt and updated.

  #Inventor V2.0 ascii
  Coordinate3 {
      point   [ 
      -1.0 -1.0 -1.0, 1.0 -1.0 -1.0,  1.0 -1.0 1.0,  
      -1.0 -1.0 1.0, -1.0 1.0 -1.0,  1.0 1.0 -1.0,  
      1.0 1.0 1.0, -1.0 1.0 1.0
      ]
    }
    Normal {
      vector   [
      0.0 -1.0 0.0, -0.0 1.0 0.0, 0.0 0.0 1.0,
      0.0 0.0 -1.0, -1.0 0.0 0.0, 1.0 0.0 0.0
      ]
    }
    NormalBinding {
      value PER_FACE_INDEXED
    }
    IndexedFaceSet {
      coordIndex [  
           0, 1, 2, 3, -1,  7, 6, 5, 4, -1,
           3, 2, 6, 7, -1,  1, 0, 4, 5, -1,
           0, 3, 7, 4, -1,  2, 1, 5, 6, -1 
      ]
      normalIndex [  
           0, 0, 0, 0, -1,  1, 1, 1, 1, -1,
           2, 2, 2, 2, -1,  3, 3, 3, 3, -1,
           4, 4, 4, 4, -1,  5, 5, 5, 5, -1 
      ]
    }

Calculating Per-Vertex Normals - Crease Angle

Meshes exported from CAD applications often tend to span several "surfaces". If vertex normals are calculated by just averaging face normals of all faces the vertex is part of, the result is poor. The images on the right show the Bismarck battleship dataset with just one averaged normal per vertex (top) and multiple normals when the angle between two adjacent polygons is larger than 30 degrees (bottom). Click on images to enlarge.

Thus, most packages allow to define a crease angle that defines the maximum angle between two face normals for them to interpolate at a given vertex. If the angle between the face normals is larger, then a crease is assumed to be in between the to faces and multiple normals are created for the shared vertices. tinySG uses the following algorithm for calculating per vertex normals:

   foreach face F of mesh:
      foreach vertex V of F:
         V.normal = (0,0,0)
         foreach adjacent face Fa of V:
            if angle(Fa.normal, F.normal) < creaseAngle:
               V.normal += Fa.normal * Fa.area
            else
               ; // ignore face normal

         normalise V.normal

The vertex normals are kept as an array, a normal index for each vertex references it's normal. The above loops produce as many normals for a vertex as there are faces the vertex is part of, and usually many of them are identical. So the next loop reworks the normal index and eliminates duplicate normals to reduce the memory footprint of any csgIndexedShape node.

Performance

Indexing vertex data may deliver great performance, because graphics drivers or even the hardware has a transformation cache, maintaining several already transformed vertices. However, these caches are small (something around 10-20 vertices). Thus, the application needs to be careful about when to send which index in order to take advantage of a vertex being found in the cache.

Having the transformation cache in mind, the following recent benchmark result with tinySG came as a surprise: Unrolling the geometry of the Bismarck dataset increased the render performance by up to 4x for synthetic benchmark datasets, compared to indexed rendering using VBOs/IBOs. The calls changed from glMultiDrawElements() to glMultiDrawArrays(). Both AMD/Ati and nVidia hardware benefit from the change.


The Stanford Dragon: 0.87M triangles in 4 nodes.	Terrain dataset created with tinyTerrain generator: 400x400 hexagons, 1.3M triangles in 9 nodes.	Performance comparison: IBOs (red) vs. unrolled indices, flat attributes (blue). Values show frames per second.

The table above shows a performance comparison done on a 2.8GHz Core i7-860 with an AMD FirePro W8000 running Windows 7. For large meshes (terrain, dragon), performance gains by unrolling indices are huge. Real datasets (axle, mountaineer) still show a significant improvement, although other factors influence performance here as well, like 500-900 material state changes, traversal of 1500-4000 scenegraph nodes and binding of well over 1000 VBOs each frame.

Keep rendering,
Christian

Acknowledgements:

The model of the Bismarck battleship has been created by a chinese artist. Unfortunately, I'm unable to reproduce the chinese characters here. The dataset is available as a SolidWorks model at grabcad for non-commercial use and has been exported to vrml to be loadable by tinySG.
The earth texture is by courtesy of NASA.
The dragon dataset is provided by courtesy of the Stanford Computer Graphics Laboratory and kindly provided for research purposes and publishing of rendered images.
The terrain mesh was created with tinySG's tinyTerrain generator.