Performance Conscious WebGL

Disclaimer: I am not a JavaScript developer. I just happen to find myself using it more than expected.

Recently, I began to dabble in WebGL. This is not unexpected as I have a history of both hobbyist and professional work in graphics programming, as you may tell from the name of this website. And though my journey began with OpenGL, much of my most recent work has been in Direct3D.

As there is no WebD3D, and I am not entirely sure I would want to venture down that path if it existed, I found myself back more-or-less to my roots. However in my recent experimentations I have come across a number of performance critical surprises which are documented in this article.

And as my collection of experiments and demos grow, assuredly so will this list of interesting, and at times perplexing, performance gotchas. Hopefully some of these will help out others who find themselves in the wild and typeless world of JavaScript-based graphics programming.

Float32Array.set is inefficient

A Float32Array is often used as the source buffer for calls to bufferData, though other ArrayBufferView implementations can be used (and they likely suffer the same performance penalty as described below).

In my instanced rendering pipeline, I store material property data for a large number of objects in a single Float32Array. The exact amount of data varies, based on the underlying type (vec3 vs mat4, etc.) and the upper limit of objects in a single instance. The important part to note is that subsets of these arrays are regularly updated.

Initially these updates were being performed using the provided Float32Array.set method, such as:

1
2
3
4
5
set(index, value)
{
    this.float32Array.set(value, index);
}

However according to these performance results, this is 40% slower than manually setting the values yourself.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
set(index, value)
{
    let length = value.length;

    for(let i = 0; i < length; ++i)
    {
        this.float32Array[index + i] = value[i];
    }
}

Alternatively, using a while instead of a for is marginally even faster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
set(index, value)
{
    let i = value.length;

    while(i--)
    {
        this.float32Array[index + i] = value[i];
    }
}

Never use strings as keys to a Map

Using a Map with a string key is really nice as it is a simple way to store relational data that is easy to understand. Especially so when initially setting up a framework and you just want to get things working ASAP.

As an example, as part of my instanced rendering pipeline I organize objects based on their material and mesh. Objects that use the same material and mesh combination are rendered in the same instance. A simple way to describe this relationship is a string such as "<material_id>:<mesh_id>". Easy right? And of course using a string as a key is slow and not optimal, but it can’t be that slow can it?

Well, it is.

When rendering 50,000 objects, my addRenderObject method which constructed the above string and inserted the object into a Map took 17ms according to the Chrome profiler. So much for doing the entire frame in 16.67ms and hitting 60 FPS.

After putting in the little bit of effort required to generate integer IDs for my materials and meshes, and using those to create a Cantor Pair as the key into the Map instead, the time spent in addRenderObject over the same 50,000 objects dropped down to 1ms, a 94% improvement.

And if you don’t believe me you can check the performance results of ~35 million ops/sec for string-based IDs vs ~110 million ops/sec for cantor pair integer-based IDs.

Super is super slow

This was the most perplexing performance penalty during my initial tune-up effort.

While profiling the code, I noticed that a good chunk of time was spent in the update method for my flashing quads. This led to a lot of optimizations in how material properties were structured and handled. But even after those efforts I was still seeing 15ms or more spent in update over 150,000 objects. The method looked similar to this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
class SceneObject
{
    // ...

    update(delta)
    {
        this.timeElapsed += delta;
    }
}

class FlashingQuad extends SceneObject
{
    update(delta)
    {
        super.update(delta);

        this.transform.translate(0.0, delta, 0.0);
    }
}

That is barely anything, even for 150,000 objects. Incrementing one variable through the super call, and then another three through translate. So what is taking so long?

After a little bit of tinkering I began suspecting the super itself. After changing that to the equivalent, but a bit more verbose, call of SceneObject.prototype.update.call(this, delta) there was a reduction of 8ms. Then when I decided that calling into the parent wasn’t even necessary, and instead updating timeElapsed inside of FlashingQuad itself, there was an additional improvement of 5ms, for a total of 13ms.

Yes, that is right. Simply invoking super over those 150,000 objects hit me for a 13ms penalty, each frame.

Looking at the performance results we see the same thing:

  • super: 13.7 million ops/sec
  • prototype: 121.6 million ops/sec
  • self: 125.8 million ops/sec

Though these results speak more favorably to Function.prototype.call than my personal experience, they agree on the fact that super is super slow.