Flux RSS

Don't trust the compiler: part #1

We will be starting a serie about the compiler myths. What people assume the compiler is doing, while the reality is way different. The first part is about static class initialisation.

When writting 3d code, programmers usualy create some kind of 3d vector class, that is used to abstract part of the 3d math. Such a class will generaly be declared like this:

class VECTOR3D {
    VECTOR3D(float fX, float fY, float fZ, float fW);
    float _data[4] align(16);

Because 3d math rely a lot on a using pre-defined vectors like origin (0.0f, 0.0f, 0.0f, 0.0f) or unit vectors, the programmer will have a tendency of declaring those multi-purpose like this:

const VECTOR3D someReferenceVector(1.0f, 0.0f, 0.0f, 0.0f);

Or potentialy, in the case of arrays:

const VECTOR3D someReferenceArray[] = {
    VECTOR3D (1.0f, 0.0f, 0.0f, 0.0f);
    VECTOR3D (0.0f, 1.0f, 0.0f, 0.0f);

Here, the programmer expects those vectors to be pre-initialized in read only memory, but is that what the compiler will actualy generates?

Assuming the VECTOR3D class is made of an array of 4 floats (quite logical) per vector, aligned on 16 bytes, one would assume that the compiler will generate just that: an array of 4 floats, aligned on 16 bytes, and in .rodata (read only memory) because it is const. The reality is way different. Looking at the compiler output, we will discover with horror that the compiler will generate:

  • an array of 4 floats aligned on 16 bytes in the .bss area (ie, R/W unitialized memory)
  • an array of floats
  • a bunch of code that will get executed before main that will copy the array of floats in the array located in bss.

Thus, the code will be larger, init will be longer, and the const just become a gimmick (the vector doesn't reside in read only memory, and is not protected like we espected). It may doesn't seem like much, but when working with very limited memory (like the 256kb on a Cell SPU), it may become an issue if you have a lot of arrays like that one.

The actual solution to this problem is to create a class that doesn't have any constructor, nor destructor, and with members publicly accessible:

class VECTOR3D {
    float _data[4] align(16);

Now, we can declare an actual constant vector:

const VECTOR3D someReferenceVector = {1.0, 0.0f, 0.0f, 0.0f);

or for an array:

const VECTOR3D someReferenceArray[] = {
    {1.0f, 0.0f, 0.0f, 0.0f},
    {0.0f, 1.0f, 0.0f, 0.0f}

The difference from before is that the compiler will generate the vectors in read only memory as we first expected, and there will be no initialization code generated and called in the global constructor call of the application. It may not seems like a big win, but at the scale of a SIMD intensive application (thus, with lots of pre-initilized read-only vectors), this can save you a few hundreds Kb of memory. Also, in case of spu where the initialialization code might get called each time the code will be loaded into the spu, this can save you a lot of cycles.

To conclude, never trust the compiler. It will not magically understands what you have in mind and makes the most optimal code. This is most true in C++, were a simple line of code can hide a maze of complex operations. Be curious of what the compiler outputs, the better you know and understand the compiler, the better code you can write, hinting the compiler about what you actually want it to generate.