Smaller binary size with C++ on baremetal (g++) – Part 2: Templates

Following up on my previous post regarding the same topic, this post will specifically discuss reducing code bloat resulting from the use of templates.

I like templates. They simplify type safety, and they are a large contributor to my using C++ on baremetal. Aside from the bootloader, all of the isostick code is written in C++.

That said, templates can chew up precious code space. Recently a lot of trace/logging code (the subject of a future blog post) has been added to the isostick firmware. It makes heavy use of my vector, map, and circular buffer classes–all of which are templates.

The problem…

… arises when using template classes or functions with many different types. The compiler helps us out a bit here:

  • Assuming you have unused function removal enabled, you will only end up with the methods you actually use for each template instance.
  • Because templates must be defined where they are declared, they are often thought of as inline. However, using the same template methods across many compilation units should result in just one copy of the relevant code in your final binary.
    The compiler may still inline them depending on its usual inlining algorithms, but at least for larger code it is not being duplicated per compilation unit.

It is still duplicated per template instance, however, and that is what eats all your bits. For example, let’s say you have a template class Foo with a method doStuff(), and let’s say it’s a very large method. Every type T you call Foo<T>::doStuff() with, you end up with a new copy of doStuff(). This can add up quickly.

My solution…

… is a C-style approach: create a “generic” mixin class or function which handles things as you would without templates. Usually this just means passing an extra argument to specify the size of the data being handled. From there you can operate on the data as per usual.

A template class would inherit the generic class, for example, while a template function would simply call the generic function. The idea is to reduce the template code as much as possible, ideally down to simply calling a generic method.

If the template method is nothing more than a call to a generic method, that call should get optimized out and the original call simply becomes a call to the generic method.

Word of warning

My resizable container classes use realloc as opposed to new[]/delete[] in hopes of speeding up resizes. So, I am already using a type traits struct to restrict them to containing POD types–they must be memcpyable.

It probably goes without saying, but be sure the types you operate on in your generic methods are safe to manipulate in a generic way!

You may need to use new[]/delete[] and avoid memcpy/etc with your container classes to be safe to use on non-POD, such as objects overloading the assignment operator, or requiring destruction.

A good example is a string class having a pointer to its buffer where the string is stored. If you made a copy of a list of strings, you would now have a list of identical strings, using the same buffers. Even assuming you properly destruct the objects in your container, you’re going to double-free that memory when the second list is destroyed.

For this reason, I strongly suggest using type traits and only allowing POD in your containers. You can always store pointers to objects in your containers, which usually ends up more convenient anyhow.

Leave a Reply

Your email address will not be published. Required fields are marked *

*