Flux RSS

Redeeming C++ (or not)

A few days ago, I was talking with Yazoo about Lua, and I realized it'd be cool to statically hash strings in order to export hashes of function names to Lua, and not plain names. Then we discussed about the feasibility to of this, and how there was already a few articles on the web talking about this, but I realized that with C++11, it should be possible to do something easier than most solutions presented, and I wanted to do some researches anyway.

So I set up into trying to achieve the goal of hashing strings at compilation time in C++, using C++11. I have to say I'm unimpressed by the results. That's going to be a pretty long read, but I've tried to describe all the steps I've taken in order to show exactly what I mean.

First, if you haven't, you'd want to read some explanations about what C++11 brings to the table. There is a few things that particularly caught my attention for doing this static hashing: the constexpr keyword, and the user-defined literals.
So, since I usually like to work backwards, and that I read documentations too fast, I first saw the keywords "must recurse", "only return statement" about constexpr, then I saw this extract on the wikipedia paragraph about user-defined literals:
template<char...> OutputType operator "" _suffix();
OutputType some_variable = 1234_suffix;
Followed by the words: operator "" _suffix<'1', '2', '3', '4'>(). Variadic templates! Yay! Another thing I got to try in C++.
So I thought to myself "oooh, that kind of looks like CAML's way of doing things! Love it. Let's try!"
Alright, so I decided to start with a recursion-friendly string hashing function: djb's hash. Here's the source code for the plain-C, non-recursive version:
uint32_t hash_djb2(const char * str) {
    uint32_t hash;
    for (hash = 5381; *str; str++)
        hash = ((hash << 5) + hash) ^ *str;
    return hash;
}
And here's a recursive re-write of it:
uint32_t hash_djb2_r(uint32_t hash, const char * str) {
    return *str ? hash_djb2_r(((hash << 5) + hash) ^ *str, str + 1) : hash;
}
uint32_t hash_djb2(const char * str) {
    return hash_djb2_r(5381, str);
}
Now, from here, based on the documentation on variadic templates, I tried to write the template<char...> version. Here what I first got:
static inline uint32_t djbHashOne(uint32_t hash, char c) {
    return ((hash << 5) + hash) ^ c;
}
template<char c>
inline constexpr uint32_t djbProcess(uint32_t hash) {
    return djbHashOne(hash, c);
}
template<char c, char... rest>
inline constexpr uint32_t djbProcess(uint32_t hash) {
    return djbProcess<rest...>(djbHashOne(hash, c));
}
template<char... str>
inline constexpr uint32_t ctHash() {
    return djbProcess<str...>(5381);
}
Sounds reasonable enough, although a bit big. Anyway, I tried to instantiate it:
int main() {
    printf("%08x\n", ctHash<'T', 'e', 's', 't'>());
}
And here's what clang++ spit on me:
cthash1.cpp:15:12: error: call to 'djbProcess' is ambiguous
    return djbProcess<rest...>(djbHashOne(hash, c));
           ^~~~~~~~~~~~~~~~~~~
cthash1.cpp:15:12: note: in instantiation of function template specialization 'djbProcess<'s', 't'>' requested here
    return djbProcess<rest...>(djbHashOne(hash, c));
           ^
cthash1.cpp:15:12: note: in instantiation of function template specialization 'djbProcess<'e', 's', 't'>' requested here
    return djbProcess<rest...>(djbHashOne(hash, c));
           ^
cthash1.cpp:20:12: note: in instantiation of function template specialization 'djbProcess<'T', 'e', 's', 't'>' requested here
    return djbProcess<str...>(5381);
           ^
cthash1.cpp:24:22: note: in instantiation of function template specialization 'ctHash<'T', 'e', 's', 't'>' requested here
    printf("%08x\n", ctHash<'T', 'e', 's', 't'>());
                     ^
cthash1.cpp:9:27: note: candidate function [with c = 't']
inline constexpr uint32_t djbProcess(uint32_t hash) {
                          ^
cthash1.cpp:14:27: note: candidate function [with c = 't', rest = <>]
inline constexpr uint32_t djbProcess(uint32_t hash) {
                          ^
1 error generated.
My first thought was "gosh, so many errors ?" when I realized this was just one single error, spread on 21 different lines. Jeez... Anyway, the error is that it can't differentiate the two versions of the same template, because it seems that the empty list is actually an acceptable argument... erm... well... that actually doesn't define the end of the recursion properly. Or maybe it does ? So I tried writing the template to be "empty list", but I had no idea how to. Here's my first attempt at it:
template<>
inline constexpr uint32_t djbProcess(uint32_t hash) {
    return hash;
}
But that gave me that error:
cthash1.cpp:9:27: error: no function template matches function template specialization 'djbProcess'
inline constexpr uint32_t djbProcess(uint32_t hash) {
                          ^
cthash1.cpp:15:12: error: no matching function for call to 'djbProcess'
    return djbProcess<rest...>(djbHashOne(hash, c));
           ^~~~~~~~~~~~~~~~~~~
cthash1.cpp:15:12: note: in instantiation of function template specialization 'djbProcess<'t', >' requested here
    return djbProcess<rest...>(djbHashOne(hash, c));
           ^
cthash1.cpp:15:12: note: in instantiation of function template specialization 'djbProcess<'s', 't'>' requested here
    return djbProcess<rest...>(djbHashOne(hash, c));
           ^
cthash1.cpp:15:12: note: in instantiation of function template specialization 'djbProcess<'e', 's', 't'>' requested here
    return djbProcess<rest...>(djbHashOne(hash, c));
           ^
cthash1.cpp:20:12: note: in instantiation of function template specialization 'djbProcess<'T', 'e', 's', 't'>' requested here
    return djbProcess<str...>(5381);
           ^
cthash1.cpp:24:22: note: in instantiation of function template specialization 'ctHash<'T', 'e', 's', 't'>' requested here
    printf("%08x\n", ctHash<'T', 'e', 's', 't'>());
                     ^
cthash1.cpp:14:27: note: candidate template ignored: couldn't infer template argument 'c'
inline constexpr uint32_t djbProcess(uint32_t hash) {
                          ^
2 errors generated.
So I tried this next:
inline constexpr uint32_t djbProcess(uint32_t hash) {
    return hash;
}
Without template that is. Which didn't work either.
Honestly, at that point, I was already cursing at Stroustrup. All of the variadic templates examples wouldn't cover such usage, and there was no example of the template version of the user-defined literals. Anyway, I decided to "outsmart" the compiler, and go for this version instead:
template<char c>
inline constexpr uint32_t djbProcess(uint32_t hash) {
    return djbHashOne(hash, c);
}
template<char c, char n, char... rest>
inline constexpr uint32_t djbProcess(uint32_t hash) {
    return djbProcess<n, rest...>(djbHashOne(hash, c));
}
Which, in my humble opinion, is utterly broken and retarded. Anybody who ever programmed in any functional language such as CAML would be laughing at C++ at that very moment. But at least, this version now worked. The advantage of the template made sure that this was computed at compilation-time by the compiler, and dumping the assembly output confirmed so:
leaq    L_.str(%rip), %rdi
movl $2089308947, %esi xorb    %al, %al
callq   _printf
Now for using a user-defined literal... well, as it turns out, I read the wikipedia page too fast. The templated version works only for numbers. Not strings. The string version of a user-defined literal's operator looks like this:
OutputType operator "" _suffix(const char * string_values, size_t num_chars);
*sigh*... well, back to coding this from scratch. Now it actually looks easier to be honest. Here's the C++11 code for the hash function with such a prototype:
inline constexpr uint32_t djbProcess(uint32_t hash, const char * str, size_t n) {
    return n ? djbProcess(djbHashOne(hash, *str), str + 1, n - 1) : hash;
}
inline constexpr uint32_t ctHash(const char * str, size_t n) {
    return djbProcess(5381, str, n);
}
And here's to try to instantiate this:
int main() {
    printf("%08x\n", ctHash("Test", 4));
}
Well, that actually works properly, without problems, and the assembly output still looks the same. Time to write the user-defined literal now. It should looks like this, right ?
uint32_t operator "" _hash(const char * str, size_t n) {
    return djbProcess(5381, str, n);
}
int main() {
    uint32_t val = "Test"_hash;
    printf("%08x\n", val);
}
But, as it turns out, this doesn't even compile... clang++ on my MacBook Pro gives me this error:
cthash2.cpp:21:14: error: cannot initialize a variable of type 'uint32_t' (aka 'unsigned int') with an lvalue of type 'const char [5]'
    uint32_t val = "Test"_hash;
             ^     ~~~~~~
cthash2.cpp:21:26: error: expected ';' at end of declaration
    uint32_t val = "Test"_hash;
                         ^
                         ;
2 errors generated.
And gcc 4.6 on my Debian gives me this:
cthash2.cpp:16:19: error: expected type-specifier before string constant
cthash2.cpp: In function ‘int main()’:
cthash2.cpp:21:20: error: invalid conversion from ‘const char*’ to ‘uint32_t {aka unsigned int}’ [-fpermissive]
cthash2.cpp:21:26: error: expected ‘,’ or ‘;’ before ‘_hash’
After double checking, even the examples from wikipedia won't compile. So I even wonder if the user-defined literals will work at all in the current versions of my compilers. Dah!
Oh well. At least, I should be able to define a template that'll work in the current form, right ? Something along the lines of:
template<size_t S>
inline constexpr uint32_t ctHash(const char str[S]) {
    return djbProcess(5381, str, S - 1);
}
But when I try to instantiate it, this is the error I get from clang++:
cthash2.cpp:18:22: error: no matching function for call to 'ctHash'
    printf("%08x\n", ctHash("Test"));
                     ^~~~~~
cthash2.cpp:13:27: note: candidate template ignored: couldn't infer template argument 'S'
inline constexpr uint32_t ctHash(const char str[S]) {
                          ^
1 error generated.
Graaaaaaaah! I have to admit I gave up at that point, unable to figure out nor remember how to do this properly. Thankfully, I have a C++ expert at the office, and the next day, he was able to help me out with this (after some probing). Here's the proper syntax:
template<size_t S>
inline constexpr uint32_t ctHash(const char (&str)[S]) {
    return djbProcess(5381, str, S - 1);
}
Pfew! This works, with both clang and gcc, and their output in assembly code are perfectly computed at compilation-time.
So, as a conclusion, yes, there are things you can do in C++11 that are definitely superior to plain-C, such as compilation-time hashing of strings. Also, think about how advanced the compiler became, able to do this kind of work at compilation-time. This is very impressive. But the required amount of trickery and mysterious arcane wording you have to write in order to achieve this is really a huge rebuttal. I mean, even when I am trolling at the office with my colleagues who have more experience than me in C++'s obscure arcane coding, there's not always a clear answer for most of the difficult problems. There's always a fair enough of research and probing and fiddling with the compiler to get your way with it. And that's really not appealing at all.
Bonus: at some point, I tried to do it this way:
inline constexpr uint32_t djbProcess(uint32_t hash, const char * str) {
    return *str ? djbProcess(djbHashOne(hash, *str), str + 1) : hash;
}
inline constexpr uint32_t ctHash(const char * str) {
    return djbProcess(5381, str);
}
int main() {
    printf("%08x\n", ctHash("Test"));
}
And although it worked perfectly fine with clang, gcc couldn't understand the recursion properly at compilation-time, and generated the loop to compute the hash at run-time. Meh. Which brings a last point: it's very easy to write code thinking that it's correct, and does what you want, but only a careful examination of the assembly output can guarantee that you're getting exactly what you want. And even better: this is compiler-dependent. Quite a trap waiting to trigger on  your feet.