If you haven't been keeping up with "Meltdown," I'm not going to link to it from here. I don't want to fuck up your world. No, no, don't look it up. Live a little first.

So, the first results are in, and things are definitely slower. One of the Riak guys says their CPU usage is up by 50%. As such, I thought I would do my part to help.

Crockford v1.0.0:

crockford [bench] cargo bench -- decoding::benchmarks
   Compiling crockford v1.0.0 (file:///foo)
    Finished release [optimized] target(s) in 1.35 secs
     Running target/release/deps/crockford-d3630e6cfdca8a67

running 4 tests
fzq                         ... bench:          26 ns/iter (+/- 4)
fzq_uppercase               ... bench:          26 ns/iter (+/- 1)
fzzzzzzzzzzzz               ... bench:          95 ns/iter (+/- 17)
fzzzzzzzzzzzz_uppercase     ... bench:         112 ns/iter (+/- 21)

Crockford v1.0.1:

crockford [bench] cargo bench -- decoding::benchmarks
   Compiling crockford v1.0.1 (file:///foo)
    Finished release [optimized] target(s) in 1.30 secs
     Running target/release/deps/crockford-2027f81b7006416a

running 4 tests
fzq                         ... bench:           5 ns/iter (+/- 0)
fzq_uppercase               ... bench:           5 ns/iter (+/- 0)
fzzzzzzzzzzzz               ... bench:          14 ns/iter (+/- 0)
fzzzzzzzzzzzz_uppercase     ... bench:          14 ns/iter (+/- 0)

Crockford is now 5-8 times faster.

How?

The short answer is that I was pretty lazy in how I wrote it.

The longer answer is that I nerd-sniped a good friend of mine and he provided one of the missing pieces when I didn't want to.

You may recall that the last article compared the performance of binary search with the performance of two different hashmap implementations. My friend (for the purposes of this article, we will call him Sparky) had just one question to ask: what are you using those for?

The answer is that I was using them to determine the validity of UTF-8 input to the function. Only 32-ish characters are valid symbols in Crockford encoding, and I put those 32 in an array that you can, incidentally, still find in the source code because it's used in encoding as well. I treated the (ordered) array as a set and used binary search to determine membership in the set, which mostly works fine, except it's a little slower than a hashmap with a fast hash.

But then he asked, "Ok, but why do either of those things? Why not put all valid inputs into a sparse array that matches the size of your problem space?"

(Ok, he probably said something that sounded less cool, but I want to play him up for the audience.)

Lucky for me, I had a ready answer. "Well, Sparky, I just don't want to type an array containing 256 items."

You lazy asshole!

Of course, by this point, he had basically managed to guilt me into trying it; my plan was to steal the #to_normal_digit function from my source code and use it to generate the appropriate array. (If I had any brains at all, maybe that would be in a build script or something, but it's not.) Anyway, as I was working on that, he sent me a copy of pretty well everything I wanted; by the time I was partway done, he'd already written a ruby script to do the work!

So, thanks, Sparky.

For reference, here is the digit normalization function currently in use--which is a classic example of trading memory for performance. (You know, 256 bytes of it, anyway.)

So, between Meltdown and this, your computer should still be a little more than four times faster than it was before. I mean, assuming all it ever does is encode identifiers into Crockford Base32...