Bigger isn't always better. Sometimes, the small number wins. For instance, the US Army monster pictured above makes 11,000 horsepower and has a redline of 8500 RPM, but goes from zero to sixty in about two tenths of a second.

Because of this truth, I have expended an inordinate amount of time ensuring that crockford does what it does as fast as it can. In my mind, the only real advantage to crockford encoding over, say, hashids (I have a library for that, too), is raw performance, so that's what I'm trying to offer. It's been a long ride; if you want to know more about it, you can always check the repository history. Right now, we want to talk about my latest round of tinkering.

Background

Crockford's encoding is based on an array:

static UPPERCASE_ENCODING: &[u8] = b"0123456789ABCDEFGHJKMNPQRSTVWXYZ";

...That's the whole thing. (No joke.) When I was first figuring this out, I thought I could just go with some offsets (and, in fact, that is what I do for numeric values), but offsets don't work when we're talking about letters because Crockford Base32 encoding skips some of those at random. (Ok, not really, but you can look up the real details here.) Anyway, the easiest solution I found was to just put them in an array and get the index of the one I need using binary search. Anything not falling in that array is (clearly) not a useful number. Here's how that code looks once we've filtered out invalid characters:

u @ b'0'...b'9' => Ok(u - INT_OFFSET),
u => match UPPERCASE_ENCODING.binary_search(&(u & !32)) {
    Ok(idx) => Ok(idx as u8),
    _ => Err(Error::new(
        Kind::InvalidDigit(idx, u),
        "Invalid encoded digit.",
    )),
},

There are a lot of potential places I could shave off a little extra time, and I still haven't explored a lot of them. For instance, the decoder only uses the second half of that array--everything after the numeral 9. Still, this is a pretty good solution for the following reasons:

  1. I know how big it is. (It's 32 bytes, ok? Duh.)
  2. I know how long it takes to initialize. (It's static. Get over it.)
  3. It does not require the user to initialize anything.

Now, those three things have a pretty positive effect on performance, so I wasn't too worried about this being significantly slower than HashMap, and as a result I never really gave HashMap a fair shake. Well, today, I addressed that. To start things off, here's the bench output for the most recent version of crockford:

fzq                     ... bench:   28 ns/iter (+/- 9)
fzq_uppercase           ... bench:   26 ns/iter (+/- 7)
fzzzzzzzzzzzz           ... bench:   94 ns/iter (+/- 30)
fzzzzzzzzzzzz_uppercase ... bench:  107 ns/iter (+/- 34)

std::collections::HashMap

The short version is that HashMap is slow as molasses.

To be fair, the standard library is pretty conservative, and the reason the standard hashmap is so slow is that it's meant to shield software against a potential DOS vector involving malicious inputs to hash algorithms: siphash has good worst case performance, but it's a little slow to start up and not all that fast for lookups. Even so...

fzq                     ... bench:   67 ns/iter (+/- 10)
fzq_uppercase           ... bench:   69 ns/iter (+/- 11)
fzzzzzzzzzzzz           ... bench:  277 ns/iter (+/- 96)
fzzzzzzzzzzzz_uppercase ... bench:  281 ns/iter (+/- 115)

...The variation between runs here nearly as large as the runtime on the binary search version. Clearly, the code involved is not identical, but I tested that code with and without the hashmap as the source of encoded values, and it's definitely the map that's making the difference. Even so, here's the code in question. The biggest difference is that a map returns an Option instead of a result. (I will be the first to admit that can be a non-trivial difference; Option can be significantly slower.)

u @ b'0'...b'9' => Ok(u - INT_OFFSET),
u => match lookup.get(&(u & !32)) {
    Some(&idx) => Ok(idx as u8),
    _ => Err(Error::new(
        Kind::InvalidDigit(idx, u),
        "Invalid encoded digit.",
    )),
},

I'll admit it: this kind of gave me some warm fuzzies, because I don't want to expand my API all that much, and this means that pursuit of a hash-friendly way of doing things isn't worthwhile. But, you know... I'm fully aware that HashMap is slow. I know there are much faster things I could use. So... Dammit. >.<

fxhash::FxHashMap

...Enter the challenger: fxhash.

Ok, that's bullshit. Fxhash is not the challenger; fxhash is the reigning champeen, as far as I know. If I need to use a hashmap in any performance-sensitive context, this is my go-to tool. Developed for use in FireFox, this stupid thing is stupid fast--I mean, I know it's fast, and I'm consistently surprised by the results I get. Even better, with this package, it's basically a drop-in replacement:

use fxhash::FxHashMap as HashMap;

I'm not kidding when I say that was almost the only code change. Performance, however, could not be more different.

fzq                     ... bench:   22 ns/iter (+/- 6)
fzq_uppercase           ... bench:   21 ns/iter (+/- 6)
fzzzzzzzzzzzz           ... bench:   72 ns/iter (+/- 20)
fzzzzzzzzzzzz_uppercase ... bench:   80 ns/iter (+/- 10)

bugs

It's faster. Than everything. If what you're looking for is to uncrockinate all the things and you're ok with initializing something first, this is your best option. But is that worth the effort?

How would you use this?

Well, you'd need some kind of UnCrocker object. I'd prefer that you didn't try to talk me into making the thing use some kind of static back end so they can all share one--to me, it just makes more sense to let the consumer make one and use it however they want to use it. But the thing is, making one has a non-zero cost, and I'm betting (I'm so sure I'm not even going to bother testing it) that this non-zero cost is enough to push the binary search variant well out in front again if you have to make a new UnCrocker for every value you want to uncrock, which...

Enough rambling. You'd want a singleton. Thing is, a singleton itself has some cost in the sense that you'd end up branching every time you used it in any real application. What do I mean?

Well, you want to share it across your whole app, because it's effectively perfectly threadsafe and there's no good reason not to, but almost no application on earth is going to be happy pretending that you know that for sure. I know Rocket sure as hell isn't. If I remember correctly, none of the others I have tried are, either. You're going to end up doing one of two things:

  1. Locking/unwrapping/whatever this damned thing over and over again.
  2. Setting it up as a fake static in some unsafe block somewhere.

Even lazy_static involves a single branch every time you want to access a "static" item, and that's just downright annoying overhead to apply to something like this, you know? Whether it's enough to offset the benefit of using our hypothetical super-fast uncrockinator is anyone's guess, because I'm pretty much done writing benchmarks for the day.

Conclusion

I'm a libertarian. I have to be; my car is too fast, too loud, and burns too much gasoline for me to be anything else. What that means in terms of software is that I want to offer you the tools and let you decide what you do with them. I guess I'll probably explore creating a way to do this--probably behind a feature flag, because one of my favorite things is offering a library that doesn't depend on something else.