Warning! Testing, like politics, religion, and how to butter your toast, is one of those subjects that will get you thrown out of better dinner parties and churches throughout the South. What you read here may shock you! You have been warned.

The image above is the Hanged Man's tree, from Witcher III: The Wild Hunt. Appropriate, I thought. I'm not sure if this is my manifesto or my death warrant.

What are we talking about?

Should he wish to conquer, the general must prepare the ground.

When I say "testing," I refer to the practice of writing automated tests in tandem with program code. That is, of incorporating programmatic tests into the code of one's program. This discussion will include Test-Driven Development, but that is not my focus; I mean to discuss the religion as a whole, not any one sect.

Perhaps I've just given away the punchline: that I regard testing as largely a religious practice on about the level of erecting bamboo control towers and making radio antennae out of leaves in hopes that the great silver birds will return. I like to think my position is a little more nuanced than that, but your mileage may vary.

What is your problem with testing?

You.

To be less glib about it, me.

You and me both. What I dislike about testing is that tests are usually written by fallible (and fallen) men. I mean, I like you a lot, but I know you make mistakes. So do I.

Locked away in a vault somewhere in France is the meter. There are two of them, actually, that I am aware of: an original and a replacement created in 1889. And just so you know I'm not full of shit, I wasn't aware of that until a minute ago; I had to look this up.

meter

This object—now superseded by the distance traveled by light in a vacuum in one 299,792,458th of a second—was the divine standard by which all other meters were measured, from the one standing in the corner of your elementary teacher's classroom (though mine used a yard stick) to the 1.524 meters separating you from the nearest spider. That's the great thing about meters, feet, smoots, and the thumb's-width rod you're supposed to use on your wife: we have objective standards for these things.

What we don't have is an objective standard for how many non-consecutive purchases of cod liver oil are required to earn one free pass to a non-weekend matinee at the Royal Theatre (note the arrangement of the letters R and E) in downtown Gingles, Wisconsin, at the Pick-a-dilly Market Square Café, General Store, and Postal Emporium. This means, unfortunately, that when Mrs. E. Bailey requests that particular enhancement for her point of sale system, someone needs to invent that out of whole cloth.

// Warning: this code may or may not do anything, and certainly
// does nothing useful.
public bool EarnsFreeMatineePasses(Purchase purchase, PurchaseHistory purchaseHistory)
{
    var purchaseCount = purchaseHistory
        .Count(purchase => purchase.Contains(CodLiverOil));
    
    return purchaseCount > 0
        && purchaseCount % 10 == 0 
        && purchase.Amount > 5.00M;
}

So, for example, the code above looks reasonable, considering I wrote it off the top of my head and without the aid of a compiler. There are just a few problems:

  1. No one told me that 10 was the magic number.
  2. No one said that a minimum purchase was required in order to be eligible.
  3. Now that I think about it, Mrs. Bailey said they could have one pass, and I've named this method something about "free passes."

...the point being, there is no right way to do this, and there is also no wrong way to do this, and in any case I would have written a test that passed on the basis of the method I wrote.

Tests are also pathological in a lot of other, possibly more important ways that I'm sure other authors have discussed in greater depth, but I'm not interested in those. I do not write pathological code, as a rule; I write code that I think is pretty good, otherwise I would not write it. As a result, what truly bothers me about tests is their limited utility.

To illustrate this, I will use an example from a conversation I had earlier today. Given the following code, or something similar:

static void Insert(Foo foo)
{
    Execute();

    if (foo.Bar?.Any() ?? false)
    {
        Execute(foo.Bar);
    }

    if (foo.Baz?.Any() ?? false)
    {
        Execute(foo.Baz);
    }
}

The gentleman in question, for whom I have a great deal of respect resulting from his conscientious, exacting, and cooperative nature, suggested that I construct a harness for counting the number of times Execute is called, and with what arguments, to ensure that it is called when appropriate.

...That is, to ensure that it is called when Bar and Baz are non-null and non-empty.

On the one hand, ok, I can see where that test would have some limited value in asserting that, yes, those methods are called under the prescribed circumstances, but I can assert that with my eyes and mouth. What I need to know is whether calling that method achieves the desired result, because there is not a doubt in my mind about whether and when that method will be called.

So you just don't write tests?

I do, in fact. Perhaps not as many as some, but—out of a handful of small libraries available for viewing on my GitHub profile—I just counted more than one hundred tests. That in mind, is this a case of "do as I say, not as I do?"

Not at all. I didn't say not to write tests.

Here, let me put that in bold. I never said not to write tests.

What I said was that they often have limited value. So, why don't we look at some cases where that isn't true?

When there's a meter

To me, the most obvious time to write tests is when there is a golden meter stick sitting in some snail-scarfing cretin's closet. A good example of this is harsh, my Rust port of hashids. In order for my port to be useful, it must perfectly duplicate the behavior of the original, and it does. I know it does, because my tests say so.

Another similar example is crockford, which implements Big Doug Crockford's base 32 encoding algorithm. Again, there is a standard, and my code does not define that standard, and therefore my code must conform perfectly to that objective, external standard. I use automated testing to help ensure I meet that goal.

Incidentally, both of these libraries are representative of another important factor.

When changes should not impact behavior

The hashids algorithm is inherently slow, and harsh does little to combat that other than having been written in Rust, which means it is naturally going to be faster than anything written in JavaScript. I wrote and champion crockford as an alternative because Douglas Crockford's Base32 encoding is significantly faster. As a result, I naturally felt that it was important to emphasize performance in my implementation. For this purpose, tests were vital.

It's one thing to have Billy Codemonkey making changes to your shopping cart's coupon algorithm: he's going to mess up the way you count cod liver items, and he's going to adjust the tests to reflect that, and you're going to get a green checkbox in your CI environment, and you're not going to know that entire families are seeing movies at the Tuesday matinee on your dime until it's too late.

tests-resized

But it's another thing entirely when there is only one correct answer, and as a result there is no reason to ever modify those tests. This was the case when optimizing crockford: my tests allowed me to make sweeping changes to the strategy I used to arrive at the required values without concern that I might be screwing up everything in return for shaving off just a few more nanoseconds.

The critical point here is that, when changes to code do not involve commensurate changes in the test suite, tests can be very valuable in detecting aberrant behavior. This applies only when the tests themselves are not changed.

When you're about to mess something up

One of the articles that started this hullabaloo, recently reposted, talks about how to do testing right, and one of the points they make repeatedly is that tests can outstay their welcome. I'm less interested in talking about how to do testing right, because I think other people do a better job, but I agree that tests can present a better value proposition as a short term investment.

For instance, say you need to make some changes to Component A, which is tangentially related to Component B. It may behoove you to write a small test suite for B before you tinker with A, because that would allow you a far greater degree of confidence that your changes have not been destructive. This is basically a special case of the last reason: we want tests now, but we're not going to keep them, because we don't want to pay their maintenance costs down the road.

When your type system is just no help at all

C# has its holes. Honestly, every language does. You can find the following code or its equivalent peppered through a lot of my Rust projects:

fn make_pattern() -> Regex {
    Regex::new("pattern here").unwrap()
}

#[test]
fn pattern_isnt_bullshit() {
    test::black_box(make_pattern());
}

Like a lot of things, regular expressions are effectively a run-time-only concern, which means that, if I want to know for sure they work, they have to be tested.

When you just honestly don't know what you're doing

I find that, depending on your language and tooling, tests are a more pleasant alternative to old fashioned log-to-console debugging, and I will often leave tests of that kind in place once the desired behavior is achieved. As mentioned above, the maintenance cost of tests does significant damage to the value proposition, and so that cost will need to be evaluated—but, in the meantime, have some free tests. Why not?

But you try not to write tests otherwise?

As a rule, I avoid writing tests for things that I define and understand, because they have so little value. I always come back to the coupon example because that's a real example earlier in my career: I worked on a system that handled loyalty rewards for a retail chain.

Our head (and only) QA guy would change his mind about how to calculate rewards on, at most, a bi-weekly basis. I had tests for these systems, and the tests never failed, but I would still get called into his cubicle repeatedly so that he could explain what we were doing wrong. Invariably, every single time, I would go back and change both A) the behavior, and B) the tests.

What good were my tests?

That's kind of the thought I want to leave you with. What do your tests buy you? Now, ok, you may be writing JavaScript or Python or something and you may say, "Well, my tests are invaluable!" In that case, I invite you to ask what your language choice buys you, but otherwise stick with my first question. What do you get from your tests that isn't effectively a side effect of testing?

Do you get better code organization? A greater understanding of your program flow? Do your tests allow you to better communicate your intent? Well, those are side effects. If that's what you get from testing, all you're really getting is a green check box, because you could get all the rest of that without ever writing a test.