Is it Possible to Write Bug-Free Code? A Case for Thorough Specifications

When I started as a server developer at Vena in 2013, we didn't have test-driven development. Sure, we designed our codebase to be testable by using dependency injection from the start, with the expectation of writing unit tests with mocks. But we needed to build out quickly, and unit tests were quickly forgotten. We had one automation engineer to write black-box tests for 3-4 developers. As one would expect, that didn't scale. After the team doubled in size, we mandated that all developers also write automated tests. Since then, we've been steadily increasing our test coverage: writing black-box (API) tests, unit tests, stress tests, and now UI tests. We've replaced our perpetually delayed biweekly QA+release cycles with daily releases by implementing pre-merge testing (automated and manual QA) and code reviews using Github Enterprise. We put in place a staggered release process from development to canary to production environments. All this helps us release faster, with fewer bugs.

By all accounts, it's been a success. But it's easy to get lost in the race for higher test coverage. Testing itself is not the end goal -- code correctness is! All the tests and processes above are in place to try to catch bugs in our code. But can one write code in a way that doesn't contain bugs in the first place? Can you prove that code is correct without testing it?

When Code Coverage Isn't Enough

Let's say you want to write a log parser, where each line of the log is in this format "[time] message". You start by writing a Java function that takes a line of input and produces a Record object.

static Record parseRecord(String input) {  
    int i = input.indexOf('[');
    int j = input.indexOf(']');

    String time = input.substring(i+1, j);
    String message = input.substring(j+1);

    return new Record(time, message.trim());
}

You also write a simple unit test for it. It passes, the code coverage is 100%, and you're all good right?

@Test
void testParseRecord() {  
    Record r = parseRecord("[15:30] foo");
    assertEquals(r.getTime(), "15:30");
    assertEquals(r.getMessage(), "foo");
}

Well, the first thing a sharp QA mind would try is: What about null input? Or input that doesn't contain any square brackets? It throws an exception! At this point, the developer might protest, "Well this function is only meant to read lines of a specific format. They should never run on such bad input!"

There are two problems here:

  1. We've not defined what kind of inputs this function is designed to accept. It is under-specified.
  2. Code coverage was 100% because the one unit test written, testParseRecord, exercises all 5 lines of the function parseRecord. If we accept that the invalid input handling is a bug despite the coverage, then how do we know when we've sufficiently tested all cases?

In my opinion, the solution to both is better code specifications. OK, I hear your groans already. But I'm not talking about the high-level feature requirements document that you have to wait for a product manager to write up. I'm talking about the short, quick documentation that any developer can write for every low-level unit of code (i.e. function, method, class).

Problem #1 can be solved directly by specifying the kinds of inputs parseRecord() should accept (we'll get back to this later). What about problem #2? Well, we can use the well-specified behaviour of the methods that our code calls - like String.indexOf() and String.substring() - to do a code analysis.

Using Assertions to Verify Code

Disclaimer: I am not an expert in formal methods for verifying software. I will however show you one way to do it. The method is to write assertions in between each line of code. The assertions are logical statements (not code that runs), and they must always be true at that point of the code's execution.

The first assertion we write will be at the entrance of the function, before any lines of code are executed. Thanks to Java being a strictly-typed language, we know there is only one local variable input, and it is of type String. We'll now write the most obviously true statement ever:

static Record parseRecord(String input) {  
    // ASSERT: input is null OR input is not null

The goal of code analysis is to narrow down the state space as much as possible. Already we can see that input being null is a problematic state. This guides us to writing some extra bit of real code, after which we can write some more useful assertions:

    if (input == null) {
        throw new IllegalArgumentException("input may not be null");
    }
    // ASSERT: input is not null
    // ASSERT: 0 <= input.length() <= Integer.MAX_VALUE

I took the liberty of asserting the value of input.length() based on what we know about Java Strings. They can never be shorter than 0 or longer than the max value of a signed integer (231 - 1). A lot of assertions will state obvious things. Get used to that.

For the next line of code, let's read the JavaDoc for String.indexOf(int ch):

Returns: the index of the first occurrence of the character in the character sequence represented by this object, or -1 if the character does not occur.

We know something about String indexes: they can range from 0 to the length of the string minus 1, inclusive. Combining this information with the assertion about input.length() above, we can write the next two assertions:

    int i = input.indexOf('[');
    // ASSERT: -1 <= i < input.length() <= Integer.MAX_VALUE

    int j = input.indexOf(']');
    // ASSERT: -1 <= j < input.length() <= Integer.MAX_VALUE

So far, so good.

Working Backwards

For the next two lines, we'll work backwards. Why? Reading the documentation, we find that the two substring methods we use can throw IndexOutOfBoundsException!

substring(int beginIndex, int endIndex):

Throws: IndexOutOfBoundsException - if the beginIndex is negative, or endIndex is larger than the length of this String object, or beginIndex is larger than endIndex

substring(int beginIndex):

Throws: IndexOutOfBoundsException - if beginIndex is negative or larger than the length of this String object.

Let's say that we don't want to throw IndexOutOfBoundsException from our function. (That's not to say exceptions can't be thrown, but we only want to throw when necessary, and even then, only throw specific exceptions that tell the caller exactly what went wrong.)

So how do we ensure that input.substring() does not throw that exception? We can write a requirement by substituting in our values for beginIndex and endIndex parameters:

// REQUIRE: 0 <= i+1 <= j <= input.length()
String time = input.substring(i+1, j);  
// REQUIRE: 0 <= j+1 <= input.length()
String message = input.substring(j+1);  

Let's break that first requirement down into 3 parts:

  • REQUIRE: 0 <= i+1 is SATISFIED by ASSERT -1 <= i
  • REQUIRE: i+1 <= j is NOT SATISFIED
  • REQUIRE: j <= input.length() is SATISFIED by ASSERT j < input.length()

Similarly, breaking down the second requirement:

  • REQUIRE: 0 <= j+1 is SATISFIED by ASSERT -1 <= j
  • REQUIRE: j+1 <= input.length() is SATISFIED by ASSERT j < input.length()

The requirement i+1 <= j is not satisfied, which means some inputs could throw IndexOutOfBoundsException from the code input.substring(i+1, j). Example: when input is "x][foo", then i is 2, i+1 is 3, and j is 1.

How to solve?

Once again, the failed requirement guides us to the solution. We need to make i+1 <= j, which means we should rewrite the code that assigns j.

Let's use int indexOf(int ch, int fromIndex):

Returns:
the index of the first occurrence of the character in the character sequence represented by this object that is greater than or equal to fromIndex, or -1 if the character does not occur.

So if we pass i+1 as fromIndex, then we get:

    int j = input.indexOf(']', i+1);
    // ASSERT: i+1 <= j OR j == -1

Well, that almost works. If j is -1, then it's always going to be smaller than i+1 because we know -1 <= i from our assertion before, and j = -1 <= i < i+1, i.e. j < i+1, which fails our requirement.

Let's add another check.

    int j = input.indexOf(']', i+1);
    // ASSERT: i+1 <= j OR j == -1

    if (j == -1) {
        throw new InvalidFormatException("] not found");
    }
    // ASSERT: i+1 <= j

Now we've got what we need!

Putting it all together, the final annotated code (up to before the return statement) might look like this:

static Record parseRecord(String input) {  
    // ASSERT: input is null OR input is not null

    if (input == null) {
        throw new IllegalArgumentException("input may not be null");
    }
    // ASSERT: input is not null
    // ASSERT: 0 <= input.length() <= Integer.MAX_VALUE

    int i = input.indexOf('[');
    // ASSERT: -1 <= i < input.length()

    int j = input.indexOf(']', i+1);
    // ASSERT: j < input.length()
    // ASSERT: i+1 <= j OR j == -1

    if (j == -1) {
        throw new InvalidFormatException("] not found");
    }
    // ASSERT: i+1 <= j < input.length()

    String time = input.substring(i+1, j);
    // ASSERT: No exception thrown by above line
    // ASSERT: time is not null AND time.length() == j - (i+1)

    // ASSERT: j+1 <= input.length()

    String message = input.substring(j+1);
    // ASSERT: No exception thrown by above line
    // ASSERT: message is not null AND message.length() == input.length() - (j+1)

    return new Record(time, message);
}

We've now modified the code and proved that it will never throw any IndexOutOfBoundsException from the calls to substring(). But is it correct? Astute readers will notice that we never added a check for i == -1. This means input such as "7:00] hello" would be accepted, even though it doesn't match the expected format.

Why did our analysis miss this? The reason is that we only tried to prove a certain exception wouldn't get thrown. We didn't try to prove anything else. Actually, we didn't know what to prove yet, because we didn't specify what this function is supposed to do. That leads us back to problem #1 at the top of this page...

Specifying the Behaviour

During our exercise, we already started to make decisions on what exceptions the function should throw. Specifically, it should never throw IndexOutOfBoundsException but it could throw InvalidFormatException or IllegalArgumentException. This should be written into the function specification.

As we start to define what the function should do in non-error cases, we find that the original function parseRecord() did two separate things: it did some string parsing to find two substrings, and then it created a Java object out of them. It's easier to write a spec when the function only does one thing. So let's extract the first string parsing part into its own function findInBrackets().

That specification might look something like this (in Javadoc style):

/**
 * Looks for the first occurrence of [.*] in the string, and returns the 
 * substring between the [ and ] characters.
 * @param input - a string to parse
 * @throws IllegalArgumentException if input is null
 * @throws InvalidFormatException if the '[' character is not found or 
 *         the ']' character is not found anywhere to the right of the 
 *         first '['.
 * @returns a substring of input, specifically the string that is 
 *          in between the first '[' symbol and the first ']' symbol 
 *          that follows that '[' in the input string.
 */
String findInBrackets(String input);  

This is now a very specific specification! It tells us exactly which inputs it accepts, and it tells us what happens if you feed it something it doesn't accept. It also tells us exactly what it returns. With this specification, it's easy to write the correct code.

It's now clear that we should add code to follow the specification "throws InvalidFormatException if the '[' character is not found," which was missing. It looks like this:

if (i == -1) {  
    throw new InvalidFormatException("[ not found");
}

The spec "throws InvalidFormatException if ... the ']' character is not found anywhere to the right of '['" is already handled by the if (j == -1) check.

Now we have a provably correct function, and we did it all without writing or running another test. (Fun fact: while writing this post, I never once compiled or ran the code!) Of course, you should still write tests. A nice bonus is that the spec tells us exactly what the expected result of each test should be. Here are some cases that I came up with in 30 seconds:

  • "abc [ foo] bar" -> " foo"
  • "[a[b[c[d]e]f]g]" -> "a[b[c[d"
  • "]x[" -> InvalidFormatException
  • "][]" -> ""
  • "abc (xyz)" -> InvalidFormatException
  • null -> IllegalArgumentException

Addressing Questions

Writing assertions looks like a lot of work. Can't a tool automate this for me?

Probably! Static analysis tools do exactly this. They can analyze your code to narrow down all the possible return values and alert you if you end up trying to dereference a null pointer for example. But keep in mind it has the same limitations shown above in Using Assertions to Verify Code. A tool doesn't know what your intentions are, and it wouldn't catch the i == -1 case either. You still need a human to design the specifications (which I think is a good thing!)

Personally, I mainly use assertions in a particularly long, complex, or critical piece of code.

As an experienced coder, I already know how to avoid those traps with indexOf and substring.

Great! It's probably easy because you know those methods well. It's made easier because those String library methods are well-documented and well-defined! How many times have you run into methods in your own development work that aren't so well-defined? Here's an example inspired by true events:

buildQuery(Map<Key, Collection<Value>> filters)  

This function returns a query for data that matches the provided filters. What kind of query do you get when filters is an empty map? Some callers might assume the query returns no data (nothing matches), while others might think it returns all data (nothing is filtered out). Without clear specifications, it's anybody's guess as to how the buildQuery() method should handle an empty input. Or a null input. Or an input where a key maps to an empty set vs a null set. Pass the result of this to a method like deleteByQuery(buildQuery(filters)) and there's a nasty bug waiting to happen.

What ends up happening is that a developer who wants to use this method has to go to great lengths to read the actual code to find out what it does, or spend time testing code that they didn't write, or both. (So do the code reviewers.) All this wasted time could be avoided with a few short sentences in the method's documentation...

Takeaways

  • Don't let code coverage fool you into thinking your tests are complete; you want tests that cover all possible states, not just all codepaths.
  • Writing the specification first can help you write better code.
  • A good specification makes writing test cases trivial.
  • A good specification makes the lives of developers who use your code easier.
  • Assertions can help you reason about what cases can or cannot occur, (hopefully) leading to bug-free code!

Happy coding!

Discuss on Hacker News

Vena is hiring in Toronto!
Learn about our culture, if you think you're a good fit, apply!