Improve your code with Testing


Any developer of scientific software has found itself in the horrible situation of having a bug in the code and have no idea how to fix it.

The situation is always made worse by the lack of time. In this article, we will show you how to avoid such a situation, but since errors and bugs happen any way we will also provide a simple, structured approach, to explore the problem and find eventually find the bug.

Using tests is fundamental to quickly and efficiently resolve issues in your codebase. Several approaches are possible and each fits a different use case.

In this blog post, we will start by introducing what is meant by testing the code, then we will explore two different kinds of tests and finally different ways to endure that your code works.

We will focus on good methodologies for scientists while covering quickly the good methodologies for engineers.

The suggestion from this post are easy to apply if you have followed the two previous posts, so make sure to write “good” functions and to respect the Single Responsibility Principle.

What is a test

We consider a test every automatic procedure that verifies the correctness of a computer program.

The important part is “automatic”. If you run your software and manually verify that the output is correct, this is a manual test and we are not talking about those now.

While manual tests can be very important they are carried out by humans. But humans make mistakes, check the results too quickly, or don't check everything every time. The value is automatic test is specifically that they are carried out by computers that don’t make the same errors as humans.

There are two main kinds of tests, integration tests, and unit tests.

Integration Tests

Integration tests are the slowest and more difficult type of test to write correctly. For small software or analysis are often too much since they may potentially use a lot of time for running and are not trivial to write correctly.

The goal of an integration test is to verify that the whole pipeline of the program works. Let’s make an example.

Suppose you want to test a simple program that reads a text file and counts the occurrence of each word in the test writing the result in a sorted output text file as CSV.

So if we pass as input a file that contains the text: “aaa bbb ccc aaa bbb aaa” the output file would be:


A possible integration test for this program would be to create a file and fill it with data such that we know the occurrence of each word in advance (after all we have created the file itself.) Then we set up a script to run our program against the know data set, and to check the result. If the result is the one that we are expecting then the test pass, if the result is different then the test did not pass and our program has some problem.

Is possible that you find problems like the computation of the sum is wrong, but these kind of tests are very useful to discover problems of integration between services. Maybe you were expecting to read a CSV while the input file has no commas. Or maybe you write the sum as an integer but it should be a float.

These kinds of tests are not so useful in scientific code, but if your software is more complex, talks with different databases, and communicates with different services, these kinds of tests are very useful.

A possible application of these tests is in big teams or where there is a lot of data preparation necessary. Following the SRP you can write integration tests to make sure that your software correctly accepts all the possible inputs and correctly generate all the kind of output.

Unit Test

Unit tests are the most useful in a scientific environment. They should be very quick and simple to set up, but at the same time, they should provide a great way to make sure that your software is correct. They are simpler to write and fast to execute.

The goal of a unit test is to make sure that a “unit” of your software works. This is under the assumption that if all the units of the software are correct, the whole software is correct, which is a big assumption to make, but unit tests are extremely useful and powerful.

Let’s go back to our simple program that counts the occurrence of each word. This program will most likely have a function split_string_into_words that given some text in input as a string it returns a list of the words in it.

A possible unit test would be to test this property with a function like this:

class CountWordTest(unittest.TestCase):
    def test_tokenizer(self):
        words = split_string_into_words("aaa bbb ccc")
        self.assertEqual(words, ["aaa", "bbb", "ccc"])

Or again it can have a test that given a list of words it returns a dictionary with the occurrence of each word.

def test_count_word_in_list(self):
    word_in_list = count_word_in_list(["aaa", "bbb", "ccc", "aaa", "aaa"])
    self.assertEqual(word_in_list, {"aaa": 3, "bbb": 1, "ccc": 1})

I believe unit tests are the sweet spot for scientific code, they are a great tool to debug your code and make sure that it stays bug-free, even in large collaborations.

Now that we know the two main classes of tests, let’s focus on unit tests that are more useful in the scientific environment, and let’s see how they can use. We will start exploring the Test Driven Development (TDD). Then we will see how to use tests to easily debug our code and make sure it stays bug-free.

Test Driven Development (TDD)

When developing software using TDD we first start writing the test for our function without even a real implementation of the function itself.

We run the test to make sure that the test we just wrote does not pass, after all, we haven’t yet implemented the function, so if the test success there is something wrong.

Only when we are sure that the test fails we start implementing the function in the simplest possible way to make the test pass. When finally the test pass, we keep writing other tests adding assumptions about the behavior of our functions.

This process produces a huge number of test that covers all the assumptions in your code. This is very useful when you need to change your code. After each change, you can re-run all the tests to make sure that you didn’t break anything.

However, it does require a lot of work that is not always necessary.

TDD is extremely useful when developing libraries that other people will use. All the tests will make sure that the library is stable, moreover, those tests can serve also as a poor form of documentation, but still better than nothing.

TDD helps to make sure that your library is always bug-free, indeed you should commit your code only when all the tests pass.

Debugging with test

Another approach that I find more useful when developing scientific software is to use tests for debugging.

If you have written good functions and followed the SRP this approach will come naturally and it will be very easy to implement.

Until you haven’t found any problem in your code you can ignore tests and testing, however as soon as you need to debug, instead of manually trying different input, you write a test to verify your assumption.

For example, it turns out that our function to count words in a string is buggy, indeed with the input aaa bbb aaa,ccc it returns:

1,aaa 1,bbb 1,aaa,ccc

At this point is worth writing a test, the function split_string_into_words returns the correct output when the input is aaa bbb aaa,ccc?

class CountWordTest(unittest.TestCase):
    def test_tokenizer_with_comma(self):
        words = split_string_into_words("aaa bbb aaa,ccc")
        self.assertEqual(words, ["aaa", "bbb", "aaa", "ccc"])

############ RUN TEST ##############

 FAIL: test_tokenizer (main.CountWordTest)
 Traceback (most recent call last):
   File "", line 4, in test_tokenizer
     self.assertEqual(split_string_into_words("aaa bbb aaa,ccc"), ["aaa", "bbb", "aaa", "ccc"])
 AssertionError: Lists differ: ['aaa', 'bbb', 'aaa,ccc'] != ['aaa', 'bbb', 'aaa', 'ccc']
 First differing element 2:
 Second list contains 1 additional elements.
 First extra element 3:
 ['aaa', 'bbb', 'aaa,ccc']
 ['aaa', 'bbb', 'aaa', 'ccc']
 ?                    + ++

Great, we have found our problem, we can change our function, re-run the test to make sure that everything works, and then try again to run the whole program.

def split_string_into_words(s):
    return s.replace(',', ' ').split(" ")

############ RUN TEST ##############

 Ran 1 test in 0.000s

Now, we have fixed our code and we added a new test. The new test makes sure for us that even if we change the implementation of split_string_into_words the same bug won’t represent itself.

If the code was more complex the use of tests provides a structured way to find a problem.

Start write the test for the big functions, make sure it fails, and then start to test each smaller function that is called until you don’t find the one that fails. At this point, either fix the function, or if it is still too complex, write more tests until you don’t find the bug. Let's see an example, where the function foo() is broken.

def foo():
    a = make_a()
    b = get_b_from_a(a)
    c = generate_c(b)

At this point, you can test the function make_a if the test for make_a passes, then you can move and test get_b_from_a, and this time the test fails. Great, then we can look into it and write the necessary tests.

def get_b_from_a(a):
    h = create_h(a)
    return buggy_find_b_in_h(h)

At this point, you can write a test for the create_h functions and make sure it works as expected and, if the test pass, you can move on and test the suspicious buggy_find_b_in_h.

Debugging with tests is the sweet post for scientific software. It allows writing code very quickly while at the same time it provides a structured approach to debugging and avoiding the same bugs coming back.


This article builds on top of the other two, if you have written good functions and respected the Single Responsibility Principle, write and debugging with tests will come naturally and it will help you a lot in find bugs and make sure they don’t come back.

Other approaches to testing are possible, if you would like to hear more simply comment on this post!