Tuesday 12th May 2015 5.49pm

Tue, May 12, 2015

As part of publicising my C++ Now 2015 talk next week, here is part 10 of 20 from its accompanying Handbook of Examples of Best Practice for C++ ¹¹⁄₁₄ (Boost) libraries:

10. DESIGN/QUALITY: Consider breaking up your testing into per-commit CI testing, 24 hour soak testing, and parameter fuzz testing

When a library is small, you can generally get away with running all tests per commit, and as that is easier that is usually what one does.

However as a library grows and matures, you should really start thinking about categorising your tests into quick ones suitable for per-commit testing, long ones suitable for 24 hour soak testing, and parameter fuzz testing whereby a fuzz tool will try executing your functions with input deliberately designed to exercise unusual code path combinations. The order of these categories generally reflects the maturity of a library, so if a library's API is still undergoing heavy refactoring the second and third categories aren't so cost effective. I haven't mentioned the distinction between unit testing and functional testing and integration testing here as I personally think that distinction not useful for libraries mostly developed in a person's free time (due to lack of resources, and the fact we all prefer to develop instead of test, one tends to fold unit and functional and integration testing into a single amorphous set of tests which don't strictly delineate as really we should, and instead of proper unit testing one tends to substitute automated parameter fuzz testing, which really isn't the same thing but it does end up covering similar enough ground to make do).

There are two main techniques to categorising tests, and each has substantial pros and cons.

The first technique is that you tag tests in your test suite with keyword tags, so "ci-quick", "ci-slow", "soak-test" and so on. The unit test framework then lets you select at execution time which set of tags you want. This sounds great, but there are two big drawbacks. The first is that each test framework has its own way of doing tags, and these are invariably not compatible so if you have a switchable Boost.Test/CATCH/Google Test generic test code setup then you'll have a problem with the tagging. One nasty but portable workaround I use is to include the tag into the test name and then using a regex test selector string on the command line, this is why I have categorised slashes in the test names exampled in the section above so I can select tests by category via their name. The second drawback is that you will find that tests often end up internally calling some generic implementation with different parameters, and you have to go spell out many sets of parameters in individual test cases when one's gut feeling is that those parameters really should be fuzz variables directly controlled by the test runner. Most test frameworks support passing variables into tests from the command line, but again this varies strongly across test frameworks in a way hard to write generic test code, so you end up hard coding various sets of variables one per test case.

The second technique is a hack, but a very effective one. One simply parameterises tests with environment variables, and then code calling the unit test program can configure special behaviour by setting environment variables before each test iteration. This technique is especially valuable for converting per-commit tests into soak tests because you simply configure an environment variable which means ITERATIONS to something much larger, and now the same per-commit tests are magically transformed into soak tests. Another major use case is to reduce iterations for when you are running under valgrind, or even just a very slow ARM dev board. The big drawback here is the self deception that just iterating per commit tests a lot more does not a proper soak test suite make, and one can fool oneself into believing your code is highly stable and reliable when it is really only highly stable and reliable at running per commit tests, which obviously it will always be because you run those exact same patterns per commit so those are always the use patterns which will behave the best. Boost.AFIO is 24 hour soak tested on its per-commit tests, and yet I have been more than once surprised at segfaults caused by someone simply doing operations in a different order than the tests did them :(

Regarding parameter fuzz testing, there are a number of tools available for C++, some better or more appropriate to your use case than others. The classic is of course http://ispras.linuxbase.org/index.php/API_Sanity_Autotest, though you'll need their ABI Compliance Checker working properly first which has become much easier for C++ 11 code since they recently added GCC 4.8 support (note that GCC 4.8 still has incomplete C++ 14 support). You should combine this with an executable built with, as a minimum, the address and undefined behaviour sanitisers. I haven't played with this tool yet with Boost.AFIO, though it is very high on my todo list as I have very little unit testing in AFIO (only functional and integration testing), and fuzz testing of my internal routines would be an excellent way of implementing comprehensive exception safety testing which I am also missing (and feel highly unmotivated to implement by hand).

http://cppnow2015.sched.org/event/37beb4ec955c082f70729e4f6d1a1a05#.VUuMqvkUUuU