Recently, some controversy (see, for example, here) erupted around a mistake made in the OpenSSL library used by the Debian project. The mistake was traced back to this change. The various comments hint at problems on several layers which led to this mistake, but I can’t help thinking that two basic practices would have gone a long way to avoid this problem.
Looking at the changes and the surrounding code, there is just no hint, viz. comment, there that tells you what is happening and why the line of code is important.
Now, I don’t like to comment the obvious; many style guidelines ask for far too much commenting when the code is quite obvious. But in this case, several good practices were not employed:
- Using an obvious self-documenting procedure name
- Add a warning to crucial code lines or code ordering
- Commenting in detail (or providing a URL)
If you’re implementing a complex algorithm, you need documentation somewhere. The lessons of Literate Programming, as exemplified in Donald Knuth’s TeX programme, seem to have fallen on deaf ears. But Knuth at least put a challenge down that he would pay out money for each verified TeX bug. (It didn’t bankrupt him. Firstly, the error rate per kLOC was very, very low and, secondly, people treasured a cheque signed by him so much that they preferred to frame rather than cash it!).
The cause of mistake is also touching other areas, among them testing. The problem with testing algorithms like the OpenSSL one is that you sometimes need to test a lot of combinations. And I mean A Lot! In his recent talk at the Cambridge BCS meeting (see my review), Peyton-Jones showed an example where the error only revealed itself after running several hundred different data inputs for the same test-case! We usually don’t go anywhere near that length to test our code. But sometimes you need to do it, usually by generating random data in order to cover as many possibilities as possible in order to avoid any bias of excluding certain cases (we humans are often good at rationalising away potential sources of error a la That can never happen).