Jotting #17: Domains, Values and Null

2009-04-02

I always find NullPointerExceptions a real pain in the neck. And often they shouldn’t occur since the object in question shouldn’t be null. But languages like the {}-family (C,C++,C#, Java, …) make it difficult or even impossible to guarantee that null values cannot occur. The situation is very different in other languages like Haskell.

Let’s take an example of a particular class, viz. String. The String domain is the set of all possible strings including the empty string “”. And null! Since

String x = null;

is a valid statement. But in my experience, I can’t remember where I really needed or wanted to distinguish between null and empty string.

A similar example applies to List, Set or Map: the empty list, set or map is perfectly fine, and null is not needed.

In (nearly) all cases, I would prefer to know that a null object is not an option. Ever. It would make arguing about possible cases so much easier and a lot of safety code could be removed. In my recent projects we always agreed to never return null list (set,map) but use an empty one instead.

Value objects: should have a defined domain, must include decision whether null is an acceptable member (usually it shouldn’t).

As an interesting side note, Tony Hoare has admitted that the introduction of null was a big mistake. Hopefully I will be able to listen to his talk later this year. Others like C. J. Date have long argued against null values in database tables, partly because it forces three-valued logic upon you (unlike the better defined two-value logic of true or false).


Jotting #15: Eclipse Tips and Moans

2009-02-02

Some Eclipse tips and moans that I’ve experienced on and off.

I like to start Eclipse with the option -showlocation; it helps to identify which workspace I am working on, especially when I need to work on two version (trunk and branch).

Workspaces

Just upgraded to Eclipse 3.4 (Ganymede) and had to get rid of a wrongly created workspace. While it is easy to move between workspaces many have commented that is difficult to get rid of workspaces.

In Eclipse 3.4, go to file <Eclipse_HOME>/configuration/.settings/org.eclipse.ui.ide.prefs and remove any workspace in the key-value RECENT_WORKSPACES. Voila. Done. Thanks.

Working Sets

I sometimes like to group projects into working sets, especially in workspaces with many Eclipse projects; but in Eclipse these sets are somewhat second class objects:

  • many menu options don’t work on sets
  • can’t export/import sets

I would like to export sets because when I branch my code it would be nice to carry things over to the branch.

Bookmarks

Similarly, you cannot export/import bookmarks. What a shame/hazzle/waste-of-time. My current workaround is to take a snapshot with Faststone Capture (version 4.8, great little helper app!) and keep the picture.


Jotting #14: Commenting – sometimes it’s crucial

2008-06-01

Recently, some controversy (see, for example, here) erupted around a mistake made in the OpenSSL library used by the Debian project. The mistake was traced back to this change. The various comments hint at problems on several layers which led to this mistake, but I can’t help thinking that two basic practices would have gone a long way to avoid this problem.

Comments

Looking at the changes and the surrounding code, there is just no hint, viz. comment, there that tells you what is happening and why the line of code is important.

Now, I don’t like to comment the obvious; many style guidelines ask for far too much commenting when the code is quite obvious. But in this case, several good practices were not employed:

  • Using an obvious self-documenting procedure name
  • Add a warning to crucial code lines or code ordering
  • Commenting in detail (or providing a URL)

If you’re implementing a complex algorithm, you need documentation somewhere. The lessons of Literate Programming, as exemplified in Donald Knuth’s TeX programme, seem to have fallen on deaf ears. But Knuth at least put a challenge down that he would pay out money for each verified TeX bug. (It didn’t bankrupt him.  Firstly, the error rate per kLOC was very, very low and, secondly, people treasured a cheque signed by him so much that they preferred to frame rather than cash it!).

Testing

The cause of mistake is also touching other areas, among them testing. The problem with testing algorithms like the OpenSSL one is that you sometimes need to test a lot of combinations. And I mean A Lot! In his recent talk at the Cambridge BCS meeting (see my review), Peyton-Jones showed an example where the error only revealed itself after running several hundred different data inputs for the same test-case! We usually don’t go anywhere near that length to test our code. But sometimes you need to do it, usually by generating random data in order to cover as many possibilities as possible in order to avoid any bias of excluding certain cases (we humans are often good at rationalising away potential sources of error a la That can never happen).


Review: Simon Peyton Jones on Type-driven Testing in Haskell

2008-03-13

Simon Peyton Jones gave a talk at the Cambridge BCS-SPA group on testing with functional languages (esp. Haskell). (Someone’s already posted the video and slides; careful it’s large!).

Some important points:

  • Future of programming will be about “Control of (Side) Effects”
  • Programming languages will become more functional than imperative
  • Purity is good for understanding, verification, maintenance,
  • Purity pays back in performance, parallelism, testing
  • Functional/value-oriented is easier to test than object-oriented stateful
  • Functional is good for generating tests (domain-specific language)

After a short intro into Haskell (10 min Haskell 101:)) SPJ moved onto testing in Haskell. In his demo, he tested a programme that would pack 7-bit words into 8-bits, so that eight ASCII characters would take up only 7 byte instead of 8. This sort of space saving is done in SMS where bandwith is precious.

One fundamental test tried to assert that unpack(pack(x))==x. After testing some hand-written cases, which succeeded, the test started to use randomly generated words and started to fail after after a few hundred attempts. Due to its random nature, it took a randomly varying number of cases, but typically it failed after less than a 1000 cases. (It turned out that words of 8-byte length ending in a particular bit sequence were not correctly packed.)

The beauty of the underlying Haskell testing framework was that it took very few lines of code to express a generic testing framework.

The talk also showed that sometimes testing with large random test data is necessary to find bugs; something we rarely do!?

Overall, I found the speaker very engaging and the talk enjoyable even if I won’t claim of having understood or remembered everything.


Jotting #12: Find empty strings in Oracle table

2008-03-12

Had to find some strings (varchar2) in a table that were just blanks with optional end-of-line characters thrown in.

Luckily, regular expressions make that an easy task. Here is the query:

SELECT *
  FROM myTable x
 WHERE REGEXP_LIKE( x.myColumn, '(^[[:space:]]*$)' );

A short explanation:

  • ^...$ says that pattern applies to the string from start to finish, i.e., it’s not just a sub-string pattern,
  • [...]* says that pattern occurs 0 or more times (could also have been […]+ in this case),
  • [:space:] defines a pattern of all white-space characters, including blank, \t, \r and \n.

Sometimes, regular expression just make tasks like these very easy.


Jotting #10: Branching Models and all that

2007-11-21

Proper software configuration management (SCM) is often treated like an unloved child in software projects. I am talking not just committing code into a repository, but about creating reproducible releases, merging code between code-lines and all those things that are sometimes boring but necessary to provide proper control over your team’s coding efforts.

Let me tell you what we did in our current project; I learned a lot during it, mainly since I had to manage the releases most of the time. In the end, the whole process is less scary than I thought; I’ve become even quite relaxed about it, and merging code is no longer scary (but a bit boring).

I must stress that you must adopt your own release process and fit it to your circumstances.

A few words on our software and its installation as it has some bearing on our choice of branching model. The code is a Java webstart application, written using Swing, connecting to enterprise beans on a server. This implies that whenever a user starts the application he will be forced to worked with the most recently installed version; i.e., there is ever only one release in production.

Here is our process to release (assuming we are currently at production version 2.2.1 and use the Linux convention for numbering):

  1. A set of features is defined for the next release.
  2. When the features are implemented a release branch is created and named (e.g., release-2.4).
  3. Part of the team, the release team, completes the new release code, incl. final configuration, acceptance testing, release notes, etc.
  4. The release team releases a candicate for acceptance testing on a test server (tagged 2.4.0rc1).
  5. Bugs in acceptance testing are fixed and a new candidate (tagged 2.4.0rc2) is released. This continues until the code is accepted.
  6. The code is released (tagged 2.4.0).
  7. Any upcoming bugs of the released code will be fixed on the branch line, repeating steps 4-6, and releasing the fixed version (tagged 2.4.1, 2.4.2, …)
  8. After each release (candidate), the code changes are merged back into the development line (merges are tagged appropriately).
  9. The development team starts to work on the next set of features on the development line, repeating steps 1-9 for the next release (2.6 or 3.0).

(If I have time I may add a picture of this process.)

What are the advantages that we obtained from this process?

  • We have a clear, easily understandable and reproducible release process.
  • There is no significant code freeze period when preparing a release.
  • The process allows the team to allocate time efficiently and in parallel.
  • The process is quite agile and flexible; there is minimal burden on developers as many can continue working as if unaware of the release process.
  • The code can be placed under coninuous integration at all stages.
  • We can reproduce production releases at any time quickly.
  • Even during the testing process for a new release (e.g., 2.4.0rc2), we can release an emergency fix for the current production version (e.g., 2.2.1 to 2.2.2) without major upheaval.

It is also obvious that our installation allows us to choose this branching model since we never have more than three versions out there: the current production (e.g., 2.2.1), the current release candidate (e.g., 2.4.0rc2) and the development line (named 2.5).

If your circumstances are different (e.g, customers paid for different feature sets), you will have to come up with a different branching model to make it fit for your needs.

Some recommendations:

  • Think early about the branching model and release process suitable for your project; at least no later than when the first feature set is complete
  • Learn and use some of the branching patterns (see references)
  • Merge early and often (before the deltas become too large and are hard too merge)

Don’t be afraid of branching and merging; once you understand the process, its limitations and benefits, everything beomes much easier.

References:

PS

Eric Raymonds has started a page on version control systems. Worth keeping an eye on it.


Jotting #4: Revealing intent …

2007-07-29

Java does not support Design-by-Contract (pre- and post-conditions, invariants) and asserts are really just a weak placebo.

Nevertheless, you can at least indicate your intentions to some degree with labels:

public void foo(Bar theX, ...) {
   pre_condition: { vetoArgument( null==theX, "Bar argument is null" ); }
   // some code
   post_condition: { ... }
}

where vetoArgument is some helper method to throw an IllegalArgumentException or the like.

Labels are a somewhat under-used feature of Java. But it’s still not the full DBC feature I’d like to see. Annotations will not be able to substitute for proper built-in language support. How long do we still have to wait?