Jotting #17: Domains, Values and Null

2009-04-02

I always find NullPointerExceptions a real pain in the neck. And often they shouldn’t occur since the object in question shouldn’t be null. But languages like the {}-family (C,C++,C#, Java, …) make it difficult or even impossible to guarantee that null values cannot occur. The situation is very different in other languages like Haskell.

Let’s take an example of a particular class, viz. String. The String domain is the set of all possible strings including the empty string “”. And null! Since

String x = null;

is a valid statement. But in my experience, I can’t remember where I really needed or wanted to distinguish between null and empty string.

A similar example applies to List, Set or Map: the empty list, set or map is perfectly fine, and null is not needed.

In (nearly) all cases, I would prefer to know that a null object is not an option. Ever. It would make arguing about possible cases so much easier and a lot of safety code could be removed. In my recent projects we always agreed to never return null list (set,map) but use an empty one instead.

Value objects: should have a defined domain, must include decision whether null is an acceptable member (usually it shouldn’t).

As an interesting side note, Tony Hoare has admitted that the introduction of null was a big mistake. Hopefully I will be able to listen to his talk later this year. Others like C. J. Date have long argued against null values in database tables, partly because it forces three-valued logic upon you (unlike the better defined two-value logic of true or false).


Jotting #15: Eclipse Tips and Moans

2009-02-02

Some Eclipse tips and moans that I’ve experienced on and off.

I like to start Eclipse with the option -showlocation; it helps to identify which workspace I am working on, especially when I need to work on two version (trunk and branch).

Workspaces

Just upgraded to Eclipse 3.4 (Ganymede) and had to get rid of a wrongly created workspace. While it is easy to move between workspaces many have commented that is difficult to get rid of workspaces.

In Eclipse 3.4, go to file <Eclipse_HOME>/configuration/.settings/org.eclipse.ui.ide.prefs and remove any workspace in the key-value RECENT_WORKSPACES. Voila. Done. Thanks.

Working Sets

I sometimes like to group projects into working sets, especially in workspaces with many Eclipse projects; but in Eclipse these sets are somewhat second class objects:

  • many menu options don’t work on sets
  • can’t export/import sets

I would like to export sets because when I branch my code it would be nice to carry things over to the branch.

Bookmarks

Similarly, you cannot export/import bookmarks. What a shame/hazzle/waste-of-time. My current workaround is to take a snapshot with Faststone Capture (version 4.8, great little helper app!) and keep the picture.


Jotting #14: Commenting – sometimes it’s crucial

2008-06-01

Recently, some controversy (see, for example, here) erupted around a mistake made in the OpenSSL library used by the Debian project. The mistake was traced back to this change. The various comments hint at problems on several layers which led to this mistake, but I can’t help thinking that two basic practices would have gone a long way to avoid this problem.

Comments

Looking at the changes and the surrounding code, there is just no hint, viz. comment, there that tells you what is happening and why the line of code is important.

Now, I don’t like to comment the obvious; many style guidelines ask for far too much commenting when the code is quite obvious. But in this case, several good practices were not employed:

  • Using an obvious self-documenting procedure name
  • Add a warning to crucial code lines or code ordering
  • Commenting in detail (or providing a URL)

If you’re implementing a complex algorithm, you need documentation somewhere. The lessons of Literate Programming, as exemplified in Donald Knuth’s TeX programme, seem to have fallen on deaf ears. But Knuth at least put a challenge down that he would pay out money for each verified TeX bug. (It didn’t bankrupt him.  Firstly, the error rate per kLOC was very, very low and, secondly, people treasured a cheque signed by him so much that they preferred to frame rather than cash it!).

Testing

The cause of mistake is also touching other areas, among them testing. The problem with testing algorithms like the OpenSSL one is that you sometimes need to test a lot of combinations. And I mean A Lot! In his recent talk at the Cambridge BCS meeting (see my review), Peyton-Jones showed an example where the error only revealed itself after running several hundred different data inputs for the same test-case! We usually don’t go anywhere near that length to test our code. But sometimes you need to do it, usually by generating random data in order to cover as many possibilities as possible in order to avoid any bias of excluding certain cases (we humans are often good at rationalising away potential sources of error a la That can never happen).


Review: Simon Peyton Jones on Type-driven Testing in Haskell

2008-03-13

Simon Peyton Jones gave a talk at the Cambridge BCS-SPA group on testing with functional languages (esp. Haskell). (Someone’s already posted the video and slides; careful it’s large!).

Some important points:

  • Future of programming will be about “Control of (Side) Effects”
  • Programming languages will become more functional than imperative
  • Purity is good for understanding, verification, maintenance,
  • Purity pays back in performance, parallelism, testing
  • Functional/value-oriented is easier to test than object-oriented stateful
  • Functional is good for generating tests (domain-specific language)

After a short intro into Haskell (10 min Haskell 101:)) SPJ moved onto testing in Haskell. In his demo, he tested a programme that would pack 7-bit words into 8-bits, so that eight ASCII characters would take up only 7 byte instead of 8. This sort of space saving is done in SMS where bandwith is precious.

One fundamental test tried to assert that unpack(pack(x))==x. After testing some hand-written cases, which succeeded, the test started to use randomly generated words and started to fail after after a few hundred attempts. Due to its random nature, it took a randomly varying number of cases, but typically it failed after less than a 1000 cases. (It turned out that words of 8-byte length ending in a particular bit sequence were not correctly packed.)

The beauty of the underlying Haskell testing framework was that it took very few lines of code to express a generic testing framework.

The talk also showed that sometimes testing with large random test data is necessary to find bugs; something we rarely do!?

Overall, I found the speaker very engaging and the talk enjoyable even if I won’t claim of having understood or remembered everything.


Jotting #12: Find empty strings in Oracle table

2008-03-12

Had to find some strings (varchar2) in a table that were just blanks with optional end-of-line characters thrown in.

Luckily, regular expressions make that an easy task. Here is the query:

SELECT *
  FROM myTable x
 WHERE REGEXP_LIKE( x.myColumn, '(^[[:space:]]*$)' );

A short explanation:

  • ^...$ says that pattern applies to the string from start to finish, i.e., it’s not just a sub-string pattern,
  • [...]* says that pattern occurs 0 or more times (could also have been [...]+ in this case),
  • [:space:] defines a pattern of all white-space characters, including blank, \t, \r and \n.

Sometimes, regular expression just make tasks like these very easy.


Jotting #10: Branching Models and all that

2007-11-21

Proper software configuration management (SCM) is often treated like an unloved child in software projects. I am talking not just committing code into a repository, but about creating reproducible releases, merging code between code-lines and all those things that are sometimes boring but necessary to provide proper control over your team’s coding efforts.

Let me tell you what we did in our current project; I learned a lot during it, mainly since I had to manage the releases most of the time. In the end, the whole process is less scary than I thought; I’ve become even quite relaxed about it, and merging code is no longer scary (but a bit boring).

I must stress that you must adopt your own release process and fit it to your circumstances.

A few words on our software and its installation as it has some bearing on our choice of branching model. The code is a Java webstart application, written using Swing, connecting to enterprise beans on a server. This implies that whenever a user starts the application he will be forced to worked with the most recently installed version; i.e., there is ever only one release in production.

Here is our process to release (assuming we are currently at production version 2.2.1 and use the Linux convention for numbering):

  1. A set of features is defined for the next release.
  2. When the features are implemented a release branch is created and named (e.g., release-2.4).
  3. Part of the team, the release team, completes the new release code, incl. final configuration, acceptance testing, release notes, etc.
  4. The release team releases a candicate for acceptance testing on a test server (tagged 2.4.0rc1).
  5. Bugs in acceptance testing are fixed and a new candidate (tagged 2.4.0rc2) is released. This continues until the code is accepted.
  6. The code is released (tagged 2.4.0).
  7. Any upcoming bugs of the released code will be fixed on the branch line, repeating steps 4-6, and releasing the fixed version (tagged 2.4.1, 2.4.2, …)
  8. After each release (candidate), the code changes are merged back into the development line (merges are tagged appropriately).
  9. The development team starts to work on the next set of features on the development line, repeating steps 1-9 for the next release (2.6 or 3.0).

(If I have time I may add a picture of this process.)

What are the advantages that we obtained from this process?

  • We have a clear, easily understandable and reproducible release process.
  • There is no significant code freeze period when preparing a release.
  • The process allows the team to allocate time efficiently and in parallel.
  • The process is quite agile and flexible; there is minimal burden on developers as many can continue working as if unaware of the release process.
  • The code can be placed under coninuous integration at all stages.
  • We can reproduce production releases at any time quickly.
  • Even during the testing process for a new release (e.g., 2.4.0rc2), we can release an emergency fix for the current production version (e.g., 2.2.1 to 2.2.2) without major upheaval.

It is also obvious that our installation allows us to choose this branching model since we never have more than three versions out there: the current production (e.g., 2.2.1), the current release candidate (e.g., 2.4.0rc2) and the development line (named 2.5).

If your circumstances are different (e.g, customers paid for different feature sets), you will have to come up with a different branching model to make it fit for your needs.

Some recommendations:

  • Think early about the branching model and release process suitable for your project; at least no later than when the first feature set is complete
  • Learn and use some of the branching patterns (see references)
  • Merge early and often (before the deltas become too large and are hard too merge)

Don’t be afraid of branching and merging; once you understand the process, its limitations and benefits, everything beomes much easier.

References:

PS

Eric Raymonds has started a page on version control systems. Worth keeping an eye on it.


Jotting #4: Revealing intent …

2007-07-29

Java does not support Design-by-Contract (pre- and post-conditions, invariants) and asserts are really just a weak placebo.

Nevertheless, you can at least indicate your intentions to some degree with labels:

public void foo(Bar theX, ...) {
   pre_condition: { vetoArgument( null==theX, "Bar argument is null" ); }
   // some code
   post_condition: { ... }
}

where vetoArgument is some helper method to throw an IllegalArgumentException or the like.

Labels are a somewhat under-used feature of Java. But it’s still not the full DBC feature I’d like to see. Annotations will not be able to substitute for proper built-in language support. How long do we still have to wait?


Jotting #3: To delta SQL or not?

2007-06-21

In this jotting, I describe how I completely changed my way of maintaining my database scripts. In my humble opinion, this approach is superior to anything I’ve seen so far. (OK, in my not-so-humble opinion …)

The usual Way

In previous projects I usually approached evolution of the required database in this quite common way:

  • Initial script to create version 1.0 database
  • Add patches to add/remove/correct its structure in later releases.

Of course, all scripts and patches were version-controlled. I even started to name the patches consistently by embedding the issue-tracking id in the file name for ease of reference (each patch required an issue, also very good practice). Others have been doing the same, sometimes more sophisticated like Ruby-on-Rails’ Migration tool. Essentially, you’re building up your database structure by deltas.

The Doubt

However, in my latest project I have to come doubt the usefulness of this approach. Of course, it does work but is it the best approach?

One day, my project manager asked me to install the current DB structure in a new instance (we decided to give developers their DB instances). I didn’t like the fact that I had to run the initial script plus all those patches from about five releases. Possible, but also very tedious. And it would grow ever more tedious with each release. I also didn’t much fancy of gathering all patches and create a new initial script. It’s a lot of work and error-prone; how easy to forget something …

No, I needed a better approach. I think, here it was an advantage that I started progamming code initially & stayed relatively far away from DB management in the past. Of course, as programmers when we install code we (usually) don’t patch it. We just uninstall the old version and replace it with the new one (in our case, it’s uses Java web-start). Could I do something similar also with the database?

The new Approach

Here’s what I came up with and, so far, it has worked very well:

  • Write scripts to define all features needed for current release,
    • Organise scripts in some useful (hierarchical) structure, e.g., in folders for tables, views, packages, etc.
    • Each structural unit has a setup.sql and teardown.sql script
    • Some (bash) script to run the complete set of scripts
    • Scripts are idempotent
  • For next release, update/add/remove scripts as needed
    • Write minimal patches to prepare current version for next release
    • Run patches
    • Run full script

It is important to note that the full script only adds missing bits, never removes features. Removing, renaming or similar changes (often irreducible) are left to the patches only. (How I do this in detail will be left to a future jotting.)

The idempotency of the scripts is an important characteristic since it ensures that I can run the scripts several times and the end results are always the same. Not sure you added the latest additions? Just run the script (again).

What are the advantages of this approach?

The biggest one for me is that I can create a DB instance from scratch at any time very easily: just run the full script. I don’t need to care about patches, don’t need to re-run the DB’s evolution to its current version each time. I don’t care about its history when I install a version for a developer.

It feels more like programming code. Of course, the patches are important and sometimes difficult but they are exactly the same as in the previous approach.

It scales better since I only install what is currently defined.

It’s more robust since I can ignore the interplay of all those patches.

I never have to gather all patches into a new baseline script. I am there already at each release; actually at every commit.

I can easily view each feature in its script; I don’t need to parse through all the patches to gather where each bit was added to arrive the current version. This is a big gain in simplicity & transparency.

Drawbacks?

I have to carefully write my scripts and patches, a bit more carefully than usual but there is little overhead once you get used to it. Actually, it forces me to write more robust scripts.

Hmmhh, can’t think of any others.

One more; the initial set-up took some time but that work can be used again in other projects. It’s a long-term amortisation but well worth it, in my view.

Final Remarks

So far I have found little reason to have to roll back a DB instance to a previous version; if I need a full former DB instance including data I go to the backups; if it’s just a clean version, maybe including some test data, I can always check my scripts out of version control. So I can’t see the need for this feature as advertised in Rails Migration.

Ah, the answer to my initial question: No, don’t delta your DB scripts. At least, I won’t.

Additions


Jotting #2: Nulls and NullPointerExceptions

2007-06-02

Debugging NullPointerExceptions in Java can be a real pain since the stack trace provides you only with a line number and the error message null. Yep, just null. Very short and very helpful!

In some cases it’s obvious which object was null, but in others you have several objects, explicit or implicit, in the same statement, and now it is much less obvious. Even the line number is wrong since it refers to the line where the statement starts while the error occurs two lines later:

// an exaggerated code snippet:
Bar b = x.foo1( y.blah(), z.some(), ... )
         .foo2( ... )
         .foo3( w.crash() );

So what failed? Is it y or z that is null or is it the result of foo1 or foo2 or was it w? In these cases I wish that null were a proper object, a sub-class of all classes, that would throw an exception with the message like “foo1 invoked on null“.

Some languages do implement null as a special object of this kind: Eiffel, Ruby. It can’t be too hard to add this to Java. Till then, we’ll still need the debugger …


Jotting #1: Style

2007-06-01

Style and taste is in the eye of the beholder, and many a flame war has been started on it (but not here … no comments allowed ;-) ), but still I have to put some thoughts down on this.

Coding style and formatting is quite important to get readable code but still people pay scant attention to the little details. E.g., how often do you see a bit of code like this:

if (a == b && c > d) { ... }

In my view that’s awful formatting: the elements are not properly grouped using white-space! I usually format such code snippet like this:

if( a==b && c>d ) { ... }

Just by removing a few spaces the logical groupings have also moved visually together.