Jotting #3: To delta SQL or not?


In this jotting, I describe how I completely changed my way of maintaining my database scripts. In my humble opinion, this approach is superior to anything I’ve seen so far. (OK, in my not-so-humble opinion …)

The usual Way

In previous projects I usually approached evolution of the required database in this quite common way:

  • Initial script to create version 1.0 database
  • Add patches to add/remove/correct its structure in later releases.

Of course, all scripts and patches were version-controlled. I even started to name the patches consistently by embedding the issue-tracking id in the file name for ease of reference (each patch required an issue, also very good practice). Others have been doing the same, sometimes more sophisticated like Ruby-on-Rails’ Migration tool. Essentially, you’re building up your database structure by deltas.

The Doubt

However, in my latest project I have to come doubt the usefulness of this approach. Of course, it does work but is it the best approach?

One day, my project manager asked me to install the current DB structure in a new instance (we decided to give developers their DB instances). I didn’t like the fact that I had to run the initial script plus all those patches from about five releases. Possible, but also very tedious. And it would grow ever more tedious with each release. I also didn’t much fancy of gathering all patches and create a new initial script. It’s a lot of work and error-prone; how easy to forget something …

No, I needed a better approach. I think, here it was an advantage that I started progamming code initially & stayed relatively far away from DB management in the past. Of course, as programmers when we install code we (usually) don’t patch it. We just uninstall the old version and replace it with the new one (in our case, it’s uses Java web-start). Could I do something similar also with the database?

The new Approach

Here’s what I came up with and, so far, it has worked very well:

  • Write scripts to define all features needed for current release,
    • Organise scripts in some useful (hierarchical) structure, e.g., in folders for tables, views, packages, etc.
    • Each structural unit has a setup.sql and teardown.sql script
    • Some (bash) script to run the complete set of scripts
    • Scripts are idempotent
  • For next release, update/add/remove scripts as needed
    • Write minimal patches to prepare current version for next release
    • Run patches
    • Run full script

It is important to note that the full script only adds missing bits, never removes features. Removing, renaming or similar changes (often irreducible) are left to the patches only. (How I do this in detail will be left to a future jotting.)

The idempotency of the scripts is an important characteristic since it ensures that I can run the scripts several times and the end results are always the same. Not sure you added the latest additions? Just run the script (again).

What are the advantages of this approach?

The biggest one for me is that I can create a DB instance from scratch at any time very easily: just run the full script. I don’t need to care about patches, don’t need to re-run the DB’s evolution to its current version each time. I don’t care about its history when I install a version for a developer.

It feels more like programming code. Of course, the patches are important and sometimes difficult but they are exactly the same as in the previous approach.

It scales better since I only install what is currently defined.

It’s more robust since I can ignore the interplay of all those patches.

I never have to gather all patches into a new baseline script. I am there already at each release; actually at every commit.

I can easily view each feature in its script; I don’t need to parse through all the patches to gather where each bit was added to arrive the current version. This is a big gain in simplicity & transparency.


I have to carefully write my scripts and patches, a bit more carefully than usual but there is little overhead once you get used to it. Actually, it forces me to write more robust scripts.

Hmmhh, can’t think of any others.

One more; the initial set-up took some time but that work can be used again in other projects. It’s a long-term amortisation but well worth it, in my view.

Final Remarks

So far I have found little reason to have to roll back a DB instance to a previous version; if I need a full former DB instance including data I go to the backups; if it’s just a clean version, maybe including some test data, I can always check my scripts out of version control. So I can’t see the need for this feature as advertised in Rails Migration.

Ah, the answer to my initial question: No, don’t delta your DB scripts. At least, I won’t.



Photo: Dolphin jumping


Probably my most spectacular photo in my short “career” as an occasional photographer. A bit of a lucky shot.

Taken in the summer of 2006 in Vancouver’s aquarium. (Click on the photo to see it in full.)dolphin-jump-2

Jotting #2: Nulls and NullPointerExceptions


Debugging NullPointerExceptions in Java can be a real pain since the stack trace provides you only with a line number and the error message null. Yep, just null. Very short and very helpful!

In some cases it’s obvious which object was null, but in others you have several objects, explicit or implicit, in the same statement, and now it is much less obvious. Even the line number is wrong since it refers to the line where the statement starts while the error occurs two lines later:

// an exaggerated code snippet:
Bar b = x.foo1( y.blah(), z.some(), ... )
         .foo2( ... )
         .foo3( w.crash() );

So what failed? Is it y or z that is null or is it the result of foo1 or foo2 or was it w? In these cases I wish that null were a proper object, a sub-class of all classes, that would throw an exception with the message like “foo1 invoked on null“.

Some languages do implement null as a special object of this kind: Eiffel, Ruby. It can’t be too hard to add this to Java. Till then, we’ll still need the debugger …

Jotting #1: Style


Style and taste is in the eye of the beholder, and many a flame war has been started on it (but not here … no comments allowed 😉 ), but still I have to put some thoughts down on this.

Coding style and formatting is quite important to get readable code but still people pay scant attention to the little details. E.g., how often do you see a bit of code like this:

if (a == b && c > d) { ... }

In my view that’s awful formatting: the elements are not properly grouped using white-space! I usually format such code snippet like this:

if( a==b && c>d ) { ... }

Just by removing a few spaces the logical groupings have also moved visually together.