Jotting #17: Domains, Values and Null

2009-04-02

I always find NullPointerExceptions a real pain in the neck. And often they shouldn’t occur since the object in question shouldn’t be null. But languages like the {}-family (C,C++,C#, Java, …) make it difficult or even impossible to guarantee that null values cannot occur. The situation is very different in other languages like Haskell.

Let’s take an example of a particular class, viz. String. The String domain is the set of all possible strings including the empty string “”. And null! Since

String x = null;

is a valid statement. But in my experience, I can’t remember where I really needed or wanted to distinguish between null and empty string.

A similar example applies to List, Set or Map: the empty list, set or map is perfectly fine, and null is not needed.

In (nearly) all cases, I would prefer to know that a null object is not an option. Ever. It would make arguing about possible cases so much easier and a lot of safety code could be removed. In my recent projects we always agreed to never return null list (set,map) but use an empty one instead.

Value objects: should have a defined domain, must include decision whether null is an acceptable member (usually it shouldn’t).

As an interesting side note, Tony Hoare has admitted that the introduction of null was a big mistake. Hopefully I will be able to listen to his talk later this year. Others like C. J. Date have long argued against null values in database tables, partly because it forces three-valued logic upon you (unlike the better defined two-value logic of true or false).


Jotting #16: SSH for Windows+Cygwin

2009-02-02

I have struggled with this a bit but I think it is a common scenario. So for my (future and maybe your) benefit here is the current way I have been setting up SSH under Windows (XP) with cygwin.

Windows+Cygwin Preliminaries

  1. Place the following windows script in the start-up folder, e.g., C:\Documents and Settings\All Users\Start Menu\Programs\Startup:

    @echo off
    C:
    chdir c:\cygwin\bin
    set HOME=/home/%USERNAME%
    bash --login c:\cygwin\sshagentrc

  2. Edit cygwin.bat file so that it looks like this:
    @echo off
    REM Script to start ssh-agent each time user logs into machine
    C:
    chdir C:\cygwin\bin
    
    set HOME=/home/%USERNAME%
    bash --login -i
  3. Place the following code in a file .sshagentrc in the c:\cygwin directory
    #!/bin/bash
    # Creates an ssh-agent,
    # writes ssh agent info to the file '~/.ssh-agent-info-`hostname`'
    # and then prompts user for keys.
    # Then any shell can use the agent by sourcing the contents of ~/.ssh-agent-info-`hostname`:
    #  . ~/ssh-agent-info-`hostname`
    HOME=/home/$USERNAME
    SSH_INFO_FILE=$HOME/.ssh-agent-info-`hostname`
    
    if test -e $HOME/.ssh/identity; then
       /usr/bin/echo "ssh_info:   $SSH_INFO_FILE"
       /usr/bin/ssh-agent > $SSH_INFO_FILE
       /usr/bin/chmod 600 $SSH_INFO_FILE
       . $SSH_INFO_FILE
       /usr/bin/ssh-add $HOME/.ssh/identity
    else
       /usr/bin/echo ""
       /usr/bin/echo "ERROR: No private key defined in $HOME/.ssh"
    fi

Individual User Setups

  1. Modify your .bashrc file to include the following lines:
    # Hook into SSH agent session
    SSH_INFO_FILE=~/.ssh-agent-info-`hostname`
    if test -e $SSH_INFO_FILE; then
       . $SSH_INFO_FILE > /dev/null
    fi
    export SVN_SSH=ssh

The SVN_SSH is useful when using subversion with ssh (svn+ssh protocol); you can also create a  Windows environment variable.

  1. Goto HOME directory
  2. Run
    ssh-keygen [-t rsa]
  3. Enter the required passphrase (and remember it!)
  4. This should create a file named identity (or similar) in folder ~/.ssh.
  5. Email the public key, in files *.pub, to your administrator so that he can add the public key to the required servers.

PuTTY

If you want to run PuTTY (Pageant) as well because you like apps like TortoiseSVN, then

  1. Import the OpenSSH key into PuTTYgen and create a PuTTY-compatible key (I store it with the OpenSSH keys in my cygwin home directory: c:\cygwin\home\<userid>\.ssh\identity.ppk)
  2. Place the pageant.exe shortcut into your startup directory
  3. Modify the short-cut property to “c:\PUTTY_HOME\pageant.exe” “HOME\.ssh\identity.ppk”

This will start up pageant when you log in, asks for your SSH key passphrase and you’re set for the day.

Results

The setup initialises ssh-agent when you log into windows and adds your SSH key to the agent’s session. When starting a cygwin terminal, the .bashrc script ensures that the terminal shares the agent’s session.

Overall, you only have to type in your SSH key’s passphrase once and the rest is easy sailing 😉


Jotting #15: Eclipse Tips and Moans

2009-02-02

Some Eclipse tips and moans that I’ve experienced on and off.

I like to start Eclipse with the option -showlocation; it helps to identify which workspace I am working on, especially when I need to work on two version (trunk and branch).

Workspaces

Just upgraded to Eclipse 3.4 (Ganymede) and had to get rid of a wrongly created workspace. While it is easy to move between workspaces many have commented that is difficult to get rid of workspaces.

In Eclipse 3.4, go to file <Eclipse_HOME>/configuration/.settings/org.eclipse.ui.ide.prefs and remove any workspace in the key-value RECENT_WORKSPACES. Voila. Done. Thanks.

Working Sets

I sometimes like to group projects into working sets, especially in workspaces with many Eclipse projects; but in Eclipse these sets are somewhat second class objects:

  • many menu options don’t work on sets
  • can’t export/import sets

I would like to export sets because when I branch my code it would be nice to carry things over to the branch.

Bookmarks

Similarly, you cannot export/import bookmarks. What a shame/hazzle/waste-of-time. My current workaround is to take a snapshot with Faststone Capture (version 4.8, great little helper app!) and keep the picture.


Jotting #14: Commenting – sometimes it’s crucial

2008-06-01

Recently, some controversy (see, for example, here) erupted around a mistake made in the OpenSSL library used by the Debian project. The mistake was traced back to this change. The various comments hint at problems on several layers which led to this mistake, but I can’t help thinking that two basic practices would have gone a long way to avoid this problem.

Comments

Looking at the changes and the surrounding code, there is just no hint, viz. comment, there that tells you what is happening and why the line of code is important.

Now, I don’t like to comment the obvious; many style guidelines ask for far too much commenting when the code is quite obvious. But in this case, several good practices were not employed:

  • Using an obvious self-documenting procedure name
  • Add a warning to crucial code lines or code ordering
  • Commenting in detail (or providing a URL)

If you’re implementing a complex algorithm, you need documentation somewhere. The lessons of Literate Programming, as exemplified in Donald Knuth’s TeX programme, seem to have fallen on deaf ears. But Knuth at least put a challenge down that he would pay out money for each verified TeX bug. (It didn’t bankrupt him.  Firstly, the error rate per kLOC was very, very low and, secondly, people treasured a cheque signed by him so much that they preferred to frame rather than cash it!).

Testing

The cause of mistake is also touching other areas, among them testing. The problem with testing algorithms like the OpenSSL one is that you sometimes need to test a lot of combinations. And I mean A Lot! In his recent talk at the Cambridge BCS meeting (see my review), Peyton-Jones showed an example where the error only revealed itself after running several hundred different data inputs for the same test-case! We usually don’t go anywhere near that length to test our code. But sometimes you need to do it, usually by generating random data in order to cover as many possibilities as possible in order to avoid any bias of excluding certain cases (we humans are often good at rationalising away potential sources of error a la That can never happen).


Jotting #13: Management, Responsibility & Authority

2008-05-16

Leadership, responsibility, authority and teamwork are crucial ingredients in software projects. Some thoughts, triggered by recent blog entries, readings and accidental finds, may help to shed some light on these aspects.

(Disclaimer: Apologies if parts seem like ramblings but I haven’t been able yet to make more consistent connections between all concepts.)

Responsibility and Authority

When you are charged with managing a project, you are typically charged with achieving some objective with a budget (time, money). But are you always given the right parameters to achieve that objective? One important aspect is the right balance of responsibility and authority. If you feel to have a lot of the former but not enough of the latter then please read on!

Johanna Rothman, author of [1], writes on Responsibility is Authority (see also here):

As a project manager, you have the responsibility to take authority, rather than wait for someone to give you the authority.
No manager ever has enough real authority to do what he or she wants to do. There’s always someone with a bigger title. Even if you’re a CEO, you report to a board. Even though titular authority is useful, it’s not enough. If the project is strategically important to the organization, act first (doing whatever the project needs) and ask forgiveness later. You’ll know whether the project is strategically important by how many people ask about the status and what levels of people ask. The more people ask at the higher levels, the more strategic the project is. If the project is not strategically important, don’t waste your time trying to accomplish it. In reality, if the project is important enough to the organization, you have the authority to do just about anything you need to do. (You need the self-esteem to do what you need to do.) But if the project is not important enough to the organization, you can never get enough authority to do what you need to do.

A good definition of authority and responsibility:

Authority: The legitimate power given to a person to use resources and exercise discipline to reach an objective

Responsibility: The duties, assignments and accountability for results associated with a function or position

(Definitions on related concepts can also be found there). The author also quips that sometimes you just have to grab it [authority] and run with it and a person should be granted at least as much authority as she can handle, which seems in agreement with Rothman.

Leadership

How should you use it? You should mainly use to enable your team to perform well, i.e., it’s not a license to tightly control your team but rather to remove obstacles that hinder your team. In other words don’t lead by command but rather by mission, be an enabler. This is at least my understanding of leadership as espoused by SCRUM and DSDM Atern.

Key principles of leading-by-task (or mission) were developed in the 19th century Prussian Army and state that the leader [2]:

  • informs what his intention is,
  • sets clear achievable objectives,
  • provides the required […] resources,
  • will only order details regarding execution if measures which serve the same objective have to be harmonized, […],
  • gives latitude to subordinate leaders in the execution of their task [mission].

This approach certainly takes a very dim view of micromanagement [2]:

It is, therefore, out of the question that a colonel or even a general appoint himself squad leader to direct traffic at a road intersection or to instruct a patrol leader about his mission.

Rupert Smith, as a former general in Bosnia, demonstrates in [3] how a lack of clear intentions and goals and a lack of understanding of the capabilities of his resources (viz. armed forces) from his political leaders did leave him in difficult situations and often rendered him unable to use his resources with their full utility. True leadership is not easy.

The Team: Collective Wisdom

Current agile approaches place great importance on the project team, especially one that self-organises and is not managed by an old-style project manager. Important issues and results from social experiments relating to small teams are reported and discussed in [4], especially in Committees, Juries and Teams: The Columbia Disaster and How Small Groups Can Be Made to Work (Ch. 9). We tend to rely on a single person to make a decision rather than trust a (small) group of people to make a better and more consistent decision, sometimes with disastrous results. [4] concludes that:

[…] that there is no point in making small groups part of the leadership structure if you do not give the group a method of aggregating the opinions of its members. If small groups are included in the decision-making process, then they should be allowed to make decisions. If an organisation sets up teams and then uses them for purely advisory purposes, it loses the true advantage that a team has: namely, collective wisdom.

(Some people even seem to be able to overturn the image of the genius inventor by using teamwork.)

Quotes

These are all from wikiquote on leadership and too fitting not to mention them:

Don’t tell people how to do things, tell them what to do and let them surprise you with their results. (George S. Patton)

If it’s a good idea, go ahead and do it. It is much easier to apologize than it is to get permission. (Admiral Grace Hopper)

A clear vision is usually assumed and rarely communicated. (Unknown)

References

  1. Johanna Rothman: Manage It! ( ISBN: 978-0-9787392-4-9, 2007)
  2. Werner Widder: Auftragstaktik und Innere Fuehrung (eng, PDF)
  3. Sir Ruper Smith: The Utility of Force: The Art of War in the Modern World (ISBN 0-713-99836-9, 2005)
  4. James Surowiecki: The Wisdom of Crowds (ISBN 0-349-11605-9, 2004)

Review: Simon Peyton Jones on Type-driven Testing in Haskell

2008-03-13

Simon Peyton Jones gave a talk at the Cambridge BCS-SPA group on testing with functional languages (esp. Haskell). (Someone’s already posted the video and slides; careful it’s large!).

Some important points:

  • Future of programming will be about “Control of (Side) Effects”
  • Programming languages will become more functional than imperative
  • Purity is good for understanding, verification, maintenance,
  • Purity pays back in performance, parallelism, testing
  • Functional/value-oriented is easier to test than object-oriented stateful
  • Functional is good for generating tests (domain-specific language)

After a short intro into Haskell (10 min Haskell 101:)) SPJ moved onto testing in Haskell. In his demo, he tested a programme that would pack 7-bit words into 8-bits, so that eight ASCII characters would take up only 7 byte instead of 8. This sort of space saving is done in SMS where bandwith is precious.

One fundamental test tried to assert that unpack(pack(x))==x. After testing some hand-written cases, which succeeded, the test started to use randomly generated words and started to fail after after a few hundred attempts. Due to its random nature, it took a randomly varying number of cases, but typically it failed after less than a 1000 cases. (It turned out that words of 8-byte length ending in a particular bit sequence were not correctly packed.)

The beauty of the underlying Haskell testing framework was that it took very few lines of code to express a generic testing framework.

The talk also showed that sometimes testing with large random test data is necessary to find bugs; something we rarely do!?

Overall, I found the speaker very engaging and the talk enjoyable even if I won’t claim of having understood or remembered everything.


Jotting #12: Find empty strings in Oracle table

2008-03-12

Had to find some strings (varchar2) in a table that were just blanks with optional end-of-line characters thrown in.

Luckily, regular expressions make that an easy task. Here is the query:

SELECT *
  FROM myTable x
 WHERE REGEXP_LIKE( x.myColumn, '(^[[:space:]]*$)' );

A short explanation:

  • ^...$ says that pattern applies to the string from start to finish, i.e., it’s not just a sub-string pattern,
  • [...]* says that pattern occurs 0 or more times (could also have been […]+ in this case),
  • [:space:] defines a pattern of all white-space characters, including blank, \t, \r and \n.

Sometimes, regular expression just make tasks like these very easy.