Monday, March 19, 2012

A TDD Example: The Mars Rover Problem

I learned TDD the hard way - while doing feature development on an existing, poorly tested enterprise application. I wanted to see TDD in action on a small greenfield project, so I decided to write a solution to the Mars Rover problem.

A Mars Rover stuck in a sand dune

The Mars Rover is a problem that I have been set by more than one prospective employer in the past. Previously I had solved it 'test-after', this time I wanted to solve it test-first.

As a result I came up with a completely different architecture. The main reason for this is that TDD encourages continuous refactoring, and high test coverage gives you confidence to improve system design as you go, without worrying about regressions.

In this post I want to briefly describe how I used TDD to grow the project, particularly as a reference and talking point for other newcomers to TDD. I'll describe how and why I used TDD, in Mars-Rover-problem context, and at the end you will be able to go and explore the codebase to see how it ended up. (Or ofcourse you can just skip the garb and go check out the project.)

I'm really interested in feedback, so if you have any thoughts as you read, please do leave comments or drop me an email.

The Mars Rover problem
A quick disclaimer: The Mars Rover problem is simple and there are many solutions floating around on the web (some have slightly different problem specs, click here to see the one I implemented). The first thing you'll notice when looking at my solution is that it is 100% overkill!

For a problem with such defined boundaries, and more importantly a problem you know will never change or grow, you could code up a very simple solution. You may not even have to use OOP at all.

However, I work every day on enterprise systems and I wanted my solution to be relevant to some of those challenges - to be extendable, testable, expressive and understandable. I wanted to explore a small, limited problem with a few basic stories, so that I could share the results and move on from it.

Importantly then, the Mars Rover problem doesn't require any particular presentation or persistence technologies. These are very important subjects, but I have descoped them and they can be the subject of other posts.

Let's get started already!
You can download the project here:

You can also just browse the source code via the github website, but if you are a C# developer you might want to launch it in VS so you can run the tests and explore using ReSharper.

A good story
Before writing any code, I went through a short analysis phase (see the Analysis directory), and one of the artifacts of that phase was the generation of a short list of acceptance criteria:

  1. Interprets GridSizeCommand, initializes Grid and sets size
  2. Interprets RoverDeployCommand, deploys Rovers, Rovers report their position without moving
  3. Interprets RoverExploreCommand, Rovers move and turn before reporting

Each of the items on the list represent a small chunk of demonstrable progress, or in other words a story. These aren't necessarily stories which make immediate sense to a system user (what's the point of being able to deploy a rover without moving it?)

Nonetheless they are good stories, because they represent incremental progress toward a stated aim. If a story is too large it becomes a 'black hole' into which development time and resources are spent without any realistic measure of progress.

(My approach to creating acceptance criteria, and to much else in this post, was inspired by Steve Freeman and Nat Pryce's Growing Object-Oriented Software, Guided by Tests, as suggested by my colleague JonoW.)

I later added a fourth acceptance criterion:

  1. Given input string defined in problem statement, produces output string defined in problem statement

After all, that is the only acceptance criterion expressed in the problem spec, so it would be crazy to miss it out!

Criteria become acceptance tests
The acceptance criteria can then translate (almost) literally into the names of the acceptance tests:

Acceptance tests

(You'll notice that some of the names are different, for example at some point I changed the word Grid to become LandingSurface, which is more expressive of this particular problem domain.)

What this all means is that if you just want to get a quick overview of what a TDD application actually does, the first place to look is the acceptance tests. One of the major benefits of TDD is the fact that the tests themselves become documentation (of what the system actually does, as well as proof that it does it.)

This is taken to the next level with BDD, which provides a framework in which behavioural criteria, as expressed by non-technical stakeholders, become executable.

Starting work on a story
Before you start writing any actual code for your story, or even any unit tests, you write a failing acceptance test. When this test goes green, you have finished the story - so in other words, the acceptance test needs to capture the entire essence of the story. This will usually be in a format similar to: 'Given this certain input/context, whatever happens internally, I expect this certain output.'

To begin work on MarsRover, I started with the first acceptance criterion:

  1. Interprets GridSizeCommand, initializes Grid and sets size

Which I translated into this acceptance test:

[TestCase("5 5", 5, 5)]
[TestCase("2 3", 2, 3)]
public void Given_a_commandString_with_one_LandingSurface
  SizeCommand_creates_LandingSurface_and_sets_size(
         string landingSurfaceSizeCommandString,
         int expectedWidth, int expectedHeight)
{
    var expectedSize = new Size(expectedWidth, expectedHeight);
    var commandCenter = container.Resolve<ICommandCenter>();
    commandCenter.Execute(landingSurfaceSizeCommandString);
    var landingSurface = commandCenter.GetLandingSurface();
    var actualSize = landingSurface.GetSize();
    Assert.AreEqual(expectedSize, actualSize);
}

The name is long, but it reads as English, and you'll recognise it from the documentation earlier in this post.

A quick aside
You'll notice that I exposed a LandingSurface as a getter from ICommandCenter:

var landingSurface = commandCenter.GetLandingSurface();

This is a compromise - the spec doesn't say anything about reporting back the plateau's size, but I need some way to get access to it to ensure the size has been set! Debates could be had here, but i'll move on.

Moving on
At the moment, this test doesn't even compile, so I then 'stub out' the two interfaces ICommandCenter and ILandingSurface. If you use ReSharper, you can do this with a few shortcuts, and you can even save your interfaces to the correct assemblies / directories, all without coming out of ReSharper.

Now the test compiles but it doesn't pass. As you eventually progress from story to story, your bank of green acceptance tests represent stories already completed. The single red acceptance test represents the story you are currently working on.

As you can see in the code example above, the acceptance test sees your codebase from the 'outside in', the same way your stakeholders see it. In writing an acceptance test, you highlight two important things - the trigger (what kicks the functionality into action - a button, a scheduled task etc.), and the expected response (a screen-based confirmation message, a sound, some kind of acknowledgement.)

You will need to keep these things in mind when you start writing your first unit test, as your first unit test will be written for the 'trigger' (i.e. commandCenter.Execute() above.)

Writing the first unit test
In the MarsRover spec, as an input we are given a complete, newline-delimited string to parse. The implication is that this system is receiving remote communications in a serial format, so from the context of our application it is not expected that the string will be literally typed in by a human.

For this reason, and because dealing with communications technologies is outside the scope of this project, I embedded the string as a constant in the application. So the entry point for this story is a command to execute a predefined string, i.e.

commandCenter.Execute(landingSurfaceSizeCommandString);

The 'grid size' command string will contain two integers separated by a space, as in our TestCase attributes above:

[TestCase("5 5", 5, 5)]
[TestCase("2 3", 2, 3)]

I came up with the class name CommandCenter to handle the execution, and it's responsibility is to talk to a CommandParser to turn the string into a list of Commands, and to hand the list to a CommandInvoker. The CommandInvoker's responsibility is to invoke each Command on domain objects such as LandingSurface and Rover.

(I avoided the name CommandController for example, because of the association with MVC.)

The first thing the CommandCenter should do when asked to execute a string command, is to figure out what kind of command it is dealing with. For this, first I wrote a test:

public class CommandCenterTests
{
    [TestFixture]
    public class CommandCenter_Execute
    {
        [Test]
        public void Given_valid_command_string_invokes_Parser()
        {
            const string expectedCommand = "2 5";
            var mockCommandParser = new Mock<ICommandParser>();

            var commandCenter = new CommandCenter();
            commandCenter.Execute(expectedCommand);

            mockCommandParser.Verify(x => 
                       x.Parse(expectedCommand), Times.Once());
        }
    }
}

As you can see, focusing on the first unit of behaviour expected of this class has suggested a dependency - ICommandParser. At the moment this is just an interface, not a class, so it can be mocked and used for the purposes of this test without having to start worrying about it's implementation or having two failing unit tests at once.

As an extension of this 'single test subject' concept, the only class to be directly instantiated (newed up) is the Subject Under Test (SUT):

var commandCenter = new CommandCenter();

All interactions with other objects are verified only by mocking, so that a test only ever verifies the behaviour of the SUT, no more no less.

Testing too much
The responsibilities of those associated objects, like ICommandParser above, will have their own tests to verify their behaviour when they are implemented. As in the example above, it is enough to ensure that the CommandCenter invokes the services of it's collaborator, ICommandParser, and that is all. Testing the actual command parsing functionality will take place in the CommandParserTests.

This is one of the first pitfalls for newcomers to TDD - testing too much. But once you get the hang of it you start thinking of classes in terms of their specific responsibilities. You also start to see objects from the outside-in, in terms of class responsibilities being defined by an API (hence the use of interfaces).

Once you have a green light on your first test, you can start work on the first dependency you have highlighted (test-first ofcourse), and so on until your acceptance test goes green. If you have no dependencies left unimplemented and your acceptance test is still red, look back to your 'trigger', and add another test for the missing functionality.

This is what happened with the example above - it's not enough just to parse the string into commands, you also have to execute them. You could write a new test for that functionality, or you could extend the existing one (as I did). So long as you have a failing test before you start, and so long as you keep the test small and focused. One or two Assert's is OK, more than that and you may be testing too much.

Arrange, Act, Assert
Note at this stage that the test is written in the Arrange Act Assert format. Below you can see exactly the same test but with a few comments to show you the different sections:

public class CommandCenterTests
{
    [TestFixture]
    public class CommandCenter_Execute
    {
        [Test]
        public void Given_valid_command_string_invokes_Parser()
        {
            //Arrange
            const string expectedCommand = "2 5";
            var mockCommandParser = new Mock<ICommandParser>();

            //Act
            var commandCenter = new CommandCenter();
            commandCenter.Execute(expectedCommand);

            //Assert
            mockCommandParser.Verify(x => 
                       x.Parse(expectedCommand), Times.Once());
        }
    }
}

When you are beginning it can be helpful to annotate it as above, however once you've been doing it a while you get used to expecting a certain convention in the code, and no longer need the reminders.

Refactoring... is this the best possible design?
This is probably a good time to take a look at the TDD cycle. One of the fundamental things, and core benefits of TDD is improving your design. After you have written your implementation, you have the opportunity to improve it, perhaps to make it more efficient, and as long as it still satisfies your test cases, you are good to go.

But more broadly than that, before you start writing a test, you have the opportunity to improve the existing system design in light of the functionality you are about to implement.

  • Will some responsibilities need to shift to ensure the resulting system makes sense? First, shift the tests, then shift their implementations.
  • Does another method need to be renamed, for the sake of differentiation, now that a class has to handle a slightly different load? Use ReSharper to do it automatically.
  • Is the code you're writing starting to look a bit like a popular design pattern? Consider whether implementing the pattern more closely will improve the code. (This is how I got to the Command pattern, which I explain later in this post.)

Semantic naming
Finally, take a look at the test's internal class structure and naming. This (like many things I am describing in this post!), is a matter of convention and style, but the inner class tells you what class and method are under test (the SUT). The test method name tells you the expected behaviour you are ensuring, i.e.

public class ClassNameTests
{
    [TestFixture]
    public class ClassName_MethodName
    {
        [Test]
        public void Behaviour_under_test()
        {
            //Arrange, Act, Assert
        }
    }
}

The reason I've used these conventions is that as well as being explicit, it reads as documentation, like before:

Rover Tests reading as class documentation

Only this time we have class-level documentation rather than story/acceptance-level. Each method of the Rover class in the example above is described in terms of it's externally expected behaviour, along with proof that it is living up to those expectations.

Speed
I'll briefly describe the architecture of the system in a minute, but I want to quickly talk about speed.

If you are new to TDD, one of the things you might be thinking right now is, how long is all this going to take!? In this time we could have already implemented twice as many features.

This is a very good point. You could, and TDD is slower - at least at the start. The most convincing answer to this that I know of, is that it depends on the expected complexity and lifespan of your application. A small application which will be shelved in six months may not be a good candidate for TDD. But an enterprise application, which is expected to constantly change and grow as the business changes and grows, is probably a good candidate.

With TDD, the very start certainly takes longer, but once you get going features can be added at a consistent pace. This is in contrast to non-TDD projects, where as the system grows and becomes more complex, it also becomes more 'brittle' and harder to change. People are afraid to redesign to adapt to new challenges, people are afraid to update to the latest versions or techniques of external libraries for fear of regressions, and deployments to production become more of a risk.

And ofcourse the real clincher: the time it takes to implement new features on existing functionality grows exponentially.

With regard the time it takes to implement a single feature with TDD, once I got used to the process I found it to be only marginally longer. This is because in effect, you are doing the same core tasks you were doing without TDD, only in a different order - and with a little bit of extra typing for the tests:

  • analysing a problem,
  • contextualising it within an existing codebase,
  • following through on implementation,
  • and testing

But as well as the extra typing, there is the fact that you are working incrementally, i.e. you are improving your design and refactoring as you go.

Now that does take extra time - but it takes the time upfront.

Without TDD you hit those problems later on, often close to your release date. You are more likely to have longer bug lists, unexpected regressions, and poorer system design at the end of it. This means the next developers who have to come and work on the codebase, and the next after that, will encounter the same problems again, and those problems will continue to grow as the complexity grows.

Ofcourse, this is all quite philosophical! There's no hard evidence, but as we know, TDD has become a mainstream practice. As you can tell, I'm convinced of it's usefulness in non-trivial applications!

Separating end-to-end from unit tests
Acceptance tests are naturally end-to-end. Their purpose is to test the system, as far as is possible, from the perspective of it's actual usage. That means that the tests must use the same object graph, provided by the same DI container and an equivalent composition root (notice the container.Resolve<Icommandcenter> in the acceptance test example above). This is so that the objects exercised by the tests all relate to each other in exactly the same way as they do in production.

Unit tests, on the other hand, must test just one behaviour of one component at a time, and so to distinguish the two types of test, I placed each in their own assembly:

Project layout showing AcceptanceTests assembly

As the system grows, there would remain only one Acceptance Test assembly (for each distinct presentation layer anyway), but a new Unit Test assembly would be added for each and every production code assembly, mirroring the directory structure of the production code.

Ofcourse, there are other types of test - in particular Integration Tests, for testing your (hopefully thin) integration points with external resources. However there is no requirement for integration in this spec, and so it is descoped for another article.

Exploring the MarsRover project
You should have enough information to start exploring the solution now, so before I wrap up I'll just say a few things about architecture.

I have used the Command pattern, in which a command invoker invokes commands on command receivers. To make it easy for other developers to detect the use of the pattern, I have used appropriate naming conventions:

  • ICommand
  • ILandingSurfaceSizeCommand (implements ICommand)
  • IRoverDeployCommand (implements ICommand)
  • IRoverExploreCommand (implements ICommand)
  • ICommandInvoker (invokes each of the concrete commands at the appropriate moments via their shared interface, ICommand)

The command receivers are the domain objects being operated on, i.e.

  • IRover (implementation Rover), and
  • ILandingSurface (implementation Plateau)

When the CommandInvoker is instructed to InvokeAll(), it simply iterates through each ICommand, assigns the receivers to it, and invokes it. You can see it happen here.

(Note: The method dictionary is just a way to pass each ICommand to a tailored method for assigning the correct receivers.)

Resolving dependencies
The other thing you'll notice is how I handled dependency injection, and in particular dynamic instantiation. This is all stuff I've blogged about before, so please check out the relevant posts:

  • I used constructor injection to inject dependencies (described here),
  • and Autofac's delegate factories for dynamic instantiation (described here)

I hope you find this project useful, and if you have any comments or questions, please write!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.