Tuesday, March 27, 2012

Introduction to Covariance and Contravariance in C#

Covariance and contravariance have had a tweak in C# 4.0. But before we can start using the benefits in our code, we need to know exactly what they are! The subject can be a little bit tricky when you first encounter it, so in this post I'm going to introduce it in a way that I hope is easy to follow.

Almost all of the information in this post comes from Eric Lippert's excellent series of posts, and I recommend you go and take a look at his blog right now (links to the whole series are at the bottom of this post.)

Tigers and a giraffe
Tigers and a giraffe (credit: andrewmalone)

If you 'get it' straight away, great - stick with that, he's the expert. If you'd like a slightly gentler introduction, avoiding (I hope!) common 'gotchas' for learning the subject, read my post (this post) first.

What I have done is to explain the same concepts using the same examples, but in an order and with an emphasis which I think make it much easier to understand the basics. This is particularly good for you if you are a programmer coming at it from the cold, i.e. you haven't encountered covariance and contravariance before, or you have but you don't understand them yet.

When you're done here I'd suggest you go back and read Eric's posts in order - they should be much easier for you to read by then. Eric's posts will flesh out all the interesting details, and continue on to discuss more advanced topics.

Inheritance and assignability
We're not going to begin talking about covariance and contravariance straight away. First, we're going to make a distinction between inheritance and assignability.

As Eric points out, for any two types T and U, exactly one of the following statements is true:

  • T is bigger than U
  • T is smaller than U
  • T is equal to U
  • T is not related to U

Now, one of the things that seems to have caused some confusion on Eric's blog (see the comments) is usage of the phrase "is smaller than". It is used frequently, and is key, so I want to make it's definition crystal clear now before we move on. Eric says:

"Suppose you have a variable, that is, a storage location. Storage locations in C# all have a type associated with them. At runtime you can store an object which is an instance of an equal or smaller type in that storage location."

In simple scenarios, this is something so familiar to programmers that it's barely worth mentioning. We all know that, looking at the list above, only in the middle two scenarios is T assignable to U. The smaller than relation:

class U{}
class T : U{}
U myInstance = new T();

This is the first thing that comes to mind, right? An inheritance hierarchy.

But Eric didn't mention inheritance hierarchies. Sure, an inheritance hierarchy is one way to make a T which is assignable to a U, but what about this one:

U[] myArray = new T[10];

... or, the same statement using classes from the animals hierarchy:

Animal[] animals = new Giraffe[10];

The type Animal inherits from Giraffe, but the type Animal[] doesn't inherit from Giraffe[]. They are assignable, but not linked by inheritance, and this tells us something about what 'is smaller than' means:

  • T is smaller than U

can be read as

  • T is assignable to U

You can visualise it this way:

T is smaller than U
T        <        U
        -->          //Direction of assignability

As we have seen, in some cases this direction of assignability may be because of an inheritance relationship, but in others it is simply because the CLR and languages (C#, Java etc.) happen to support that particular assignment operation.

There is still an inheritance hierarchy involved, i.e. this wouldn't work:

Tiger[] tigers = new Giraffe[10]; //illegal

But the key thing is that there is a difference between inheritance and assignability: they are not the same thing.

I'll say it one more time (for good luck!): The phrase "is smaller than" refers to assignability, not inheritance. The direction of assignability always flows from the smaller type to the larger type. We'll come back to this in a moment.

Covariance and Contravariance
Eric's second post discusses the array assignment operation (the one I used in the Animal/Giraffe example above), and the problems with it. It's definitely worth reading, but park it for now, because things really come alive in post number three.

Eric's example uses delegate methods, and I'll use a simplified version of it here, just to get us started.

It is clear why this is a legal operation:

static Giraffe MakeGiraffe()
{
    return new Giraffe();
}

//Inside some method:
Func<Animal> func = MakeGiraffe;
//             <-- Direction of assignment

Notice that in the assignment operation, Animal is on the left and Giraffe is on the right. That is, the declared type is based on Animal and the assigned type is based on Giraffe.

Now let's look at another example:

static void AcceptAnimal(Animal animal)
{
    //operate on animal
}

//Inside some method:
Action<Giraffe> action = AcceptAnimal;
//             <-- Direction of assignment

Notice that Giraffe is on the left and Animal is on the right. That is, the declared type is based on Giraffe and the assigned type is based on Animal.

The Func<out T> assignment operation supports covariance. The Action<in T> assignment operation supports contravariance.

What does that mean?
Have a quick look at this summary:
(remember to read < as 'is smaller than' and 'is assignable to')

//Direction of assignability -->
Giraffe                       <   Animal

Giraffe MakeGiraffe()         <   Func<Animal>    //covariance
AcceptAnimal(Animal animal)   <   Action<Giraffe> //contravar..

Now read Eric's definition of covariance and contravariance, from the first post in his series:
(the "operation" which manipulate types being the two assignment operations)

Consider an "operation" which manipulates types. If the results of the operation applied to any T and U always results in two types T' and U' with the same relationship as T and U, then the operation is said to be "covariant". If the operation reverses bigness and smallness on its results but keeps equality and unrelatedness the same then the operation is said to be "contravariant".

Hopefully it should start to become clear. In line 4 above, the direction of assignability with respect to the original types, was preserved, while in line 5 it was reversed!

Line 4 represents a covariant operation, and line 5 represents a contravariant operation.

The main heuristic
Let's put it back to C# code so that we can see it with the right-to-left assignability we are used to (now the smaller types are on the right):

Animal animal = new Giraffe(); //basic type assignment

Func<Animal> func = MakeGiraffe;       //covariant
Action<Giraffe> action = AcceptAnimal; //contravariant
// <-- Direction of assignability

Notice how in the covariant operation, Animal and Giraffe are on the same sides as in the basic type assignment operation. And notice how in the contravariant operation, they are on opposite sides - the operation "reverses bigness and smallness".

In both cases, the opposites are illegal. As Eric puts it in post number five:

"Stuff going 'in' may be contravariant,
stuff going 'out' may be covariant"

... but not vice-versa:

Func<Giraffe> func = MakeAnimal;       //contravariant (illegal)
Action<Animal> action = AcceptGiraffe; //covariant (illegal)
// <-- Direction of assignability

And by the way, if there's one heuristic you remember as a result of reading this post, it's probably best to make it the one above!

I'll repeat it later in this article.

Hang on, methods aren't types!
A quick aside - at this stage you might be asking why I'm referring to methods as though they were types. The straight answer is, I'm copying Eric. His caveat:

"A note to nitpickers out there: yes, I said earlier that variance was a property of operations on types, and here I have an operation on method groups, which are typeless expressions in C#. I’m writing a blog, not a dissertation; deal with it!"

Can't argue with that.

What's new in C# 4.0?
Well, 'new' is the wrong word since the stable release of C# 4.0 was two years ago! But all of the types of variance we've looked at so far in this post have been supported since C#2 or before.

We as developers didn't really have to think about those types of variance to use them, because it wasn't exposed syntactically. In other words, we didn't have to write anything different to make it happen, it's just what is and what isn't supported by C# compilers and the CLR.

In post numbers four and six, Eric discusses types of variance which went on to become part of the specification for C# 4.0, and it's those types of variance that I'll discuss now.

Real delegate variance
The first one is easy, and it's discussed in post number four. It's simply about taking the operations which were already legal in terms of method groups and making the same operations legal in terms of typed expressions.

Take our covariant example from earlier:

static Giraffe MakeGiraffe()
{
    return new Giraffe();
}

//Inside some method:
Func<Animal> func = MakeGiraffe;
//             <-- Direction of assignment

Well, in C#3 this essentially equivalent operation was illegal, whereas in C#4 it is legal:

Func<Animal> func = new Func<Giraffe>(() => new Giraffe());
//             <-- Direction of assignment

In fact because of lambda syntax and inferred typing, it can be shortened to:

Func<Animal> func = () => new Giraffe();
//             <-- Direction of assignment

You can now do with typed expressions what you could already do with method groups. Simple.

But here's where it makes sense to quickly explain something I breezed over earlier.

Covariance and Contravariance, at once
Take a look again at the heuristic:

"Stuff going 'in' may be contravariant,
stuff going 'out' may be covariant"

So what happens when you are dealing with a type which has both an 'in' and an 'out'?

The short answer is: it can be covariant, contravariant, both, or neither. But it's easier than that makes it sound!

Take a look at this example. It's a Func that accepts a Mammal and returns a Mammal:

Func<Mammal, Mammal> func;

Now here are some assignment operations:

  • This is a covariant operation:

Func<Mammal, Giraffe> toAssign = //somehow initialise;
Func<Mammal, Mammal> func = toAssign;

  • This is a contravariant operation:

Func<Animal, Mammal> toAssign = //somehow initialise;
Func<Mammal, Mammal> func = toAssign;

  • This is both:

Func<Animal, Giraffe> toAssign = //somehow initialise;
Func<Mammal, Mammal> func = toAssign;

... and, well, I'm sure I don't need to spell out the neither!

Interface variance
The other new feature, as discussed in post number six, is the extension of variance to interfaces. There's not much to add here - it's just the same thing, but using interfaces. Eric gives a really nice example of the practical benefit here, and I'm going to repeat it almost verbatim.

Take a look at this code block. This is another example of something which is illegal in C#3, and legal in C#4:

void FeedAnimals(IEnumerable<Animal> animals) {
  foreach(Animal animal in animals)
    if (animal.Hungry)
      Feed(animal);
}
//...
IEnumerable<Giraffe> adultGiraffes = 
        from g in giraffes where g.Age > 5 select g;
FeedAnimals(adultGiraffes);

Just as earlier on, when we call FeedAnimals(IEnumerable<Animal> animals) we are assigning a 'smaller' type to a 'larger' type:

//Direction of assignability -->
Giraffe                   <   Animal

Giraffe MakeGiraffe()     <   Func<Animal>        //covariance
IEnumerable<Giraffe>      <   IEnumerable<Animal> //covariance

Of course, anywhere else that you reference that assigned-to variable (IEnumerable<Animal>), what comes out will be typed as Animal. All pretty uncontroversial.

In and out
But finally, let's look at the in and out keywords, and how they fit in when designing your own interfaces (or using the upgraded C#4 ones.) Recall one more time the heuristic:

"Stuff going 'in' may be contravariant,
stuff going 'out' may be covariant"

In C# 4.0, IEnumerable<T> has become IEnumerable<out T>. The out marks the IEnumerable as supporting covariance on the T. This means that, as in the example above, you can assign based on something smaller than T.

But it also means that the interface cannot accept the type T as an input! It will only allow the interface to send T out, in whatever fashion you like - but it will never accept a T in. If you try it, the compiler won't allow it. Hence, the name: out.

Reading through this code block should make it clear why:

//the compiler won't allow this, but go with it to see why:
interface ICustom<out T>
{
    T GetFirst();     //ok
    void Insert(T t); //compiler complains
}
//..
ICustom<Giraffe> giraffes = //somehow init;
ICustom<Animal> animals = giraffes;
Animal animal = animals.GetFirst();  //ok
animals.Insert(new Tiger());   //problem - 
                               //backing store is Giraffe

Think of it this way - an IEnumerable<Animal> variable can have an IEnumerable<Giraffe> assigned to it and it will churn out Giraffes typed as Animals all day long. Because of how it's declared, users of the IEnumerable<Animal> variable expect to be dealing with Animals.

But a Tiger is also an animal. What would happen if there were a method on the interface that allowed a user to put an Animal in?

The user could put a Tiger in instead, and the backing store - IEnumerable<Giraffe> - wouldn't be able to cope.

The same in reverse
Now here's a similarly invalid code block, this time using the in keyword:

//the compiler won't allow this, but go with it to see why:
interface ICustom<in T>
{
    T GetFirst();     //compiler complains
    void Insert(T t); //ok
}
//..
ICustom<Animal> animals = //somehow init;
ICustom<Giraffe> giraffes = animals;
giraffes.Insert(new Giraffe()); //ok

Giraffe giraffe = giraffes.GetFirst();  //problem
                               //backing store is Animal

So when a type is marked as out, it's out only. And when a type is marked as in, it's in only too! A type can't be both in and out.

How to read it
So when you read an out type in an interface, read it this way:

interface ICustom<out T>
{
    //You can initialise me using <=T
    //And I will use it as a backing store
    //But I will only send T's out-wards
    //Because T's coming in could be too wide
    //                    for my <=T backing store
}

And for an in type in an interface:

interface ICustom<in T>
{
    //You can initialise me using >=T
    //And I will use it as a backing store
    //But I will only accept T's in-wards
    //Because my >=T backing store is too wide
    //                   to produce T's to send out
}

If it helps, try reading those again - but this time, with the out T interfaces read T as Animal and <=T as Giraffe.

And with the in T interfaces read T as Giraffe and >=T as Animal.

Or, more concisely
Here's out again more concisely:

interface ICustom<out T>
{
    //covariant
    //assign a smaller T, i'll only send it out
}

And for in:

interface ICustom<in T>
{
    //contravariant
    //assign a larger T, i'll only take it in
}

I hope that helps!

The payoff
As Eric points out, the only way to make the above example of FeedAnimals work in C#3 is to use a "silly and expensive casting operation":

FeedAnimals(adultGiraffes.Cast<Animal>());
//or
FeedAnimals(from g in adultGiraffes select (Animal)g);

He goes on:

"This explicit typing should not be necessary. Unlike arrays (which are read-write) it is perfectly typesafe to treat a read-only list of giraffes as a list of animals"

And the example which Eric suggests hypothetically in that post, Matt Hidinger later demonstrates for us using C#4!

The full series
That's about as much as I want to write on the subject!

Below are links to the full series. Bear in mind that these explanatory posts were written prior to the release of C# 4.0. But they are still an excellent programmer's introduction, with much more info than I have covered in this post:

Monday, March 19, 2012

A TDD Example: The Mars Rover Problem

I learned TDD the hard way - while doing feature development on an existing, poorly tested enterprise application. I wanted to see TDD in action on a small greenfield project, so I decided to write a solution to the Mars Rover problem.

A Mars Rover stuck in a sand dune

The Mars Rover is a problem that I have been set by more than one prospective employer in the past. Previously I had solved it 'test-after', this time I wanted to solve it test-first.

As a result I came up with a completely different architecture. The main reason for this is that TDD encourages continuous refactoring, and high test coverage gives you confidence to improve system design as you go, without worrying about regressions.

In this post I want to briefly describe how I used TDD to grow the project, particularly as a reference and talking point for other newcomers to TDD. I'll describe how and why I used TDD, in Mars-Rover-problem context, and at the end you will be able to go and explore the codebase to see how it ended up. (Or ofcourse you can just skip the garb and go check out the project.)

I'm really interested in feedback, so if you have any thoughts as you read, please do leave comments or drop me an email.

The Mars Rover problem
A quick disclaimer: The Mars Rover problem is simple and there are many solutions floating around on the web (some have slightly different problem specs, click here to see the one I implemented). The first thing you'll notice when looking at my solution is that it is 100% overkill!

For a problem with such defined boundaries, and more importantly a problem you know will never change or grow, you could code up a very simple solution. You may not even have to use OOP at all.

However, I work every day on enterprise systems and I wanted my solution to be relevant to some of those challenges - to be extendable, testable, expressive and understandable. I wanted to explore a small, limited problem with a few basic stories, so that I could share the results and move on from it.

Importantly then, the Mars Rover problem doesn't require any particular presentation or persistence technologies. These are very important subjects, but I have descoped them and they can be the subject of other posts.

Let's get started already!
You can download the project here:

You can also just browse the source code via the github website, but if you are a C# developer you might want to launch it in VS so you can run the tests and explore using ReSharper.

A good story
Before writing any code, I went through a short analysis phase (see the Analysis directory), and one of the artifacts of that phase was the generation of a short list of acceptance criteria:

  1. Interprets GridSizeCommand, initializes Grid and sets size
  2. Interprets RoverDeployCommand, deploys Rovers, Rovers report their position without moving
  3. Interprets RoverExploreCommand, Rovers move and turn before reporting

Each of the items on the list represent a small chunk of demonstrable progress, or in other words a story. These aren't necessarily stories which make immediate sense to a system user (what's the point of being able to deploy a rover without moving it?)

Nonetheless they are good stories, because they represent incremental progress toward a stated aim. If a story is too large it becomes a 'black hole' into which development time and resources are spent without any realistic measure of progress.

(My approach to creating acceptance criteria, and to much else in this post, was inspired by Steve Freeman and Nat Pryce's Growing Object-Oriented Software, Guided by Tests, as suggested by my colleague JonoW.)

I later added a fourth acceptance criterion:

  1. Given input string defined in problem statement, produces output string defined in problem statement

After all, that is the only acceptance criterion expressed in the problem spec, so it would be crazy to miss it out!

Criteria become acceptance tests
The acceptance criteria can then translate (almost) literally into the names of the acceptance tests:

Acceptance tests

(You'll notice that some of the names are different, for example at some point I changed the word Grid to become LandingSurface, which is more expressive of this particular problem domain.)

What this all means is that if you just want to get a quick overview of what a TDD application actually does, the first place to look is the acceptance tests. One of the major benefits of TDD is the fact that the tests themselves become documentation (of what the system actually does, as well as proof that it does it.)

This is taken to the next level with BDD, which provides a framework in which behavioural criteria, as expressed by non-technical stakeholders, become executable.

Starting work on a story
Before you start writing any actual code for your story, or even any unit tests, you write a failing acceptance test. When this test goes green, you have finished the story - so in other words, the acceptance test needs to capture the entire essence of the story. This will usually be in a format similar to: 'Given this certain input/context, whatever happens internally, I expect this certain output.'

To begin work on MarsRover, I started with the first acceptance criterion:

  1. Interprets GridSizeCommand, initializes Grid and sets size

Which I translated into this acceptance test:

[TestCase("5 5", 5, 5)]
[TestCase("2 3", 2, 3)]
public void Given_a_commandString_with_one_LandingSurface
  SizeCommand_creates_LandingSurface_and_sets_size(
         string landingSurfaceSizeCommandString,
         int expectedWidth, int expectedHeight)
{
    var expectedSize = new Size(expectedWidth, expectedHeight);
    var commandCenter = container.Resolve<ICommandCenter>();
    commandCenter.Execute(landingSurfaceSizeCommandString);
    var landingSurface = commandCenter.GetLandingSurface();
    var actualSize = landingSurface.GetSize();
    Assert.AreEqual(expectedSize, actualSize);
}

The name is long, but it reads as English, and you'll recognise it from the documentation earlier in this post.

A quick aside
You'll notice that I exposed a LandingSurface as a getter from ICommandCenter:

var landingSurface = commandCenter.GetLandingSurface();

This is a compromise - the spec doesn't say anything about reporting back the plateau's size, but I need some way to get access to it to ensure the size has been set! Debates could be had here, but i'll move on.

Moving on
At the moment, this test doesn't even compile, so I then 'stub out' the two interfaces ICommandCenter and ILandingSurface. If you use ReSharper, you can do this with a few shortcuts, and you can even save your interfaces to the correct assemblies / directories, all without coming out of ReSharper.

Now the test compiles but it doesn't pass. As you eventually progress from story to story, your bank of green acceptance tests represent stories already completed. The single red acceptance test represents the story you are currently working on.

As you can see in the code example above, the acceptance test sees your codebase from the 'outside in', the same way your stakeholders see it. In writing an acceptance test, you highlight two important things - the trigger (what kicks the functionality into action - a button, a scheduled task etc.), and the expected response (a screen-based confirmation message, a sound, some kind of acknowledgement.)

You will need to keep these things in mind when you start writing your first unit test, as your first unit test will be written for the 'trigger' (i.e. commandCenter.Execute() above.)

Writing the first unit test
In the MarsRover spec, as an input we are given a complete, newline-delimited string to parse. The implication is that this system is receiving remote communications in a serial format, so from the context of our application it is not expected that the string will be literally typed in by a human.

For this reason, and because dealing with communications technologies is outside the scope of this project, I embedded the string as a constant in the application. So the entry point for this story is a command to execute a predefined string, i.e.

commandCenter.Execute(landingSurfaceSizeCommandString);

The 'grid size' command string will contain two integers separated by a space, as in our TestCase attributes above:

[TestCase("5 5", 5, 5)]
[TestCase("2 3", 2, 3)]

I came up with the class name CommandCenter to handle the execution, and it's responsibility is to talk to a CommandParser to turn the string into a list of Commands, and to hand the list to a CommandInvoker. The CommandInvoker's responsibility is to invoke each Command on domain objects such as LandingSurface and Rover.

(I avoided the name CommandController for example, because of the association with MVC.)

The first thing the CommandCenter should do when asked to execute a string command, is to figure out what kind of command it is dealing with. For this, first I wrote a test:

public class CommandCenterTests
{
    [TestFixture]
    public class CommandCenter_Execute
    {
        [Test]
        public void Given_valid_command_string_invokes_Parser()
        {
            const string expectedCommand = "2 5";
            var mockCommandParser = new Mock<ICommandParser>();

            var commandCenter = new CommandCenter();
            commandCenter.Execute(expectedCommand);

            mockCommandParser.Verify(x => 
                       x.Parse(expectedCommand), Times.Once());
        }
    }
}

As you can see, focusing on the first unit of behaviour expected of this class has suggested a dependency - ICommandParser. At the moment this is just an interface, not a class, so it can be mocked and used for the purposes of this test without having to start worrying about it's implementation or having two failing unit tests at once.

As an extension of this 'single test subject' concept, the only class to be directly instantiated (newed up) is the Subject Under Test (SUT):

var commandCenter = new CommandCenter();

All interactions with other objects are verified only by mocking, so that a test only ever verifies the behaviour of the SUT, no more no less.

Testing too much
The responsibilities of those associated objects, like ICommandParser above, will have their own tests to verify their behaviour when they are implemented. As in the example above, it is enough to ensure that the CommandCenter invokes the services of it's collaborator, ICommandParser, and that is all. Testing the actual command parsing functionality will take place in the CommandParserTests.

This is one of the first pitfalls for newcomers to TDD - testing too much. But once you get the hang of it you start thinking of classes in terms of their specific responsibilities. You also start to see objects from the outside-in, in terms of class responsibilities being defined by an API (hence the use of interfaces).

Once you have a green light on your first test, you can start work on the first dependency you have highlighted (test-first ofcourse), and so on until your acceptance test goes green. If you have no dependencies left unimplemented and your acceptance test is still red, look back to your 'trigger', and add another test for the missing functionality.

This is what happened with the example above - it's not enough just to parse the string into commands, you also have to execute them. You could write a new test for that functionality, or you could extend the existing one (as I did). So long as you have a failing test before you start, and so long as you keep the test small and focused. One or two Assert's is OK, more than that and you may be testing too much.

Arrange, Act, Assert
Note at this stage that the test is written in the Arrange Act Assert format. Below you can see exactly the same test but with a few comments to show you the different sections:

public class CommandCenterTests
{
    [TestFixture]
    public class CommandCenter_Execute
    {
        [Test]
        public void Given_valid_command_string_invokes_Parser()
        {
            //Arrange
            const string expectedCommand = "2 5";
            var mockCommandParser = new Mock<ICommandParser>();

            //Act
            var commandCenter = new CommandCenter();
            commandCenter.Execute(expectedCommand);

            //Assert
            mockCommandParser.Verify(x => 
                       x.Parse(expectedCommand), Times.Once());
        }
    }
}

When you are beginning it can be helpful to annotate it as above, however once you've been doing it a while you get used to expecting a certain convention in the code, and no longer need the reminders.

Refactoring... is this the best possible design?
This is probably a good time to take a look at the TDD cycle. One of the fundamental things, and core benefits of TDD is improving your design. After you have written your implementation, you have the opportunity to improve it, perhaps to make it more efficient, and as long as it still satisfies your test cases, you are good to go.

But more broadly than that, before you start writing a test, you have the opportunity to improve the existing system design in light of the functionality you are about to implement.

  • Will some responsibilities need to shift to ensure the resulting system makes sense? First, shift the tests, then shift their implementations.
  • Does another method need to be renamed, for the sake of differentiation, now that a class has to handle a slightly different load? Use ReSharper to do it automatically.
  • Is the code you're writing starting to look a bit like a popular design pattern? Consider whether implementing the pattern more closely will improve the code. (This is how I got to the Command pattern, which I explain later in this post.)

Semantic naming
Finally, take a look at the test's internal class structure and naming. This (like many things I am describing in this post!), is a matter of convention and style, but the inner class tells you what class and method are under test (the SUT). The test method name tells you the expected behaviour you are ensuring, i.e.

public class ClassNameTests
{
    [TestFixture]
    public class ClassName_MethodName
    {
        [Test]
        public void Behaviour_under_test()
        {
            //Arrange, Act, Assert
        }
    }
}

The reason I've used these conventions is that as well as being explicit, it reads as documentation, like before:

Rover Tests reading as class documentation

Only this time we have class-level documentation rather than story/acceptance-level. Each method of the Rover class in the example above is described in terms of it's externally expected behaviour, along with proof that it is living up to those expectations.

Speed
I'll briefly describe the architecture of the system in a minute, but I want to quickly talk about speed.

If you are new to TDD, one of the things you might be thinking right now is, how long is all this going to take!? In this time we could have already implemented twice as many features.

This is a very good point. You could, and TDD is slower - at least at the start. The most convincing answer to this that I know of, is that it depends on the expected complexity and lifespan of your application. A small application which will be shelved in six months may not be a good candidate for TDD. But an enterprise application, which is expected to constantly change and grow as the business changes and grows, is probably a good candidate.

With TDD, the very start certainly takes longer, but once you get going features can be added at a consistent pace. This is in contrast to non-TDD projects, where as the system grows and becomes more complex, it also becomes more 'brittle' and harder to change. People are afraid to redesign to adapt to new challenges, people are afraid to update to the latest versions or techniques of external libraries for fear of regressions, and deployments to production become more of a risk.

And ofcourse the real clincher: the time it takes to implement new features on existing functionality grows exponentially.

With regard the time it takes to implement a single feature with TDD, once I got used to the process I found it to be only marginally longer. This is because in effect, you are doing the same core tasks you were doing without TDD, only in a different order - and with a little bit of extra typing for the tests:

  • analysing a problem,
  • contextualising it within an existing codebase,
  • following through on implementation,
  • and testing

But as well as the extra typing, there is the fact that you are working incrementally, i.e. you are improving your design and refactoring as you go.

Now that does take extra time - but it takes the time upfront.

Without TDD you hit those problems later on, often close to your release date. You are more likely to have longer bug lists, unexpected regressions, and poorer system design at the end of it. This means the next developers who have to come and work on the codebase, and the next after that, will encounter the same problems again, and those problems will continue to grow as the complexity grows.

Ofcourse, this is all quite philosophical! There's no hard evidence, but as we know, TDD has become a mainstream practice. As you can tell, I'm convinced of it's usefulness in non-trivial applications!

Separating end-to-end from unit tests
Acceptance tests are naturally end-to-end. Their purpose is to test the system, as far as is possible, from the perspective of it's actual usage. That means that the tests must use the same object graph, provided by the same DI container and an equivalent composition root (notice the container.Resolve<Icommandcenter> in the acceptance test example above). This is so that the objects exercised by the tests all relate to each other in exactly the same way as they do in production.

Unit tests, on the other hand, must test just one behaviour of one component at a time, and so to distinguish the two types of test, I placed each in their own assembly:

Project layout showing AcceptanceTests assembly

As the system grows, there would remain only one Acceptance Test assembly (for each distinct presentation layer anyway), but a new Unit Test assembly would be added for each and every production code assembly, mirroring the directory structure of the production code.

Ofcourse, there are other types of test - in particular Integration Tests, for testing your (hopefully thin) integration points with external resources. However there is no requirement for integration in this spec, and so it is descoped for another article.

Exploring the MarsRover project
You should have enough information to start exploring the solution now, so before I wrap up I'll just say a few things about architecture.

I have used the Command pattern, in which a command invoker invokes commands on command receivers. To make it easy for other developers to detect the use of the pattern, I have used appropriate naming conventions:

  • ICommand
  • ILandingSurfaceSizeCommand (implements ICommand)
  • IRoverDeployCommand (implements ICommand)
  • IRoverExploreCommand (implements ICommand)
  • ICommandInvoker (invokes each of the concrete commands at the appropriate moments via their shared interface, ICommand)

The command receivers are the domain objects being operated on, i.e.

  • IRover (implementation Rover), and
  • ILandingSurface (implementation Plateau)

When the CommandInvoker is instructed to InvokeAll(), it simply iterates through each ICommand, assigns the receivers to it, and invokes it. You can see it happen here.

(Note: The method dictionary is just a way to pass each ICommand to a tailored method for assigning the correct receivers.)

Resolving dependencies
The other thing you'll notice is how I handled dependency injection, and in particular dynamic instantiation. This is all stuff I've blogged about before, so please check out the relevant posts:

  • I used constructor injection to inject dependencies (described here),
  • and Autofac's delegate factories for dynamic instantiation (described here)

I hope you find this project useful, and if you have any comments or questions, please write!

Wednesday, March 14, 2012

Using Autofac's Auto-Factories to Avoid Service Locator

An autofac-style factory delegate
An autofac-style factory delegate
(credit: pasukaru76)

I recently switched a project's DI solution from StructureMap to Autofac.

The reason was because the only way I could get StructureMap to handle dynamic instantiation was to use a variant of Service Locator, which couples your code to your container, making unit tests impossible and reducing code coverage.

In this article I'll explain the problem with StructureMap, and how Autofac solved it elegantly.

I'll explain the problem using a Something class, i.e. any class you might want to dynamically instantiate somewhere in your code. Here it is in all it's glory:

public class Something : ISomething
{
    public Something(int anArgument)
    {
        //an implementation
    }
}

How it works in StructureMap
One of the biggest problems with using StructureMap (as mentioned in this post and this post), is that your entry-point to the container is via the static class ObjectFactory, which always resolves to a singleton container.

Commonly you'll see a Bootstrapper class like this one below, where all the dependencies are registered (explained here):

public static void Bootstrap()
{
    ObjectFactory.Initialize(x =>
    {
        x.For<ISomething>().Use<Something>();
        x.For<IAnotherThing>().Use<AnotherThing>();
    }
}

The main benefit of the static reference is that when you need to dynamically instantiate an instance later in your code, you can use the static class again to refer to the same container (rather than 'newing up' an instance of the container, which wouldn't have all of your existing bootstrapped mappings):

public class SomethingClient
{
    public void SomeDynamicScenario(int anArgument)
    {
        var something = ObjectFactory
            .With("anArgument").EqualTo(anArgument)
            .GetInstance<ISomething>();
        //operate on new object
    }
}

The downside of this benefit is that it's a Service Locator. The method can't be unit tested, because when the test runner gets to the ObjectFactory, it doesn't know what an IService is. You could tell it by running the Bootstrapper in your test, but then ofcourse then you aren't unit-testing any more.

In other words, your code is coupled to the StructureMap container.

The standard approach
The standard approach to making this testable is to extract your instantiation logic into an ISomethingFactory. That way, you can inject the factory and mock it in your tests.

public class SomethingClient
{
    private readonly ISomethingFactory somethingFactory;

    public SomethingClient(ISomethingFactory aSomethingFactory)
    {
        somethingFactory = aSomethingfactory;
    }

    public void SomeDynamicScenario(int anArgument)
    {
        var something = somethingFactory.Create(anArgument);
        //operate on new object
    }
}

public class SomethingFactory : ISomethingFactory
{
    public ISomething Create(int anArgument)
    {
        return ObjectFactory
            .With("anArgument").EqualTo(anArgument)
            .GetInstance<ISomething>();
    }
}

But then ofcourse, all you are doing is moving the problem somewhere else. How would you unit test SomethingFactory? You can't.

The problem is that wherever you have 'peppered' container references around your code (i.e. ObjectFactory references), you won't be able to test and your implementation will be coupled.

At that stage you could just say, well, that's the boundary of our unit tests - we will adopt a policy where we just don't unit test factories. And there's merit in that, because you are defining a unit test boundary and everything on one side of it will be unit tested. You could still test the factory as a module, or not test it at all. But then again...

Step in Autofac
The good thing about Autofac is, not only does it provide a solution which gives you 100% code coverage, it's also actually much easier to implement.

You can just declare your factory as a delegate method (which returns an ISomething), include it as a constructor argument and that's it! Autofac will work out from your ISomething mappings to build and inject a factory for you, saving you the bother of having to even define one in the first place.

Here's the SomethingClient example again:

public class SomethingClient
{
    private readonly Func<int, ISomething> somethingFactory;

    public SomethingClient(
                  Func<int, ISomething> aSomethingFactory)
    {
        somethingFactory = aSomethingfactory;
    }

    public void SomeDynamicScenario(int anArgument)
    {
        var something = somethingFactory(anArgument);
        //operate on new object
    }
}

//No need for a factory class

When the container instantiates SomethingClient it discovers the Func delegate. The delegate has a return type that Autofac is registered to handle, and a parameter list Autofac recognises as ISomething constructor parameters, and so Autofac injects a delegate that acts as a factory for us.

The Func delegate in this example takes one int argument, but ofcourse you can plug in any useful combination of input parameters.

Mixed constructor argument parameters
Now, this all works fine when we have a simple parameter with just a single int, but what happens when we update ISomething to accept a mixture of dependencies and value-type arguments?

public class Something : ISomething
{
    public Something(int anArgument,
                      Point aPoint, IDependency aDependency)
    {
        //an implementation
    }
}

In this example let's assume that the Point above is a type you have defined in your codebase to represent a value type, and therefore you want to specify this value dynamically on construction rather than by registering it with the container:

public struct Point
{
    public int X;
    public int Y;
    public Point(int anX, int aY)
    {
        X = anX;
        Y = aY;
    }
}

On the other hand ofcourse, IDependency is a dependency and you want that injected for you, by the container. How do we declare our factory?

We only include the value types and we omit the dependency:

public class SomethingClient
{
    private readonly 
              Func<int, Point, ISomething> somethingFactory;

    public SomethingClient(
              Func<int, Point, ISomething> aSomethingFactory)
    {
        somethingFactory = aSomethingfactory;
    }

    public void SomeDynamicScenario(
                   int anArgument, Point aPoint)
    {
        var something = somethingFactory(anArgument, aPoint);
        //operate on new object
    }
}

//No need for a factory class

No IDependency mentioned anywhere! Autofac is smart enough to allow you to add dependencies to the underlying class constructor's signature, without changing the factory, and it will still inject the dependencies for you!

Which is exactly what you want - as your solution grows and you find yourself adding dependencies, your factories do not change. They are created for you and extra dependencies are included on instantiation transparently.

This type of joined-up thinking makes Autofac a very nice container to work with. Lots of things happen for you, right out of the box, based on convention - you just have to know which and when.

The composition root
As this post eloquently explains, there should be only one place in an application where your container is referenced - the composition root. In this location you should do everything necessary to resolve the entire object graph - i.e. Register, Resolve, Release.

By using a static reference to a singleton container, StructureMap encourages these rules to be broken. Autofac, on the other hand, provides a container that implements IDisposable, which means you can use a using block to enforce the RRR pattern:

public class Program
{
    public static void Main()
    {
        var containerBuilder = registerAssemblyTypes();

        using (var container = containerBuilder.Build())
        {
            //single entry point
            var entryPoint = 
                    container.Resolve<ICodebaseEntryPoint>();
            //start using entryPoint
        }
    }

    private static ContainerBuilder registerAssemblyTypes()
    {
        var programAssembly = Assembly.GetExecutingAssembly();
        var builder = new ContainerBuilder();

        //perform auto-wiring
        builder.RegisterAssemblyTypes(programAssembly)
            .AsImplementedInterfaces();

        return builder;
    }
}

Component discovery
While we're here, take a look at the call to ContainerBuilder.RegisterAssemblyTypes (Assembly). This little call uses reflection to look at your concrete classes and your interfaces, and work out which go with which. It uses a convention over configuration approach, and you can include clauses and overrides to filter and specify particular instantiations.

StructureMap also provides a similar component discovery model.

In fact, I'm sure StructureMap can do most of the things Autofac can do one way or another (see this article on avoiding static references), and I've used StructureMap successfully many times. It's just that Autofac does things nicely by default.

Broadening the discussion about relationships
One final point - Nicholas Blumhardt, the creator of Autofac, has included a very nice chart on his blog describing different object relationship types (of which dynamic instantiation via factories is one):

Relationship Adapter Type Meaning
A needs a B None Dependency
A needs a B at some point in the future Lazy<B> Delayed instantiation
A needs a B until some point in the future Owned<B> Controlled lifetime
A needs to create instances of B Func<B> Dynamic instantiation
A provides parameters of types X and Y to B Func<X,Y,B> Parameterisation
A needs all the kinds of B IEnumerable<B> Enumeration
A needs to know X about B before using it Meta<T> and Meta<B,X> Metadata interrogation

It's nice to consider contextually where the problem we've just dealt with sits. For more discussion on these types, check out Nicholas' introductory article.