Tuesday, March 27, 2012

Introduction to Covariance and Contravariance in C#

Covariance and contravariance have had a tweak in C# 4.0. But before we can start using the benefits in our code, we need to know exactly what they are! The subject can be a little bit tricky when you first encounter it, so in this post I'm going to introduce it in a way that I hope is easy to follow.

Almost all of the information in this post comes from Eric Lippert's excellent series of posts, and I recommend you go and take a look at his blog right now (links to the whole series are at the bottom of this post.)

Tigers and a giraffe
Tigers and a giraffe (credit: andrewmalone)

If you 'get it' straight away, great - stick with that, he's the expert. If you'd like a slightly gentler introduction, avoiding (I hope!) common 'gotchas' for learning the subject, read my post (this post) first.

What I have done is to explain the same concepts using the same examples, but in an order and with an emphasis which I think make it much easier to understand the basics. This is particularly good for you if you are a programmer coming at it from the cold, i.e. you haven't encountered covariance and contravariance before, or you have but you don't understand them yet.

When you're done here I'd suggest you go back and read Eric's posts in order - they should be much easier for you to read by then. Eric's posts will flesh out all the interesting details, and continue on to discuss more advanced topics.

Inheritance and assignability
We're not going to begin talking about covariance and contravariance straight away. First, we're going to make a distinction between inheritance and assignability.

As Eric points out, for any two types T and U, exactly one of the following statements is true:

  • T is bigger than U
  • T is smaller than U
  • T is equal to U
  • T is not related to U

Now, one of the things that seems to have caused some confusion on Eric's blog (see the comments) is usage of the phrase "is smaller than". It is used frequently, and is key, so I want to make it's definition crystal clear now before we move on. Eric says:

"Suppose you have a variable, that is, a storage location. Storage locations in C# all have a type associated with them. At runtime you can store an object which is an instance of an equal or smaller type in that storage location."

In simple scenarios, this is something so familiar to programmers that it's barely worth mentioning. We all know that, looking at the list above, only in the middle two scenarios is T assignable to U. The smaller than relation:

class U{}
class T : U{}
U myInstance = new T();

This is the first thing that comes to mind, right? An inheritance hierarchy.

But Eric didn't mention inheritance hierarchies. Sure, an inheritance hierarchy is one way to make a T which is assignable to a U, but what about this one:

U[] myArray = new T[10];

... or, the same statement using classes from the animals hierarchy:

Animal[] animals = new Giraffe[10];

The type Animal inherits from Giraffe, but the type Animal[] doesn't inherit from Giraffe[]. They are assignable, but not linked by inheritance, and this tells us something about what 'is smaller than' means:

  • T is smaller than U

can be read as

  • T is assignable to U

You can visualise it this way:

T is smaller than U
T        <        U
        -->          //Direction of assignability

As we have seen, in some cases this direction of assignability may be because of an inheritance relationship, but in others it is simply because the CLR and languages (C#, Java etc.) happen to support that particular assignment operation.

There is still an inheritance hierarchy involved, i.e. this wouldn't work:

Tiger[] tigers = new Giraffe[10]; //illegal

But the key thing is that there is a difference between inheritance and assignability: they are not the same thing.

I'll say it one more time (for good luck!): The phrase "is smaller than" refers to assignability, not inheritance. The direction of assignability always flows from the smaller type to the larger type. We'll come back to this in a moment.

Covariance and Contravariance
Eric's second post discusses the array assignment operation (the one I used in the Animal/Giraffe example above), and the problems with it. It's definitely worth reading, but park it for now, because things really come alive in post number three.

Eric's example uses delegate methods, and I'll use a simplified version of it here, just to get us started.

It is clear why this is a legal operation:

static Giraffe MakeGiraffe()
{
    return new Giraffe();
}

//Inside some method:
Func<Animal> func = MakeGiraffe;
//             <-- Direction of assignment

Notice that in the assignment operation, Animal is on the left and Giraffe is on the right. That is, the declared type is based on Animal and the assigned type is based on Giraffe.

Now let's look at another example:

static void AcceptAnimal(Animal animal)
{
    //operate on animal
}

//Inside some method:
Action<Giraffe> action = AcceptAnimal;
//             <-- Direction of assignment

Notice that Giraffe is on the left and Animal is on the right. That is, the declared type is based on Giraffe and the assigned type is based on Animal.

The Func<out T> assignment operation supports covariance. The Action<in T> assignment operation supports contravariance.

What does that mean?
Have a quick look at this summary:
(remember to read < as 'is smaller than' and 'is assignable to')

//Direction of assignability -->
Giraffe                       <   Animal

Giraffe MakeGiraffe()         <   Func<Animal>    //covariance
AcceptAnimal(Animal animal)   <   Action<Giraffe> //contravar..

Now read Eric's definition of covariance and contravariance, from the first post in his series:
(the "operation" which manipulate types being the two assignment operations)

Consider an "operation" which manipulates types. If the results of the operation applied to any T and U always results in two types T' and U' with the same relationship as T and U, then the operation is said to be "covariant". If the operation reverses bigness and smallness on its results but keeps equality and unrelatedness the same then the operation is said to be "contravariant".

Hopefully it should start to become clear. In line 4 above, the direction of assignability with respect to the original types, was preserved, while in line 5 it was reversed!

Line 4 represents a covariant operation, and line 5 represents a contravariant operation.

The main heuristic
Let's put it back to C# code so that we can see it with the right-to-left assignability we are used to (now the smaller types are on the right):

Animal animal = new Giraffe(); //basic type assignment

Func<Animal> func = MakeGiraffe;       //covariant
Action<Giraffe> action = AcceptAnimal; //contravariant
// <-- Direction of assignability

Notice how in the covariant operation, Animal and Giraffe are on the same sides as in the basic type assignment operation. And notice how in the contravariant operation, they are on opposite sides - the operation "reverses bigness and smallness".

In both cases, the opposites are illegal. As Eric puts it in post number five:

"Stuff going 'in' may be contravariant,
stuff going 'out' may be covariant"

... but not vice-versa:

Func<Giraffe> func = MakeAnimal;       //contravariant (illegal)
Action<Animal> action = AcceptGiraffe; //covariant (illegal)
// <-- Direction of assignability

And by the way, if there's one heuristic you remember as a result of reading this post, it's probably best to make it the one above!

I'll repeat it later in this article.

Hang on, methods aren't types!
A quick aside - at this stage you might be asking why I'm referring to methods as though they were types. The straight answer is, I'm copying Eric. His caveat:

"A note to nitpickers out there: yes, I said earlier that variance was a property of operations on types, and here I have an operation on method groups, which are typeless expressions in C#. I’m writing a blog, not a dissertation; deal with it!"

Can't argue with that.

What's new in C# 4.0?
Well, 'new' is the wrong word since the stable release of C# 4.0 was two years ago! But all of the types of variance we've looked at so far in this post have been supported since C#2 or before.

We as developers didn't really have to think about those types of variance to use them, because it wasn't exposed syntactically. In other words, we didn't have to write anything different to make it happen, it's just what is and what isn't supported by C# compilers and the CLR.

In post numbers four and six, Eric discusses types of variance which went on to become part of the specification for C# 4.0, and it's those types of variance that I'll discuss now.

Real delegate variance
The first one is easy, and it's discussed in post number four. It's simply about taking the operations which were already legal in terms of method groups and making the same operations legal in terms of typed expressions.

Take our covariant example from earlier:

static Giraffe MakeGiraffe()
{
    return new Giraffe();
}

//Inside some method:
Func<Animal> func = MakeGiraffe;
//             <-- Direction of assignment

Well, in C#3 this essentially equivalent operation was illegal, whereas in C#4 it is legal:

Func<Animal> func = new Func<Giraffe>(() => new Giraffe());
//             <-- Direction of assignment

In fact because of lambda syntax and inferred typing, it can be shortened to:

Func<Animal> func = () => new Giraffe();
//             <-- Direction of assignment

You can now do with typed expressions what you could already do with method groups. Simple.

But here's where it makes sense to quickly explain something I breezed over earlier.

Covariance and Contravariance, at once
Take a look again at the heuristic:

"Stuff going 'in' may be contravariant,
stuff going 'out' may be covariant"

So what happens when you are dealing with a type which has both an 'in' and an 'out'?

The short answer is: it can be covariant, contravariant, both, or neither. But it's easier than that makes it sound!

Take a look at this example. It's a Func that accepts a Mammal and returns a Mammal:

Func<Mammal, Mammal> func;

Now here are some assignment operations:

  • This is a covariant operation:

Func<Mammal, Giraffe> toAssign = //somehow initialise;
Func<Mammal, Mammal> func = toAssign;

  • This is a contravariant operation:

Func<Animal, Mammal> toAssign = //somehow initialise;
Func<Mammal, Mammal> func = toAssign;

  • This is both:

Func<Animal, Giraffe> toAssign = //somehow initialise;
Func<Mammal, Mammal> func = toAssign;

... and, well, I'm sure I don't need to spell out the neither!

Interface variance
The other new feature, as discussed in post number six, is the extension of variance to interfaces. There's not much to add here - it's just the same thing, but using interfaces. Eric gives a really nice example of the practical benefit here, and I'm going to repeat it almost verbatim.

Take a look at this code block. This is another example of something which is illegal in C#3, and legal in C#4:

void FeedAnimals(IEnumerable<Animal> animals) {
  foreach(Animal animal in animals)
    if (animal.Hungry)
      Feed(animal);
}
//...
IEnumerable<Giraffe> adultGiraffes = 
        from g in giraffes where g.Age > 5 select g;
FeedAnimals(adultGiraffes);

Just as earlier on, when we call FeedAnimals(IEnumerable<Animal> animals) we are assigning a 'smaller' type to a 'larger' type:

//Direction of assignability -->
Giraffe                   <   Animal

Giraffe MakeGiraffe()     <   Func<Animal>        //covariance
IEnumerable<Giraffe>      <   IEnumerable<Animal> //covariance

Of course, anywhere else that you reference that assigned-to variable (IEnumerable<Animal>), what comes out will be typed as Animal. All pretty uncontroversial.

In and out
But finally, let's look at the in and out keywords, and how they fit in when designing your own interfaces (or using the upgraded C#4 ones.) Recall one more time the heuristic:

"Stuff going 'in' may be contravariant,
stuff going 'out' may be covariant"

In C# 4.0, IEnumerable<T> has become IEnumerable<out T>. The out marks the IEnumerable as supporting covariance on the T. This means that, as in the example above, you can assign based on something smaller than T.

But it also means that the interface cannot accept the type T as an input! It will only allow the interface to send T out, in whatever fashion you like - but it will never accept a T in. If you try it, the compiler won't allow it. Hence, the name: out.

Reading through this code block should make it clear why:

//the compiler won't allow this, but go with it to see why:
interface ICustom<out T>
{
    T GetFirst();     //ok
    void Insert(T t); //compiler complains
}
//..
ICustom<Giraffe> giraffes = //somehow init;
ICustom<Animal> animals = giraffes;
Animal animal = animals.GetFirst();  //ok
animals.Insert(new Tiger());   //problem - 
                               //backing store is Giraffe

Think of it this way - an IEnumerable<Animal> variable can have an IEnumerable<Giraffe> assigned to it and it will churn out Giraffes typed as Animals all day long. Because of how it's declared, users of the IEnumerable<Animal> variable expect to be dealing with Animals.

But a Tiger is also an animal. What would happen if there were a method on the interface that allowed a user to put an Animal in?

The user could put a Tiger in instead, and the backing store - IEnumerable<Giraffe> - wouldn't be able to cope.

The same in reverse
Now here's a similarly invalid code block, this time using the in keyword:

//the compiler won't allow this, but go with it to see why:
interface ICustom<in T>
{
    T GetFirst();     //compiler complains
    void Insert(T t); //ok
}
//..
ICustom<Animal> animals = //somehow init;
ICustom<Giraffe> giraffes = animals;
giraffes.Insert(new Giraffe()); //ok

Giraffe giraffe = giraffes.GetFirst();  //problem
                               //backing store is Animal

So when a type is marked as out, it's out only. And when a type is marked as in, it's in only too! A type can't be both in and out.

How to read it
So when you read an out type in an interface, read it this way:

interface ICustom<out T>
{
    //You can initialise me using <=T
    //And I will use it as a backing store
    //But I will only send T's out-wards
    //Because T's coming in could be too wide
    //                    for my <=T backing store
}

And for an in type in an interface:

interface ICustom<in T>
{
    //You can initialise me using >=T
    //And I will use it as a backing store
    //But I will only accept T's in-wards
    //Because my >=T backing store is too wide
    //                   to produce T's to send out
}

If it helps, try reading those again - but this time, with the out T interfaces read T as Animal and <=T as Giraffe.

And with the in T interfaces read T as Giraffe and >=T as Animal.

Or, more concisely
Here's out again more concisely:

interface ICustom<out T>
{
    //covariant
    //assign a smaller T, i'll only send it out
}

And for in:

interface ICustom<in T>
{
    //contravariant
    //assign a larger T, i'll only take it in
}

I hope that helps!

The payoff
As Eric points out, the only way to make the above example of FeedAnimals work in C#3 is to use a "silly and expensive casting operation":

FeedAnimals(adultGiraffes.Cast<Animal>());
//or
FeedAnimals(from g in adultGiraffes select (Animal)g);

He goes on:

"This explicit typing should not be necessary. Unlike arrays (which are read-write) it is perfectly typesafe to treat a read-only list of giraffes as a list of animals"

And the example which Eric suggests hypothetically in that post, Matt Hidinger later demonstrates for us using C#4!

The full series
That's about as much as I want to write on the subject!

Below are links to the full series. Bear in mind that these explanatory posts were written prior to the release of C# 4.0. But they are still an excellent programmer's introduction, with much more info than I have covered in this post:

5 comments:

  1. Covariance basically means that the return value of a method that is referenced by your delegate can have a different return type than that specified by the delegate itself, as long as the return type of the method is a subclass of the return type of the delegate.
    Contravariance deals with the parameters data types rather than return data types

    Online Dot Net Training
    Dot Net Training in Chennai

    ReplyDelete
  2. Really interesting content which is unique which provided me the required information.
    Dot Net Training in Chennai | .Net training in Chennai | FITA Training | FITA Velachery .

    ReplyDelete
  3. Excellent .. Amazing .. I will bookmark your blog and take the feeds additionally? I’m satisfied to find so many helpful information here within the put up, we want work out extra strategies in this regard, thanks for sharing..

    Hadoop Training in Chennai

    ReplyDelete
  4. this technological concepts are really well being and wonderful thus it is very much interesting and very well good too, really i got more information from your knowledge.





    Digital Marketing Company in Chennai

    ReplyDelete
  5. Really a pretty thing you had said here. I think this will be useful at many people. but i am expecting updated posts from you
    Back to original

    ReplyDelete

Note: Only a member of this blog may post a comment.