Jahya Code .NET

Monday, October 3, 2011

Selected Items with HtmlHelpers

I came across an odd 'feature' of MVC Html Helpers today: Even if you explicitly provide a selected value when you call an option-based Helper, there's no guarantee that selected value will be used.

Suppose you are creating a view - an edit page for a Book. You declare something like this in your view. Note that we are passing in a new Author object as our currently selected item - so it shouldn't match anything and no items should be selected:

<% var selectList = 
    new SelectList(Model.Authors, "Id", "Name", new Author()); %>

<%=Html.DropDownList("book.Author.Id", selectList, 
    "Please select")%>

The fourth argument to your SelectList constructor is a new Author object. As such when you view your edit page you would expect the drop-down to default to "Please select".

In reality though, the drop-down defaults to the author associated in your model..? This may be the behaviour you normally want from an edit page, but when your business rules suggest that the field not be pre-populated, you can't live up to it without a mild hack.

What's happening is the Html Helper is looking at your View Model object, for an item that looks like the first argument: "book.Author.Id"

When it finds it, the Helper is favouring that Id value over your suggestion. Not exactly putting you in the driving seat is it?

One workaround? It's not the nicest-looking, but set the View Model value to what you want just before calling the Helper:

Model.Book.Author = new Author();
<% var selectList = 
    new SelectList(Model.Authors, "Id", "Name", new Author()); %>

<%=Html.DropDownList("book.Author.Id", selectList, 
    "Please select")%>

It may not be pretty, but it works.

Friday, July 22, 2011

Lambda Expressions in C#

In a previous article I introduced delegates and anonymous delegates. In this article I intend to explain the fundamentals of Lambda expressions by building on our knowledge of delegates. It's useful to get a proper handle on delegates before delving into Lambdas, because Lambdas are just anonymous delegates in different clothing.

A Quick Example
At the end of my post on delegates, I introduced a common scenario where you might want to use an anonymous delegate to find an item from a list:

class AdHocSearch
{
    static void Main()
    {
        List<Book> bookDb = new List<Book>();
        bookDb.Add(new Book(){ title = "Romancing the Stone" });
        bookDb.Add(new Book(){ title = "Wuthering Heights" });
        
        Book foundBook = bookDb.Find(delegate(Book candidate)
        {
            return candidate.title.Equals(
                            "Wuthering Heights");
        });

        Console.WriteLine(foundBook.title);
    }
}

However the same code can easily be refactored to use a Lambda expression (if you use ReSharper just hit Alt+Enter and it will refactor for you):

class AdHocSearch
{
    static void Main()
    {
        List<Book> bookDb = new List<Book>();
        bookDb.Add(new Book(){ title = "Romancing the Stone" });
        bookDb.Add(new Book(){ title = "Wuthering Heights" });
        
        Book foundBook = bookDb.Find(
            candidate => candidate.title.Equals(
                                "Wuthering Heights"));

        Console.WriteLine(foundBook.title);
    }
}

The portion highlighted is the Lambda expression, and it does the same job as the delegate in the first code-block. However it does it in a way which is slightly different - it is more concise, and it's declaration is less explicit.

Let's start from the top - the Lambda operator.

The Lambda Operator
You can tell you are looking at a lambda expression when you see the lambda operator (=>). This operator can be read as "goes to":

x => x * x

So this expression (above) is read as "x goes to x times x".

How do we Get from Delegate to Lambda?
Let's explain the above expression by showing how it would look in familiar, delegate form. Then we can alter it step by step until it becomes a lamba expression again. Here's the fully written-out delegate equivalent of the above statement (below). I have written this in a way which should be familiar from my previous post on delegates:

public delegate int Square(int i);

void Main()
{
    Square square = delegate(int i)
    {
        return i*i;
    };
}

In the code-block above, the first thing to notice is that we have a named delegate type - it's name is Square. This is the first line in the code-block, the declaration that essentially says (in English) "There is a delegate type named Square, it has a single int input parameter, and an int return type."

Inside the body of the Main method, we have then instantiated an instance of Square, (named lower-case square), and assigned an anonymous delegate to it.

Step 1: Anonymise the Delegate Type
Well now, we are going to anonymise the delegate type aswell as the delegate. Anonymising means declaring the same behaviour, but without specifying a name. So in English: "There is a delegate type ~~named Square~~, it has a single int input parameter, and an int return type".

Since we are no longer giving it a name, we must declare it at the same time as assigning a value to it, and therefore now the whole thing is enclosed within one statement. Remember, we are still naming the variable (lower-case square), but we are no longer naming the type (was init-capped Square):

public delegate int Square(int i);

void Main()
{
    Square square = delegate(int i)
    {
        return i*i;
    };
}

//becomes

void Main()
{
    Func<int, int> square = delegate(int i)
    {
        return i*i;
    };
}

To do this we used a type named Func<T, TResult> (see the definition here), which was introduced in C# 3.5. This class allows us to encapsulate a method as a reference-type, and pass in a couple of generic type parameters describing the argument and return type respectively: <int, int>.

You'll notice that there are several variations on Func, each with a different number of generic type parameters. You can encapsulate a method with up to 16 arguments, and there is a similar family of classes to deal with methods with no return type. Any method you assign to an instance of one of these types has to have matching input parameters and return type, and you can see in the code-block above this is the case.

Step 2: Switch to Lambda Syntax
Now let's convert the delegate on the right side of that assignment operation into a lamda expression:

Func<int, int> square = delegate(int i)
{
    return i*i;
};

//becomes

Func<int, int> square = i => i * i;

And there we have the same statement we started with. It uses the same variables to do the same thing and return the same value - but with different syntax.

The first thing you notice is how concise it is, even though it does exactly the same job. On the left side of the operator, we always have the input parameters.

left-of-the-operator - input parameters,
right-of-the-operator - expression or statement block

Remember, the operator is read as "goes to". So a lambda is always read as "input (goes to) expression", or "input (goes to) statement block".

Inferred Typing
With the delegate version we had to specify the type of the input parameter, when we said: delegate(int i). However with lambdas the compiler infers the type of i from it's usage. The compiler sees that we are assigning the expression to a strongly-typed Func object with generic type parameteres <int, int>. So it knows and doesn't need to be retold that the input parameter i in the expression is an int.

Don't make the mistake of thinking this is loosely-typed. Loose-typing is not possible in the .NET Framework. This is inferred typing, which is still very much strongly-typed. Type usage is inferred by the compiler on initialisation and cannot change. It is part of the suite of implicit-typing functionality introduced in C# 3.0, including:

In fact, one of the main benefits of lambda's, is they are a simple way to make many previously un-type-safe operations type-safe. Take a quick look at Mauricio Scheffer's blog here, where he has used lambda's to replace string literals. You see this type of usage a lot in .NET frameworks and packages nowadays.

In the first example in the first code-block of this post, without Lambdas or delegates the Find method would only be possible by passing in string literals representing the desired candidate properties. And you can also see, for example, from my post on StructureMap IoC, that statement lambdas are now the preferred initialisation approach (though the previous approach was also typesafe).

Specifying Input Parameters
If you want to override the compiler's inferred usage, you can use brackets on the left-side of the lambda operator to specify input parameters:

Func<int, int> square = (int i) => i * i;

Or, if you have no parameters to pass in, just use empty brackets:

Func<int> noParams = () => 5 * 5;

I have to admit i'm not sure yet where or when this would be useful.

Expression Lambdas vs. Statement Lambdas
So far we've only been using Expression Lambdas, which means that on the right-side of the lambda operator there is one, single expression. This is generally useful when you are trying to apply a function, such as Square above, or trying to apply a predicate, such as the List.Find example at the top of this article.

However you can construct multi-line, procedural lambdas if you wish, by using curly braces:

Action speakToWorld = n =>
{
    string s = n + " " + "World";
    Console.WriteLine(s);
};
speakToWorld("Hello");

Action is ofcourse the cousin of Func, but never returns a value (see the reference page here).

Expression Trees
One final thing to note here is the broader context within which both lambda's and delegates exist. We have moved into an arena where functionality now has the same status as value. What I mean is that we are used to assigning and manipulating values (and objects encapsulating values) within variables. Now it is just as easy to build and manipulate functionality. Expression trees are an example of that.

Earlier we used Func and Action to store our method pointers. These allow truly anonymous references to executable code. But these classes can be assembled and re-assembled in their own right, as components of an Expression object.

Have a quick look at the MSDN on Expression Trees:

"Expression trees represent code in a tree-like data structure, where each node is an expression, for example, a method call or a binary operation such as x < y."
"You can compile and run code represented by expression trees. This enables dynamic modification of executable code, the execution of LINQ queries in various databases, and the creation of dynamic queries."

A quick example from the-code-project:

ParameterExpression parameter1 = 
    Expression.Parameter(typeof(int), "x");

BinaryExpression multiply = 
    Expression.Multiply(parameter1, parameter1);

Expression<Func<int, int>> square = 
    Expression.Lambda<Func<int, int>>(multiply, parameter1);

Func<int, int> lambda = square.Compile();
Console.WriteLine(lambda(5));

In the code-block above, we have done the same thing our simple lambda expression from earlier does - square a number. It is much more verbose, yes, but look at the types on the last two lines before Console.WriteLine. We created a dynamic Expression object, which can be altered and assembled ad-hoc however you like at runtime - and then Compile()'d into executable code, and run - again, all at runtime.

In other words, you can have your code assemble more code as it runs. Or in other words again, you can have your code treat itself in the same, flexible way it used to only be able to treat data and values.

Aside from generating dynamic queries, what does this allow us to do? As this stack-overflow page points out, we can hand the power of code customisation over to users. Ever made a 'form builder', allowing users to customise post-driven dynamic webpages? I can't imagine how we did that in the past but they've just got a whole lot easier, and semantically correct. Or what about user-customised workflows? Ditto.

Thursday, July 21, 2011

Introduction to Delegates in C#

What is a delegate? Most articles dive straight into the technical stuff, but I think it makes it easier to learn when you start by looking at the metaphor - I will dive right into the practical example straight afterwards.

So first of all, what makes the word 'delegate' such an apt keyword?

The Metaphor
According to Wikipedia, a delegate is a person who is similar to a representative, but "delegates differ from representatives because they receive and carry out instructions from the group that sends them, and, unlike representatives, are not expected to act independently."

Imagine a government department is employing two private companies to carry out litter collection across two districts. The government are responsible for preparing the contracts, and intend to use a standard template contract for both private companies. But the two private companies have different operating practices, and each want to tailor their respective contracts to suit their own particular needs.

They each send a delegate to meet the government and make amendments to their respective contracts. Each delegate is briefed by their respective companies before they leave, so that they know exactly what their instructions are. Each delegate arrives intending to do the same thing - amend a contract. But their instructions and boundaries on how they can do that are different - and are defined clearly by the respective senders.

I'll end the metaphor here, because I don't want to lock into one example. But having that sense of what a delegate is should make the rest of this article easier to read.

Declaring a Delegate Type
First, let's declare a publicly accessible delegate type:

//Declare a delegate type
public delegate void BookProcessingDelegate(Book b);

It is important at this stage to understand that what we have defined is not a delegate but a delegate type. Each instance we later instantiate of this type will be an actual delegate.

Delegates are usually declared not as class members, but as peers of classes or structs, i.e. at the same level. This is because declared delegate types are types in their own right. They can ofcourse, also be declared nested inside a class (just like a class can be declared nested inside a class).

In other words, a delegate is an example of a reference-type, and is stored on the heap and accessed via a pointer (see my article on types, the stack and the heap).

Instantiating and Assigning a Delegate Method
Having declared our delegate type, there are three things we need to do to make use of it:

Define a method (write some instructions)
Instantiate a delegate and assign the method to it (brief the delegate of our instructions)
Send the delegate off to do business on our behalf

I'm going to use a variation of the 'classic' delegate example, and note that I've commented numerically the implementation of the bullet points above:

//Declare a delegate type
public delegate void BookProcessingDelegate(Book b);

public class Book
{
    public string title { get; set; }
}

public class BookDB
{
    private IList<Book> books;

    public void ProcessBooks(BookProcessingDelegate del)
    {
        foreach(Book b in books)
        {
            // Call the delegate method
            del(b);
        }
    }
}

class Program
{
    //1. Define a method
    private static void PrintBookTitle(Book b)
    {
        Console.WriteLine(b.title);
    }

    static void Main()
    {
        BookDB bookDb = new BookDB();

        //2. Instantiate a delegate, and assign the method
        BookProcessingDelegate printTitles = PrintBookTitle;

        //3. Send the delegate off to do work on our behalf
        bookDb.ProcessBooks(printTitles);    
    }
}

When we assign the method to the delegate (item 2 above), we are actually assigning to the delegate a pointer to the method. Then, when the BookDB.ProcessBooks method uses (invokes) the delegate, it follows the pointer and actually invokes the referenced method.

Logically, it is exactly the same as if the body of the referenced method had been declared inside of BookDB. But ofcourse it wasn't, and that's the key to the usefulness of delegates - i'll discuss this in more detail shortly.

But first of all let's explore two more ideas - anonymous delegates, and multicast delegates.

Anonymous Delegates
In the above example, we assigned a named method which was a private class member (namely PrintBookTitle). If that method is only going to be used for this one specific purpose, it can be much more convenient to declare an anonymous method.

What follows is exactly the same code block as above, but now the PrintBookTitle method has been anonymised, causing steps 1 and 2 to become one statement:

class Program
{
    static void Main()
    {
        BookDB bookDb = new BookDB();

        //1. Define a method AND
        //2. Instantiate a delegate, and assign a method
        BookProcessingDelegate printTitles = delegate(Book b) {
            Console.WriteLine(b.title);
        };

        //3. Send the delegate off to do work on our behalf
        bookDb.ProcessBooks(printTitles);    
    }
}

Anonymous delegates are more concise, and they are the foundation of lambda expressions, which I discuss in another article.

Multicast Delegates
It's another odd word, but multicast is based on another metaphor. Check out the link to get a quick sense of that, but it's simple enough anyway. You can assign more than one method to a delegate. The delegate will respond when invoked, by executing each method in order:

class Program
{
    private static void PrintLower(Book b) {
        Console.WriteLine(b.title.ToLower());
    }

    private static void PrintUpper(Book b) {
        Console.WriteLine(b.title.ToUpper());
    }

    static void Main()
    {
        //Instantiate a delegate, and assign TWO methods
        BookProcessingDelegate printTitles = PrintUpper;
        printTitles += PrintLower;

        //You can also use the -= operator to remove methods
        //from the 'invocation list'
        printTitles -= PrintLower;
    }
}

You can add and remove methods to your heart's content. The resulting list of methods which must be called on invocation of the delegate is referred to as the delegate's invocation list.

If the delegate has a return value, only the value returned by the last method in the invocation list will be returned. Similarly, if the delegate has an out or ref parameter, the parameter will end up being equal to the value assigned in the last method in the invocation list (see also my article on parameter passing in C#).

What Are Delegates Useful For?
As we have seen, delegates give us the ability to assign functionality at runtime. This adds to our toolbox of assignment operations - we all know how easy it is to assign values, or references to objects. Now we can assign functionality too.

Look back to the first, main code-block of this article. It should be clear that any class, anywhere in your application that chooses to call BookDB.ProcessBooks can do so and provide it's own specific, tailored functionality. The Program class happens to have sent one particular type of functionality (printing to the console), but any delegate can point to any method implementation.

Without delegates, the creators of the BookDB class would have to have thought up in advance every possible thing that callers might want to do with the Book list - and provide lots of methods to cover all those bases. In doing so, the BookDB class would be taking on responsibilities that are not within it's problem domain.

The other way to achieve the same functionality without delegates would be for the BookDB to expose it's internal list as a property, or via an access method. The Program class could then iterate over the collection itself. But this approach would mean that the Program class was shouldering responsibilities that are not within it's domain - iterating over a collection.

Delegates therefore provide an elegant solution which helps achieve the Seperation of Concerns principle within your application architecture.

A Common Usage - Ad-Hoc Searching
There are lots of places in the .NET Framework where you can use delegates, and their newer equivalent, lambda expressions. We'll take a look at one example, by altering our BookDB code a little. Let's say you want to find a book with a particular title, from a list of Book items:

class AdHocSearch
{
    static void Main()
    {
        List<Book> bookDb = new List<Book>();
        bookDb.Add(new Book() { title="Romancing the Stone" });
        bookDb.Add(new Book() { title="Wuthering Heights" });

        Book foundBook = bookDb.Find(delegate(Book candidate)
        {
            return candidate.title.Equals("Wuthering Heights");
        });

        Console.WriteLine(foundBook.title);
    }
}

If you couldn't use a delegate, you would have to manually iterate over the list. But fortunately for us, the List<T>.Find() method will accept a delegate function which describes how to apply a condition which determines a match for us. This leaves the job of iterating the collection, with it's associated efficiency concerns, in the correct problem domain - the domain of the List.

Quick Runthrough
For a nice, quick, concise primer on the subjects I've discussed above, check out this Youtube video:

I don't think it stands up as a learning resource in it's own right, but as a primer it's quite well organised.

Further Reading
This is quite a nice delegates tutorial on MSDN.

Wednesday, July 20, 2011

Pass by Value and Pass by Reference in C#

This article is a quick explanation of C#'s handling of pass-by-val and pass-by-ref. If you haven't already done so, it's a worth getting to grips with the heap and the stack - that article will also explain the relationship between value types, reference types and pointers, which will be really useful.

Value-Types
By default, value types are 'passed-by-value' into C# methods. This means that inside the body of the called method, the value instance can be changed or re-assigned without affecting the instance of the value in the caller.

static void Main(string[] args)
{
    int i = 1;
    Console.WriteLine(i); // prints 1
    MyMethod(i);
    Console.WriteLine(i); // prints 1
}

static void MyMethod(int i)
{
    i = 2;
}

If you want to override this behaviour, you can use the ref keyword or out keyword:

static void Main(string[] args)
{
    int i = 1;
    Console.WriteLine(i); // prints 1
    MyMethod(ref i);
    Console.WriteLine(i); // prints 2
}

static void MyMethod(ref int i)
{
    i = 2;
}

You can use the ref or the out keyword, the only difference it makes is that ref needs the value to have been initialised before it is passed.

The compiler will issue an error if you try to send an unassigned variable with the ref keyword, but it will let you send happily using out. Ofcourse, if you don't have logic to handle the unassigned out variable in the target method you will get a runtime exception anyway.

Reference-Types
There really is no way to pass reference-types other than by reference. So although you can use the out and ref keywords with an object, there really isn't much point:

static void Main(string[] args)
{
    MyClass myObj = new MyClass();
    MyMethod(ref myObj);            //not much point
}

If you want to pass a reference-type variable by-value, you need to start thinking about cloning the object, which is a whole new subject matter in itself. If you simply want to prevent the passing object from being altered you might want to consider making the class immutable, but in any case that's outside the scope of this article.

More examples

Monday, July 18, 2011

Boxing and Unboxing in C#

C#, like all .NET languages, uses a unified type system (called Common Type System). The idea is that all types that can ever be declared within the framework, always ultimately derive from System.Object. This diagram gives you some idea how that works at the top-level.

However if primitive datatypes such as int and bool were to be stored always as objects it would be a huge performance hit. So in reality, at any given moment these primitive types are capable of being represented in one of two ways:

As a value type - stored on the stack or the heap
As a reference type - stored on the heap

(Check out my article on value types, reference types, the stack and the heap.)

Boxing
Boxing is the name given to the process of converting a value-type representation to a reference-type representation. The process is implicit - just use an assignment operator and the compiler will infer from the types involved that boxing is necessary:

public static void BoxUsingObjectType()
{
    const int i = 0;
    object intObject = i; //Boxing occurs here
}

In the above example the integer i is boxed into a generic object. This is the type of usage that you will generally see - usually the point of boxing is so that you can treat the value-type in a generic way (alongside reference-type objects). More on that later.

It's also useful to note that you can box directly to the correct type:

public static void BoxUsingCorrectTypes()
{
    const short s = 0;    //equivalent to System.Int16
    System.Int16 shortObject = s;

    const int i = 0;      //equivalent to System.Int32
    System.Int32 intObject = i;

    const long l = 0;     //equivalent to System.Int64
    System.Int64 longObject = l;
}

In fact all that happens when you box an int to an object is that it is first boxed and then downcasted.

The example below shows that you cannot box a larger value-type into a smaller reference-type, for the obvious reason that data will be lost. However, the reverse is possible, you can box a smaller value-type into a larger reference-type:

public static void BoxToDifferentSizedTypes()
{
    const int intVal = 0;   //equivalent to System.Int32
    const long longVal = 0; //equivalent to System.Int64

    /* This is fine */
    System.Int64 longObject = intVal;

    /* This causes a compiler error */
    System.Int32 intObject = longVal;
}

The second assignment operation above causes a compiler error.

Unboxing
The process of unboxing is explicit, and uses the same syntax as casting. Ofcourse, casting can only occur between two reference-type variables, so when the compiler encounters the syntax below with a value-type on the left and a reference-type on the right, it infers that unboxing is necessary:

public static void UnBox(System.Int32 intObject)
{
    int i = (int) intObject; //Unboxing occurs here
}
//or
public static void UnBox(object intObject)
{
    int i = (int) intObject; //Unboxing occurs here
}

Why Would you Use Boxing?
You generally wouldn't - and one good reason to know about it is to help avoid it. Like casting, it's an expensive operation. Here's a common scenario where you might inadvertently box:

int i = 10;
string s = string.Format("The value of i is {0}", i);

But the most common usage of boxing is in collections. In the .NET 1.1 days you might have used an ArrayList something like this:

ArrayList list = new ArrayList();
list.Add(10);                 // Boxing
int i = (int) list[0];        // Unboxing

Because ArrayList boxes and downcasts to object, this is an expensive operation.

Now in the era of generics this is less of a problem, because when you use a primitive type as a generic parameter, such as the <int> below, it is not boxed, and it is typesafe. This is a very good reason to use a generic collections.

List<int> list = new List<int>();
list.Add(10);                 // No Boxing
int i = list[0];              // No Unboxing

Generically-Typed Collections
The problem comes when you want to use a mixed list. First, how it looks in .NET 1.1:

ArrayList list = new ArrayList();
list.Add(10);                       // Boxing
list.Add(new SomeClass());          // Downcasting
int y = (int) list[0];              // Unboxing
SomeClass o = (SomeClass) list[1];  // Casting

Even with a generic list, we have to use object, or some common ancestor type, to ensure that our list will accept all values:

List<object> list = new List<object>();
list.Add(10);                       // Boxing
list.Add(new SomeClass());          // Downcasting
int y = (int) list[0];              // Unboxing
SomeClass o = (SomeClass) list[1];  // Casting

This is very similar to using ArrayList. But although there are no efficiency gains, with generics we explicitly have control as to what level casting operations occur.

There are some discussions here and here on stack-overflow about other possible use cases involving boxing and unboxing.

Further Reading
This page on the microsoft website gives you some more ideas including some diagrams demonstrating what's happening on the stack and the heap at runtime.

Also, this article on the-code-project has some nice examples and images, and a fuller explanation of value and reference types.

Sunday, July 17, 2011

The Heap and the Stack

There are two locations in memory where the .NET Framework stores items during runtime - the Heap and the Stack. Understanding the difference between the two is one of those nuanced things that will pay dividends in all kinds of programming scenarios.

This article is a concise run-through of the most important concepts. For a broader explanation, check out this article on c-sharp corner or this one on code-project (with nice diagrams).

What are the Heap and the Stack?

The Stack is more or less responsible for what's executing in our code.

There is a stack for every currently executing thread at runtime. Within each thread, each time a method is called, the method is added to the relevant stack for the thread. When the method is finished executing, it is removed from the stack. Therefore the stack is sequential, building upwards in one straight line, and at any given moment the CLR is only really interested in the items at the top of the stack.

Bear in mind that although it's easier to visualise it in terms of 'methods' being added to the stack, in reality the method itself is not added to the stack - the resources associated with the method are. This means arguments, internal method members, and the return value.

The stack is self-cleaning - when the method at the top of the stack finishes executing, the items associated with it have their resources freed.

The Heap is more or less responsible for keeping track of our objects. Anything on the heap can be accessed at any time - it's like a heap of laundry, anything can go anyplace.

Items placed in the Heap can be referenced by pointers to the relevant memory location where they are stored. Unlike the Stack, the Heap has to worry about garbage collection.

What can go on the Heap and the Stack?
First we'll talk about the different resources that can be placed on the heap and the stack, and then we'll talk about which go where.

Value Types
Reference Types, and
Pointers

For a nice tree diagram go here, or check out the lists:

1. Value Types
These are the types which are associated with System.ValueType (when boxed they derive from it).

bool
byte
char
decimal
double
enum
float
int
long
sbyte
short
struct
uint
ulong
ushort

2. Reference Types
System.Object, and anything which derives from it. Think in terms of:

class
interface
delegate
object
string

3. Pointers
A Pointer is a chunk of space in memory that points to another space in memory - it's value is either a memory address or null. All Reference Types are accessed through pointers. We don't explicitly use Pointers, they are managed by the CLR, but they exist in memory as items in their own right.

Which Go Where?

Reference Types always go on the Heap
Value Types and Pointers go wherever they are declared

Look at the example below:

public class MyClass
{
    /* This variable is placed on the HEAP
       inline with the containing reference-type,
       i.e. the class, when it is instantiated */
    public int MyClassMember;

    /* These 3 variables are placed on the STACK
       when the method is called, and removed
       when execution completes */
    public int MyMethod(int myArg)
    {
        int myLocal;
        return myArg + myLocal;
    }
}

Ofcourse, the class MyClass is a Reference Type and is placed on the Heap. The member variable MyClassMember is declared inline with a reference type, and therefore it is stored inline with that reference type on the Heap.

The local variables myArg, myLocal and the return variable are incidental to the object - they are not class members. They are not inline with a reference type and therefore they are stored on the Stack.

Reference Types and Pointers
When a Reference Type such as an object is instantiated, the actual contents are stored on the Heap. Under the hood, the CLR also creates a Pointer, the contents of which are a reference to the object's memory location on the heap.

In this way, reference types can be easily addressed, and can be addressed by reference from more than one variable. But where is that pointer stored?

It's the same rules as with Value Types - it depends where the object is declared:

public class MyClass
{
    /* This pointer is stored on the HEAP */
    MyClass myMember = new MyClass();

    public void MyMethod()
    {
        /* This pointer is stored on the STACK */
        MyClass myLocal = new MyClass();
    }
}

As noted above and discussed in the c-sharp-corner article, the same object can be referenced by more than one Pointer. It's important to understand that object assignment operations in .NET actually assign the pointer value - the memory address of the object. They do not copy the object's value, only the pointer value.

Take a look at this example:

public int ReturnValue()
{
    int x = new int();
    x = 3;
    int y = new int();
    y = x;      
    y = 4;          
    return x;
}

//This returns 3

This is simple enough.

But what happens when we wrap the value types inside a reference type? The key is what happens when you use the assignment operation on a reference type.

public class MyInt
{
    public int Val;
}

public int ReturnValue2()
{
    MyInt x = new MyInt();
    x.Val = 3;
    MyInt y = new MyInt();
    y = x;  /* y now points to the 'x' memory address */
    y.Val = 4;              
    return x.Val;
}

//This returns 4

As you can see, the assignment assigns the Pointer value - the memory address of the assigned object - not the value of the object or the object's members. As a consequence, the new MyInt() that was created and initially assigned to y, is now orphaned.

Dynamic vs. Static Memory
So the Stack and the Heap have different structures, behaviours and reasons for being. One is sequential and related to the current method, the other is tree-based, messy (requires GC) and random-access.

But why not just use one memory store - why separate them at all? The answer is to separate static memory from dynamic memory.

The Stack is static - once a chunk of memory is allocated for a variable, it's size cannot and will not change. Each unit is small and of fixed size.
The Heap is dynamic - reference types encapsulate value types and other reference types. Each unit is larger and of variable size.

These differences mean that the way space is allocated and consumed is very different. It's outside the scope of this article, but you can do more reading by looking up dynamic memory allocation and static memory allocation.

Friday, July 15, 2011

The 'Finally' in Try / Catch

When exactly is the finally block called in try/catch statements? If you know, you can correctly predict what gets printed here:

static void Main(string[] args)
{
    AssignToInt(null);
    AssignToInt(new object());
    AssignToInt(1);
}

public static bool AssignToInt(object o)
{
    try
    {
        int i = (int)o;
        Console.WriteLine("{0} assigned OK", i);
        return true;
    }
    catch (NullReferenceException)
    {
        Console.WriteLine("NullReferenceException");
        return false;
    }
    catch (InvalidCastException)
    {
        Console.WriteLine("InvalidCastException");
        return false;
    }
    finally
    {
        Console.WriteLine("Finally...");
    }
}

The finally block is always called after any execution within a try/catch block - regardless of whether an exception was caught or not. Even when a return command is found inside a try or catch block, the CLR will execute the finally block before executing the return command.

The best example of why you would use it is to clean up and close any resources you have left open. Lets say you open a FileStream in your try block - it will need to be closed whether or not an exception occurs. The FileStream.Close() method call should be made inside the finally block.

Therefore the output you get from running the code is:

NullReferenceException
Finally...
InvalidCastException
Finally...
1 assigned OK
Finally...