Sunday, July 17, 2011

The Heap and the Stack

There are two locations in memory where the .NET Framework stores items during runtime - the Heap and the Stack. Understanding the difference between the two is one of those nuanced things that will pay dividends in all kinds of programming scenarios.

This article is a concise run-through of the most important concepts. For a broader explanation, check out this article on c-sharp corner or this one on code-project (with nice diagrams).

What are the Heap and the Stack?

The stack

The Stack is more or less responsible for what's executing in our code.

There is a stack for every currently executing thread at runtime. Within each thread, each time a method is called, the method is added to the relevant stack for the thread. When the method is finished executing, it is removed from the stack. Therefore the stack is sequential, building upwards in one straight line, and at any given moment the CLR is only really interested in the items at the top of the stack.

Bear in mind that although it's easier to visualise it in terms of 'methods' being added to the stack, in reality the method itself is not added to the stack - the resources associated with the method are. This means arguments, internal method members, and the return value.

The stack is self-cleaning - when the method at the top of the stack finishes executing, the items associated with it have their resources freed.

The heap

The Heap is more or less responsible for keeping track of our objects. Anything on the heap can be accessed at any time - it's like a heap of laundry, anything can go anyplace.

Items placed in the Heap can be referenced by pointers to the relevant memory location where they are stored. Unlike the Stack, the Heap has to worry about garbage collection.

What can go on the Heap and the Stack?
First we'll talk about the different resources that can be placed on the heap and the stack, and then we'll talk about which go where.

  1. Value Types
  2. Reference Types, and
  3. Pointers

For a nice tree diagram go here, or check out the lists:

1. Value Types
These are the types which are associated with System.ValueType (when boxed they derive from it).

  • bool
  • byte
  • char
  • decimal
  • double
  • enum
  • float
  • int
  • long
  • sbyte
  • short
  • struct
  • uint
  • ulong
  • ushort

2. Reference Types
System.Object, and anything which derives from it. Think in terms of:

  • class
  • interface
  • delegate
  • object
  • string

3. Pointers
A Pointer is a chunk of space in memory that points to another space in memory - it's value is either a memory address or null. All Reference Types are accessed through pointers. We don't explicitly use Pointers, they are managed by the CLR, but they exist in memory as items in their own right.

Which Go Where?

  • Reference Types always go on the Heap
  • Value Types and Pointers go wherever they are declared

Look at the example below:

public class MyClass
{
    /* This variable is placed on the HEAP
       inline with the containing reference-type,
       i.e. the class, when it is instantiated */
    public int MyClassMember;

    /* These 3 variables are placed on the STACK
       when the method is called, and removed
       when execution completes */
    public int MyMethod(int myArg)
    {
        int myLocal;
        return myArg + myLocal;
    }
}

Ofcourse, the class MyClass is a Reference Type and is placed on the Heap. The member variable MyClassMember is declared inline with a reference type, and therefore it is stored inline with that reference type on the Heap.

The local variables myArg, myLocal and the return variable are incidental to the object - they are not class members. They are not inline with a reference type and therefore they are stored on the Stack.

Reference Types and Pointers
When a Reference Type such as an object is instantiated, the actual contents are stored on the Heap. Under the hood, the CLR also creates a Pointer, the contents of which are a reference to the object's memory location on the heap.

In this way, reference types can be easily addressed, and can be addressed by reference from more than one variable. But where is that pointer stored?

It's the same rules as with Value Types - it depends where the object is declared:

public class MyClass
{
    /* This pointer is stored on the HEAP */
    MyClass myMember = new MyClass();

    public void MyMethod()
    {
        /* This pointer is stored on the STACK */
        MyClass myLocal = new MyClass();
    }
}

As noted above and discussed in the c-sharp-corner article, the same object can be referenced by more than one Pointer. It's important to understand that object assignment operations in .NET actually assign the pointer value - the memory address of the object. They do not copy the object's value, only the pointer value.

Take a look at this example:

public int ReturnValue()
{
    int x = new int();
    x = 3;
    int y = new int();
    y = x;      
    y = 4;          
    return x;
}

//This returns 3

This is simple enough.

But what happens when we wrap the value types inside a reference type? The key is what happens when you use the assignment operation on a reference type.

public class MyInt
{
    public int Val;
}

public int ReturnValue2()
{
    MyInt x = new MyInt();
    x.Val = 3;
    MyInt y = new MyInt();
    y = x;  /* y now points to the 'x' memory address */
    y.Val = 4;              
    return x.Val;
}

//This returns 4

As you can see, the assignment assigns the Pointer value - the memory address of the assigned object - not the value of the object or the object's members. As a consequence, the new MyInt() that was created and initially assigned to y, is now orphaned.

Dynamic vs. Static Memory
So the Stack and the Heap have different structures, behaviours and reasons for being. One is sequential and related to the current method, the other is tree-based, messy (requires GC) and random-access.

But why not just use one memory store - why separate them at all? The answer is to separate static memory from dynamic memory.

  • The Stack is static - once a chunk of memory is allocated for a variable, it's size cannot and will not change. Each unit is small and of fixed size.
  • The Heap is dynamic - reference types encapsulate value types and other reference types. Each unit is larger and of variable size.

These differences mean that the way space is allocated and consumed is very different. It's outside the scope of this article, but you can do more reading by looking up dynamic memory allocation and static memory allocation.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.