clr - How boxing a value type work internally in C#?

Question

Welcome To Ask or Share your Answers For Others

clr - How boxing a value type work internally in C#?

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

clr - How boxing a value type work internally in C#?

I know what's boxing/unboxing, but not quite sure how it is implemented internally, let's say we have the following code:

struct Point {
   public Int32 x, y;
}
...
ArrayList a = new ArrayList();
Point p;                           // Allocate a Point (not in the heap).
for (Int32 i = 0; i < 10; i++) {
   p.x = p.y = i;                  // Initialize the members in the value type.
   a.Add(p);                       // Box the value type and add the reference to the Arraylist.
}

and below is a general description of what happens:

the Point value type must be converted into a true heap-managed object, and a reference to this object must be obtained.At run time, the fields currently residing in the Point value type instance p are copied into the newly allocated Point object. The address of the boxed Point object (now a reference type) is returned and is then passed to the Add method. The Point object will remain in the heap until it is garbage collected.

So my assumption is, when the compiler compiles the code and detects boxing is needed, so it generates IL code (in C# format for readability) as:

ArrayList a = new ArrayList();
Point p; 
for (Int32 i = 0; i < 10; i++) {
   p.x = p.y = i;
   Wrapper w = new Wrapper(p);  // Wrapper w is a dynamic generate class instance that takes a Point struct instance to copy its fields internally, I know it is not the exact correct format to express, but you get my idea.
   a.Add(w);                      
}

Q1-Is my assumption correct?

Q2- If my assumption is correct, it means p still exists on stack and we can reuse p as a struct instance, which impacts performance just a little bit as p still exists in stack, quite redundant, but since stack is a short life data structure which unwinds quickly, so we don't really care about the redundant struct instance since the impact is so tiny that no need to consider it?

question from:https://stackoverflow.com/questions/65912313/how-boxing-a-value-type-work-internally-in-c

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:11:07+0000

Isn't boxing & unboxing a special case of direct casting?

(object)instance and (MyType)instance does not need wrapping in the way you expose.

We call that boxing and unboxing because once boxed to object type, we can unbox without having a compiler type conversion error as any type mismatch will be raised at runtime.

All that: .NET, C#, OOP, casting, boxing, unboxing... is just high-level language sugar over machine code to allow humans to be able to do things better and simpler but more complex and more powerful in less time.

Else here for the code provided, all structs instances are created in the stack until the end of the method and all is lost.

But they are duplicated to be added in the array class, so these copies are next allocated in the heap.

How does the heap and stack work for instances and members of struct in C#?

Why methods return just one kind of parameter in normal conditions?

So the IL code is not as complicated as you imagine:

// arrayList.Add(point);
IL_001e: ldloc.0
IL_001f: ldloc.1
IL_0020: box ConsoleApp.Program/Point
IL_0025: callvirt instance int32 [mscorlib]System.Collections.ArrayList::Add(object)

From OpCodes.Box Field:

Converts a value type to an object reference (type O).

The stack transitional behavior, in sequential order, is:

A value type is pushed onto the stack.

The value type is popped from the stack; the box operation is performed.

An object reference to the resulting "boxed" value type is pushed onto the stack.

A value type has two separate representations within the Common Language Infrastructure (CLI):

A 'raw' form used when a value type is embedded within another object or on the stack.

A 'boxed' form, where the data in the value type is wrapped (boxed) into an object so it can exist as an independent entity.

The box instruction converts the 'raw' (unboxed) value type into an object reference (type O). This is accomplished by creating a new object and copying the data from the value type into the newly allocated object. valTypeToken is a metadata token indicating the type of the value type on the stack.

In terms of machine code material architecture dependant as with an Intel-type microprocessor, it is nothing more than using a memory pointer after performing all the specified processings.

Here is another sample:

int valueInt = 10;
double valueDouble = (double)valueInt;
object instance = (object)valueInt;
int value = (int)instance;

IL generated code is:

.method private hidebysig static 
    void Test () cil managed 
{
    .maxstack 1
    .locals init (
        [0] int32 valueInt,
        [1] float64 valueDouble,
        [2] object 'instance',
        [3] int32 'value'
    )

    // int num = 10;
    IL_0001: ldc.i4.s 10
    IL_0003: stloc.0

    // double num2 = num;
    IL_0004: ldloc.0
    IL_0005: conv.r8
    IL_0006: stloc.1

    // object obj = num;
    IL_0007: ldloc.0
    IL_0008: box [mscorlib]System.Int32
    IL_000d: stloc.2

    // int num3 = (int)obj;
    IL_000e: ldloc.2
    IL_000f: unbox.any [mscorlib]System.Int32
    IL_0014: stloc.3

    IL_0010: ret
}

From OpCodes.Unbox Field:

Converts the boxed representation of a value type to its unboxed form.

The stack transitional behavior, in sequential order, is:

An object reference is pushed onto the stack.

The object reference is popped from the stack and unboxed to a value type pointer.

The value type pointer is pushed onto the stack.

A value type has two separate representations within the Common Language Infrastructure (CLI):

A 'raw' form used when a value type is embedded within another object.

A 'boxed' form, where the data in the value type is wrapped (boxed) into an object so it can exist as an independent entity.

The unbox instruction converts the object reference (type O), the boxed representation of a value type, to a value type pointer (a managed pointer, type &), its unboxed form. The supplied value type (valType) is a metadata token indicating the type of value type contained within the boxed object.

Unlike Box, which is required to make a copy of a value type for use in the object, unbox is not required to copy the value type from the object. Typically it simply computes the address of the value type that is already present inside of the boxed object.

Last words

Since classes and structures are in fact the same thing, of course managed differently, but being only references and "hidden-references" (hidden and "hidden-hidden" memory pointers to forget to manage them as well as accessing and using, and to delegate this to the CLR), boxing and unbowing of value-types and non-value-types is essentially the same at this low level of operation, so the IL code does not differentiate between the two.

A look at the internals of 'boxing' in the CLR

I'm not sure if it might be relevant to flag as a duplicate, but here's a related question:

How does boxing and unboxing work at the lowest level

Categories

clr - How boxing a value type work internally in C#?

clr - How boxing a value type work internally in C#?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags