Is the CLR storing an instance of an object passed by a thread on a
call to Monitor.Enter(instance) so that when another thread tries to
enter a lock, the CLR will check the instance provided by the new
thread and if the instance matches to the first threads instance then
the CLR will add the new thread to a first in first served queue and
so on?
Disregarding instruction reordering and other magic performed by the jitter.
Firstly, let's address the significant part of the question:
Why Do Locks Require Instances In C#?
The answer is not that satisfying, yet it boils down to… Well, it had to be done some way!
You could imagine that the C# specs and the CLR could use a magic string, or a number to keep track of thread synchronisation, yet the designers chose to use Reference Types. Reference types already have a header used for other CLR activity, so instead of giving you magic numbers or strings which keeps in a table, they opted for a dual use header for a reference type to keep track of thread synchronization. End of story basically.
The longer story
Monitor
lock objects need to be reference types. Value Types do not have headers like Reference Types, in part because they do not need finalization and can’t be pinned by the GC. Moreover, Value Types can be boxed, which basically means they get wrapped into an object. When you pass a value type to Monitor
they get boxed, when you pass the same value type they get boxed into a different object (which negates all the internal CLR plumbing of a lock).
This is primarily the reason why value types can’t be use for locking...
Let’s move forward
Both value types and reference types have an internal memory layout. However, reference types also contain a 32 bit header to help the CLR perform certain housekeeping tasks on an object (as discussed above). This is what we will be talking about
There is a fair bit that goes on in the header, however it's far from rocket science. Though, regarding locking, there are only 2 concepts that matter here, the header Lock State information or whether the header needed to be inflated to the Sync Block Table.
Object header
The most significant bytes in a typical object header format is shown below.
|31 0|
----------------|
|7|6|5|4|3|2| --|
| | | | | |
| | | | | +- BIT_SBLK_IS_HASHCODE : set if the rest of the word is a hash code (or sync block index)
| | | | +--- BIT_SBLK_IS_HASH_OR_SYNCBLKINDEX : set if hashcode or sync block index is set
| | | +----- BIT_SBLK_SPIN_LOCK : lock the header for exclusive mutation on spin
| | +------- BIT_SBLK_GC_RESERVE : set if the object is pinned
| +--------- BIT_SBLK_FINALIZER_RUN : set if finalized already
+----------- BIT_SBLK_AGILE_IN_PROGRESS : set if locking on AppDomain agile classes
The header is responsible for holding certain easily accessible information for the CLR, primarily this is tiny bits of data for the GC, whether a HashCode has been generated and the Lock State of the object. Though, because there is only limited size in an object header (32 bits) the header may need to be inflated to the Sync Block Table. This would typically be done in the following situation.
- A Hash Code has been generated and thin lock has been acquired.
- A Fat Lock has been acquired
- A condition variable is involved (via Wait, Pulse etc.)
The header is just not big enough.
Lock State
Once you create a lock on an object, the CLR will look at the header and first determine whether it needs to find any locking information in the Sync Block table, it does this simply by looking at the bits that are set. If there is no Thin Lock, it will create one (if applicable). If there is a Thin Lock it will try to spin and wait for it. If the header has been inflated, it will look in the Sync Block Table for locking information (to be continued...).
Locking comes in 2 different flavours. Critical regions, and conditional variables.
- Critical regions are the result of
Enter
, Exit
, Lock
etc
- Conditional variables are the result of
Wait
, Pulse
etc, this is another story as it's not related to the question.
Regarding Critical Regions, there are 2 primary ways the CLR can lock for them. Thin Lock, and Fat Lock. The CLR uses the both of these in a hybrid lock model, which basically means it tries one first then falls back to the next.
Thin Lock
Object Thin Lock Header
|31 |26 |15 |9 0|
----------------------------------------------------------------
|7|6|5|4|3| App Domain Index | Lock Recusion Level | Thread id |
| | | | |
| | | | |
| | | | +--- BIT_SBLK_IS_HASH_OR_SYNCBLKINDEX = 0 can store a thin lock
A Thin Lock basically consists of an App Domain Index, Recursion Level, and a Managed Thread Id. The Thread Id is atomically set by a locking thread if zero, or if nonzero, a simple spin wait is used to reread the lock state various times to acquire the lock. If after a period of time the lock is still not available, it will need to promote the lock (and if not already done so), inflate the Thin Lock to the Sync Block Table and a true* lock will need to be registered with the operating based on a Kernel Event (like an auto reset event).
A Thin Lock is exactly how it sounds, it is a lighter weight mechanism and fast, however it’s at the cost of spinning the core to achieve its work. This hybrid locking mechanism is faster and less efficient for short release scenarios, however the CLR falls back to less resource intensive slower kernel lock for longer contention scenarios. In short, overall it will typically garner better results for day to day usage.
Fat Lock
In cases where contention occurs or when a condition variable is involved (via Wait, Pulse etc.), additional info needs to be stored in the Sync Block, such as a handle to the kernel object or a list of events that are associated with the lock. A fat lock is exactly what it sounds like, it’s a more aggressive lock, it's slower yet it's less resource-intensive as it doesn’t revolve around the CPU spinning needlessly, it's more suited to a longer lock cycle.
Sync Block Table
Object Sync Block Index header
|31 |25 0|
--------------------------------
|7|6|5|4|3|2| Sync Block Index |
| | | | | |
| | | | | +- BIT_SBLK_IS_HASHCODE = 0 sync block index
| | | | +--- BIT_SBLK_IS_HASH_OR_SYNCBLKINDEX = 1 hash code or sync block index
The CLR has a pre-initialized, recyclable, cached and reusable Sync Block Table on the heap. This table may contain a Hash code (migrated from the header), and various types of locking information that is referenced to by the Objects Header Sync Block Index (when promoted / inflation occurs).
Putting it all together*
When Monitor.Enter
is called, the CLR registers the acquisition either by storing the current thread Id (among other things) in the object header (as discussed) or promoting it to the Sycnc Block Table. If there is a Thin Lock the CLR will briefly use a spin to wait for the lock to be uncontended either by checking the header or the Sync Block Table.
If the spin lock cannot gain the lock after a certain amount of spins, it may eventually need to register an auto reset event with the operating system and store the handle in the Sync Block Table. At this point a waiting thread will just wait on that handle.
then the CLR will add the new thread to a first in first served queue and so on?
No, there is no queue as such, and subsequently this all can lead to unfair behaviour. Threads have the ability to steal the lock between the signal and the wakeup, however the CLR does help this in an orderly fashion and tries prevent a [lock convoy][3].
So, there is obviously a lot more that has been glossed over here with the types of locks (critical regions and condition variables), the CLR memory model, how the callbacks work and so forth. But it should give you a starting point to answer your initial question
Disclaimer : A lot of this information is actually subject to change as they are CLR implementation detail.