vruntime is per-thread; it is a member nested within the task_struct.
Essentially, vruntime is a measure of the "runtime" of the thread - the amount of time it has spent on the processor. The whole point of CFS is to be fair to all; hence, the algo kind of boils down to a simple thing: (among the tasks on a given runqueue) the task with the lowest vruntime is the task that most deserves to run, hence select it as 'next'. (The actual implementation is done using an rbtree for efficiency).
Taking into account various factors - like priority, nice value, cgroups, etc - the calculation of vruntime is not as straight-forward as a simple increment. I'd suggest reading the relevant section in "Professional Linux Kernel Architecture", Mauerer, Wrox Press - it's explained in great detail.
Pl see below a quick attempt at summarizing some of this.
Other resource:
Documentation/scheduler/sched-design-CFS.txt
Quick summary - vruntime calculation:
(based on the book)
Most of the work is done in kernel/sched_fair.c:__update_curr()
Called on timer tick
Updates the physical and virtual time 'current' has just spent on the processor
For tasks that run at default priority, i.e., nice value 0, the physical and virtual time spent is identical
Not so for tasks at other priority (nice) levels; thus the calculation of vruntime is affected by the priority of current using a load weight factor
delta_exec = (unsigned long)(now – curr->exec_start);
// ...
delta_exec_weighted = calc_delta_fair(delta_exec, curr);
curr->vruntime += delta_exec_weighted;
Neglecting some rounding and overflow checking, what calc_delta_fair does is to
compute the value given by the following formula:
delta_exec_weighed = delta_exec * (NICE_0_LOAD / curr->load.weight)
The thing is, more important tasks (those with a lower nice value) will have larger
weights; thus, by the above equations, the vruntime accounted to them will be smaller
(thus having them enqueued more to the left on the rbtree!).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…