Are you open to a different way of thinking about performance tuning?
It does not look at I/O vs CPU bound, hotspots, and timers.
First, think about just one thread. The execution of a thread is much like a tree. There is a main function (the trunk). There are points when subroutines are called (branches). There are terminal instructions (leaves) and blocking calls like I/O (fruit). The total time the program takes is the sum of all the leaves and all the fruit.
What you want to do is prune the tree, making it as light as possible, without killing it.
What many people do is weigh (time) the whole thing, and then weigh parts of it, and so on, and hope to find hotspots (leafy branches) that maybe they could trim.
Another way is 1) select some leaves or fruit at random. 2) from each leaf or fruit, paint a line from it along the branch it is on, all the way back to the trunk. 3) Take note of branches that have >1 lines painted on them. 4) Ask "Do I need this branch?". If you can prune it, do so. You will eliminate the entire weight of the branch, and you did it without weighing it. Then start over.
That's the idea behind random-pausing.
There are certain kinds of problems it will not find, but most of them it will find, quickly, including any that timing threads can find.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…