To answer another part of the question - the kernel is mapped into every processes address space partially for efficiency/performance reasons (there are others too, I'm sure).
On most modern hardware, it is quicker to change the security level (thus allowing access to the pages that are otherwise protected, as mentioned in Alexey's answer) in order to perform system calls and other kernel provided functions than it is to change the security level and the entire virtual memory map, along with all the associated TLB cache flushes and everything else involved in a full context switch.
Since system calls can be fairly frequent events, the design that has evolved in Linux and many other places to try and minimize the overhead of utilizing kernel services, and mapping the kernel code and (at least some of the) data into each process is part of that.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…