1 - is impossible. You don't know exactly how long a message will take to arrive on a client and no measurement you take will necessarily be applicable to the next message you send. The best you can do is an approximation, but you always need to assume that people will see EITHER slightly different things OR the same things at slightly different times. I'd recommend just sending the current state to everybody and using interpolation/extrapolation to smooth out the gameplay, so that everybody sees the game a few milliseconds in the past, with the delay varying both between players and over time. In general this is rarely a big problem. If you really want to buffer up some past states on the server you can interpolate between them and send different old data to different people in an attempt to sync what they see, but combined with the client side simulation and the jitter in transmission times you'll still see some differences across machines.
2 - the typical way is to run the simulation on the server, and send regular (small) state updates to clients. Clients typically run their own simulations and have a way to blend between their own predicted/interpolated state and the authoritative state that the server is sending them. All decisions other than user input should be made server-side. Ultimately the way you blend these is just a tradeoff between a smooth appearance and an accurate state so it's a cosmetic decision you'll have to make.
3 - your client should typically translate a keypress to a logical action. Your server doesn't care about keys. Send that logical action to the server and it can broadcast it to other clients if they need it. Generally though you wouldn't need to do anything here - any relevant change caused by the action will typically just change the game state, and thus will get sent out in the normal broadcast of that state.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…