At the end of the first training epoch the weights will of course have changed. A likely reasons why you see a decrease in performance in the earlier epochs before possibly improving later is that some optimization methods have internal states that adapt over time, for example decreasing step size as you converge, or increasing momentum decay etc. After training the internal state will typically not allow stepping too far away from where the model sits since it is believed to be close to optimal, so only tries to microtune. When you restart a training from scratch the method will typically allow much bigger steps earlier on to speed up early convergence since assumption is that the model is far from optimal. In you case you start close to optimal and allow the algo to make a large step which will likely take it to a much worse point...
If you don't want this to happen you'll need to dig into the internals of your optimization methods. Whether it is a good idea to do so? As usual in ML no one fits all answer and it depends on many factors, so try and see for your own specific case.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…