I'm using TensorFlow to build a deep learning model. And new to TensorFlow.
Due to some reason, my model has limited batch size, then this limited batch-size will make the model has a high variance.
So, I want to use some trick to make the batch size larger. My idea is to store the gradients of each mini-batch, for example 64 mini-batches, and then sum the gradients together, use the mean gradients of this 64 mini batches of training data to update the model's parameters.
This means that for the first 63 mini-batches, do not update the parameters, and after the 64 mini batch, update the model's parameters only once.
But as TensorFlow is graph based, do anyone know how to implement this wanted feature?
Thanks very much.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…