To achieve determinism when training on CPU, the following should be sufficient:
(为了在使用CPU进行培训时获得确定性,应满足以下条件:)
1. SET ALL SEEDS
(1.设置所有种子)
SEED = 123
os.environ['PYTHONHASHSEED']=str(SEED)
random.seed(SEED)
np.random.seed(SEED)
tf.set_random_seed(SEED)
2. LIMIT CPU THREADS TO ONE
(2.将CPU线程数限制为一)
session_config.intra_op_parallelism_threads = 1
session_config.inter_op_parallelism_threads = 1
3. DATASET WORKERS
(3.数据集工作者)
If you are using tf.data.Dataset
, then make sure the number of workers is limited to one.
(如果使用的是tf.data.Dataset
,请确保tf.data.Dataset
的数量限制为一个。)
4. HOROVOD
(4.水平)
If you are training with more than two GPUs using Horovod, like so,
(如果您使用Horovod使用两个以上的GPU进行训练,)
os.environ['HOROVOD_FUSION_THRESHOLD']='0'
To more clearly check for determinism between runs, I recommend the method I have documented here .
(为了更清楚地检查两次运行之间的确定性,我建议使用此处记录的方法。)
I also recommend using this approach to confirm that the initial weights (before step one of training) are exactly the same between runs. (我还建议使用这种方法来确认两次跑步之间的初始权重(在训练的第一步之前)完全相同。)
For the latest information on determinism in TensorFlow (with a focus on determinism when using GPUs), please take a look the tensorflow-determinism project which NVIDIA is kindly paying me to drive.
(有关TensorFlow中确定性的最新信息(在使用GPU时侧重于确定性),请查看NVIDIA请我驱动的tensorflow确定性项目。)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…