tensorflow - Debugging nans in the backward pass

Question

Welcome To Ask or Share your Answers For Others

tensorflow - Debugging nans in the backward pass

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

tensorflow - Debugging nans in the backward pass

I'm trying to debug a somewhat complicated and non-canonical NN architecture. Computing the forward pass is fine and is giving me the expected results, but when I try to optimize using Adam or any of the standard optimizers, even after one iteration with a very small learning rate I get nans everywhere. I'm trying to localize them and was wondering if there's a way to catch the first occurrence of a nan and detect in which op it arose? I tried tf.add_check_numerics_ops() but it doesn't appear to be doing anything, or perhaps I'm using it incorrectly.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:34:48+0000

Debugging NaNs can be tricky, especially if you have a large network. tf.add_check_numerics_ops() adds ops to the graph that assert that each floating point tensor in the graph does not contain any NaN values, but does not run these checks by default. Instead it returns an op that you can run periodically, or on every step, as follows:

train_op = ...
check_op = tf.add_check_numerics_ops()

sess = tf.Session()
sess.run([train_op, check_op])  # Runs training and checks for NaNs

Categories

tensorflow - Debugging nans in the backward pass

tensorflow - Debugging nans in the backward pass

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags