I was running TensorFlow and I happen to have something yielding a NaN.
(我正在运行TensorFlow,并且碰巧产生了NaN。)
I'd like to know what it is but I do not know how to do this. (我想知道它是什么,但我不知道该怎么做。)
The main issue is that in a "normal" procedural program I would just write a print statement just before the operation is executed. (主要问题在于,在“正常”过程程序中,我只是在执行操作之前编写一条打印语句。)
The issue with TensorFlow is that I cannot do that because I first declare (or define) the graph, so adding print statements to the graph definition does not help. (TensorFlow的问题在于我无法做到这一点,因为我先声明(或定义)了图形,因此在图形定义中添加打印语句无济于事。)
Are there any rules, advice, heuristics, anything to track down what might be causing the NaN? (是否有任何规则,建议,试探法,还有什么可追踪可能导致NaN的原因?)
In this case I know more precisely what line to look at because I have the following:
(在这种情况下,我更确切地知道要看哪一行,因为我有以下几点:)
Delta_tilde = 2.0*tf.matmul(x,W) - tf.add(WW, XX) #note this quantity should always be positive because its pair-wise euclidian distance
Z = tf.sqrt(Delta_tilde)
Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)
Z = tf.pow(Z, 2.0)
A = tf.exp(Z)
when this line is present I have it that it returns NaN as declared by my summary writers.
(当此行存在时,它可以返回摘要编写者声明的NaN。)
Why is this? (为什么是这样?)
Is there a way to at least explore what value Z has after its being square rooted? (有没有一种方法至少可以探索Z平方根后的值?)
For the specific example I posted, I tried tf.Print(0,Z)
but with no success it printed nothing.
(对于我发布的特定示例,我尝试了tf.Print(0,Z)
但没有成功,但未打印任何内容。)
As in: (如:)
Delta_tilde = 2.0*tf.matmul(x,W) - tf.add(WW, XX) #note this quantity should always be positive because its pair-wise euclidian distance
Z = tf.sqrt(Delta_tilde)
tf.Print(0,[Z]) # <-------- TF PRINT STATMENT
Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)
Z = tf.pow(Z, 2.0)
A = tf.exp(Z)
I actually don't understand what tf.Print
is suppose to do.
(我实际上不明白tf.Print
应该做什么。)
Why does it need two arguments? (为什么需要两个参数?)
If I want to print 1 tensor why would I need to pass 2? (如果我想打印1张量,为什么我需要通过2?)
Seems bizarre to me. (对我来说似乎很奇怪。)
I was looking at the function tf.add_check_numerics_ops() but it doesn't say how to use it (plus the docs seem to not be super helpful).
(我当时在看函数tf.add_check_numerics_ops(),但是它没有说明如何使用它(加上文档似乎没有太大帮助)。)
Does anyone know how to use this? (有人知道如何使用吗?)
Since I've had comments addressing the data might be bad, I am using standard MNIST.
(由于我对数据的注释可能不好,因此我使用的是标准MNIST。)
However, I am computing a quantity that is positive (pair-wise eucledian distance) and then square rooting it. (但是,我正在计算一个正数(成对的欧氏距离),然后平方根。)
Thus, I wouldn't see how the data specifically would be an issue. (因此,我看不到具体的数据将是什么问题。)
ask by Pinocchio translate from so