Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
187 views
in Technique[技术] by (71.8m points)

python - 如何在TensorFlow中调试NaN值?(How does one debug NaN values in TensorFlow?)

I was running TensorFlow and I happen to have something yielding a NaN.

(我正在运行TensorFlow,并且碰巧产生了NaN。)

I'd like to know what it is but I do not know how to do this.

(我想知道它是什么,但我不知道该怎么做。)

The main issue is that in a "normal" procedural program I would just write a print statement just before the operation is executed.

(主要问题在于,在“正常”过程程序中,我只是在执行操作之前编写一条打印语句。)

The issue with TensorFlow is that I cannot do that because I first declare (or define) the graph, so adding print statements to the graph definition does not help.

(TensorFlow的问题在于我无法做到这一点,因为我先声明(或定义)了图形,因此在图形定义中添加打印语句无济于事。)

Are there any rules, advice, heuristics, anything to track down what might be causing the NaN?

(是否有任何规则,建议,试探法,还有什么可追踪可能导致NaN的原因?)


In this case I know more precisely what line to look at because I have the following:

(在这种情况下,我更确切地知道要看哪一行,因为我有以下几点:)

Delta_tilde = 2.0*tf.matmul(x,W) - tf.add(WW, XX) #note this quantity should always be positive because its pair-wise euclidian distance
Z = tf.sqrt(Delta_tilde)
Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)
Z = tf.pow(Z, 2.0)
A = tf.exp(Z) 

when this line is present I have it that it returns NaN as declared by my summary writers.

(当此行存在时,它可以返回摘要编写者声明的NaN。)

Why is this?

(为什么是这样?)

Is there a way to at least explore what value Z has after its being square rooted?

(有没有一种方法至少可以探索Z平方根后的值?)


For the specific example I posted, I tried tf.Print(0,Z) but with no success it printed nothing.

(对于我发布的特定示例,我尝试了tf.Print(0,Z)但没有成功,但未打印任何内容。)

As in:

(如:)

Delta_tilde = 2.0*tf.matmul(x,W) - tf.add(WW, XX) #note this quantity should always be positive because its pair-wise euclidian distance
Z = tf.sqrt(Delta_tilde)
tf.Print(0,[Z]) # <-------- TF PRINT STATMENT
Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)
Z = tf.pow(Z, 2.0)
A = tf.exp(Z) 

I actually don't understand what tf.Print is suppose to do.

(我实际上不明白tf.Print应该做什么。)

Why does it need two arguments?

(为什么需要两个参数?)

If I want to print 1 tensor why would I need to pass 2?

(如果我想打印1张量,为什么我需要通过2?)

Seems bizarre to me.

(对我来说似乎很奇怪。)


I was looking at the function tf.add_check_numerics_ops() but it doesn't say how to use it (plus the docs seem to not be super helpful).

(我当时在看函数tf.add_check_numerics_ops(),但是它没有说明如何使用它(加上文档似乎没有太大帮助)。)

Does anyone know how to use this?

(有人知道如何使用吗?)


Since I've had comments addressing the data might be bad, I am using standard MNIST.

(由于我对数据的注释可能不好,因此我使用的是标准MNIST。)

However, I am computing a quantity that is positive (pair-wise eucledian distance) and then square rooting it.

(但是,我正在计算一个正数(成对的欧氏距离),然后平方根。)

Thus, I wouldn't see how the data specifically would be an issue.

(因此,我看不到具体的数据将是什么问题。)

  ask by Pinocchio translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

There are a couple of reasons WHY you can get a NaN-result, often it is because of too high a learning rate but plenty other reasons are possible like for example corrupt data in your input-queue or a log of 0 calculation.

(为什么可以得到NaN结果有两个原因,通常是由于学习率太高,但是还有许多其他原因也是可能的,例如输入队列中的数据损坏或计算记录为0。)

Anyhow, debugging with a print as you describe cannot be done by a simple print (as this would result only in the printing of the tensor-information inside the graph and not print any actual values).

(无论如何,使用您描述的打印调试无法通过简单的打印完成(因为这只会导致在图形内部打印张量信息,而不会打印任何实际值)。)

However, if you use tf.print as an op in bulding the graph ( tf.print ) then when the graph gets executed you will get the actual values printed (and it IS a good exercise to watch these values to debug and understand the behavior of your net).

(但是,如果将tf.print用作构建图形( tf.print )的操作,则在执行图形时,您将获得打印的实际值(观察这些值以调试和了解行为是一个不错的练习的净值)。)

However, you are using the print-statement not entirely in the correct manner.

(但是,您不是完全以正确的方式使用打印语句。)

This is an op, so you need to pass it a tensor and request a result-tensor that you need to work with later on in the executing graph.

(这是一个操作,因此您需要向其传递一个张量并请求一个结果张量,稍后在执行图中需要使用该结果张量。)

Otherwise the op is not going to be executed and no printing occurs.

(否则,将不会执行该操作,并且不会进行打印。)

Try this:

(尝试这个:)

Z = tf.sqrt(Delta_tilde)
Z = tf.Print(Z,[Z], message="my Z-values:") # <-------- TF PRINT STATMENT
Z = Transform(Z) # potentially some transform, currently I have it to return Z for debugging (the identity)
Z = tf.pow(Z, 2.0)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...