computer vision - How to implement pixel-wise classification for scene labeling in TensorFlow?

Question

Welcome To Ask or Share your Answers For Others

computer vision - How to implement pixel-wise classification for scene labeling in TensorFlow?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

computer vision - How to implement pixel-wise classification for scene labeling in TensorFlow?

I am working on a deep learning model using Google's TensorFlow. The model should be used to segment and label scenes.

I am using the SiftFlow dataset which has 33 semantic classes and images with 256x256 pixels.
As a result, at my final layer using convolution and deconvolution I arrive at the following tensor(array) [256, 256, 33].
Next I would like to apply Softmax and compare the results to a semantic label of size [256, 256].

Questions: Should I apply mean averaging or argmax to my final layer so its shape becomes [256,256,1] and then loop through each pixel and classify as if I were classying 256x256 instances? If the answer is yes, how, if not, what other options?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:46:36+0000

To apply softmax and use a cross entropy loss, you have to keep intact the final output of your network of size batch_size x 256 x 256 x 33. Therefore you cannot use mean averaging or argmax because it would destroy the output probabilities of your network.

You have to loop through all the batch_size x 256 x 256 pixels and apply a cross entropy loss to your prediction for this pixel. This is easy with the built-in function tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels).

Some warnings from the doc before applying the code below:

WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results.
logits and must have the shape [batch_size, num_classes] and the dtype (either float32 or float64).
labels must have the shape [batch_size] and the dtype int64.

The trick is to use batch_size * 256 * 256 as the batch size required by the function. We will reshape logits and labels to this format. Here is the code I use:

inputs = tf.placeholder(tf.float32, [batch_size, 256, 256, 3])  # input images
logits = inference(inputs)  # your outputs of shape [batch_size, 256, 256, 33] (no final softmax !!)
labels = tf.placeholder(tf.float32, [batch_size, 256, 256])  # your labels of shape [batch_size, 256, 256] and type int64

reshaped_logits = tf.reshape(logits, [-1, 33])  # shape [batch_size*256*256, 33]
reshaped_labels = tf.reshape(labels, [-1])  # shape [batch_size*256*256]
loss = sparse_softmax_cross_entropy_with_logits(reshaped_logits, reshaped_labels)

You can then apply your optimizer on that loss.

Update: v0.10

The documentation of tf.sparse_softmax_cross_entropy_with_logits shows that it now accepts any shape for logits, so there is no need to reshape the tensors (thanks @chillinger):

inputs = tf.placeholder(tf.float32, [batch_size, 256, 256, 3])  # input images
logits = inference(inputs)  # your outputs of shape [batch_size, 256, 256, 33] (no final softmax !!)
labels = tf.placeholder(tf.float32, [batch_size, 256, 256])  # your labels of shape [batch_size, 256, 256] and type int64

loss = sparse_softmax_cross_entropy_with_logits(logits, labels)

Categories

computer vision - How to implement pixel-wise classification for scene labeling in TensorFlow?

computer vision - How to implement pixel-wise classification for scene labeling in TensorFlow?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Update: v0.10

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags