They're the same.
If you check the implementation, you will find that it calls nll_loss
after applying log_softmax
on the incoming arguments.
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
Disclaimer: I specifically replied to "Now, what approach should anyone use, and why?" without knowledge about your use case.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…