For example, if I use RNNs and have inputs of shape (B, T*, F)
. T*
is the maximum length of the time dimension for each batch (to which the rest are padded), so for the first batch and the second batch, T1
and T2
may be different.
I think if only normalize over the feature (last) dimension we can set the LN
layer to LN=nn.LayerNorm(F)
, but what if I want to normalize over both time and feature dimensions, how to initialize the LN
layer in this case while T*
is changing?
And, do I really need to normalize over the time dimension? Will it be useful or harmful?
Thanks in advance.
question from:
https://stackoverflow.com/questions/65846420/how-to-apply-layernorm-pytorch-to-both-time-and-feature-dimension-when-the-len 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…