Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
607 views
in Technique[技术] by (71.8m points)

machine learning - How to calculate the number of parameters of an LSTM network?

Is there a way to calculate the total number of parameters in a LSTM network.

I have found a example but I'm unsure of how correct this is or If I have understood it correctly.

For eg consider the following example:-

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import LSTM
model = Sequential()
model.add(LSTM(256, input_dim=4096, input_length=16))
model.summary()

Output

____________________________________________________________________________________________________
Layer (type)                       Output Shape        Param #     Connected to                     
====================================================================================================
lstm_1 (LSTM)                      (None, 256)         4457472     lstm_input_1[0][0]               
====================================================================================================
Total params: 4457472
____________________________________________________________________________________________________

As per My understanding n is the input vector lenght. And m is the number of time steps. and in this example they consider the number of hidden layers to be 1.

Hence according to the formula in the post. 4(nm+n^2) in my example m=16;n=4096;num_of_units=256

4*((4096*16)+(4096*4096))*256 = 17246978048

Why is there such a difference? Did I misunderstand the example or was the formula wrong ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

No - the number of parameters of a LSTM layer in Keras equals to:

params = 4 * ((size_of_input + 1) * size_of_output + size_of_output^2)

Additional 1 comes from bias terms. So n is size of input (increased by the bias term) and m is size of output of a LSTM layer.

So finally :

4 * (4097 * 256 + 256^2) = 4457472

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...