machine learning - What is `weight_decay` meta parameter in Caffe?

Question

Welcome To Ask or Share your Answers For Others

machine learning - What is `weight_decay` meta parameter in Caffe?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

machine learning - What is `weight_decay` meta parameter in Caffe?

Looking at an example 'solver.prototxt', posted on BVLC/caffe git, there is a training meta parameter

weight_decay: 0.04

What does this meta parameter mean? And what value should I assign to it?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:07:00+0000

The weight_decay meta parameter govern the regularization term of the neural net.

During training a regularization term is added to the network's loss to compute the backprop gradient. The weight_decay value determines how dominant this regularization term will be in the gradient computation.

As a rule of thumb, the more training examples you have, the weaker this term should be. The more parameters you have (i.e., deeper net, larger filters, larger InnerProduct layers etc.) the higher this term should be.

Caffe also allows you to choose between L2 regularization (default) and L1 regularization, by setting

regularization_type: "L1"

However, since in most cases weights are small numbers (i.e., -1<w<1), the L2 norm of the weights is significantly smaller than their L1 norm. Thus, if you choose to use regularization_type: "L1" you might need to tune weight_decay to a significantly smaller value.

While learning rate may (and usually does) change during training, the regularization weight is fixed throughout.

Categories

machine learning - What is `weight_decay` meta parameter in Caffe?

machine learning - What is `weight_decay` meta parameter in Caffe?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags