Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
297 views
in Technique[技术] by (71.8m points)

machine learning - Implementing BandRNN with pytorch and tensorflow

So I am trying to figure out how to train my matrix in a way that I will get a BandRNN.

BandRnn is a diagonalRNN model with a different number of connections per neuron. For example: enter image description here C is the number of connections per neuron.

I found out that there is a way to turn off some of the gradients in a for loop, in a way that prevents them from being trained as follows:

for p in model.input.parameters():
         p.requires_grad = False

But I can't find a proper way to do so, in a way that will make my matrix become a BandRNN.

Hopefully, someone will be able to help me with this issue.

question from:https://stackoverflow.com/questions/65883374/implementing-bandrnn-with-pytorch-and-tensorflow

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

As far as I know you can only activate/deactivate requires_grad on a tensor, and not on distinct components of that tensor. Instead what you could do is zero out the values outside the band.

First create a mask for the band, you could use torch.ones with torch.diagflat:

>>> torch.diagflat(torch.ones(5), offset=1)

By setting the right dimension for torch.ones as well as the right offset you can generate offset diagonal matrices with consistent shapes.

>>> N = 10; i = -1
>>> torch.diagflat(torch.ones(N-abs(i)), offset=i)
tensor([[0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.]])

>>> N = 10; i = 0
>>> torch.diagflat(torch.ones(N-abs(i)), offset=i)
tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])

>>> N = 10; i = 1
>>> torch.diagflat(torch.ones(N-abs(i)), offset=i)
tensor([[0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.],
        [0., 0., 0., 0., 0.]])

You get the point, summing these matrices element-wise allows use to get a mask:

>>> N = 10; b = 3
>>> mask = sum(torch.diagflat(torch.ones(N-abs(i)), i) for i in range(-b//2,b//2+1))

>>> mask
tensor([[1., 1., 0., 0., 0.],
        [1., 1., 1., 0., 0.],
        [1., 1., 1., 1., 0.],
        [0., 1., 1., 1., 1.],
        [0., 0., 1., 1., 1.]])

Then you can zero out the values outside the band on your nn.Linear:

>>> m = nn.Linear(N, N)
>>> m.weight.data = m.weight * mask

>>> m.weight
Parameter containing:
tensor([[-0.3321, -0.3377, -0.0000, -0.0000, -0.0000],
        [-0.4197,  0.1729,  0.2101,  0.0000,  0.0000],
        [ 0.3467,  0.2857, -0.3919, -0.0659,  0.0000],
        [ 0.0000, -0.4060,  0.0908,  0.0729, -0.1318],
        [ 0.0000, -0.0000, -0.4449, -0.0029, -0.1498]], requires_grad=True)

Note, you might need to perform this on each forward pass as the parameters outside the band might get updated to non-zero values during the training. Of course, you can initialize mask once and keep it in memory.

It would be more convenient to wrap everything into a custom nn.Module.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...