I am trying to apply Random Projections method on a very sparse dataset. I found papers and tutorials about Johnson Lindenstrauss method, but every one of them is full of equations which makes no meaningful explanation to me. For example, this document on Johnson-Lindenstrauss
Unfortunately, from this document, I can get no idea about the implementation steps of the algorithm. It's a long shot but is there anyone who can tell me the plain English version or very simple pseudo code of the algorithm? Or where can I start to dig this equations? Any suggestions?
For example, what I understand from the algorithm by reading this paper concerning Johnson-Lindenstrauss is that:
- Assume we have a
AxB
matrix where A
is number of samples and B
is the number of dimensions, e.g. 100x5000
. And I want to reduce the dimension of it to 500
, which will produce a 100x500
matrix.
As far as I understand: first, I need to construct a 100x500
matrix and fill the entries randomly with +1
and -1
(with a 50% probability).
Edit:
Okay, I think I started to get it. So we have a matrix A
which is mxn
. We want to reduce it to E
which is mxk
.
What we need to do is, to construct a matrix R
which has nxk
dimension, and fill it with 0
, -1
or +1
, with respect to 2/3
, 1/6
and 1/6
probability.
After constructing this R
, we'll simply do a matrix multiplication AxR
to find our reduced matrix E
. But we don't need to do a full matrix multiplication, because if an element of Ri
is 0
, we don't need to do calculation. Simply skip it. But if we face with 1
, we just add the column, or if it's -1
, just subtract it from the calculation. So we'll simply use summation rather than multiplication to find E
. And that is what makes this method very fast.
It turned out a very neat algorithm, although I feel too stupid to get the idea.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…