I have a distance matrix n*n M
where M_ij
is the distance between object_i
and object_j
. So as expected, it takes the following form:
/ 0 M_01 M_02 ... M_0n
| M_10 0 M_12 ... M_1n |
| M_20 M_21 0 ... M2_n |
| ... |
M_n0 M_n2 M_n2 ... 0 /
Now I wish to cluster these n objects with hierarchical clustering. Python has an implementation of this called scipy.cluster.hierarchy.linkage(y, method='single', metric='euclidean')
.
Its documentation says:
y must be a {n choose 2} sized vector where n is the number of
original observations paired in the distance matrix.
y : ndarray
A condensed or redundant distance matrix. A condensed
distance matrix is a flat array containing the upper triangular of the
distance matrix. This is the form that pdist returns. Alternatively, a
collection of m observation vectors in n dimensions may be passed as
an m by n array.
I am confused by this description of y
. Can I directly feed my M
in as the input y
?
Update
@hongbo-zhu-cn has raised this issue up in GitHub. This is exactly what I am concerning about. However, as a newbie to GitHub, I don't know how it works and therefore have no idea how this issue is dealt with.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…