Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

pyspark - Spark cosine distance between rows using Dataframe

I have to compute a cosine distance between each rows but I have no idea how to do it using Spark API Dataframes elegantly. The idea is to compute similarities for each rows(items) and take top 10 similarities by comparing their similarities between rows. --> This is need for Item-Item Recommender System.

All that I've read about it is referred to computing similarity over columns Apache Spark Python Cosine Similarity over DataFrames May someone say is it possible to compute a cosine distance elegantly between rows using PySpark Dataframe's API or RDD's or I have to do it manually?

That's just some code to show what I intend to do

def cosineSimilarity(vec1, vec2):
    return vec1.dot(vec2) / (LA.norm(vec1) * LA.norm(vec2))


#p.s model is ALS
Pred_Factors = model.itemFactors.cache() #Pred_Factors = DataFrame[id: int, features: array<float>]

sims = []

for _id,_feature in Pred_Factors.toLocalIterator():
    for id, feature in Pred_Factors.toLocalIterator():

        itemFactor = _feature

        sims = sims.append(_id, cosineSimilarity(asarray(feature),itemFactor))

sims = sc.parallelize(l)
sortedSims = sims.takeOrdered(10, key=lambda x: -x[1])

Thanks in Advance for all the help

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use mllib.feature.IndexedRowMatrix's columnSimilarities function. It uses cosine metrics as distance function. It computes similarities between columns so, you have to take transpose before applying this function.

pred_ = IndexedRowMatrix(Pred_Factors.rdd.map(lambda x: IndexedRow(x[0],x[1]))).toBlockMatrix().transpose().toIndexedRowMatrix()
pred_sims = pred.columnSimilarities()

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...