I have a dataset containing 70k rows and I want to apply jaccard_score
to find similarities in every row with other rows. the dataset is look like this.
image
here is my code:
from sklearn.metrics import jaccard_score
mail_list=[]
for index in range(df.shape[0]):
sub_list=[]
for row in range(index+1,df.shape[0]):
sub_list.append(round(jaccard_score(df.iloc[index,:],df.iloc[row,:],average='macro'),1))
mail_list.append(max(sub_list))
this code is working fine but it takes too much time. how I modify this code to run fast.
question from:
https://stackoverflow.com/questions/65920717/loop-large-data-in-efficient-way 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…