Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
90 views
in Technique[技术] by (71.8m points)

python 3.x - loop large data in efficient way

I have a dataset containing 70k rows and I want to apply jaccard_score to find similarities in every row with other rows. the dataset is look like this.

image

here is my code:

from sklearn.metrics import jaccard_score
mail_list=[]
for index in range(df.shape[0]):
    sub_list=[]
    for row in range(index+1,df.shape[0]):
        
        sub_list.append(round(jaccard_score(df.iloc[index,:],df.iloc[row,:],average='macro'),1))
        
    mail_list.append(max(sub_list))

this code is working fine but it takes too much time. how I modify this code to run fast.

question from:https://stackoverflow.com/questions/65920717/loop-large-data-in-efficient-way

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...