Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
134 views
in Technique[技术] by (71.8m points)

python - What is the fastest hash algorithm for only two files?

My client is sending millions of files and my program needs to say like...

Hey there, 

you've already done your job for some files and they has never been changed. 

Do not repeat the job for them but changed files"

and my code blocks like below...

# dict has key value of a file and hash value of a file content
newList = dict(getListofFilesFromMyClient()) 
oldList = dict(getListofFilesFromHistory())

for keyValue, hashValue in newList.items() :
    if(keyValue not in oldList) 
        # if this file is the new friend, 
        # do calculate hash code of this file and record it,
        # and do heavy job.
    else if(keyValue in oldList)
        if(hashValue == oldList[keyValue])
            # if this file is an old friend and has never been changed 
            # do not repeat heavy job.
        else
            # if this file is an old friend but has ever been changed 
            # repeat heavy job and re-calculate hash value and record it.
    else
        # It's not my business!
        

An identical hash value from different files are not my concern because hash collision probability between two files is less than 0.1%, right?

My concern is only for throughput to calculate hash value from a few mega byte file.

Which algorithm is the most suitable in this situation?

Any advice would be appreciated.

question from:https://stackoverflow.com/questions/65517666/what-is-the-fastest-hash-algorithm-for-only-two-files

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...