hash - How can I recognize slightly modified images?

Question

Welcome To Ask or Share your Answers For Others

hash - How can I recognize slightly modified images?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

hash - How can I recognize slightly modified images?

I have a very large database of jpeg images, about 2 million. I would like to do a fuzzy search for duplicates among those images. Duplicate images are two images that have many (around half) of their pixels with identical values and the rest are off by about +/- 3 in their R/G/B values. The images are identical to the naked eye. It's the kind of difference you'd get from re-compressing a jpeg.

I already have a foolproof way to detect if two images are identical: I sum the delta-brightness over all the pixels and compare to a threshold. This method has proven 100% accurate but doing 1 photo against 2 million is incredibly slow (hours per photo).

I would like to fingerprint the images in a way that I could just compare the fingerprints in a hash table. Even if I can reliably whittle down the number of images that I need to compare to just 100, I would be in great shape to compare 1 to 100. What would be a good algorithm for this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:34:40+0000

Have a look at O. Chum, J. Philbin, and A. Zisserman, Near duplicate image detection: min-hash and tf-idf weighting, in Proceedings of the British Machine Vision Conference, 2008. They solve the problem you have and demonstrate the results for 146k images. However, I have no first-hand experience with their approach.

Categories

hash - How can I recognize slightly modified images?

hash - How can I recognize slightly modified images?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags