For a small project, I need to compare one image with another - to determine if the images are approximately the same or not. The images are smallish, varying from 25 to 100px across. The images are meant to be of the same picture data but are sublty different, so a simple pixel equality check won't work. Consider these two possible scenarios:
- A security (CCTV) camera in a museum looking at an exhibit: we want to quickly see if two different video frames show the same scene, but slight differences in lighting and camera focus means they won't be identical.
- A picture of a vector computer GUI icon rendered at 64x64 compared to the same icon rendered at 48x48 (but both images would be scaled down to 32x32 so the histograms have the same total pixel count).
I've decided to represent each image using histograms, using three 1D histograms: one for each RGB channel - it's safe for me to just use colour and to ignore texture and edge histograms (An alternative approach uses a single 3D histogram for each image, but I'm avoiding that as it adds extra complexity). Therefore I will need to compare the histograms to see how similar they are, and if the similarity measure passes some threshold value then I can say with confidence the respective images are visually the same - I would be comparing each image's corresponding channel histograms (e.g. image 1's red histogram with image 2's red histogram, then image 1's blue histogram with image 2's blue histogram, then the green histograms - so I'm not comparing image 1's red histogram with image 2's blue histogram, that would just be silly).
Let's say I have these three histograms, which represent a summary of the red RGB channel for three images (using 5 bins for 7-pixel images for simplicity):
H1 H2 H3
X X X
X X X X X
X X X X X X X X X X X X X
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
H1 = [ 1, 3, 0, 2, 1 ]
H2 = [ 3, 1, 0, 1, 2 ]
H3 = [ 1, 1, 1, 1, 3 ]
Image 1 (H1
) is my reference image, and I want to see if Image 2 (H2
) and/or Image 3 (H3
) is similar to Image 1. Note that in this example, Image 2 is similar to Image 1, but Image 3 is not.
When I did a cursory search for "histogram difference" algorithms (at least those I could understand) I found a popular approach was to just sum the differences between each bin, however this approach often fails because it weighs all bin differences the same.
To demonstrate the problem with this approach, in C# code, like this:
Int32[] image1RedHistogram = new Int32[] { 1, 3, 0, 2, 1 };
Int32[] image2RedHistogram = new Int32[] { 3, 2, 0, 1, 2 };
Int32[] image3RedHistogram = new Int32[] { 1, 1, 1, 1, 3 };
Int32 GetDifference(Int32[] x, Int32[] y) {
Int32 sumOfDifference = 0;
for( int i = 0; i < x.Length; i++ ) {
sumOfDifference += Math.Abs( x[i] - y[i] );
}
return sumOfDifferences;
}
The output of which is:
GetDifference( image1RedHistogram, image2RedHistogram ) == 6
GetDifference( image1RedHistogram, image3RedHistogram ) == 6
This is incorrect.
Is there a way to determine the difference between two histograms that takes into account the shape of the distribution?
See Question&Answers more detail:
os