The reason why you are getting different results is the fact that your colour segmentation algorithm uses k-means clustering. I'm going to assume you don't know what this is as someone familiar in how it works would instantly tell you that this is why you're getting different results every time. In fact, the different results you are getting after you run this code each time are a natural consequence to k-means clustering, and I'll explain why.
How it works is that for some data that you have, you want to group them into k groups. You initially choose k random points in your data, and these will have labels from 1,2,...,k
. These are what we call the centroids. Then, you determine how close the rest of the data are to each of these points. You then group those points so that whichever points are closest to any of these k points, you assign those points to belong to that particular group (1,2,...,k
). After, for all of the points for each group, you update the centroids, which actually is defined as the representative point for each group. For each group, you compute the average of all of the points in each of the k groups. These become the new centroids for the next iteration. In the next iteration, you determine how close each point in your data is to each of the centroids. You keep iterating and repeating this behaviour until the centroids don't move anymore, or they move very little.
How this applies to the above code is that you are taking the image and you want to represent the image using only k possible colours. Each of these possible colours would thus be a centroid. Once you find which cluster each pixel belongs to, you would replace the pixel's colour with the centroid of the cluster that pixel belongs to. Therefore, for each colour pixel in your image, you want to decide which out of the k possible colours this pixel would be best represented with. The reason why this is a colour segmentation is because you are segmenting the image to belong to only k possible colours. This, in a more general sense, is what is called unsupervised segmentation.
Now, back to k-means. How you choose the initial centroids is the reason why you are getting different results. You are calling k-means in the default way, which automatically determines which initial points the algorithm will choose from. Because of this, you are not guaranteed to generate the same initial points each time you call the algorithm. If you want to repeat the same segmentation no matter how many times you call kmeans
, you will need to specify the initial points yourself. As such, you would need to modify the k-means call so that it looks like this:
[cluster_idx, cluster_center] = kmeans(ab,nColors,'distance','sqEuclidean', ...
'Replicates', 3, 'start', seeds);
Note that the call is the same, but we have added two additional parameters to the k-means call. The flag start
means that you are specifying the initial points, and seeds
is a k x p
array where k is how many groups you want. In this case, this is the same as nColors
, which is 3. p
is the dimension of your data. Because of the way you are transforming and reshaping your data, this is going to be 2. As such, you are ultimately specifying a 3 x 2
matrix. However, you have a Replicate
flag there. This means that the k-means algorithm will run a certain number of times specified by you, and it will output the segmentation that has the least amount of error. As such, we will repeat the kmeans
calls for as many times as specified with this flag. The above structure of seeds
will no longer be k x p
but k x p x n
, where n
is the number of times you want to run the segmentation. This is now a 3D matrix, where each 2D slice determines the initial points for each run of the algorithm. Keep this in mind for later.
How you choose these points is up to you. However, if you want to randomly choose these and not leave it up to you, but want to reproduce the same results every time you call this function, you should set the random seed generator to be a known number, like 123
. That way, when you generate random points, it will always generate the same sequence of points, and is thus reproducible. Therefore, I would add this to your code before calling kmeans
.
rng(123); %// Set seed for reproducibility
numReplicates = 3;
ind = randperm(size(ab,1), numReplicates*nColors); %// Randomly choose nColors colours from data
%// We are also repeating the experiment numReplicates times
%// Make a 3D matrix where each slice denotes the initial centres for each iteration
seeds = permute(reshape(ab(ind,:).', [2 nColors numReplicates]), [2 1 3]);
%// Now call kmeans
[cluster_idx, cluster_center] = kmeans(ab,nColors,'distance','sqEuclidean', ...
'Replicates', numReplicates, 'start', seeds);
Bear in mind that you specified the Replicates
flag, and we want to repeat this algorithm a certain number of times. This is 3
. Therefore, what we need to do is specify initial points for each run of the algorithm. Because we are going to have 3 clusters of points, and we are going to run this algorithm 3 times, we need 9 initial points (or nColors * numReplicates
) in total. Each set of initial points has to be a slice in a 3D array, which is why you see that complicated statement just before the kmeans
call.
I made the number of replicates as a variable so that you can change this and to your heart's content and it'll still work. The complicated statement with permute
and reshape
allows us to create this 3D matrix of points very easily.
Bear in mind that the call to randperm
in MATLAB only accepted the second parameter as of recently. If the above call to randperm
doesn't work, do this instead:
rng(123); %// Set seed for reproducibility
numReplicates = 3;
ind = randperm(size(ab,1)); %// Randomly choose nColors colours from data
ind = ind(1:numReplicates*nColors); %// We are also repeating the experiment numReplicates times
%// Make a 3D matrix where each slice denotes the initial centres for each iteration
seeds = permute(reshape(ab(ind,:).', [2 nColors numReplicates]), [2 1 3]);
%// Now call kmeans
[cluster_idx, cluster_center] = kmeans(ab,nColors,'distance','sqEuclidean', ...
'Replicates', numReplicates, 'start', seeds);
Now with the above code, you should be able to generate the same colour segmentation results every time.
Good luck!