I am using a datatype of std::vector<std::vector<T> >
to store a 2D matrix/array. I would like to determine the unique rows of this matrix. I am looking for any advice or pointers on how to go about doing this operation.
I have tried two methods.
Method 1: slightly convoluted. I keep an index for each row with 0/1 indicating whether the row is a duplicate value, and work through the matrix, storing the index of each unique row in a deque
. I want to store the results in a <vector<vector<T> >
, and so from this deque of indices, I pre-allocate and then assign the rows from the matrix into the return value.
Method 2: Is easier to read, and in many cases faster than method 1. I keep a deque of the unique rows that have been found, and just loop through the rows and compare each row to all the entries in this deque
.
I am comparing both of these methods to matlab, and these C++ routines are orders of magnitude slower. Does anyone have any clever ideas on how I might speed this operation up? I am looking to do this operation on matrices that potentially have millions of rows.
I am storing the unique rows in a deque during the loop to avoid the cost of resizing a vector, and then copying the deque
to the vector<vector<T> >
for the results. I've benchmarked this operation closely, and it is not anywhere near slowing operation down, it accounts for less than .5% of the runtime on a matrix with 100,000 rows for example.
Thanks,
Bob
Here is the code. If anyone is interested in a more complete example showing the usage, drop me a comment and I can put something together.
Method 1:
template <typename T>
void uniqueRows( const std::vector<std::vector<T> > &A,
std::vector<std::vector<T> > &ret) {
// Go through a vector<vector<T> > and find the unique rows
// have a value ind for each row that is 1/0 indicating if a value
// has been previously searched.
// cur : current item being compared to every item
// num : number of values searched for. Once all the values in the
// matrix have been searched, terminate.
size_t N = A.size();
size_t num=1,cur=0,it=1;
std::vector<unsigned char> ind(N,0);
std::deque<size_t> ulist; // create a deque to store the unique inds
ind[cur] = 1;
ulist.push_back(0); // ret.push_back(A[0]);
while(num < N ) {
if(it >= N ) {
++cur; // find next non-duplicate value, push back
while(ind[cur])
++cur;
ulist.push_back(cur); //ret.push_back(A[cur]);
++num;
it = cur+1; // start search for duplicates at the next row
if(it >= N && num == N)
break;
}
if(!ind[it] && A[cur]==A[it]) {
ind[it] = 1; // mark as duplicate
++num;
}
++it;
} // ~while num
// loop over the deque and .push_back the unique vectors
std::deque<size_t>::iterator iter;
const std::deque<size_t>::iterator end = ulist.end();
ret.reserve(ulist.size());
for(iter= ulist.begin(); iter != end; ++iter) {
ret.push_back(A[*iter]);
}
}
Here is the code for method 2:
template <typename T>
inline bool isInList(const std::deque< std::vector<T> > &A,
const std::vector<T> &b) {
typename std::deque<std::vector<T> >::const_iterator it;
const typename std::deque<std::vector<T> >::const_iterator end = A.end();
for(it = A.begin(); it != end; ++it) {
if(*it == b)
return true;
}
return false;
}
template <typename T>
void uniqueRows1(const::std::vector<std::vector<T> > &A,
std::vector<std::vector<T> > &ret) {
typename std::deque<std::vector<T> > ulist;
typename std::vector<std::vector<T> >::const_iterator it = A.begin();
const typename std::vector<std::vector<T> >::const_iterator end = A.end();
ulist.push_back(*it);
for(++it; it != end; ++it) {
if(!isInList(ulist,*it)) {
ulist.push_back(*it);
}
}
ret.reserve(ulist.size());
for(size_t i = 0; i != ulist.size(); ++i) {
ret.push_back(ulist[i]);
}
}
See Question&Answers more detail:
os