Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
771 views
in Technique[技术] by (71.8m points)

c++11 - performing vector intersection in C++

I have a vector of vector of unsigned. I need to find the intersection of all these vector of unsigned's for doing so I wrote the following code:

int func()
{
   vector<vector<unsigned> > t;
   vector<unsigned> intersectedValues;
   bool firstIntersection=true;
   for(int i=0;i<(t).size();i++)
   {
       if(firstIntersection)
       {
           intersectedValues=t[0];
           firstIntersection=false;
       }else{
           vector<unsigned> tempIntersectedSubjects;                                                              
           set_intersection(t[i].begin(),
                  t[i].end(), intersectedValues.begin(),
                  intersectedValues.end(),
                  std::inserter(tempIntersectedSubjects, tempIntersectedSubjects.begin()));
           intersectedValues=tempIntersectedSubjects;
       }         
       if(intersectedValues.size()==0)
           break;
   }               
}

Each individual vector has 9000 elements and there are many such vectors in "t". When I profiled my code I found that set_intersection takes the maximum amount of time and hence makes the code slow when there are many invocations of func(). Can someone please suggest as to how can I make the code more efficient.

I am using: gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)

EDIT: Individual vectors in vector "t" are sorted.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I don't have a framework to profile the operations but I'd certainly change the code to reuse the readily allocated vector. In addition, I'd hoist the initial intersection out of the loop. Also, std::back_inserter() should make sure that elements are added in the correct location rather than in the beginning:

int func()
{
    vector<vector<unsigned> > t = some_initialization();
    if (t.empty()) {
        return;
    }
    vector<unsigned> intersectedValues(t[0]);
    vector<unsigned> tempIntersectedSubjects;
    for (std::vector<std::vector<unsigned>>::size_type i(1u);
         i < t.size() && !intersectedValues.empty(); ++i) {
        std::set_intersection(t[i].begin(), t[i].end(),
                              intersectedValues.begin(), intersectedValues.end(),
                             std::back_inserter(tempIntersectedSubjects);
        std::swap(intersectedValues, tempIntersectedSubjects);
        tempIntersectedSubjects.clear();
    }
}               

I think this code has a fair chance to be faster. It may also be reasonable to intersect the sets different: instead of keeping one set and intersecting with that you could create a new intersection for pairs of adjacent sets and then intersect the first sets with their respect adjacent ones:

std::vector<std::vector<unsigned>> intersections(
    std::vector<std::vector<unsigned>> const& t) {
    std::vector<std::vector<unsigned>> r;
    std::vector<std::vector<unsignned>>::size_type i(0);
    for (; i + 1 < t.size(); i += 2) {
        r.push_back(intersect(t[i], t[i + 1]));
    }
    if (i < t.size()) {
        r.push_back(t[i]);
    }
    return r;
}

std::vector<unsigned> func(std::vector<std::vector<unsigned>> const& t) {
    if (t.empty()) { /* deal with t being empty... */ }
    std::vector<std::vector<unsigned>> r(intersections(t))
    return r.size() == 1? r[0]: func(r);
}

Of course, you wouldn't really implement it like this: you'd use Stepanov's binary counter to keep the intermediate sets. This approach assumes that the result is most likely non-empty. If the expectation is that the result will be empty that may not be an improvement.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...