c++ - Optimizing very often used anagram function

Question

Welcome To Ask or Share your Answers For Others

c++ - Optimizing very often used anagram function

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

c++ - Optimizing very often used anagram function

I have written a function that determines whether two words are anagrams. Word A is an anagram of word B if you can build word B out of A just by rearranging the letters, e.g.:

lead is anagram of deal

This is my function:

bool is_anagram(std::string const & s1, std::string const & s2)
{
    auto check = [](std::string const & x)
    {
        std::map<char, unsigned> counter;
        for(auto const & c : x)
        {
            auto it = counter.find(c);
            if(it == counter.end())
                counter[c] = 1;
            else
                ++counter[c];
        }
        return counter;
    };

    return check(s1) == check(s2);
}

This works fine, but as the number of words increases (and this function is used several million times within my application), it very soon became a major bottleneck of my application.

Does anyone have an idea of how to speed this function up?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:47:39+0000

The map creation and your call to std::map::find within the iteration, is quite expensive.

In this case, you can use the fact that std::string behaves in many regards like a std::vector<char>, meaning that you can sort it using std::sort:

bool is_anagram(std::string s1, std::string s2)
{
    std::sort(s1.begin(), s1.end());
    std::sort(s2.begin(), s2.end());
    return s1 == s2;
}

Instead of the two maps that you are creating, I am taking a copy of the strings (passing them by value instead of const reference) and sorting them, so

sort("lead") => "adel"
sort("deal") => "adel"

This change should already speed your algorithm up by quite a bit. One more thing that may greatly affect the performance if you tend to compare arbitrary words:

bool is_anagram(std::string s1, std::string s2)
{
    if(s1.length() != s2.length())
        return false;
    /* as above */
}

If the length of the two strings differs, it obviously cannot be an anagram. std::string::length() is a very fast operation (it needs to store the string size anyway), so we save us the hustle of O(N*log(N)) from sorting the two strings.

Categories

c++ - Optimizing very often used anagram function

c++ - Optimizing very often used anagram function

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags