Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
163 views
in Technique[技术] by (71.8m points)

c++ - Boost regex not working as expected in my code

I just started using Boost::regex today and am quite a novice in Regular Expressions too. I have been using "The Regulator" and Expresso to test my regex and seem satisfied with what I see there, but transferring that regex to boost, does not seem to do what I want it to do. Any pointers to help me a solution would be most welcome. As a side question are there any tools that would help me test my regex against boost.regex?

using namespace boost;
using namespace std;

vector<string> tokenizer::to_vector_int(const string s)
{
    regex re("\d*");
    vector<string> vs;
    cmatch matches;
    if( regex_match(s.c_str(), matches, re) ) {
        MessageBox(NULL, L"Hmmm", L"", MB_OK); // it never gets here
        for( unsigned int i = 1 ; i < matches.size() ; ++i ) {
            string match(matches[i].first, matches[i].second);
            vs.push_back(match);
        }
    }
    return vs;
}

void _uttokenizer::test_to_vector_int() 
{
    vector<string> __vi = tokenizer::to_vector_int("0<br/>1");
    for( int i = 0 ; i < __vi.size() ; ++i ) INFO(__vi[i]);
    CPPUNIT_ASSERT_EQUAL(2, (int)__vi.size());//always fails
}

Update (Thanks to Dav for helping me clarify my question): I was hoping to get a vector with 2 strings in them => "0" and "1". I instead never get a successful regex_match() (regex_match() always returns false) so the vector is always empty.

Thanks '1800 INFORMATION' for your suggestions. The to_vector_int() method now looks like this, but it goes into a never ending loop (I took the code you gave and modified it to make it compilable) and find "0","","","" and so on. It never find the "1".

vector<string> tokenizer::to_vector_int(const string s)
{
    regex re("(\d*)");
    vector<string> vs;

    cmatch matches;

    char * loc = const_cast<char *>(s.c_str());
    while( regex_search(loc, matches, re) ) {
        vs.push_back(string(matches[0].first, matches[0].second));
        loc = const_cast<char *>(matches.suffix().str().c_str());
    }

    return vs;
}

In all honesty I don't think I have still understood the basics of searching for a pattern and getting the matches. Are there any tutorials with examples that explains this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The basic problem is that you are using regex_match when you should be using regex_search:

The algorithms regex_search and regex_match make use of match_results to report what matched; the difference between these algorithms is that regex_match will only find matches that consume all of the input text, where as regex_search will search for a match anywhere within the text being matched.

From the boost documentation. Change it to use regex_search and it will work.

Also, it looks like you are not capturing the matches. Try changing the regex to this:

regex re("(\d*)");

Or, maybe you need to be calling regex_search repeatedly:

char *where = s.c_str();
while (regex_search(s.c_str(), matches, re))
{
  where = m.suffix().first;
}

This is since you only have one capture in your regex.

Alternatively, change your regex, if you know the basic structure of the data:

regex re("(\d+).*?(\d+)");

This would match two numbers within the search string.

Note that the regular expression d* will match zero or more digits - this includes the empty string "" since this is exactly zero digits. I would change the expression to d+ which will match 1 or more.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...