Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
308 views
in Technique[技术] by (71.8m points)

c++ - How to tokenize (words) classifying punctuation as space

Based on this question which was closed rather quickly:
Trying to create a program to read a users input then break the array into seperate words are my pointers all valid?

Rather than closing I think some extra work could have gone into helping the OP to clarify the question.

The Question:

I want to tokenize user input and store the tokens into an array of words.
I want to use punctuation (.,-) as delimiter and thus removed it from the token stream.

In C I would use strtok() to break an array into tokens and then manually build an array.
Like this:

The main Function:

char **findwords(char *str);

int main()
{
    int     test;
    char    words[100]; //an array of chars to hold the string given by the user
    char    **word;  //pointer to a list of words
    int     index = 0; //index of the current word we are printing
    char    c;

    cout << "die monster !";
    //a loop to place the charecters that the user put in into the array  

    do
    {
        c = getchar();
        words[index] = c;
    }
    while (words[index] != '
');

    word = findwords(words);

    while (word[index] != 0) //loop through the list of words until the end of the list
    {
        printf("%s
", word[index]); // while the words are going through the list print them out
        index ++; //move on to the next word
    }

    //free it from the list since it was dynamically allocated
    free(word);
    cin >> test;

    return 0;
}

The line tokenizer:

char **findwords(char *str)
{
    int     size = 20; //original size of the list 
    char    *newword; //pointer to the new word from strok
    int     index = 0; //our current location in words
    char    **words = (char **)malloc(sizeof(char *) * (size +1)); //this is the actual list of words

    /* Get the initial word, and pass in the original string we want strtok() *
     *   to work on. Here, we are seperating words based on spaces, commas,   *
     *   periods, and dashes. IE, if they are found, a new word is created.   */

    newword = strtok(str, " ,.-");

    while (newword != 0) //create a loop that goes through the string until it gets to the end
    {
        if (index == size)
        {
            //if the string is larger than the array increase the maximum size of the array
            size += 10;
            //resize the array
            char **words = (char **)malloc(sizeof(char *) * (size +1));
        }
        //asign words to its proper value
        words[index] = newword;
        //get the next word in the string
        newword = strtok(0, " ,.-");
        //increment the index to get to the next word
        ++index;
    }
    words[index] = 0;

    return words;
}

Any comments on the above code would be appreciated.
But, additionally, what is the best technique for achieving this goal in C++?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Have a look at boost tokenizer for something that's much better in a C++ context than strtok().


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...