Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
610 views
in Technique[技术] by (71.8m points)

regex - How to remove everything but certain words in string variable (Stata)?

I have a string variable response, which contains text as well as categories that have already been coded (categories like "CatPlease", "CatThanks", "ExcuseMe", "Apology", "Mit", etc.). I would like to erase everything in response except for these previously coded categories.

For example, I would like response to change from:

"I Mit understand CatPlease read it again CatThanks"

to:

"Mit CatPlease CatThanks"

This seems like a simple problem, but I can't get my regex code to work perfectly. The code below attempts to store the categories in a variable cat_only. It only works if the category appears at the beginning of response. The local macro, cats, contains all of the words I would like to preserve in response:

local cats = "(CatPlease|CatThanks|ExcuseMe|Apology|Mit|IThink|DK|Confused|Offers|CatYG)?"

gen cat_only = strltrim(strtrim(ustrregexs(1)+" "+ustrregexs(2)+" "+ustrregexs(3))) if ustrregexm(response, "`cats'.+?`cats'.+?`cats'")

If I add characters to the beginning of the search pattern in ustrregexm, however, nothing will be stored in cat_only:

gen cat_only = strltrim(strtrim(ustrregexs(1)+" "+ustrregexs(2)+" "+ustrregexs(3))) if ustrregexm(response, ".+?`cats'.+?`cats'.+?`cats'")

Is there a way to fix my code to make it work, or should I approach the problem differently?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
* Example generated by -dataex-. To install: ssc install dataex
clear
input str50 response
"I Mit understand CatPlease read it again CatThanks"
end

local regex "(?!CatPlease|CatThanks|ExcuseMe|Apology|Mit|IThink|DK|Confused|Offers|CatYG)[^s]+"
gen wanted = strtrim(stritrim(ustrregexra(response, "`regex'", "")))
list
. list

     +-------------------------------------------------------------------------------+
     |                                           response                     wanted |
     |-------------------------------------------------------------------------------|
  1. | I Mit understand CatPlease read it again CatThanks    Mit CatPlease CatThanks |
     +-------------------------------------------------------------------------------+

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...