Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
706 views
in Technique[技术] by (71.8m points)

regex - Extract last 4-digit number from a series in R using stringr

I would like to flatten lists extracted from HTML tables. A minimal working example is presented below. The example depends on the stringr package in R. The first example exhibits the desired behavior.

years <- c("2005-", "2003-")
unlist(str_extract_all(years,"[[:digit:]]{4}"))

[1] "2005" "2003"

The below example produces an undesirable result when I try to match the last 4-digit number in a series of other numbers.

years1 <- c("2005-", "2003-", "1984-1992, 1996-")
unlist(str_extract_all(years1,"[[:digit:]]{4}$"))

character(0)

As I understand the documentation, I should include $ at the end of the pattern in order to request the match at the end of the string. I would prefer to match from the second example the numbers, "2005", "2003", and "1996".

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use base R sub for this quite easily:

sub('.*(\d{4}).*', '\1', years1)

## [1] "2005" "2003" "1996"

The pattern to be matched here is .* (zero or more of any character) followed by \d{4} (four consecutive numerals, which we capture by enclosing in parentheses), followed by zero or more characters.

sub replaces the matched pattern with the value in the second argument. In this case, \1 indicates that we want to replace the whole matched pattern with the first captured substring (i.e. the four consecutive numerals).

Here regex is greedy, so it will bypass early matches of \d{4}, consuming them with .*. Only the last sequence of four consecutive numerals is captured.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...