Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
101 views
in Technique[技术] by (71.8m points)

r - Using an If function, how do I change string of text in dataframe to another string?

using the following dataset

 structure(list(...1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12), V1 = c("overstress", "flicker", "lotteri", "life", 
"charg", "capac", "health", "drain", "degrad", "protector", "bright", 
"use", "overstress", "flicker", "lotteri", "life", "charg", "capac", 
"health", "drain", "degrad", "protector", "bright", "use", "overstress", 
"flicker", "lotteri", "life", "charg", "capac", "health", "drain", 
"degrad", "protector", "bright", "use"), term = c("corr1", "corr1", 
"corr1", "corr1", "corr1", "corr1", "corr1", "corr1", "corr1", 
"corr1", "corr1", "corr1", "corr2", "corr2", "corr2", "corr2", 
"corr2", "corr2", "corr2", "corr2", "corr2", "corr2", "corr2", 
"corr2", "corr3", "corr3", "corr3", "corr3", "corr3", "corr3", 
"corr3", "corr3", "corr3", "corr3", "corr3", "corr3"), correlation = c(0.5, 
0.43, 0.42, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.53, 
0.29, 0.25, 0.25, 0.23, 0.2, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 0.45, 0.16, 0.15)), row.names = c(NA, -36L), class = c("tbl_df", 
"tbl", "data.frame"))

I am looking to change if the word is corr1, corr2 or corr3, to toil1,toil2 or toil3. I tried the following code, but only receive the following error term:

three_terms_corrs_gathered$term <- if
(three_terms_corrs_gathered$term  == "corr1"){toil1} else if
(three_terms_corrs_gathered$term  == "corr2"){toil2} else
{toil3}

Warning message:

In if (three_terms_corrs_gathered$term == "corr1") { : the condition has length > 1 and only the first element will be used. So it only changes to the first condition. What am I doing wrong?

question from:https://stackoverflow.com/questions/65906378/using-an-if-function-how-do-i-change-string-of-text-in-dataframe-to-another-str

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Three options:

  1. "Merge" mentality. This works very well when you have multiple disparate matches, as it is both efficient for code and easy to visualize and maintain. While the example here only has two replacements, the code doesn't change if corrs_df has 2 rows or 200, and entries in corrs_df that match nothing are silently discarded, doing no harm.

    library(dplyr)
    corrs_df <- data.frame(term = c("corr1", "corr2"), newterm = c("toil1", "toil2"))
    dat %>%
      left_join(corrs_df, by = "term") %>%
      slice(c(1:3, 28:30))
    # # A tibble: 6 x 5
    #    ...1 V1         term  correlation newterm
    #   <dbl> <chr>      <chr>       <dbl> <chr>  
    # 1     1 overstress corr1        0.5  toil1  
    # 2     2 flicker    corr1        0.43 toil1  
    # 3     3 lotteri    corr1        0.42 toil1  
    # 4     4 life       corr3       NA    <NA>   
    # 5     5 charg      corr3       NA    <NA>   
    # 6     6 capac      corr3       NA    <NA>   
    
    dat %>%
      left_join(corrs_df, by = "term") %>%
      mutate(term = coalesce(newterm, term)) %>%
      slice(c(1:3, 28:30))
    # # A tibble: 6 x 5
    #    ...1 V1         term  correlation newterm
    #   <dbl> <chr>      <chr>       <dbl> <chr>  
    # 1     1 overstress toil1        0.5  toil1  
    # 2     2 flicker    toil1        0.43 toil1  
    # 3     3 lotteri    toil1        0.42 toil1  
    # 4     4 life       corr3       NA    <NA>   
    # 5     5 charg      corr3       NA    <NA>   
    # 6     6 capac      corr3       NA    <NA>   
    

    You can obviously %>% select(-newterm).) The coalesce function effectively says "give me the first non-NA value from these variables". The NA in newterm occurs when the associated term variable is not present in corrs_df, which we assume means to make no change.

  2. dplyr::case_when. (If you're into it, then data.table::fcase does effectively the same thing.)

    dat %>%
      mutate(
        term = case_when(
          term == "corr1" ~ "toil1",
          term == "corr2" ~ "toil2",
          TRUE ~ term)
      ) %>%
      slice(c(1:3, 28:30))
    # # A tibble: 6 x 4
    #    ...1 V1         term  correlation
    #   <dbl> <chr>      <chr>       <dbl>
    # 1     1 overstress toil1        0.5 
    # 2     2 flicker    toil1        0.43
    # 3     3 lotteri    toil1        0.42
    # 4     4 life       corr3       NA   
    # 5     5 charg      corr3       NA   
    # 6     6 capac      corr3       NA   
    
  3. Nested ifelse. Actually, since you're using dplyr, it is much better to use if_else for many reasons (e.g., this).

    dat %>%
      mutate(
        term = if_else(term == "corr1", "toil1",
                       if_else(term == "corr2", "toil2", term))
      ) %>%
      slice(c(1:3, 28:30))
    # # A tibble: 6 x 4
    #    ...1 V1         term  correlation
    #   <dbl> <chr>      <chr>       <dbl>
    # 1     1 overstress toil1        0.5 
    # 2     2 flicker    toil1        0.43
    # 3     3 lotteri    toil1        0.42
    # 4     4 life       corr3       NA   
    # 5     5 charg      corr3       NA   
    # 6     6 capac      corr3       NA   
    

    This works fine for 1 or 2 nestings, but in my opinion, it looks messy and it gets difficult to follow; in my experience, because it is harder to follow, it can be harder to maintain, making it quite simple to have incorrect placement of particular options/values. Maintainability and readability are very important in my opinion.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...