Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
155 views
in Technique[技术] by (71.8m points)

r - How to separate values in a column and convert to numeric values?

I have a dataset where the values are collapsed so each row has multiple inputs per one column.

For example:

Gene   Score1                      
Gene1  NA, NA, NA, 0.03, -0.3 
Gene2  NA, 0.2, 0.1   

I am trying to unpack this to then select the maximum absolute value per row for the Score1 column - and also keep track of if the maximum absolute value was previously negative by creating a new column.

So output of the example is:

Gene   Score1    Negatives1
Gene1   0.3          1
Gene1   0.2          0
#Score1 is now the maximum absolute value and if it used to be negative is tracked

I code this with:

dat2 <- dat %>%
  tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>%
  group_by(Gene) %>%
  #Create negative column to track max absolute values that were negative
  summarise(Negatives1 = +(min(Score1 == -max(abs(Score1))),
            Score1 = max(abs(Score1), na.rm = TRUE))

However, for some reason the above code gives me this error:

Error: Problem with `summarise()` input `Negatives1`.
x non-numeric argument to mathematical function
i Input `Negatives1` is `+(min(Score1) == -max(abs(Score1)))`.
i The error occurred in group 1: Gene = "Gene1".
Run `rlang::last_error()` to see where the error occurred.

I though by using convert = TRUE this would make the values numeric - but the error suggests the code is getting non-numeric values after I run separate_rows()?

Example input data:

structure(list(Gene = c("Gene1", "Gene2"), Score1 = c("NA, NA, NA, 0.03, -0.3", 
"NA, 0.2, 0.1")), row.names = c(NA, -2L), class = c("data.table", 
"data.frame"))
question from:https://stackoverflow.com/questions/65939732/how-to-separate-values-in-a-column-and-convert-to-numeric-values

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If we look at the separate_rows outuput, I think the issue becomes clear: your separated column isn't numeric! I guess convert didn't pick it up. We can force the conversion with as.numeric() (and ignore the warnings - we want things like " NA" to become NA).

You also have some issues in the summarise - need more na.rm = TRUE, mismatched parens, etc.

dat %>%
  tidyr::separate_rows(Score1, sep = ",", convert = TRUE)
# # A tibble: 8 x 2
#   Gene  Score1 
#   <chr> <chr>  
# 1 Gene1  NA    
# 2 Gene1 " NA"  
# 3 Gene1 " NA"  
# 4 Gene1 " 0.03"
# 5 Gene1 " -0.3"
# 6 Gene2  NA    
# 7 Gene2 " 0.2" 
# 8 Gene2 " 0.1" 

dat %>%
  tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>% 
  mutate(Score1 = as.numeric(Score1)) %>% 
  group_by(Gene) %>%
  #Create negative column to track max absolute values that were negative
  summarise(
    Negatives1 = +(min(Score1, na.rm = TRUE) == -max(abs(Score1), na.rm = TRUE)),
    Score1 = max(abs(Score1), na.rm = TRUE)
  )
# `summarise()` ungrouping output (override with `.groups` argument)
# # A tibble: 2 x 3
#   Gene  Negatives1 Score1
#   <chr>      <int>  <dbl>
# 1 Gene1          1    0.3
# 2 Gene2          0    0.2


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...