I have a dataset where the values are collapsed so each row has multiple inputs per one column.
For example:
Gene Score1
Gene1 NA, NA, NA, 0.03, -0.3
Gene2 NA, 0.2, 0.1
I am trying to unpack this to then select the maximum absolute value per row for the Score1
column - and also keep track of if the maximum absolute value was previously negative by creating a new column.
So output of the example is:
Gene Score1 Negatives1
Gene1 0.3 1
Gene1 0.2 0
#Score1 is now the maximum absolute value and if it used to be negative is tracked
I code this with:
dat2 <- dat %>%
tidyr::separate_rows(Score1, sep = ",", convert = TRUE) %>%
group_by(Gene) %>%
#Create negative column to track max absolute values that were negative
summarise(Negatives1 = +(min(Score1 == -max(abs(Score1))),
Score1 = max(abs(Score1), na.rm = TRUE))
However, for some reason the above code gives me this error:
Error: Problem with `summarise()` input `Negatives1`.
x non-numeric argument to mathematical function
i Input `Negatives1` is `+(min(Score1) == -max(abs(Score1)))`.
i The error occurred in group 1: Gene = "Gene1".
Run `rlang::last_error()` to see where the error occurred.
I though by using convert = TRUE
this would make the values numeric - but the error suggests the code is getting non-numeric values after I run separate_rows()
?
Example input data:
structure(list(Gene = c("Gene1", "Gene2"), Score1 = c("NA, NA, NA, 0.03, -0.3",
"NA, 0.2, 0.1")), row.names = c(NA, -2L), class = c("data.table",
"data.frame"))
question from:
https://stackoverflow.com/questions/65939732/how-to-separate-values-in-a-column-and-convert-to-numeric-values