This is a slightly tricky dataset where the columns are laid out like this.
ID C.Date T.Date C(Area) T(Area) Level(closet)_1 Venti_1 Level(closet)_2 Venti_2
733 2013.06.18 2013.06.18 65.2 42.1 C6 0 C3 1
537 2015.10.01 2015.15.01 34.5 27.2 C3 0 T11 0
909 2016-01-14 2016-01-14 15.1 25.9 T4 1 T2 1
Rule
Step1 : Consider columns: ID, C.Date, C(Area), Level(closet)_1, Venti_1, Level(closet)_2, Venti_2
Rearrange the data like this.
ID Index Date Ref.Level Area Level(closet) Venti
733 1 2013.06.18 C 65.2 C6 0
733 2 2013.06.18 C 65.2 C3 1
Step2 : Consider columns: ID, T.Date, T(Area), Level(closet)_1, Venti_1, Level(closet)_2, Venti_2
Rearrange the data like this.
ID Index Date Ref.Level Area Level(closet) Venti
733 3 2013.06.18 T 42.1 NA NA
Notice both Step1 and Step2 references values in columns Level(closet)_1, Venti_1, Level(closet)_2, Venti_2
. The difference is in Step2, when there are values for T.Date
and T(Area)
the expectation is that either one of the Level(closet)
value will start with T*
, in the 1st ID 733 there were NONE. So the transformed dataset 3rd row has values NA for columns Level(closet), Venti
. The 2nd ID 537 again has both T.Date
and T(Area)
values, again based on the Step2 we look for Level(closet)
column values that start with T*
in this case Level(closet)_2
contains value T11
so for the wide-to-long transformed data for ID 523 will be
Step1 : Consider columns: ID, C.Date, C(Area), Level(closet)_1, Venti_1, Level(closet)_2, Venti_2
Rearrange the data like this.
ID Index Date Ref.Level Area Level(closet) Venti
537 1 2015.10.01 C 34.5 C3 0
Step2 : Consider columns: ID, T.Date, T(Area), Level(closet)_1, Venti_1, Level(closet)_2, Venti_2
Rearrange the data like this.
ID Index Date Ref.Level Area Level(closet) Venti
537 2 2015.15.01 T 27.2 T11 0
The final expected dataset would look like this below
ID Index Date Ref.Level Area Level(closet) Venti
733 1 2013.06.18 C 65.2 C6 0
733 2 2013.06.18 C 65.2 C3 1
733 3 2013.06.18 T 42.1 NA NA
537 1 2015.10.01 C 34.5 C3 0
537 2 2015.15.01 T 27.2 T11 0
909 1 2016-01-14 C 15.1 NA NA
909 2 2016-01-14 T 25.9 T4 1
909 3 2016-01-14 T 25.9 T2 1
Sorry this is a bit complicated. On the surface level this looks like taking few rows in the wide format and reshaping this to a long format but there is a nested ifelse to see if there are any values starting with T*
in the Level(closet)
columns. I am completely blank how to structure this in a long format like this. Any help or suggestions is much apricated. Thanks.
library(tidyverse)
df <- tibble::tribble(~`ID`, ~`C.Date`, ~`T.Date`, ~`C(Area)`, ~`T(Area)`, ~`Level(closet)_1`, ~`Venti_1`, ~`Level(closet)_2`, ~`Venti_2`,
"733", "2013.06.18", "2013.06.18", "65.2", "42.1", "C6", "0", "C3", "1",
"537", "2015.10.01", "2015.15.01", "34.5", "27.2", "C3", "0", "T11", "0",
"909", "2016-01-14", "2016-01-14", "15.1", "25.9", "T4", "1", "T2", "1"
)
question from:
https://stackoverflow.com/questions/65840183/r-conditional-wide-to-long-with-column-name-pattern