I have a data frame with MRN, dates, and a test value.
I need to select all the first rows per MRN that have three consecutive values above 0.5.
This is an example version of the data:
MRN Collected_Date ANC
1 001 2015-01-02 0.345
2 001 2015-01-03 0.532
3 001 2015-01-04 0.843
4 001 2015-01-05 0.932
5 002 2015-03-03 0.012
6 002 2015-03-05 0.022
7 002 2015-03-06 0.543
8 002 2015-03-07 0.563
9 003 2015-08-02 0.343
10 003 2015-08-03 0.500
11 003 2015-08-04 0.734
12 003 2015-08-05 0.455
13 004 2014-01-02 0.001
14 004 2014-01-03 0.500
15 004 2014-01-04 0.562
16 004 2014-01-05 0.503
Example code:
df <- data.frame(MRN = c('001','001','001','001',
'002','002','002','002',
'003','003','003','003',
'004','004','004','004'),
Collected_Date = as.Date(c('01-02-2015','01-03-2015','01-04-2015','01-05-2015',
'03-03-2015','03-05-2015','03-06-2015','03-07-2015',
'08-02-2015','08-03-2015','08-04-2015','08-05-2015',
'01-02-2014','01-03-2014','01-04-2014','01-05-2014'),
format = '%m-%d-%Y'),
ANC = as.numeric(c('0.345','0.532','0.843','0.932',
'0.012','0.022','0.543','0.563',
'0.343','0.500','0.734','0.455',
'0.001','0.500','0.562','0.503')))
Currently, I am using a very awkward approach using the lag function to calculate the date difference, then filter for all values >= 0.5, and then sum up the values, which helps to select the date of the THIRD value. I then substract two days to get the date of the first value:
df %>% group_by(MRN) %>%
mutate(., days_diff = abs(Collected_Date[1] - Collected_Date)) %>%
filter(ANC >= 0.5) %>%
mutate(days = days_diff + lag((days_diff))) %>%
filter(days == 5) %>%
mutate(Collected_Date = Collected_Date - 2) %>%
select(MRN, Collected_Date)
Output:
Source: local data frame [2 x 2]
Groups: MRN
MRN Collected_Date
1 001 2015-01-03
2 004 2014-01-03
There must be a way simpler / more elegant way. Also, it does not give accurate results if there are gaps between the test dates.
My desired output for this example is:
MRN Collected_Date ANC
1 001 2015-01-03 0.532
2 004 2014-01-03 0.500
So if at least three consecutive test values are >= 0.5, the date of the FIRST value should be returned.
If there are not at least three consecutive values >= 0.5, NA should be returned.
Any help is greatly appreciated!
Thank you very much!
See Question&Answers more detail:
os