How might I categorize each row in an R dataframe (>10 000 rows) based on date range definitions in a separate, much smaller R dataframe (62 rows)?
My large dataframe, ConcFlow, looks similar to this when called via head(ConcFlow) :
STATION Sampling.Year DATE Flow
1 2016-2017 13/03/2017 177.45
1 2016-2017 12/01/2017 96.798
1 2016-2017 11/01/2017 99.902
2 2016-2017 4/03/2017 109.74
2 2016-2017 5/03/2017 100.55
3 2016-2017 19/05/2017 2302.5
1 2017-2018 13/03/2018 177.45
1 2017-2018 12/01/2018 96.798
1 2017-2018 11/01/2018 99.902
2 2017-2018 4/03/2018 109.74
2 2017-2018 5/03/2018 100.55
3 2017-2018 19/05/2018 2302.5
The smaller dataset, First.Flush, contains dates for the start and end of the Australian wet season, like this:
STATION Sampling.Year Start End Season
1 2011-2012 1/01/2012 1/07/2012 Wet Season
1 2013-2014 1/01/2014 2/07/2014 Wet Season
1 2014-2015 1/01/2015 2/07/2015 Wet Season
1 2015-2016 23/12/2015 22/06/2016 Wet Season
1 2016-2017 12/12/2016 12/06/2017 Wet Season
2 2011-2012 18/10/2011 17/04/2012 Wet Season
2 2012-2013 24/12/2012 24/06/2013 Wet Season
2 2013-2014 1/01/2014 2/07/2014 Wet Season
2 2014-2015 1/01/2015 2/07/2015 Wet Season
2 2015-2016 23/12/2015 22/06/2016 Wet Season
3 2011-2012 18/10/2011 17/04/2012 Wet Season
I need to add a 'season' column to my ConcFlow dataframe where the value would be determined based on whether the ConcFlow$DATE falls in the ranges defined in First.Flush. If the DATE is within First.Flush$Start and First.Flush$End (inclusive) it needs to be defined as Wet Season. If not, it should be defined as Dry Season.
It also needs to repeat for each ConcFLow$STATION and ConcFLow$Sampling.Year
The best I've been able to produce is a for loop, but I don't know how to make it repeat for each STATION and Sampling.Year. I'm a novice in R and loops don't make too much sense to me yet. I don't know if this is the best approach - any help would be much appreciated thank you!
for (i in seq_len(nrow(First.Flush$STATION))) {
ConcFlow$Season <- ifelse(is.na(ConcFlow$Season) &
ConcFlow$DATE >= First.Flush$Start[i] &
ConcFlow$DATE < First.Flush$End[i],
First.Flush$Season[i], ConcFlow$Season)
}
This is a similar question but I don't know how to apply it to multiple factor levels within my df (ConcFlow$Station and ConFlow$Sampling.Year).
categorize based on date ranges in R
Thank you
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…