i am relatively inexperienced in using R. I was looking for a weather normalizing technique and came across the package "rmweather".
I have run the model and am now not sure how I evaluate this. I test an overfitting with the 'rmw_predict_the_test_set' function. This will run the model with the test dataset.
I would like to do a cross validation. Do I need to compare the model statistics (rmw_model_statistics) with the test results? I get the error that my model is not a "ranger" model.
0
More Details:
i would like to study the effects of the COVID-19-virus containment measures on air quality. I have meteorological data for 4 years. The dataset "data_prepared" is spilt in 80 % training and 20% test. I have trained the model with data from 2017-2019 and would like to perform a weather normalization for 2020.
First I used the function:
RF_pm2.5_model <- "rmw_do_all(
data_prepared,
variables = c("date_unix", "day_julian", "weekday", "hour", "temp", "RH", "wd", "ws", "pressure","u.","L","MLH"),
variables_sample=c("hour", "temp", "RH", "wd", "ws", "pressure","u.","L","MLH"),
n_trees = 500,
n_samples = 500,
verbose = TRUE)"
I would like the check the modelperformance with crossvalidation. I found the option to check if the model has suffered from overfitting:
testing_model <- rmw_predict_the_test_set(model = RF_pm2.5_model$model,
df = RF_pm2.5_model$observations)
model_performance<-modStats(testing_model, mod = "value", obs = "value_predict",
statistic = c("n", "FAC2","MB", "MGE", "NMB", "NMGE", "RMSE","COE", "IOA", "r"), type = "default", rank.name = NULL)
But i am not sure, how to compare it to the performance for the training dataset.
question from:
https://stackoverflow.com/questions/66059328/r-how-to-validate-the-model-performance-of-the-rmweather-package