I have to write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate (two columns) from each file where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no files meet the threshold requirement, then the function should return a numeric vector of length 0. A prototype of this function follows
My code looks like this
corr <- function(directory,threshold=0){
a<-list.files("specdata")
for (i in a) {
data <- read.csv(paste(directory, "/", i, sep =""))
x<-complete.cases(data)
j<-sum(as.numeric(x))
sulfate<-data[,2]
nitrate<-data[,3]
b<-cor(sulfate,nitrate)
}
if (j>threshold)
return(b)
else
numeric()
}
there's no error messege
If I type
z<-corr("specdata")
head(z)
[1] NA
I don't know what the problem is. I don't know if NA values in the columns have to do with it. I think something is missing in my code. I think the read.csv creates a unique data frame when I need one data frame per file but I don't see why the return is NA in this case (when there's no threshold).
However, if I introduce a bigger threshold (1000):
z<-corr("specdata",1000)
head(z)
numeric(0)
The expected output I need is
cr <- corr("specdata", 150)
head(cr)
[1] -0.01895754 -0.14051254 -0.04389737 -0.06815956 -0.12350667 -0.07588814
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…