I'm working writing some R extensions on C (C functions to be called from R).
My code needs to compute a statistic using 2 different datasets at the same time, and I need to perform this with all possible pair combinations. Then, I need all these statistics (very large arrays) to continue the calculation on the C side. Those files are very large, typically ~40GB, and that's my problem.
To do this on C called by R, first I need to load all the datasets in R to pass them then to the C function call. But, ideally, it is possible to maintain only 2 of those files on memory at the same time, following the sequence if I were able to access the datasets from C or Fortran directly:
open file1 - open file2 - compute cov(1,2)
close file2
hold file1 - open file3 - compute cov(1,3)
... // same approach
This is fine on R because I can load/unload files, but when calling C or Fortran I haven't any mechanism to load/unload files. So, my question is, can I read .Rdata files from Fortran or C directly, being able to open/close them? Any other approaches to the problem?
As far as I've read, the answer is no. So, I'm considering to move from Rdata to HDF5.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…