I came up with the following solution, which will work for file sizes less than 2^32 - 1 bytes.
The R object needs to be serialized and saved to a file, as done by the following code.
saveObj <- function(object, file.name){
outfile <- file(file.name, "wb")
serialize(object, outfile)
close(outfile)
}
Then we read the binary data in chunks, keeping track of how much is read and updating the progress bar accordingly.
loadObj <- function(file.name){
library(foreach)
filesize <- file.info(file.name)$size
chunksize <- ceiling(filesize / 100)
pb <- txtProgressBar(min = 0, max = 100, style=3)
infile <- file(file.name, "rb")
data <- foreach(it = icount(100), .combine = c) %do% {
setTxtProgressBar(pb, it)
readBin(infile, "raw", chunksize)
}
close(infile)
close(pb)
return(unserialize(data))
}
The code can be run as follows:
> a <- 1:100000000
> saveObj(a, "temp.RData")
> b <- loadObj("temp.RData")
|======================================================================| 100%
> all.equal(b, a)
[1] TRUE
If we benchmark the progress bar method against reading the file in a single chunk we see the progress bar method is slightly slower, but not enough to worry about.
> system.time(unserialize(readBin(infile, "raw", file.info("temp.RData")$size)))
user system elapsed
2.710 0.340 3.062
> system.time(b <- loadObj("temp.RData"))
|======================================================================| 100%
user system elapsed
3.750 0.400 4.154
So while the above method works, I feel it is completely useless because of the file size restrictions. Progress bars are only useful for large files that take a long time to read in.
It would be great if someone could come up with something better than this solution!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…