Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
339 views
in Technique[技术] by (71.8m points)

Recommended package for very large dataset processing and machine learning in R


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Have a look at the "Large memory and out-of-memory data" subsection of the high performance computing task view on CRAN. bigmemory and ff are two popular packages. For bigmemory (and the related biganalytics, and bigtabulate), the bigmemory website has a few very good presentations, vignettes, and overviews from Jay Emerson. For ff, I recommend reading Adler Oehlschl?gel and colleagues' excellent slide presentations on the ff website.

Also, consider storing data in a database and reading in smaller batches for analysis. There are likely any number of approaches to consider. To get started, consdier looking through some of the examples in the biglm package, as well as this presentation from Thomas Lumley.

And do investigate the other packages on the high-performance computing task view and mentioned in the other answers. The packages I mention above are simply the ones I've happened to have more experience with.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...