I am going to be undertaking some logfile analyses in R (unless I can't do it in R), and I understand that my data needs to fit in RAM (unless I use some kind of fix like an interface to a keyval store, maybe?). So I am wondering how to tell ahead of time how much room my data is going to take up in RAM, and whether I will have enough. I know how much RAM I have (not a huge amount - 3GB under XP), and I know how many rows and cols my logfile will end up as and what data types the col entries ought to be (which presumably I need to check as it reads).
How do I put this together into a go/nogo decision for undertaking the analysis in R? (Presumably R needs to be able to have some RAM to do operations, as well as holding the data!) My immediate required output is a bunch of simple summary stats, frequencies, contingencies, etc, and so I could probably write some kind of parser/tabulator that will give me the output I need short term, but I also want to play around with lots of different approaches to this data as a next step, so am looking at feasibility of using R.
I have seen lots of useful advice about large datasets in R here, which I have read and will reread, but for now I would like to understand better how to figure out whether I should (a) go there at all, (b) go there but expect to have to do some extra stuff to make it manageable, or (c) run away before it's too late and do something in some other language/environment (suggestions welcome...!). thanks!
question from:
https://stackoverflow.com/questions/12767432/how-can-i-tell-when-my-dataset-in-r-is-going-to-be-too-large 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…