Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
176 views
in Technique[技术] by (71.8m points)

read.table - How to read a subset of large dataset in R?

I have a dataset with about 2 million rows, so without reading the whole dataset I want to read a subset of dataset . My dataset contains a date column in it so I just want to read dataset between a date range without reading whole dataset as it will be time consuming and memory waste. so how to accomplish it can anyone guide me on this ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Use skip= parameter in read.table

read.table("file.txt",skip= ,nrows= )

Both the skip= and nrows= take in row indicator numbers so just add them after the=.

The nrows= defines how deep you range when you are importing the file.

I suggest reading https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html if you haven't done so already.

Also, please see one of my questions:

R - Reading lines from a .txt-file after a specific line

It, somewhat, touches the same subject.

The other possible way might be to use grep() in skip=

read.table(...,skip=grep("2005-12-31", readLines("File.txt")),nrows=365)

What this line does is it skips until it finds the line depicted in grep() and reads the lines after that. The nrow= will stop the reading after it has read 365 lines (this way you have read one year of dates provided one line equals one date).

This seems kinda complicated, but it's the only way I know how to solve this.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...