Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
741 views
in Technique[技术] by (71.8m points)

how to read huge csv file into R by row condition?

I have a huge csv file about 15 million row, with size around 3G.

I would like to read this file into R by piece, each time only choose those rows fit into certain condition.

e.g. one of the column is called product type, so I only need to read one type of product into R, and process it then output the result, after that I move to another type of product...

so far I have read about different methods, such as upload the big file into database, or read column by column by colbycol, or read a chunk of rows by ff ...

is any pure R solution can solve my problem?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use the RSQLite package:

library(RSQLite)
# Create/Connect to a database
con <- dbConnect("SQLite", dbname = "sample_db.sqlite")

# read csv file into sql database
# Warning: this is going to take some time and disk space, 
#   as your complete CSV file is transferred into an SQLite database.
dbWriteTable(con, name="sample_table", value="Your_Big_CSV_File.csv", 
    row.names=FALSE, header=TRUE, sep = ",")

# Query your data as you like
yourData <- dbGetQuery(con, "SELECT * FROM sample_table LIMIT 10")

dbDisconnect(con)

Next time you want to access your data you can leave out the dbWriteTable, as the SQLite table is stored on disk.

Note: the writing of the CSV data to the SQLite file does not load all data in memory first. So the memory you will use in the end will be limited to the amount of data that your query returns.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...