Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
132 views
in Technique[技术] by (71.8m points)

How can I read multiple (excel) files into R?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

With list.files you can create a list of all the filenames in your workingdirectory. Next you can use lapply to loop over that list and read each file with the read_excel function from the readxl package:

library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)

This method can off course also be used with other file reading functions like read.csv or read.table. Just replace read_excel with the appropriate file reading function and make sure you use the correct pattern in list.files.

If you also want to include the files in subdirectories, use:

file.list <- list.files(pattern='*.xlsx', recursive = TRUE)

Other possible packages for reading Excel-files: openxlsx & xlsx


Supposing the columns are the same for each file, you can bind them together in one dataframe with bind_rows from :

library(dplyr)
df <- bind_rows(df.list, .id = "id")

or with rbindlist from :

library(data.table)
df <- rbindlist(df.list, idcol = "id")

Both have the option to add a id column for identifying the separate datasets.


Update: If you don't want a numeric identifier, just use sapply with simplify = FALSE to read the files in file.list:

df.list <- sapply(file.list, read.csv, simplify=FALSE)

When using bind_rows from or rbindlist from , the id column now contains the filenames.

Even another approach is using the purrr-package:

library(purrr)
file.list <- list.files(pattern='*.csv')
file.list <- setNames(file.list, file.list) # only needed when you need an id-column with the file-names

df <- map_df(file.list, read.csv, .id = "id")

Other approaches to getting a named list: If you don't want just a numeric identifier, than you can assign the filenames to the dataframes in the list before you bind them together. There are several ways to do this:

# with the 'attr' function from base R
attr(df.list, "names") <- file.list
# with the 'names' function from base R
names(df.list) <- file.list
# with the 'setattr' function from the 'data.table' package
setattr(df.list, "names", file.list)

Now you can bind the list of dataframes together in one dataframe with rbindlist from data.table or bind_rows from dplyr. The id column will now contain the filenames instead of a numeric indentifier.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...