I have a large dataset that chokes split()
in R. I am able to use dplyr
group_by (which is a preferred way anyway) but I am unable to persist the resulting grouped_df
as a list of data frames, a format required by my consecutive processing steps (I need to coerce to SpatialDataFrames
and similar).
consider a sample dataset:
df = as.data.frame(cbind(c("a","a","b","b","c"),c(1,2,3,4,5), c(2,3,4,2,2)))
listDf = split(df,df$V1)
returns
$a
V1 V2 V3
1 a 1 2
2 a 2 3
$b
V1 V2 V3
3 b 3 4
4 b 4 2
$c
V1 V2 V3
5 c 5 2
I would like to emulate this with group_by
(something like group_by(df,V1)
) but this returns one, grouped_df
. I know that do
should be able to help me, but I am unsure about usage (also see link for a discussion.)
Note that split names each list by the name of the factor that has been used to establish this group - this is a desired function (ultimately, bonus kudos for a way to extract these names from the list of dfs).
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…