From the comments, "C2" seems to be "character" column with %
as suffix. Before, creating a group, remove the %
using sub
, convert to "numeric" (as.numeric
). The variable "group" is created (transform(df,...)
) by using the function cut
with breaks
(group buckets/intervals) and labels
(for the desired group labels) arguments. Once the group variable is created, the sum
of the "C1" by "group" and the "count" of elements within "group" can be done using aggregate
from "base R"
df1 <- transform(df, group=cut(as.numeric(sub('[%]', '', C2)),
breaks=c(-Inf,0.005, 0.010, 0.014, Inf),
labels=c('<0.005', 0.005, 0.01, 0.014)))
res <- do.call(data.frame,aggregate(C1~group, df1,
FUN=function(x) c(Count=length(x), Sum=sum(x))))
dNew <- data.frame(group=levels(df1$group))
merge(res, dNew, all=TRUE)
# group C1.Count C1.Sum
#1 <0.005 2 3491509.6
#2 0.005 NA NA
#3 0.01 2 302997.1
#4 0.014 8 364609.5
or you can use data.table
. setDT
converts the data.frame
to data.table
. Specify the "grouping" variable with by=
and summarize/create the two variables "Count" and "Sum" within the list(
. .N
gives the count of elements within each "group".
library(data.table)
setDT(df1)[, list(Count=.N, Sum=sum(C1)), by=group][]
Or using dplyr
. The %>%
connect the LHS with RHS arguments and chains them together. Use group_by
to specify the "group" variable, and then use summarise_each
or summarise
to get summary count and sum
of the concerned column. summarise_each
would be useful if there are more than one column.
library(dplyr)
df1 %>%
group_by(group) %>%
summarise_each(funs(n(), Sum=sum(.)), C1)
Update
Using the new dataset df
df1 <- transform(df, group=cut(C2, breaks=c(-Inf,0.005, 0.010, 0.014, Inf),
labels=c('<0.005', 0.005, 0.01, 0.014)))
res <- do.call(data.frame,aggregate(cbind(C1,C3)~group, df1,
FUN=function(x) c(Count=length(x), Sum=sum(x))))
res
# group C1.Count C1.Sum C3.Count C3.Sum
#1 <0.005 2 3491509.6 2 91233
#2 0.01 2 302997.1 2 88843
#3 0.014 8 364609.5 8 268809
and you can do the merge
as detailed above.
The dplyr
approach would be the same except specifying the additional variable
df1%>%
group_by(group) %>%
summarise_each(funs(n(), Sum=sum(.)), C1, C3)
#Source: local data frame [3 x 5]
# group C1_n C3_n C1_Sum C3_Sum
#1 <0.005 2 2 3491509.6 91233
#2 0.01 2 2 302997.1 88843
#3 0.014 8 8 364609.5 268809
data
df <-structure(list(C1 = c(49488.01172, 268221.1563, 34775.96094,
13046.98047, 2121699.75, 71155.09375, 1369809.875, 750, 44943.82813,
85585.04688, 31090.10938, 68550.40625), C2 = c("0.0512%", "0.0128%",
"0.0128%", "0.07241%", "0.00453%", "0.0181%", "0.00453%", "0.2048%",
"0.0362%", "0.0362%", "0.0362%", "0.0181%")), .Names = c("C1",
"C2"), row.names = c(NA, -12L), class = "data.frame")