Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
741 views
in Technique[技术] by (71.8m points)

dataframe - R: Differences by group and adding

I would like to know how to do this operation simpler.
Imagine I have a data.frame like this one:

set.seed(1)
ID <- rep(1:3,each=4)
XX <- round(runif(12),3)
TT <- rep(1:4, 3)
ZZ <- ave(XX*TT,ID, FUN = cumsum)
DF <- data.frame(ID, XX,  ZZ)   

ID  TT   XX    ZZ
1    1   0.266 0.266
1    2   0.372 1.010
1    3   0.573 2.729
1    4   0.908 6.361
2    1   0.202 0.202
2    2   0.898 1.998
2    3   0.945 4.833
2    4   0.661 7.477
3    1   0.629 0.629
3    2   0.062 0.753
3    3   0.206 1.371
3    4   0.177 2.079

I' would like to get, for each column, the increments (differences between two consecutive elements) by groups of ID. Keeping the first one (as if there is a previous zero).

ID    TT      XX    ZZ
 1    1    0.266 0.266
 1    2    0.106 0.744
 1    3    0.201 1.719
 1    4    0.335 3.632
 2    1    0.202 0.202
 2    2    0.696 1.796
 2    3    0.047 2.835
 2    4   -0.284 2.644
 3    1    0.629 0.629
 3    2   -0.567 0.124
 3    3    0.144 0.618
 3    4   -0.029 0.708

I've tried with

ave(DF[3:4],DF$ID,FUN=function(x) diff(c(0,x)))

but it doesn't work, it produces the error:

 Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)] : 
  non-numeric argument to binary operator 

Isn't there an easy way to do it?
I've found that I can get the proper output with:

ave(DF[3:4],DF$ID,FUN=function(x) 
  sapply(x, FUN=function(y) diff(c(0,y))))

but it gets quite long and complex for a so simple operation. I've found that I can also do it by using data.table but I prefer to be able to do it with base R.

setDT(DF)
DF[, lapply(.SD, FUN=function(x) diff(c(0,x)) ), keyby = ID ]

I also don't know how to insert a new row (plenty of zeroes) at the beginning of each group or given some condition.

ID   XX    ZZ
1     0     0
1 0.266 0.266
1 0.372 1.010
1 0.573 2.729
1 0.908 6.361
2     0     0
2 0.202 0.202
2 0.898 1.998
2 0.945 4.833
2 0.661 7.477
3     0     0
3 0.629 0.629
3 0.062 0.753
3 0.206 1.371
3 0.177 2.079

I've tried with:

ave(DF[3:4],DF$ID,FUN=function(x) sapply(x, FUN=function(y) (c(0,y))))   

warning:

data length [10] is not a sub-multiple or multiple of the number of
rows [4]

I guess the general way to do it would be working with indexes of the rows.

PD: I've updated the post.

Trying to do it simpler I had removed the TT column but I have leater noticed that is important.

My solution assumes that the table is ordered by TT, but sometimes it's not like that. What I really want is:

XX1
XX2-XX1
XX3-XX2
XX4-XX3

Where we get the subindexes not from the position on the table but from T. I don't know whether is more effcicient to do it by first sorting the columns by TT or by creating a paste() syntax.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I think you will need to use lapply() across the relevant columns, as ave() will not take a list in its first argument. Try this:

df[-1] <- lapply(
    df[-1], 
    function(x) ave(x, df$ID, FUN = function(x) c(x[1], diff(x)))
)

which gives the updated df

   ID     XX    ZZ
1   1  0.266 0.266
2   1  0.106 0.744
3   1  0.201 1.719
4   1  0.335 3.632
5   2  0.202 0.202
6   2  0.696 1.796
7   2  0.047 2.835
8   2 -0.284 2.644
9   3  0.629 0.629
10  3 -0.567 0.124
11  3  0.144 0.618
12  3 -0.029 0.708

Data:

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L), XX = c(0.266, 0.372, 0.573, 0.908, 0.202, 0.898, 0.945, 
0.661, 0.629, 0.062, 0.206, 0.177), ZZ = c(0.266, 1.01, 2.729, 
6.361, 0.202, 1.998, 4.833, 7.477, 0.629, 0.753, 1.371, 2.079
)), .Names = c("ID", "XX", "ZZ"), class = "data.frame", row.names = c(NA, 
-12L))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...