Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
505 views
in Technique[技术] by (71.8m points)

r - Merge plm fitted values to dataset

I'm working with a fixed effects regression model using plm.

The model looks like this:

FE.model <-plm(fml, data = data.reg2,
           index=c('Site.ID','date.hour'), # cross section ID and time series ID
           model='within', #coefficients are fixed
           effect='individual')
summary(FE.model)

"fml" is a formula I defined previously. I have many independent variables, so this made it more efficient.

What I want to do is get my fitted values (my yhats) and join them to my base dataset; data.reg2

I was able to get the fitted values using this code:

 Fe.model.fitted <- FE.model$model[[1]] - FE.model$residuals

However, this only gives me a one column vector of fitted values only - I have no way of joining it to my base dataset.

Alternatively, I've tried something like this:

 Fe.model.fitted <- cbind(data.reg2, resid=resid(FE.model), fitted=fitted(FE.model))

However, I get this error with that:

 Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class ""pseries"" to a data.frame

Are there any other ways to get my fitted values in my base dataset? Or can someone explain the error I'm getting and maybe a way to fix it?

I should note that I don't want to manually compute the yhats based on my betas. I have way too many independent variables for that option and my defined formula (fml) may change so that option would not be efficient.

Many thanks!!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Merging plm fitted values back into the original dataset requires some intermediate steps -- plm drops any rows with missing data, and as far as I can tell, a plm object does not contain the index info. The order of the data is not preserved -- see what Giovanni Millo, one of plm's authors, commented in this thread:

"...the input order is not always preserved: observations are always reordered by (individual, time) internally, so that the output you get is ordered accordingly..."

The steps in short:

  1. Get fitted values from the estimated plm object. It is a single vector but the entries are named. The names correspond to the position in the index.
  2. Get the index, using the index() function. It can return both individual and time indices. Note the index may contain more rows than the fitted values, in case rows were removed for missing data. (It is also possible to generate an index directly from the original data, but I did not see a promise that the original order of the data is preserved in what plm returns.)
  3. Merge into the original data, looking up the id and time values from the index.

Sample code is provided below. Kind of long but I've tried to comment. The code is not optimized, my intention was to list the steps explicitly. Also, I am using data.tables rather than data.frames.

library(data.table); library(plm)

### Generate dummy data. This way we know the "true" coefficients
set.seed(100)
n <- 500 # Run with more data if you want to get closer to the "true" coefficients
DT <- data.table(CJ(id = c("a","b","c","d","e"), time = c(1:(n / 5))))
DT[, x1 := rnorm(n)]
DT[, x2 := rnorm(n)]
DT[, y  := x1 + 2 * x2 + rnorm(n) / 10]

setkey(DT, id, time)
# # Make it an unbalanced panel & put in some NAs
DT <- DT[!(id == "a" & time == 4)]
DT[.("a", 3), x2 := as.numeric(NA)]
DT[.("d", 2), x2 := as.numeric(NA)]

str(DT)

### Run the model -- both individual and time effects; "within" model
summary(PLM <- plm(data = DT, id = c("id", "time"), formula = y ~ x1 + x2, model = "within", effect = "twoways", na.action = "na.omit"))

### Merge the fitted values back into the data.table DT
# Note that PLM$model$y is shorter than the data, i.e. the row(s) with NA have been dropped
cat("
Rows omitted (due to NA): ", nrow(DT) - length(PLM$model$y))

# Since the objects returned by plm() do not contain the index, need to generate it from the data
# The object returned by plm(), i.e. PLM$model$y, has names that point to the place in the index
# Note: The index can also be done as INDEX <- DT[, j = .(id, time)], but use the longer way with index() in case plm does not preserve the order
INDEX <- data.table(index(x = pdata.frame(x = DT, index = c("id", "time")), which = NULL)) # which = NULL extracts both the individual and time indexes
INDEX[, id := as.character(id)]
INDEX[, time := as.integer(time)] # it is returned as a factor, convert back to integer to match the variable type in DT

# Generate the fitted values as the difference between the y values and the residuals
if (all(names(PLM$residuals) == names(PLM$model$y))) { # this should not be needed, but just in case...
    FIT <- data.table(
        index   = as.integer(names(PLM$model$y)), # this index corresponds to the position in the INDEX, from where we get the "id" and "time" below
        fit.plm = as.numeric(PLM$model$y) - as.numeric(PLM$residuals)
    )
}

FIT[, id   := INDEX[index]$id]
FIT[, time := INDEX[index]$time]
# Now FIT has both the id and time variables, can match it back into the original dataset (i.e. we have the missing data accounted for)
DT <- merge(x = DT, y = FIT[, j = .(id, time, fit.plm)], by = c("id", "time"), all = TRUE) # Need all = TRUE, or some data from DT will be dropped!

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...