You should heed the advice in the comments.
I addressed a version of the OP's question on CV. If the structure of the data is the same, then you're only observing one cross-sectional unit over time. In your setting, you're observing a single country over many years. If your data was a true panel dataset, you would be observing more than one country over at least two years. For example, I will simulate a small panel data frame.
library(dplyr)
library(plm)
set.seed(12345)
panel <- tibble(
country = c(rep("Spain", 5), rep("France", 5), rep("Croatia", 5)),
year = rep(2016:2020, 3), # each country is observed over 5 years
x = rnorm(15), # sample 15 random deviates (5 per country)
y = sample(c(10000:100000), size = 15) # sample incomes (range: 10,000 - 100,000)
) %>%
mutate(
France = ifelse(country == "France", 1, 0),
Croatia = ifelse(country == "Croatia", 1, 0),
y_2016 = ifelse(year == 2016, 1, 0),
y_2017 = ifelse(year == 2017, 1, 0),
y_2018 = ifelse(year == 2018, 1, 0),
y_2019 = ifelse(year == 2019, 1, 0),
y_2020 = ifelse(year == 2020, 1, 0)
)
Inside of the mutate()
function I appended dummies for all countries and all years, excluding one country and one year. In your other question, you estimate time fixed effects. Software invariably drops one year to avoid collinearity. You don't need to append the dummies, but they are helpful for explication purposes. Here is a classic panel data frame:
# Panel - varies across two dimensions (country + time)
# 3 countries observed over 5 years for a total of 15 country-year observations
# A tibble: 15 x 10
country year x y France Croatia y_2017 y_2018 y_2019 y_2020
<chr> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Spain 2016 0.586 81371 0 0 0 0 0 0
2 Spain 2017 0.709 10538 0 0 1 0 0 0
3 Spain 2018 -0.109 26893 0 0 0 1 0 0
4 Spain 2019 -0.453 71363 0 0 0 0 1 0
5 Spain 2020 0.606 43308 0 0 0 0 0 1
6 France 2016 -1.82 42544 1 0 0 0 0 0
7 France 2017 0.630 88187 1 0 1 0 0 0
8 France 2018 -0.276 91368 1 0 0 1 0 0
9 France 2019 -0.284 65563 1 0 0 0 1 0
10 France 2020 -0.919 22061 1 0 0 0 0 1
11 Croatia 2016 -0.116 80390 0 1 0 0 0 0
12 Croatia 2017 1.82 48623 0 1 1 0 0 0
13 Croatia 2018 0.371 93444 0 1 0 1 0 0
14 Croatia 2019 0.520 79582 0 1 0 0 1 0
15 Croatia 2020 -0.751 33367 0 1 0 0 0 1
As @DaveArmstrong correctly noted, you should specify the panel indexes. First, we specify a panel data frame, then we estimate the model.
pdata <- pdata.frame(panel, index = c("year", "country"))
random <- plm(y ~ x, model = "random", data = pdata)
A one-way random effects model is fit. The call to summary()
will produce the following (abridged output):
Call:
plm(formula = y ~ x, data = pdata, model = "random")
Balanced Panel: n = 5, T = 3, N = 15
Effects:
var std.dev share
idiosyncratic 685439601 26181 0.819
individual 151803385 12321 0.181
theta: 0.2249
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-49380 -17266 6221 17759 32442
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
(Intercept) 58308.0 8653.7 6.7380 1.606e-11 ***
x 7777.0 8808.9 0.8829 0.3773
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
But your data does not have this structure, hence the warning message. In fact, your data is similar to carving out one country from this panel. For example, suppose we winnowed down the data frame to Croatian observations only. The following code takes a subset of the previous data frame:
croatia_only <- panel %>%
filter(country == "Croatia") # grab only the observations from Croatia
Here, longitudinal variation only exists for one country. In other words, by restricting attention to Croatia, we cannot exploit the variation across countries; we only have variation in one dimension! The resulting data frame looks like the following:
# Time Series - varies across one dimension (time)
# A tibble: 5 x 10
country year x y France Croatia y_2017 y_2018 y_2019 y_2020
<chr> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Croatia 2016 -0.116 80390 0 1 0 0 0 0
2 Croatia 2017 1.82 48623 0 1 1 0 0 0
3 Croatia 2018 0.371 93444 0 1 0 1 0 0
4 Croatia 2019 0.520 79582 0 1 0 0 1 0
5 Croatia 2020 -0.751 33367 0 1 0 0 0 1
Now I will re-estimate a random effects model with one country:
pdata <- pdata.frame(croatia_only, index = c("year", "country"))
random_croatia <- plm(y ~ x , model = "random", data = pdata)
This should reproduce your error message (i.e., empty model). Note, you only have variation within one country! As you correctly noted, a "between-effects" model is estimable, but not for reasons you might presume. A "between effects" model averages over all years within a country, then it runs ordinary least squares on the 'averaged' data. In your setting, taking the average over your time series results in a country mean. And since you only observe one country, then you only have one observation. Such a model is inestimable. However, you can 'pool' together all of your yearly observations for one country and run a linear model instead. That is what you're doing. To test this out using one country, try comparing the "between" model with the "pooling" model. They should produce identical estimates of x
.
# Run this using the croatia_only data frame
summary(plm(y ~ x , model = "between", data = pdata))
summary(plm(y ~ x , model = "pooling", data = pdata))
It should be painfully obvious now, but model = "pooling"
is equivalent to running lm()
.
If you want me to tie this into your previous post, try estimating a linear model with separate dummies for all years as covariates. You will quickly discover that you have no residual degrees of freedom, which is exactly the problem outlined in your other post.
In sum, I would look for data from other countries. Once you do that, you can use plm()
for all it's worth.