optimization - Regression with equality and inequality constrained coefficients in R

Question

Welcome To Ask or Share your Answers For Others

optimization - Regression with equality and inequality constrained coefficients in R

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

optimization - Regression with equality and inequality constrained coefficients in R

I am trying to obtain estimated constrained coefficients using RSS. The beta coefficients are constrained between [0,1] and sum to 1. Additionally, my third parameter is constrained between (-1,1). Utilizing the below I can obtain a nice solution using simulated variables but when implementing the methodology on my real data set I keep arriving at a non-unique solution. In turn, I'm wondering if there is a more numerically stable way to obtain my estimated parameters.

set.seed(234)
k = 2
a = diff(c(0, sort(runif(k-1)), 1))
n = 1e4
x = matrix(rnorm(k*n), nc = k)
a2 = -0.5
y = a2 * (x %*% a) + rnorm(n)
f = function(u){sum((y - u[3] * (x %*% u[1:2]))^2)}
g = function(v){

v1 = v[1]
v2 = v[2]
u = vector(mode = "double", length = 3)

# ensure in (0,1)
v1 = 1 / (1 + exp(-v1))

# ensure add up to 1
u[1:2] = c(v1, 1 - sum(v1))

# ensure between [-1,1]
u[3] = (v2^2 - 1) / (v2^2 + 1)
u
}

res = optim(rnorm(2), function(v) f(g(v)), hessian = TRUE, method = "BFGS")
eigen(res$hessian)$values
res$convergence
rbind(Est = res$par, SE = sqrt(diag(solve(res$hessian))))
rbind(g(res$par),c(a,a2))

Hats off to http://zoonek.free.fr/blosxom/R/2012-06-01_Optimization.html

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:31:52+0000

Since there has been no direct answer to your question so far, I'd like to show a way how to implement a parameter-constrained model in Stan/RStan. You should give this a try using your real data.

Doing Bayesian inference has the advantage of giving you posterior probabilities for your (constrained) model parameters. Point estimates including confidence intervals can then be easily calculated.

First off, we load the library and set RStan to store the compiled model and use multiple cores (if available).
```
library(rstan);
rstan_options(auto_write = TRUE);
options(mc.cores = parallel::detectCores());
```

We now define our Stan model. In this case, it's very simple, and we can make use of RStan's simplex data type for vectors of non-negative values that sum to one.

model <- "
data {
    int<lower=1> n;   // number of observations
    int<lower=0> k;   // number of parameters
    matrix[n, k] X;   // data
    vector[n] y;      // response
}

parameters {
    real a2;          // a2 is a free scaling parameter
    simplex[k] a;     // a is constrained?to sum to 1
    real sigma;       // residuals
}

model {
    // Likelihood
    y ~ normal(a2 * (X * a), sigma);
}"

Stan supports various constrained data types; I'd recommend taking a lot at the Stan manual for more complex examples.

Using the sample data from your original question, we can run our model:

# Sample data
set.seed(234);
k = 2;
a = diff(c(0, sort(runif(k-1)), 1));
n = 1e4;
x = matrix(rnorm(k * n), nc = k);
a2 = -0.5;
y = a2 * (x %*% a) + rnorm(n);

# Fit stan model
fit <- stan(
    model_code = model,
    data = list(
        n = n,
        k = k,
        X = x,
        y = as.numeric(y)),
    iter = 4000,
    chains = 4);

Running the model will only take a few seconds (after the parser has internally translated and compiled the model in C++), and the full results (posterior distributions for all parameters conditional on the data) are stored in fit.

We can inspect the contents of fit using summary:

# Extract parameter estimates
pars <- summary(fit)$summary;
pars;
#               mean      se_mean          sd          2.5%           25%
#a2       -0.4915289 1.970327e-04 0.014363398    -0.5194985    -0.5011471
#a[1]      0.7640606 2.273282e-04 0.016348488     0.7327691     0.7527457
#a[2]      0.2359394 2.273282e-04 0.016348488     0.2040952     0.2248482
#sigma     1.0048695 8.746869e-05 0.007048116     0.9909698     1.0001889
#lp__  -5048.4273105 1.881305e-02 1.204892294 -5051.4871931 -5048.9800451
#                50%           75%         97.5%    n_eff      Rhat
#a2       -0.4916061    -0.4819086    -0.4625947 5314.196 1.0000947
#a[1]      0.7638723     0.7751518     0.7959048 5171.881 0.9997468
#a[2]      0.2361277     0.2472543     0.2672309 5171.881 0.9997468
#sigma     1.0048994     1.0095420     1.0187554 6492.930 0.9998086
#lp__  -5048.1238783 -5047.5409682 -5047.0355381 4101.832 1.0012841

You can see that a[1]+a[2]=1.

Plotting parameter estimates including confidence intervals is also easy:

plot(fit);

Categories

optimization - Regression with equality and inequality constrained coefficients in R

optimization - Regression with equality and inequality constrained coefficients in R

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags