optimization - In R, how do I find the optimal variable to maximize or minimize correlation between several datasets

Question

Welcome To Ask or Share your Answers For Others

optimization - In R, how do I find the optimal variable to maximize or minimize correlation between several datasets

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

optimization - In R, how do I find the optimal variable to maximize or minimize correlation between several datasets

I am able to do this easily in Excel, but my dataset has gotten too large. In excel, I would use solver.

Column A,B,C,D = random numbers 
Column E = random number (which I want to maximize the correlation to) 
Column F = A*x+B*y+C*z+D*j where x,y,z,j are coefficients resulted from solver In a separate cell, I would have correl(E,F)

In solver, I would set the objective of correl(C,D) to max, by changing variables x,y and setting certain constraints:

1.  A,B,C,D have to be between 0 and 1
2.  A+B+C+D = 1

How can I do this in R? Thanks for the help.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T17:57:42+0000

Since most optimization routines work best with no constraints, you can transform (reparametrize) the problem of finding four numbers, x, y, z, j, constrained to be between 0 and 1 and to sum up to 1, into the problem of finding three real numbers q1, q2, q3 (with no constraints). For instance, if we have a function s that maps the real line R to the interval (0,1), the following does the trick:

  x = s(q1)
  y = (1-x) * s(q2)
  z = (1-x-y) * s(q3)
  j = 1-x-y-z

It is probably easier to understand in two dimensions: in this case, the set of points (x,y,z) with coordinates between 0 and 1 and summing up to 1 is a triangle and s(q1),s(q2) form a coordinate system for points in that triangle.

# Sample data
A <- rnorm(100)
B <- rnorm(100)
C <- rnorm(100)
D <- rnorm(100)
E <- rnorm(100)
f <- function(p) cor(p[1]*A + p[2]*B + p[3]*C + p[4]*D, E)

# Unconstrained optimization
optim(
  c(1,1,1,1)/4, # Starting values
  f,            # Function to maximize
  control=list(fnscale=-1) # Maximize (default is to minimize)
)

# Transform the parameters
sigmoid <- function(x) exp(x) / ( 1 + exp(x) )
convert <- function(p) {
  q1 <- sigmoid(p[1])
  q2 <- (1-q1) * sigmoid(p[2])
  q3 <- (1-q1-q2) * sigmoid(p[3])
  q4 <- 1-q1-q2-q3 
  c(q1,q2,q3,q4)
}

# Optimization
g <- function(p) f(convert(p))
p <- optim(c(0,0,0,0), g, control=list(fnscale=-1))
convert(p$par)

Categories

optimization - In R, how do I find the optimal variable to maximize or minimize correlation between several datasets

optimization - In R, how do I find the optimal variable to maximize or minimize correlation between several datasets

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags