Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
401 views
in Technique[技术] by (71.8m points)

optimization - In R, how do I find the optimal variable to maximize or minimize correlation between several datasets

I am able to do this easily in Excel, but my dataset has gotten too large. In excel, I would use solver.

Column A,B,C,D = random numbers 
Column E = random number (which I want to maximize the correlation to) 
Column F = A*x+B*y+C*z+D*j where x,y,z,j are coefficients resulted from solver In a separate cell, I would have correl(E,F)

In solver, I would set the objective of correl(C,D) to max, by changing variables x,y and setting certain constraints:

1.  A,B,C,D have to be between 0 and 1
2.  A+B+C+D = 1

How can I do this in R? Thanks for the help.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Since most optimization routines work best with no constraints, you can transform (reparametrize) the problem of finding four numbers, x, y, z, j, constrained to be between 0 and 1 and to sum up to 1, into the problem of finding three real numbers q1, q2, q3 (with no constraints). For instance, if we have a function s that maps the real line R to the interval (0,1), the following does the trick:

  x = s(q1)
  y = (1-x) * s(q2)
  z = (1-x-y) * s(q3)
  j = 1-x-y-z

It is probably easier to understand in two dimensions: in this case, the set of points (x,y,z) with coordinates between 0 and 1 and summing up to 1 is a triangle and s(q1),s(q2) form a coordinate system for points in that triangle.

# Sample data
A <- rnorm(100)
B <- rnorm(100)
C <- rnorm(100)
D <- rnorm(100)
E <- rnorm(100)
f <- function(p) cor(p[1]*A + p[2]*B + p[3]*C + p[4]*D, E)

# Unconstrained optimization
optim(
  c(1,1,1,1)/4, # Starting values
  f,            # Function to maximize
  control=list(fnscale=-1) # Maximize (default is to minimize)
)

# Transform the parameters
sigmoid <- function(x) exp(x) / ( 1 + exp(x) )
convert <- function(p) {
  q1 <- sigmoid(p[1])
  q2 <- (1-q1) * sigmoid(p[2])
  q3 <- (1-q1-q2) * sigmoid(p[3])
  q4 <- 1-q1-q2-q3 
  c(q1,q2,q3,q4)
}

# Optimization
g <- function(p) f(convert(p))
p <- optim(c(0,0,0,0), g, control=list(fnscale=-1))
convert(p$par)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...