That problem is called a "skyline query" by database administrators (they may have other algorithms) and an "efficient frontier" by economists.
Plotting the data can make it clear what we are looking for.
n <- 40
d <- data.frame(
x = rnorm(n),
y = rnorm(n)
)
# We want the "extreme" points in the following plot
par(mar=c(1,1,1,1))
plot(d, axes=FALSE, xlab="", ylab="")
for(i in 1:n) {
polygon( c(-10,d$x[i],d$x[i],-10), c(-10,-10,d$y[i],d$y[i]),
col=rgb(.9,.9,.9,.2))
}
The algorithm is as follows: sort the points along the first coordinate,
keep each observation unless it is worse than the last retained one.
d <- d[ order(d$x, decreasing=TRUE), ]
result <- d[1,]
for(i in seq_len(nrow(d))[-1] ) {
if( d$y[i] > result$y[nrow(result)] ) {
result <- rbind(result, d[i,]) # inefficient
}
}
points(result, cex=3, pch=15)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…