在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
一、scale函数 R语言base库中自带数据标准化接口scale函数,函数介绍如下 Usage scale(x, center = TRUE, scale = TRUE)
Arguments x: a numeric matrix(like object). center: either a logical value or a numeric vector of length equal to the number of columns of x. scale: either a logical value or a numeric vector of length equal to the number of columns of x.
Details The value of center determines how column centering is performed. If center is a numeric vector with length equal to the number of columns of x, then each column of x has the corresponding value from center subtracted from it. If center is TRUE then centering is done by subtracting the column means (omitting NAs) of x from their corresponding columns, and if center is FALSE, no centering is done. The value of scale determines how column scaling is performed (after centering). If scale is a numeric vector with length equal to the number of columns of x, then each column of x is divided by the corresponding value from scale. If scale is TRUE then scaling is done by dividing the (centered) columns of x by their standard deviations if center is TRUE, and the root mean square otherwise. If scale is FALSE, no scaling is done. The root-mean-square for a (possibly centered) column is defined as sqrt(sum(x^2)/(n-1)), where x is a vector of the non-missing values and n is the number of non-missing values. In the case center = TRUE, this is the same as the standard deviation, but in general it is not. (To scale by the standard deviations without centering, use scale(x, center = FALSE, scale = apply(x, 2, sd, na.rm = TRUE)).)
Value For scale.default, the centered, scaled matrix. The numeric centering and scalings used (if any) are returned as attributes "scaled:center" and "scaled:scale"
scale方法默认进行z-score标准化,先减去均值,再除以标准差 z-score 标准化(zero-mean normalization) 也叫标准差标准化,这种方法给予原始数据的均值(mean)和标准差(standard deviation)进行数据的标准化。 经过处理的数据符合标准正态分布,即均值为0,标准差为1,其转化函数为:
其中μ为所有样本数据的均值,σ为所有样本数据的标准差。
二、unscale函数 DMwR中函数unscale可以根据scale的返回对象,还原数据 Usage unscale(vals, norm.data, col.ids)
Arguments vals: A numeric matrix with the values to un-scale norm.data: A numeric and scaled matrix. This should be an object to which the function scale() was applied. col.ids: The columns of the vals matrix that are to be un-scaled (defaults to all of them).
Value An object with the same dimension as the parameter vals
三、使用示例 > df<-data.frame(x=c(1,2,3),y=c(2,4,6),z=c(3,6,9)) > df x y z 1 1 2 3 2 2 4 6 3 3 6 9 > scaledData<-scale(df) > scaledData x y z [1,] -1 -1 -1 [2,] 0 0 0 [3,] 1 1 1 attr(,"scaled:center") x y z 2 4 6 attr(,"scaled:scale") x y z 1 2 3 > unscale(scaledData,scaledData) x y z [1,] 1 2 3 [2,] 2 4 6 [3,] 3 6 9 > ndf<-data.frame(x=c(1,2),y=c(2,4),z=c(3,6)) > ndf x y z 1 1 2 3 2 2 4 6 > scale(ndf,center=attr(scaledData, "scaled:center"),scale=attr(scaledData, "scaled:scale")) x y z [1,] -1 -1 -1 [2,] 0 0 0 attr(,"scaled:center") x y z 2 4 6 attr(,"scaled:scale") x y z 1 2 3 > |
请发表评论