1 变量间的关系分析
1.1 变量间的关系
函数关系(确定性关系)数学模型
相关关系(非确定性关系)统计学
1.2 案例分析①_单变量一元回归分析
1. 读取数据
x = c(171, 175, 159, 155, 152, 158, 154, 164, 168, 166, 159, 164)
y = c(57, 64, 41, 38, 35, 44, 41, 51, 57, 49, 47, 46)
2. 直观图示
#散点图看x,y关系
plot(x, y)
3. 两变量间统计量分析:
总体线性相关系数
ρ = C o v ( x , y ) v a r ( x ) v a r ( y ) = σ x y σ x 2 σ y 2 \rho = \frac{Cov(x,y)}{\sqrt{var(x)var(y)}}=\frac{\sigma_{xy}}{\sqrt{\sigma_{x}^{2}\sigma_{y}^{2}}} ρ = v a r ( x ) v a r ( y ) C o v ( x , y ) = σ x 2 σ y 2 σ x y
样本线性相关系数:
样本化:样本矩代替总体矩(协方差与标准差)
{ l x x = ∑ ( x − x ˉ ) 2 = ∑ x 2 − ( ∑ x ) 2 ) n l y y = ∑ ( y − y ˉ ) 2 = ∑ y 2 − ( ∑ y ) 2 ) n l x y = ∑ ( x − x ˉ ) ( y − y ˉ ) = ∑ x y − ( ∑ x ) ( ∑ y ) n \left\{\begin{matrix}
l_{xx}=\sum{(x-\bar{x})^{2}}=\sum{x^{2}}-\frac{(\sum{x})^{2})}{n}
& & \\
l_{yy}=\sum{(y-\bar{y})^{2}}=\sum{y^{2}}-\frac{(\sum{y})^{2})}{n}
& & \\
l_{xy}=\sum{(x-\bar{x})(y-\bar{y})}=\sum{xy}-\frac{(\sum{x})(\sum{y})}{n}
& & \\
\end{matrix}\right. ⎩ ⎪ ⎨ ⎪ ⎧ l x x = ∑ ( x − x ˉ ) 2 = ∑ x 2 − n ( ∑ x ) 2 ) l y y = ∑ ( y − y ˉ ) 2 = ∑ y 2 − n ( ∑ y ) 2 ) l x y = ∑ ( x − x ˉ ) ( y − y ˉ ) = ∑ x y − n ( ∑ x ) ( ∑ y )
r = S x y S x 2 ⋅ S y 2 = l x y l x x ⋅ l y y = ∑ ( x − x ˉ ) ( y − y ˉ ) ∑ ( x − x ˉ ) 2 ∑ ( y − y ˉ ) 2 r=\frac{S_{xy}}{\sqrt{S_{x}^{2}\cdot S_{y}^{2}}}=\frac{l_{xy}}{\sqrt{l_{xx} \cdot l_{yy}}}=\frac{\sum{(x-\bar x)(y-\bar y)}}{\sqrt{\sum{(x-\bar x)^2}\sum{(y-\bar y)^2}}} r = S x 2 ⋅ S y 2 S x y = l x x ⋅ l y y l x y = ∑ ( x − x ˉ ) 2 ∑ ( y − y ˉ ) 2 ∑ ( x − x ˉ ) ( y − y ˉ )
4.建立一个离均差积和函数
{ l x x = 556.9 l y y = 813 l x y = 645.5 \left\{
\begin{matrix}
l_{xx}=556.9
& & \\
l_{yy}=813
& & \\
l_{xy}=645.5
& &
\end{matrix}
\right. ⎩ ⎨ ⎧ l x x = 5 5 6 . 9 l y y = 8 1 3 l x y = 6 4 5 . 5
r = l x x l x x l y y = 645.5 559.6 × 813 = 0.9593 r=\frac{l_{xx}}{\sqrt{l_{xx}l_{yy}}}=\frac{645.5}{\sqrt{559.6\times 813}}=0.9593 r = l x x l y y l x x = 5 5 9 . 6 × 8 1 3 6 4 5 . 5 = 0 . 9 5 9 3
5.R语言中计算相关系数函数
***cor(x, y=NULL, method=c("pearson", "kendall", "spearman"))***
x: 数值向量、矩阵或数据框
y:空或数值向量、矩阵或数据框
method: 计算方法,默认:pearson
计算pearson相关系数
cor(x, y)
6.建立假设检验
H 0 : ρ = 0 , H 1 : ρ ≠ 0 , α = 0.05 H_ 0:\rho=0,H_ 1:\rho \neq 0,\alpha=0.05 H 0 : ρ = 0 , H 1 : ρ ̸ = 0 , α = 0 . 0 5
假设检验思想
t r ( r − ρ ) S r ∼ f 分 布 t_ r\frac{(r-\rho)}{S_ r}\sim f分布 t r S r ( r − ρ ) ∼ f 分 布
t r = r − 0 1 − r 2 n − 2 = 0.9593 12 − 2 1 − 0.959 3 2 = 10.74 t_ r=\frac{r - 0}{\sqrt{\frac{1-r^2}{n-2}}}=\frac{0.9593 \sqrt{12 - 2}}{\sqrt{1-0.9593^2}}=10.74 t r = n − 2 1 − r 2 r − 0 = 1 − 0 . 9 5 9 3 2 0 . 9 5 9 3 1 2 − 2 = 1 0 . 7 4
n = length(x)
tr = r/sqrt((1-r^2)/(n-2));tr
7.计算t值和p值,作结论
cor.test(x,y)
Pearson's product-moment correlation
data: x and y
t = 10.743, df = 10, p-value = 8.21e-07
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.8574875 0.9888163
sample estimates:
cor
0.9593031
**分析:
**
**p < 5
**
**95%区间估计为[0.8574875 0.9888163]
**
拒绝H 0 H_ 0 H 0
8. 一元线性回归模型的额参数估计
直线方程的模型:y ^ = a + b x \hat y=a+bx y ^ = a + b x
b = l x y l x x = ∑ i = 1 n ( x i − x ˉ ) ( y i − y ˉ ) ∑ i = 1 n ( x i − x ˉ ) 2 b=\frac{l_{xy}}{l_{xx}}=\frac{\sum_{i = 1}^{n}{(x_ i - \bar x)(y_ i - \bar y)}}{\sum_{i = 1}^{n}{(x_ i - \bar x)^2}} b = l x x l x y = ∑ i = 1 n ( x
请发表评论