在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
#将DTdata.csv中的数据带入play_decision变量中,有header,分隔符为,
> play_decision <- read.table("DTdata.csv",header = TRUE,sep = ",")
#查看数据
> play_decision
Play Outlook Temperature Humidity Wind
1 yes rainy cool normal FALSE
2 no rainy cool normal TRUE
3 yes overcast hot high FALSE
4 no sunny mild high FALSE
5 yes rainy cool normal FALSE
6 yes sunny cool normal FALSE
7 yes rainy cool normal FALSE
8 yes sunny hot normal FALSE
9 yes overcast mild high TRUE
10 no sunny mild high TRUE
#用rpart函数生成决策树模型,基于4种属性来预测Play属性,method="class"表示构建分类树,data指定了包含属性的数据帧,control控制
树的增长minsplit=1要求每个节点在尝试分裂前必须要有至少一个观测值
> fit <- rpart(Play ~ Outlook + Temperature + Humidity + Wind , method = "class" , data = play_decision ,
control = rpart.control(minsplit = 1) , parms = list(split="information"))
#观察rpart生成的决策树模型概览
> summary(fit)
Call:
rpart(formula = Play ~ Outlook + Temperature + Humidity + Wind,
data = play_decision, method = "class", parms = list(split = "information"),
control = rpart.control(minsplit = 1))
n= 10
CP nsplit rel error xerror xstd
1 0.3333333 0 1 1.000000 0.4830459
2 0.0100000 3 0 1.666667 0.5270463
Variable importance
Wind Outlook Temperature
51 29 20
Node number 1: 10 observations, complexity param=0.3333333
predicted class=yes expected loss=0.3 P(node) =1
class counts: 3 7
probabilities: 0.300 0.700
left son=2 (3 obs) right son=3 (7 obs)
Primary splits:
Temperature splits as RRL, improve=1.3282860, (0 missing)
Wind < 0.5 to the right, improve=1.3282860, (0 missing)
Outlook splits as RLL, improve=0.8161371, (0 missing)
Humidity splits as LR, improve=0.6326870, (0 missing)
Surrogate splits:
Wind < 0.5 to the right, agree=0.8, adj=0.333, (0 split)
Node number 2: 3 observations, complexity param=0.3333333
predicted class=no expected loss=0.3333333 P(node) =0.3
class counts: 2 1
probabilities: 0.667 0.333
left son=4 (2 obs) right son=5 (1 obs)
Primary splits:
Outlook splits as R-L, improve=1.9095430, (0 missing)
Wind < 0.5 to the left, improve=0.5232481, (0 missing)
Node number 3: 7 observations, complexity param=0.3333333
predicted class=yes expected loss=0.1428571 P(node) =0.7
class counts: 1 6
probabilities: 0.143 0.857
left son=6 (1 obs) right son=7 (6 obs)
Primary splits:
Wind < 0.5 to the right, improve=2.8708140, (0 missing)
Outlook splits as RLR, improve=0.6214736, (0 missing)
Temperature splits as LR-, improve=0.3688021, (0 missing)
Humidity splits as RL, improve=0.1674470, (0 missing)
Node number 4: 2 observations
predicted class=no expected loss=0 P(node) =0.2
class counts: 2 0
probabilities: 1.000 0.000
Node number 5: 1 observations
predicted class=yes expected loss=0 P(node) =0.1
class counts: 0 1
probabilities: 0.000 1.000
Node number 6: 1 observations
predicted class=no expected loss=0 P(node) =0.1
class counts: 1 0
probabilities: 1.000 0.000
Node number 7: 6 observations
predicted class=yes expected loss=0 P(node) =0.6
class counts: 0 6
probabilities: 0.000 1.000
#将已生成的决策树可视化
> rpart.plot(fit , type = 4 , extra = 1)
#建立新的数据帧
> newdata <- data.frame(Outlook="rainy",Temperature="mild",Humidity="high",Wind=FALSE)
#查看新建立的数据帧
> newdata
#利用predict函数预测newdata数据帧的属性是否会Play,参数type表示预测值类型
> predict(fit,newdata=newdata,type="prob")
no yes
1 1 0
或
> predict(fit,newdata=newdata,type="class")
1
no
Levels: no yes
#结果显示两种预测值类型结果都显示no
|
请发表评论