• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    公众号

R语言 - 决策树

原作者: [db:作者] 来自: [db:来源] 收藏 邀请
#将DTdata.csv中的数据带入play_decision变量中,有header,分隔符为,
> play_decision <- read.table("DTdata.csv",header = TRUE,sep = ",")

#查看数据
> play_decision

Play Outlook Temperature Humidity Wind
1 yes rainy cool normal FALSE
2 no rainy cool normal TRUE
3 yes overcast hot high FALSE
4 no sunny mild high FALSE
5 yes rainy cool normal FALSE
6 yes sunny cool normal FALSE
7 yes rainy cool normal FALSE
8 yes sunny hot normal FALSE
9 yes overcast mild high TRUE
10 no sunny mild high TRUE

#用rpart函数生成决策树模型,基于4种属性来预测Play属性,method="class"表示构建分类树,data指定了包含属性的数据帧,control控制
树的增长minsplit=1要求每个节点在尝试分裂前必须要有至少一个观测值
> fit <- rpart(Play ~ Outlook + Temperature + Humidity + Wind , method = "class" , data = play_decision ,
control = rpart.control(minsplit = 1) , parms = list(split="information"))

#观察rpart生成的决策树模型概览
> summary(fit)

Call:
rpart(formula = Play ~ Outlook + Temperature + Humidity + Wind,
data = play_decision, method = "class", parms = list(split = "information"),
control = rpart.control(minsplit = 1))
n= 10

CP nsplit rel error xerror xstd
1 0.3333333 0 1 1.000000 0.4830459
2 0.0100000 3 0 1.666667 0.5270463

Variable importance
Wind Outlook Temperature
51 29 20

Node number 1: 10 observations, complexity param=0.3333333
predicted class=yes expected loss=0.3 P(node) =1
class counts: 3 7
probabilities: 0.300 0.700
left son=2 (3 obs) right son=3 (7 obs)
Primary splits:
Temperature splits as RRL, improve=1.3282860, (0 missing)
Wind < 0.5 to the right, improve=1.3282860, (0 missing)
Outlook splits as RLL, improve=0.8161371, (0 missing)
Humidity splits as LR, improve=0.6326870, (0 missing)
Surrogate splits:
Wind < 0.5 to the right, agree=0.8, adj=0.333, (0 split)

Node number 2: 3 observations, complexity param=0.3333333
predicted class=no expected loss=0.3333333 P(node) =0.3
class counts: 2 1
probabilities: 0.667 0.333
left son=4 (2 obs) right son=5 (1 obs)
Primary splits:
Outlook splits as R-L, improve=1.9095430, (0 missing)
Wind < 0.5 to the left, improve=0.5232481, (0 missing)

Node number 3: 7 observations, complexity param=0.3333333
predicted class=yes expected loss=0.1428571 P(node) =0.7
class counts: 1 6
probabilities: 0.143 0.857
left son=6 (1 obs) right son=7 (6 obs)
Primary splits:
Wind < 0.5 to the right, improve=2.8708140, (0 missing)
Outlook splits as RLR, improve=0.6214736, (0 missing)
Temperature splits as LR-, improve=0.3688021, (0 missing)
Humidity splits as RL, improve=0.1674470, (0 missing)

Node number 4: 2 observations
predicted class=no expected loss=0 P(node) =0.2
class counts: 2 0
probabilities: 1.000 0.000

Node number 5: 1 observations
predicted class=yes expected loss=0 P(node) =0.1
class counts: 0 1
probabilities: 0.000 1.000

Node number 6: 1 observations
predicted class=no expected loss=0 P(node) =0.1
class counts: 1 0
probabilities: 1.000 0.000

Node number 7: 6 observations
predicted class=yes expected loss=0 P(node) =0.6
class counts: 0 6
probabilities: 0.000 1.000

#将已生成的决策树可视化
> rpart.plot(fit , type = 4 , extra = 1)


#建立新的数据帧
> newdata <- data.frame(Outlook="rainy",Temperature="mild",Humidity="high",Wind=FALSE)

#查看新建立的数据帧
> newdata

#利用predict函数预测newdata数据帧的属性是否会Play,参数type表示预测值类型
> predict(fit,newdata=newdata,type="prob")

no yes
1 1 0


> predict(fit,newdata=newdata,type="class")

1
no
Levels: no yes

#结果显示两种预测值类型结果都显示no

鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
外推预测法(R语言预测实战-节选)发布时间:2022-07-18
下一篇:
R语言中 数据读取与写入发布时间:2022-07-18
热门推荐
热门话题
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap