• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    公众号

R语言使用机器学习算法预测股票市场

原作者: [db:作者] 来自: [db:来源] 收藏 邀请

quantmod 介绍

quantmod 是一个非常强大的金融分析报, 包含数据抓取,清洗,建模等等功能.

1. 获取数据 getSymbols

  默认是数据源是yahoo

       获取上交所股票为 getSymbols("600030.ss"), 深交所为 getSymbols("000002.sz").  ss表示上交所, sz表示深交所

2. 重命名函数 setSymbolLookup

3. 股息函数 getDividends

4. 除息调整函数 adjustOHLC

5. 除权除息函数 getSplits

6. 期权交易函数 getOptionChain

7. 财务报表 getFinancials / getFin

> library(quantmod)
> setSymbolLookup(WANKE=list(name="000002.sz", src="yahoo"))
> getSymbols("WANKE")
[1] "WANKE"
Warning message:
000002.sz contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them. 
> head(WANKE)
           000002.SZ.Open 000002.SZ.High 000002.SZ.Low 000002.SZ.Close
2008-03-17         14.221         14.221        14.221           13.65
2008-03-18             NA             NA            NA              NA
2008-03-19             NA             NA            NA              NA
2008-03-20             NA             NA            NA              NA
2008-03-21             NA             NA            NA              NA
2008-03-24             NA             NA            NA              NA
           000002.SZ.Volume 000002.SZ.Adjusted
2008-03-17        123340858           13.10156
2008-03-18               NA                 NA
2008-03-19               NA                 NA
2008-03-20               NA                 NA
2008-03-21               NA                 NA
2008-03-24               NA                 NA
> 

 


 

机器学习 Classification

首先, 简化问题, 只预测股票的涨跌情况. 问题就变成一个分类问题, 把历史数据分为涨跌两种情况. 进一不简化, 涨跌情况只与历史数据情况有关.

我们使用Naive Bayes classifier (朴素的贝叶斯分类) 作为学习方法. 朴素的贝叶斯的定义为: 给定类别A条件下,所有的属性Ai相互独立

R语言的实现如下

> library(lubridate)
#日期包
> library(e1071)
#朴素贝叶斯包
> library(quantmod)
> setSymbolLookup(WANKE=list(name="000002.sz", src="yahoo"))
> getSymbols("WANKE")
[1] "WANKE"


> head(WANKE)
           000002.SZ.Open 000002.SZ.High 000002.SZ.Low 000002.SZ.Close
2008-03-17         14.221         14.221        14.221           13.65
2008-03-18             NA             NA            NA              NA
2008-03-19             NA             NA            NA              NA
2008-03-20             NA             NA            NA              NA
2008-03-21             NA             NA            NA              NA
2008-03-24             NA             NA            NA              NA
           000002.SZ.Volume 000002.SZ.Adjusted
2008-03-17        123340858           13.10156
2008-03-18               NA                 NA
2008-03-19               NA                 NA
2008-03-20               NA                 NA
2008-03-21               NA                 NA
2008-03-24               NA                 NA
> tail(WANKE)
           000002.SZ.Open 000002.SZ.High 000002.SZ.Low 000002.SZ.Close
2017-07-31          23.52          23.58         23.10           23.37
2017-08-01          23.35          23.55         23.20           23.42
2017-08-02          23.45          24.12         23.43           23.58
2017-08-03          23.58          23.58         22.79           23.11
2017-08-04          23.00          23.06         22.71           22.84
2017-08-07          22.82          23.05         22.68           22.71
           000002.SZ.Volume 000002.SZ.Adjusted
2017-07-31         30942482              23.37
2017-08-01         20952262              23.42
2017-08-02         35391017              23.58
2017-08-03         45518939              23.11
2017-08-04         29612306              22.84
2017-08-07         23409149              22.71
> 

> startDate <- as.Date("2010-01-01")
> endDate <- as.Date("2017-01-01")
> DayofWeek <- wday(WANKE, label=TRUE)
> PriceChange <- Cl(WANKE) - Op(WANKE)
#收盘减去开盘
> Class <- ifelse(PriceChange > 0, "UP", "DOWN")
#大于0就是涨
> DataSet <- data.frame(DayofWeek, Class)

> MyModel <- naiveBayes(DataSet[,1], DataSet[,2])
> MyModel

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = DataSet[, 1], y = DataSet[, 2])

A-priori probabilities:
DataSet[, 2]
     DOWN        UP 
0.5148148 0.4851852 

Conditional probabilities:
            x
DataSet[, 2]       Sun       Mon      Tues       Wed     Thurs       Fri
        DOWN 0.0000000 0.2374101 0.1510791 0.2158273 0.1870504 0.2086331
        UP   0.0000000 0.1603053 0.2442748 0.1908397 0.2137405 0.1908397
            x
DataSet[, 2]       Sat
        DOWN 0.0000000
        UP   0.0000000

> 
整个dataset的涨跌概率
DataSet[, 2]
     DOWN        UP 
0.5148148 0.4851852
基于这个涨跌概率下, 每天的涨跌概率
Conditional probabilities:
            x
DataSet[, 2]       Sun       Mon      Tues       Wed     Thurs       Fri
        DOWN 0.0000000 0.2374101 0.1510791 0.2158273 0.1870504 0.2086331
        UP   0.0000000 0.1603053 0.2442748 0.1908397 0.2137405 0.1908397
            x
DataSet[, 2]       Sat
        DOWN 0.0000000
        UP   0.0000000

模型改进

指数移动平均值 EMA (exponential moving average)

> W <- na.omit(WANKE)
> DayofWeek <- wday(W, label=TRUE)
> PriceChange <- Cl(W) - Op(W)
> Class <- ifelse(PriceChange > 0, "UP", "DOWN")
> EMA5 <- EMA(Op(W), n = 5)
> EMA10 <- EMA(Op(W), n = 10)
> EMACross <- EMA5 -EMA10
> EMACross <- round(EMACross, 2)
> DataSet2 <- data.frame(DayofWeek, EMACross, Class)
> DataSet2<-DataSet2[-c(1:10),]
> head(DataSet2)
           DayofWeek   EMA X000002.SZ.Close
2016-07-14     Thurs  0.11             DOWN
2016-07-15       Fri  0.04             DOWN
2016-07-18       Mon  0.00             DOWN
2016-07-19      Tues -0.10             DOWN
2016-07-20       Wed -0.23             DOWN
2016-07-21     Thurs -0.28             DOWN
> tail(DataSet2)
           DayofWeek   EMA X000002.SZ.Close
2017-07-31       Mon -0.34             DOWN
2017-08-01      Tues -0.31               UP
2017-08-02       Wed -0.26               UP
2017-08-03     Thurs -0.19             DOWN
2017-08-04       Fri -0.24             DOWN
2017-08-07       Mon -0.27             DOWN

> length(DayofWeek)
[1] 270
> TrainingSet<-DataSet2[1:200,]
> TestSet<-DataSet2[201:270,] 
> EMACrossModel<-naiveBayes(TrainingSet[,1:2],TrainingSet[,3]) 
> EMACrossModel

Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = TrainingSet[, 1:2], y = TrainingSet[, 
    3])

A-priori probabilities:
TrainingSet[, 3]
DOWN   UP 
 0.5  0.5 

Conditional probabilities:
                DayofWeek
TrainingSet[, 3]  Sun  Mon Tues  Wed Thurs  Fri  Sat
            DOWN 0.00 0.22 0.13 0.24  0.18 0.23 0.00
            UP   0.00 0.16 0.27 0.17  0.23 0.17 0.00

                EMA
TrainingSet[, 3]    [,1]      [,2]
            DOWN  0.0333 0.4119553
            UP   -0.0177 0.4191522

> table(predict(EMACrossModel,TestSet),TestSet[,3],dnn=list(\'predicted\',\'actual\')) 
         actual
predicted DOWN UP
     DOWN   16 21
     UP     13 10
> 

 

 


 

参考文献

quantmod

http://www.quantmod.com/, 

https://github.com/dengyishuo/Notes/tree/master/quantmod 

Naive Bayes classifier

http://blog.csdn.net/sulliy/article/details/6629201

Introduction to Use Machine Learning by R

https://www.inovancetech.com/blogML2.html

 


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
『原创』机器学习算法的R语言实现(二):决策树算法 - Digging4 ...发布时间:2022-07-18
下一篇:
机器学习与R语言:C5.0发布时间:2022-07-18
热门推荐
热门话题
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap