gburtini/Learning-Library-for-PHP: The rudimentary workings of a machine learnin ...

原作者: [db:作者] 来自: 网络收藏邀请

开源软件名称（OpenSource Name）：

gburtini/Learning-Library-for-PHP

开源软件地址(OpenSource Url)：

https://github.com/gburtini/Learning-Library-for-PHP

开源编程语言(OpenSource Language)：

PHP 100.0%

开源软件介绍(OpenSource Introduction)：

Learning Library for PHP

Some machine learning/artificial intelligence/natural language processing algorithms implemented in PHP. Note that in almost all cases, PHP as it stands today is the wrong tool for most machine learning jobs. This library provides a pedagogical introduction to these tools more than it is a recommendation that it is used for day-to-day development.

Instructions

In general, you'll want to grab just the "required" features from this repository for your project -- a lot of the individual methods are standalone (or only dependent on the accessory directory). Browse the lib/ directory and decide which techniques you are interested in.

Available Algorithms

Unsupervised

DBScan (dbscan.php) - Density Based Clustering [1][2] - a clustering/unsupervised classification algorithm based on the idea of "density reachability." This algorithm is a win over the others because one does not need to specify the number of clusters a priori. The parameters are $e(psilon), the size of a neighborhood to visit (a noise threshold) and $minimumPoints, the minimum number of points to form a cluster.
K Means (kmeans.php) - the standard clustering algorithm which breaks data in to k "most different" groups. The technique is simply to reposition the "centroid" to the average of all points until it doesn't move any longer.
K Nearest Neighbors (knn.php) - similar to K Means, except "flipped on its head" - a clustering algorithm which builds the best clusters that are of size k (rather than building k clusters). (Wikipedia)
Markov Chain (markovchain.php) - a n-order Markov Chain implementation - takes in a list of values to train and computes probabilities simply from observations.

Parametric

Anomaly Detection (anomaly_detection.php) - assume a normal distribution, train data (n-dimensional) and then test to see if a given record is an "outlier" (less likely than a given percent, given the distribution). Assumes semi-stationarity (training can happen online with testing if you wish).
Naive Bayes (naivebayes.php)
Regression (regression.php) including optimization implementations for gradient descent ("take a step in the right direction"), stochastic gradient descent, normal equations and a logistic regression implementation.
Simulated Annealing (sann.php) - an implementation of simulated annealing, a probabilistic metaheuristic for finding global optima with no assumptions about smoothness or size of the search space. Written by Graeme Douglas (2014).

Changepoint Detection

Mann-Whitney (mann_whitney.php) - provides the test of Pettitt (1979) based on the Mann and Whitney "U" test (or rank-sum test, Wikipedia). The function provided accepts a list and a threshold (critical point) and returns a list of locations where changepoints were detected according to the test.
Page-Hinkley (page_hinkley.php) - provides the test used in Mouss et al. (2004) and Hartland et al. (2007) among others based on Page's (1953) idea of a cumulative sum (Wikipedia). The test is parameterized with $alpha, the minimum amplitude of a change and $lambda, a parameter proportional to the false positive rate. An extension to Page-Hinkley as provided in Ikonomovska (2012) is also provided which automatically calibrates $alpha to be the standard deviation of the data.

Bandits (Partial Observation Reinforcement Learning)

Epsilon Greedy Bandit (EpsilonBandit.php) - explores (purely randomly) e fraction of the time, exploits (maximizes estimated reward) the rest.
UCB1 Bandit (UCB1Bandit.php) - implements the UCB1 (upper confidence bound) algorithm, as described by Auer et al. (2002). Provides an average and padding function implementation that can easily be extended to other UCB variants.

References

[1] Domenica Arlia, Massimo Coppola. "Experiments in Parallel Clustering with DBSCAN". Euro-Par 2001: Parallel Processing: 7th International Euro-Par Conference Manchester, UK August 28–31, 2001, Proceedings. Springer Berlin.
[2] Hans-Peter Kriegel, Peer Kröger, Jörg Sander, Arthur Zimek (2011). "Density-based Clustering". WIREs Data Mining and Knowledge Discovery 1 (3): 231–240. doi:10.1002/widm.30.
[3] Auer, Peter, Nicolo Cesa-Bianchi, and Paul Fischer. "Finite-time analysis of the multiarmed bandit problem." Machine learning 47.2-3 (2002): 235-256.

To-Do

Update to PHP 5.x, change to use namespaces instead of messy function names.
Lots of missing documentation - most public facing methods are currently undocumented; bad.
Build TF-IDF class / simple vector search space class.
Build MonteCarlo class (with callbacks)
OOize all appropriate algorithms (use "train" and "test" when possible).
Complete tests for algorithms that do not have them.
Consider adding a pathfinding/graph search algorithm set.
Ensemble and boosting learning methods like random forests / CART / BART
Neural networks and HMMs.
NLP work, specifically a class for using WordNet and the Stanford Core NLP library; eventually, NLP work should probably be forked as its own project.

Notes for Use

For effective use, a lot of this library will have to be customized. This is a largely academic project. In many cases I've opted to write clearer code in favor of faster code, and in other cases, I've excluded useful features for "real world" applications (like training, saving the trained data to a file, and then running the actual "estimates" after in a separate location).

If you would like to deploy this in a real world application, I would be happy to discuss work on any machine learning problems you do have: contact me at [email protected].

Most of the code in this library is designed such that the rest of the library can just be thrown away if you would like to use it. In parametric/ and unsupervised/ each "type" of learning is implemented in a file of its own (though, regression stuff gets a whole directory!) so as to be useful without loading the rest of the library.

There are many things that can be improved and there are many known properties (and optimization techniques) that can be used to improve the performance of these algorithms that have not been implemented here. This is very much a "first run" at implementing a lot of these algorithms in PHP and should be looked at as a possible starting point for learning algorithms in PHP, not necessarily a deployable library.

In many cases, the right answer will be to implement the learning algorithms in a faster language and use PHP only to evaluate their probabilities / compute results from the existing estimates.

Citing This

If you wish to cite this work your work, you can cite it as

Burtini, G. (2011). Learning Library for PHP. Canada. <_http://github.com/gburtini/Learning-Library-for-PHP_>

Works Using This

If you wish to have your work using the Learning Library for PHP in a machine learning, artificial intelligence, bandits or other context, please contact me with details and I will add it here. Actually, even if you don't wish to have your work listed here, I'd love to hear about how you're using the library or what you have learned.

Buchmann et al. Personal Information Dashboard: Putting the Individual Back in Control. Knowledge and Privacy Analytics Engine. Digital Enlightenment Yearbook (2013) p. 139-164, Iso Press, September 2013. ISBN 978-1-61499-294-3 (print) | 978-1-61499-295-0 (online).
Various private projects.

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

If you need to work with the Learning Library in an environment that is not conducive to the GPL, please contact me at [email protected] and we can discuss alternative licensing terms.

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

politie/pipeline-library: Jenkins shared library for use with pipeline-as-code发布时间：2022-08-15

Warrenren/inside-rust-std-library: 本书主要对RUST的标准库代码进行分析，并试图给 ...发布时间：2022-08-15

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：18962|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9916|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8301|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8665|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8598|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9606|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8588|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7973|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8600|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7515|2022-11-06

客服电话

电子邮件

gburtini/Learning-Library-for-PHP: The rudimentary workings of a machine learnin ...

开源软件名称（OpenSource Name）：

开源软件地址(OpenSource Url)：

开源编程语言(OpenSource Language)：

开源软件介绍(OpenSource Introduction)：

Learning Library for PHP

Instructions

Available Algorithms

Unsupervised

Parametric

Changepoint Detection

Bandits (Partial Observation Reinforcement Learning)

References

To-Do

Notes for Use

Citing This

Works Using This

License

请发表评论

全部评论

上一篇：

下一篇：

CVE-2022-35379

GitbookIO/gitbook:

blueskythlikesclouds/MikuMikuLibrary: Fo

juleswhite/mobile-cloud-asgn1

kyamagu/matlab-json: Use official API: h

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053