• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    公众号

off99555/machine-learning-curriculum: Learn to make machines learn so that you d ...

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称(OpenSource Name):

off99555/machine-learning-curriculum

开源软件地址(OpenSource Url):

https://github.com/off99555/machine-learning-curriculum

开源编程语言(OpenSource Language):


开源软件介绍(OpenSource Introduction):

Machine Learning Curriculum

Machine Learning is a branch of Artificial Intelligence dedicated at making machines learn from observational data without being explicitly programmed.

Machine learning and AI are not the same. Machine learning is an instrument in the AI symphony — a component of AI. So what is Machine Learning — or ML — exactly? It’s the ability for an algorithm to learn from prior data in order to produce a behavior. ML is teaching machines to make decisions in situations they have never seen.

This curriculum is made to guide you to learn machine learning, recommend tools, and help you to embrace ML lifestyle by suggesting media to follow. I update it regularly to maintain freshness and get rid of outdated content and deprecated tools.

Machine Learning in General

Study this section to understand fundamental concepts and develop intuitions before going any deeper.

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.

Books

Reinforcement Learning

Building a machine that senses the environment and then chooses the best policy (action) to do at any given state to maximize its expected long-term scalar reward is the goal of reinforcement learning.

Deep Learning

Deep learning is a branch of machine learning where deep artificial neural networks (DNN) — algorithms inspired by the way neurons work in the brain — find patterns in raw data by combining multiple layers of artificial neurons. As the layers increase, so does the neural network’s ability to learn increasingly abstract concepts.

The simplest kind of DNN is a Multilayer Perceptron (MLP).

Convolutional Neural Networks

DNNs that work with grid data like sound waveforms, images and videos better than ordinary DNNs. They are based on the assumptions that nearby input units are more related than the distant units. They also utilize translation invariance. For example, given an image, it might be useful to detect the same kind of edges everywhere on the image. They are sometimes called convnets or CNNs.

Recurrent Neural Networks

DNNs that have states. They also understand sequences that vary in length. They are sometimes called RNNs.

Best Practices

Unsupervised Domain Adaptation

Unsupervised Domain Adaptation is a type of Transfer Learning that applies a model that was trained on source dataset to do well on a target dataset without any label on the target dataset. It's one of the technique that is practically useful in the real world when the cost of labeling target dataset is high. One of the example is to train a model on synthetic data with label and try to use it on real data without label.

Tools

Libraries and frameworks that are useful for practical machine learning

Frameworks

Machine learning building blocks

  • scikit-learn (Python) general machine learning library, high level abstraction, geared towards beginners
  • TensorFlow (Python); Awesome TensorFlow; computation graph framework built by Google, has nice visualization board, probably the most popular framework nowadays for doing Deep Learning
  • Keras: Deep Learning library for Theano and TensorFlow (Python)
  • PyTorch (Python) PyTorch is a deep learning framework that puts Python first.
  • Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity.
  • Chainer (Python) A flexible framework of neural networks for deep learning
  • DeepLearning4j (Java) Model import deployment framework for retraining models (pytorch, tensorflow,keras) deploying in JVM Micro service environments, mobile devices, iot, and Apache Spark
  • Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. There is a specific focus on reinforcement learning with several contextual bandit algorithms implemented and the online nature lending to the problem well.
  • H2O is an in-memory platform for distributed, scalable machine learning.
  • spektral Graph Neural Networks with Keras and Tensorflow 2.

No coding

  • Lobe a drag-and-drop tool for machine learning
  • Ludwig Ludwig is a toolbox that allows users to train and test deep learning models without the need to write code. It is built on top of TensorFlow.

Gradient Boosting

Models that are used heavily in competitions because of their outstanding generalization performance.

Time Series Inference

Time series data require unique feature extraction process for them to be usable in most machine learning models because most models require data to be in a tabular format. Or you can use special model architectures which target time series e.g. LSTM, TCN, etc.

Life Cycle

Libraries that help you develop/debug/deploy the model in production (MLOps). There is more to ML than training the model.

  • https://github.com/allegroai/clearml Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, ML-Ops and Data-Management
  • https://github.com/quantumblacklabs/kedro A Python framework for creating reproducible, maintainable and modular data science code.
  • https://github.com/determined-ai/determined Determined is an open-source deep learning training platform that makes building models fast and easy. I use it mainly for tuning hyperparameters.
  • https://github.com/iterative/cml Continuous Machine Learning (CML) is an open-source library for implementing continuous integration & delivery (CI/CD) in machine learning projects. Use it to automate parts of your development workflow, including model training and evaluation, comparing ML experiments across your project history, and monitoring changing datasets.
  • https://github.com/creme-ml/creme Python library for online machine learning. All the tools in the library can be updated with a single observation at a time, and can therefore be used to learn from streaming data.
  • https://github.com/aimhubio/aim A super-easy way to record, search and compare 1000s of ML training runs
  • https://github.com/Netflix/metaflow Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix.
  • MLflow MLflow (currently in beta) is an open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment. It currently offers three components: MLflow Tracking, MLflow Projects, MLflow Models.
  • FloydHub a Heroku for Deep Learning (You focus on the model, they'll deploy)
  • comet.ml Comet enables data scientists and teams to track, compare, explain and optimize experiments and models across the model's entire lifecycle. From training to production
  • https://neptune.ai/ Manage all your model building metadata in a single place
  • https://wandb.ai/site Build better models faster with experiment tracking, dataset versioning, and model management
  • https://github.com/fastai/nbdev Create delightful python projects using Jupyter Notebooks
  • https://rapids.ai/ data science on GPUs
  • https://github.com/datarevenue-berlin/OpenMLOps
  • https://github.com/jacopotagliabue/you-dont-need-a-bigger-boat Not really a tool, but a guide on how to compose many tools together in real-world reasonable scale business.

Data Storage

  • https://github.com/activeloopai/Hub Fastest dataset optimization and management for machine and deep learning. Stream data real-time & version-control it.
  • https://github.com/determined-ai/yogadl Better approach to data loading for Deep Learning. API-transparent caching to disk, GCS, or S3.
  • https://github.com/google/ml_collections ML Collections is a library of Python Collections designed for ML use cases. It contains ConfigDict, a "dict-like" data structures with dot access to nested elements. It is supposed to be used as a main way of expressing configurations of experiments and models.

Data Wrangling

Data cleaning and data augmentation

Data Orchestration

Data Visualization

Hyperparameter Tuning

Before you begin, please read this blog post to understand the motivation of searching in general: https://www.determined.ai/blog/stop-doing-iterative-model-development

Open your eyes to search-driven development. It will change you. Main benefit is that there will be no setbacks. Only progress and improvement are allowed. Imagine working and progressing everyday, instead of regressing backwards because your new solution doesn't work. This guaranteed progress is what search-driven development will do to you. Apply it to everything in optimization, not just machine learning.

My top opinionated preferences are determined, ray tune, and optuna because of parallelization (distributed tuning on many machines), flexibility (can optimize arbitrary objectives and allow dataset parameters to be tuned), library of SOTA tuning algorithms (e.g. HyperBand, BOHB, TPE, PBT, ASHA, etc), result visualization/analysis tools, and extensive documentations/tutorials.

AutoML

Make machines learn without the tedious task of feature engineering, model selection, and hyperparameter tuning that you have to do yourself. Let the machines perform machine learning for you!

Personally if I have a tabular dataset I would try FLAML and mljar first, especially if you want to get something working fast. If you want to try gradient boosting frameworks such as XGBoost, LightGBM, CatBoost, etc but you don't know which one works best, I suggest you to try AutoML first because internally it will try the gradient boosting frameworks mentioned previously.

Model Architectures

Architectures that are state-of-the-art in its field.

Interesting Techniques & Applications

Nice Blogs & Vlogs to Follow

Impactful People

  • Geoffrey Hinton, he has been called the godfather of deep learning by introducing 2 revolutionizing techniques (ReLU and Dropout) with his students. These techniques solve the Vanishing Gradient and Generalization problem of deep neural networks. He also taught a Neural Networks course at Coursera.
  • Yann LeCun, he invented CNNs (Convolutional neural networks), the kind of network that is really popular among computer vision developers today
  • Yoshua Bengio another serious professor at Deep Learning, you can watch his TEDx talk here (2017)
  • Andrew Ng he discovered that GPUs make deep learning faster. He taught 2 famous online courses, Machine Learning and Deep Learning specialization at Coursera.
  • Juergen Schmidhuber invented LSTM (a particular type of RNN)
  • Jeff Dean, a Google Brain engineer, watch his TEDx Talk
  • Ian Goodfellow, he invented GANs (Generative Adversarial Networks), is an OpenAI engineer
  • David Silver this is the guy behind AlphaGo and Artari reinforcement learning game agents at DeepMind
  • Demis Hassabis CEO of DeepMind, has given a lot of talks about AlphaGo and Reinforcement Learning achievements they have
  • Andrej Karparthy he teaches convnet classes, wrote ConvNetJS, and produces a lot of content for DL community, he also writes a blog (see Nice Blogs & Vlogs to Follow section)
  • Pedro Domingos he wrote the book The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World, watch his TEDx talk here

Cutting-Edge Research Publishers

Steal the most recent techniques introduced by smart computer scientists (could be you).

Practitioner Community

热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap