Machine Learning is a branch of Artificial Intelligence dedicated at making
machines learn from observational data without being explicitly programmed.
Machine learning and AI are not the same. Machine learning is an instrument in
the AI symphony — a component of AI. So what is Machine Learning — or ML —
exactly? It’s the ability for an algorithm to learn from prior data in order
to produce a behavior. ML is teaching machines to make decisions in situations
they have never seen.
This curriculum is made to guide you to learn machine learning, recommend tools, and help you to embrace ML lifestyle by suggesting media to follow.
I update it regularly to maintain freshness and get rid of outdated content and deprecated tools.
Machine Learning in General
Study this section to understand fundamental concepts and develop intuitions before going any deeper.
A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P if its performance at tasks in
T, as measured by P, improves with experience E.
Elements of AI A bunch of easy courses teaching AI and machine learning
Artificial Intelligence, Revealed a quick introduction by Yann LeCun, mostly about Machine Learning ideas, Deep Learning, and convolutional neural network
Andrew Ng's Specialization on Coursera recommended for people who want to know the details of ML algorithms under the hood, understand enough maths to be dangerous and do coding assignments in python
Landing A Data Job: The Course is an opinionated and practical guideline for those who wants to become data scientists quickly. For example, they suggest that knowing how decision tree works is already good enough, you don't need to know how all the models work, which is true!
Building a machine that senses the environment and then chooses the best policy
(action) to do at any given state to maximize its expected long-term scalar
reward is the goal of reinforcement learning.
OpenAI Spinning Up This is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning (deep RL).
Basic Reinforcement Learning An introduction series to Reinforcement Learning (RL) with comprehensive step-by-step tutorials.
Deep learning is a branch of machine learning where deep artificial neural
networks (DNN) — algorithms inspired by the way neurons work in the brain — find
patterns in raw data by combining multiple layers of artificial neurons. As the
layers increase, so does the neural network’s ability to learn increasingly
abstract concepts.
Deep learning - Udacity recommended for visual learner who knows some ML, this course provides high level ideas of deep learning, dense intuitive details put in a short amount of time, you will use TensorFlow inside the course
Deep Learning Book recommended for math
nerds who want to understand the theoretical side, the book is crafted by our
deep learning wizards (Goodfellow, Bengio and Courville)
http://neuralnetworksanddeeplearning.com/index.html a hands-on online book for deep learning maths intuition, I can say that after you finish this, you will be able to explain deep learning in a fine detail.
https://www.kadenze.com/courses/creative-applications-of-deep-learning-with-tensorflow-i You will implement a lot of things inside TensorFlow such as Autoencoders, Convolutional neural net, Feedforward neural nets, Generative models (Generative Adversarial Networks, Recurrent networks), visualizing the network, etc. You will have lots of assignments to finish. The course director (Parag) is also approachable and active.
The Neural Network Zoo a bunch of neural network models that you should know about (I know about half of them so don't worry that you don't know many because most of them are not popular or useful in the present)
Advancing AI theory with a first-principles understanding of deep neural networks we use deep learning successfully for quite a long time but we don't know exactly why it works and how to really improve it from the ground up. This blog post contains a link to the paper that explains the first-principles attempt of understanding neural networks so that we can advance deep learning field further.
Convolutional Neural Networks
DNNs that work with grid data like sound waveforms, images and videos better
than ordinary DNNs. They are based on the assumptions that nearby input units
are more related than the distant units. They also utilize translation
invariance. For example, given an image, it might be useful to detect the same
kind of edges everywhere on the image.
They are sometimes called convnets or CNNs.
Deep Learning for Computer Vision (Andrej Karparthy, OpenAI) this
is my most favorite video of convolutional net. Andrej explains convnet in
detail answering all the curious questions that one might have. For example,
most articles only talk about convolution in grayscale image, but he describe
convolution in images with color channels as well. He also talks about the
concerns and the assumptions that convnets make. This is a great lecture!
Capsule Networks (CapsNets) – Tutorial CapsNets are a hot new architecture for neural networks, invented by Geoffrey Hinton, one of the godfathers of deep learning.
Recurrent Neural Networks
DNNs that have states. They also understand sequences that vary in length.
They are sometimes called RNNs.
Unsupervised Domain Adaptation is a type of Transfer Learning that applies a model that was trained on source dataset to do well on a target dataset without any label on the target dataset. It's one of the technique that is practically useful in the real world when the cost of labeling target dataset is high. One of the example is to train a model on synthetic data with label and try to use it on real data without label.
https://github.com/bbdamodaran/deepJDOT one of the easy-to-use implementation on keras of a unsupervised domain adaptation technique that I tried before and work great
Tools
Libraries and frameworks that are useful for practical machine learning
Frameworks
Machine learning building blocks
scikit-learn (Python) general machine learning library, high level abstraction, geared towards beginners
TensorFlow (Python); Awesome TensorFlow; computation graph framework built by Google, has nice visualization board, probably the most popular framework nowadays for doing Deep Learning
PyTorch (Python) PyTorch is a deep learning framework that puts Python first.
Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity.
Chainer (Python) A flexible framework of neural networks for deep learning
DeepLearning4j (Java) Model import deployment framework for retraining models (pytorch, tensorflow,keras) deploying in JVM Micro service environments, mobile devices, iot, and Apache Spark
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. There is a specific focus on reinforcement learning with several contextual bandit algorithms implemented and the online nature lending to the problem well.
H2O is an in-memory platform for distributed, scalable machine learning.
spektral Graph Neural Networks with Keras and Tensorflow 2.
https://github.com/catboost/catboost A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
https://github.com/tensorflow/decision-forests TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.
Time series data require unique feature extraction process for them to be usable in most machine learning models because most models require data to be in a tabular format.
Or you can use special model architectures which target time series e.g. LSTM, TCN, etc.
https://github.com/facebook/prophet Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://github.com/determined-ai/determined Determined is an open-source deep learning training platform that makes building models fast and easy. I use it mainly for tuning hyperparameters.
https://github.com/iterative/cml Continuous Machine Learning (CML) is an open-source library for implementing continuous integration & delivery (CI/CD) in machine learning projects. Use it to automate parts of your development workflow, including model training and evaluation, comparing ML experiments across your project history, and monitoring changing datasets.
https://github.com/creme-ml/creme Python library for online machine learning. All the tools in the library can be updated with a single observation at a time, and can therefore be used to learn from streaming data.
https://github.com/Netflix/metaflow Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix.
MLflow MLflow (currently in beta) is an open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment. It currently offers three components: MLflow Tracking, MLflow Projects, MLflow Models.
FloydHub a Heroku for Deep Learning (You focus on the model, they'll deploy)
comet.ml Comet enables data scientists and teams to track, compare, explain and optimize experiments and models across the model's entire lifecycle. From training to production
https://neptune.ai/ Manage all your model building metadata in a single place
https://wandb.ai/site Build better models faster with experiment tracking, dataset versioning, and model management
https://github.com/activeloopai/Hub Fastest dataset optimization and management for machine and deep learning. Stream data real-time & version-control it.
https://github.com/google/ml_collections ML Collections is a library of Python Collections designed for ML use cases. It contains ConfigDict, a "dict-like" data structures with dot access to nested elements. It is supposed to be used as a main way of expressing configurations of experiments and models.
https://github.com/ploomber/ploomber Ploomber is the fastest way to build data pipelines ⚡️. Use your favorite editor (Jupyter, VSCode, PyCharm) to develop interactively and deploy ☁️ without code changes (Kubernetes, Airflow, AWS Batch, and SLURM).
https://github.com/streamlit/streamlit Streamlit turns data scripts into shareable web apps in minutes. All in Python. All for free. No front‑end experience required.
https://github.com/lux-org/lux By simply printing out a dataframe in a Jupyter notebook, Lux recommends a set of visualizations highlighting interesting trends and patterns in the dataset.
Open your eyes to search-driven development. It will change you. Main benefit is that there will be no setbacks. Only progress and improvement are allowed. Imagine working and progressing everyday, instead of regressing backwards because your new solution doesn't work. This guaranteed progress is what search-driven development will do to you. Apply it to everything in optimization, not just machine learning.
My top opinionated preferences are determined, ray tune, and optuna because of parallelization (distributed tuning on many machines), flexibility (can optimize arbitrary objectives and allow dataset parameters to be tuned), library of SOTA tuning algorithms (e.g. HyperBand, BOHB, TPE, PBT, ASHA, etc), result visualization/analysis tools, and extensive documentations/tutorials.
https://docs.ray.io/en/master/tune/index.html Ray Tune is a Python library for experiment execution and hyperparameter tuning at any scale. If you are looking for distributed tuning, Ray Tune is probably the most serious framework out there.
https://github.com/optuna/optuna an automatic hyperparameter optimization software framework (framework agnostic, define-by-run)
Make machines learn without the tedious task of feature engineering, model selection, and hyperparameter tuning
that you have to do yourself. Let the machines perform machine learning for you!
Personally if I have a tabular dataset I would try FLAML and mljar first, especially if you want to get something working fast.
If you want to try gradient boosting frameworks such as XGBoost, LightGBM, CatBoost, etc but you don't know which one works best,
I suggest you to try AutoML first because internally it will try the gradient boosting frameworks mentioned previously.
https://github.com/mljar/mljar-supervised an Automated Machine Learning Python package that works with tabular data. I like that it generates visualization report (in the Explain mode) and extra features for you e.g. golden features and K-means features.
Geoffrey Hinton, he has been called
the godfather of deep learning
by introducing 2 revolutionizing techniques (ReLU and Dropout) with his students.
These techniques solve the Vanishing Gradient and Generalization problem of
deep neural networks. He also taught
a Neural Networks course at
Coursera.
Yann LeCun, he invented CNNs
(Convolutional neural networks), the kind of network that is really popular
among computer vision developers today
Andrew Ng he discovered that GPUs make deep learning faster.
He taught 2 famous online courses, Machine Learning and Deep Learning specialization at Coursera.
Ian Goodfellow, he invented
GANs (Generative Adversarial Networks), is an OpenAI engineer
David Silver this is
the guy behind AlphaGo and Artari reinforcement learning game agents at DeepMind
Demis Hassabis CEO of
DeepMind, has given a lot of talks about AlphaGo and Reinforcement Learning
achievements they have
Andrej Karparthy he teaches convnet
classes, wrote ConvNetJS, and produces a lot of content for DL community, he
also writes a blog (see Nice Blogs & Vlogs to Follow section)