开源软件名称(OpenSource Name): csarron/awesome-emdl开源软件地址(OpenSource Url): https://github.com/csarron/awesome-emdl开源编程语言(OpenSource Language): 开源软件介绍(OpenSource Introduction): EMDL
Embedded and mobile deep learning research notes.
Papers
Survey
EfficientDNNs [Repo]
Awesome ML Model Compression [Repo]
TinyML Papers and Projects [Repo]
TinyML Platforms Benchmarking [arXiv '21]
TinyML: A Systematic Review and Synthesis of Existing Research [ICAIIC '21]
TinyML Meets IoT: A Comprehensive Survey [Internet of Things '21]
A review on TinyML: State-of-the-art and prospects [Journal of King Saud Univ. '21]
TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers [IEEE '21]
Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better [arXiv '21]
Benchmarking TinyML Systems: Challenges and Direction [arXiv '20]
Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey [IEEE '20]
The Deep Learning Compiler: A Comprehensive Survey [arXiv '20]
Recent Advances in Efficient Computation of Deep Convolutional Neural Networks [arXiv '18]
A Survey of Model Compression and Acceleration for Deep Neural Networks [arXiv '17]
Model
EtinyNet: Extremely Tiny Network for TinyML [AAAI '21]
MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [NeurIPS '21, MIT]
SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems [MLSys '20, IBM]
Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets [NeurIPS '20, Huawei]
MCUNet: Tiny Deep Learning on IoT Devices [NeurIPS '20, MIT]
GhostNet: More Features from Cheap Operations [CVPR '20, Huawei]
MicroNet for Efficient Language Modeling [NeurIPS '19, MIT]
Searching for MobileNetV3 [ICCV '19, Google]
MobilenetV2: Inverted Residuals and Linear Bottlenecks: Mobile Networks for
Classification, Detection and Segmentation [CVPR '18, Google]
ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware [arXiv '18, MIT]
DeepRebirth: Accelerating Deep Neural Network Execution on Mobile Devices [AAAI'18, Samsung]
NasNet: Learning Transferable Architectures for Scalable Image Recognition [arXiv '17, Google]
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices [arXiv '17, Megvii]
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [arXiv '17, Google]
CondenseNet: An Efficient DenseNet using Learned Group Convolutions [arXiv '17]
System
BSC: Block-based Stochastic Computing to Enable Accurate and Efficient TinyML [ASP-DAC '22]
CFU Playground: Full-Stack Open-Source Framework for Tiny Machine Learning (tinyML) Acceleration on FPGAs [arXiv '22, Google]
UDC: Unified DNAS for Compressible TinyML Models [arXiv '22, Arm]
AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator [arXiv '21, Arm]
TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning [NeurIPS '20, MIT]
Once for All: Train One Network and Specialize it for Efficient Deployment [ICLR '20, MIT]
DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications [MobiSys '17]
DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware [MobiSys '17]
MobiRNN: Efficient Recurrent Neural Network Execution on Mobile GPU [EMDL '17]
fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs [NIPS '17]
DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile devices [WearSys '16]
DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices [IPSN '16]
EIE: Efficient Inference Engine on Compressed Deep Neural Network [ISCA '16]
MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints [MobiSys '16]
DXTK: Enabling Resource-efficient Deep Learning on Mobile and Embedded Devices with the DeepX Toolkit [MobiCASE '16]
Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables [SenSys ’16]
An Early Resource Characterization of Deep Learning on Wearables, Smartphones and Internet-of-Things Devices [IoT-App ’15]
CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android [MM '16]
Quantization
Quantizing deep convolutional networks for efficient inference: A whitepaper [arXiv '18]
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks [ECCV'18]
Training and Inference with Integers in Deep Neural Networks [ICLR'18]
The ZipML Framework for Training Models with End-to-End Low Precision: The Cans, the Cannots, and a Little Bit of Deep Learning [ICML'17]
Loss-aware Binarization of Deep Networks [ICLR'17]
Towards the Limit of Network Quantization [ICLR'17]
Deep Learning with Low Precision by Half-wave Gaussian Quantization [CVPR'17]
ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks [arXiv'17]
Quantized Convolutional Neural Networks for Mobile Devices [CVPR '16]
Fixed-Point Performance Analysis of Recurrent Neural Networks [ICASSP'16]
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations [arXiv'16]
Compressing Deep Convolutional Networks using Vector Quantization [arXiv'14]
Pruning
Awesome-Pruning [Repo]
Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration [CVPR'19]
To prune, or not to prune: exploring the efficacy of pruning for model compression [ICLR'18]
Pruning Filters for Efficient ConvNets [ICLR'17]
Pruning Convolutional Neural Networks for Resource Efficient Inference [ICLR'17]
Soft Weight-Sharing for Neural Network Compression [ICLR'17]
Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning [CVPR'17]
ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression [ICCV'17]
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding [ICLR'16]
Dynamic Network Surgery for Efficient DNNs [NIPS'16]
Learning both Weights and Connections for Efficient Neural Networks [NIPS'15]
Approximation
High performance ultra-low-precision convolutions on mobile devices [NIPS'17]
Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications [ICLR'16]
Efficient and Accurate Approximations of Nonlinear Convolutional Networks [CVPR'15]
Accelerating Very Deep Convolutional Networks for Classification and Detection (Extended version of above one)
Convolutional neural networks with low-rank regularization [arXiv'15]
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation [NIPS'14]
Characterization
A First Look at Deep Learning Apps on Smartphones [WWW'19]
Machine Learning at Facebook: Understanding Inference at the Edge [HPCA'19]
NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications [ECCV 2018]
Latency and Throughput Characterization of Convolutional Neural Networks for Mobile Computer Vision [MMSys’18]
Libraries
Inference Framework
Alibaba - MNN - is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba.
Apple - CoreML - is integrate machine learning models into your app. BERT and GPT-2 on iPhone
Arm - ComputeLibrary - is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies. Intro
Arm - Arm NN - is the most performant machine learning (ML) inference engine for Android and Linux, accelerating ML on Arm Cortex-A CPUs and Arm Mali GPUs.
Baidu - Paddle Lite - is multi-platform high performance deep learning inference engine.
DeepLearningKit - is Open Source Deep Learning Framework for Apple's iOS, OS X and tvOS.
Edge Impulse - Interactive platform to generate models that can run in microcontrollers. They are also quite active on social netwoks talking about recent news on EdgeAI/TinyML.
Google - TensorFlow Lite - is an open source deep learning framework for on-device inference.
Intel - OpenVINO - Comprehensive toolkit to optimize your processes for faster inference.
JDAI Computer Vision - dabnn - is an accelerated binary neural networks inference framework for mobile platform.
Meta - PyTorch Mobile - is a new framework for helping mobile developers and machine learning engineers embed PyTorch ML models on-device.
Microsoft - DeepSpeed - is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Microsoft - ELL - allows you to design and deploy intelligent machine-learned models onto resource constrained platforms and small single-board computers, like Raspberry Pi, Arduino, and micro:bit.
Microsoft - ONNX RUntime - cross-platform, high performance ML inferencing and training accelerator.
Nvidia - TensorRT - is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
OAID - Tengine - is a lite, high performance, modular inference engine for embedded device
Qualcomm - Neural Processing SDK for AI - Libraries to developers run NN models on Snapdragon mobile platforms taking advantage of the CPU, GPU and/or DSP.
Tencent - ncnn - is a high-performance neural network inference framework optimized for the mobile platform.
uTensor - AI inference library based on mbed (an RTOS for ARM chipsets) and TensorFlow.
XiaoMi - Mace - is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
xmartlabs - Bender - Easily craft fast Neural Networks on iOS! Use TensorFlow models. Metal under the hood.
Optimization Tools
Neural Network Distiller - Python package for neural network compression research.
PocketFlow - An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.
Research Demos
RSTensorFlow - GPU Accelerated TensorFlow for Commodity Android Devices.
Web
mil-tokyo/webdnn - Fastest DNN Execution Framework on Web Browser.
General
Caffe2 AICamera
TensorFlow Android Camera Demo
TensorFlow iOS Example
TensorFlow OpenMV Camera Module
Edge / Tiny MLOps
Tiny-MLOps: a framework for orchestrating ML applications at the far edge of IoT systems [EAIS '22]
MLOps for TinyML: Challenges & Directions in Operationalizing TinyML at Scale [TinyML Talks '22]
TinyMLOps: Operational Challenges for Widespread Edge AI Adoption [arXiv '22]
A TinyMLaaS Ecosystem for Machine Learning in IoT: Overview and Research Challenges [VLSI-DAT '21]
SOLIS: The MLOps journey from data acquisition to actionable insights [arXiv '21]
Edge MLOps: An Automation Framework for AIoT Applications [IC2E '21]
SensiX++: Bringing MLOPs and Multi-tenant Model Serving to Sensory Edge Devices [arXiv '21, Nokia]
Vulkan
Vulkan API Examples and Demos
Neural Machine Translation on Android
OpenCL
DeepMon
RenderScript
Mobile_ConvNet: RenderScript CNN for Android
Tutorials
General
Squeezing Deep Learning Into Mobile Phones
Deep Learning – Tutorial and Recent Trends
Tutorial on Hardware Architectures for Deep Neural Networks
Efficient Convolutional Neural Network Inference on Mobile GPUs
NEON
NEON™ Programmer’s Guide
OpenCL
ARM® Mali™ GPU OpenCL Developer Guide , pdf
Optimal Compute on ARM Mali™ GPUs
GPU Compute for Mobile Devices
Compute for Mobile Devices Performance focused
Hands On OpenCL
Adreno OpenCL Programming Guide
Better OpenCL Performance on Qualcomm Adreno GPU
Courses
UW Deep learning systems
Berkeley Machine Learning Systems
Tools
GPU
Bifrost GPU architecture and ARM Mali-G71 GPU
Midgard GPU Architecture , ARM Mali-T880 GPU
Mobile GPU market share
Driver
[Adreno] csarron/qcom_vendor_binaries: Common Proprietary Qualcomm Binaries
[Mali] Fevax/vendor_samsung_hero2ltexx: Blobs from s7 Edge G935F
Related Repos
请发表评论