This repo has been deprecated. Please see Detectron, which includes an implementation of Mask R-CNN.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
By Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun at Microsoft Research
Introduction
Faster R-CNN is an object detection framework based on deep convolutional networks, which includes a Region Proposal Network (RPN) and an Object Detection Network. Both networks are trained for sharing convolutional layers for fast testing.
Faster R-CNN is released under the MIT License (refer to the LICENSE file for details).
Citing Faster R-CNN
If you find Faster R-CNN useful in your research, please consider citing:
@article{ren15fasterrcnn,
Author = {Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun},
Title = {{Faster R-CNN}: Towards Real-Time Object Detection with Region Proposal Networks},
Journal = {arXiv preprint arXiv:1506.01497},
Year = {2015}
}
Main Results
training data
test data
mAP
time/img
Faster RCNN, VGG-16
VOC 2007 trainval
VOC 2007 test
69.9%
198ms
Faster RCNN, VGG-16
VOC 2007 trainval + 2012 trainval
VOC 2007 test
73.2%
198ms
Faster RCNN, VGG-16
VOC 2012 trainval
VOC 2012 test
67.0%
198ms
Faster RCNN, VGG-16
VOC 2007 trainval&test + 2012 trainval
VOC 2012 test
70.4%
198ms
Note: The mAP results are subject to random variations. We have run 5 times independently for ZF net, and the mAPs are 59.9 (as in the paper), 60.4, 59.5, 60.1, and 59.5, with a mean of 59.88 and std 0.39.
Caffe build for Faster R-CNN (included in this repository, see external/caffe)
If you are using Windows, you may download a compiled mex file by running fetch_data/fetch_caffe_mex_windows_vs2013_cuda65.m
If you are using Linux or you want to compile for Windows, please follow the instructions on our Caffe branch.
MATLAB
Requirements: hardware
GPU: Titan, Titan Black, Titan X, K20, K40, K80.
Region Proposal Network (RPN)
2GB GPU memory for ZF net
5GB GPU memory for VGG-16 net
Object Detection Network (Fast R-CNN)
3GB GPU memory for ZF net
8GB GPU memory for VGG-16 net
Preparation for Testing:
Run fetch_data/fetch_caffe_mex_windows_vs2013_cuda65.m to download a compiled Caffe mex (for Windows only).
Run faster_rcnn_build.m
Run startup.m
Testing Demo:
Run fetch_data/fetch_faster_rcnn_final_model.m to download our trained models.
Run experiments/script_faster_rcnn_demo.m to test a single demo image.
You will see the timing information as below. We get the following running time on K40 @ 875 MHz and Intel Xeon CPU E5-2650 v2 @ 2.60GHz for the demo images with VGG-16:
001763.jpg (500x375): time 0.201s (resize+conv+proposal: 0.150s, nms+regionwise: 0.052s)
004545.jpg (500x375): time 0.201s (resize+conv+proposal: 0.151s, nms+regionwise: 0.050s)
000542.jpg (500x375): time 0.192s (resize+conv+proposal: 0.151s, nms+regionwise: 0.041s)
000456.jpg (500x375): time 0.202s (resize+conv+proposal: 0.152s, nms+regionwise: 0.050s)
001150.jpg (500x375): time 0.194s (resize+conv+proposal: 0.151s, nms+regionwise: 0.043s)
mean time: 0.198s
and with ZF net:
001763.jpg (500x375): time 0.061s (resize+conv+proposal: 0.032s, nms+regionwise: 0.029s)
004545.jpg (500x375): time 0.063s (resize+conv+proposal: 0.034s, nms+regionwise: 0.029s)
000542.jpg (500x375): time 0.052s (resize+conv+proposal: 0.034s, nms+regionwise: 0.018s)
000456.jpg (500x375): time 0.062s (resize+conv+proposal: 0.034s, nms+regionwise: 0.028s)
001150.jpg (500x375): time 0.058s (resize+conv+proposal: 0.034s, nms+regionwise: 0.023s)
mean time: 0.059s
The visual results might be different from those in the paper due to numerical variations.
Running time on other GPUs
GPU / mean time
VGG-16
ZF
K40
198ms
59ms
Titan Black
174ms
56ms
Titan X
151ms
59ms
Preparation for Training:
Run fetch_data/fetch_model_ZF.m to download an ImageNet-pre-trained ZF net.
Run fetch_data/fetch_model_VGG16.m to download an ImageNet-pre-trained VGG-16 net.
Download VOC 2007 and 2012 data to ./datasets
Training:
Run experiments/script_faster_rcnn_VOC2007_ZF.m to train a model with ZF net. It runs four steps as follows:
Train RPN with conv layers tuned; compute RPN results on the train/test sets.
Train Fast R-CNN with conv layers tuned using step-1 RPN proposals; evaluate detection mAP.
Train RPN with conv layers fixed; compute RPN results on the train/test sets.
Train Fast R-CNN with conv layers fixed using step-3 RPN proposals; evaluate detection mAP.
Note: the entire training time is ~12 hours on K40.
Run experiments/script_faster_rcnn_VOC2007_VGG16.m to train a model with VGG net.
Note: the entire training time is ~2 days on K40.
Check other scripts in ./experiments for more settings.
Resources
Note: This documentation may contain links to third party websites, which are provided for your convenience only. Such third party websites are not under Microsoft’s control. Microsoft does not endorse or make any representation, guarantee or assurance regarding any third party website, content, service or product. Third party websites may be subject to the third party’s terms, conditions, and privacy statements.
请发表评论