• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    公众号

gpushare-scheduler-extender: GPU sharing solution on native Kubernetes: it is ba ...

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称:

gpushare-scheduler-extender

开源软件地址:

https://gitee.com/AliyunContainerService/gpushare-scheduler-extender

开源软件介绍:

GPU Sharing Scheduler Extender in Kubernetes

CircleCIBuild StatusGo Report Card

Overview

More and more data scientists run their Nvidia GPU based inference tasks on Kubernetes. Some of these tasks can be run on the same Nvidia GPU device to increase GPU utilization. So one important challenge is how to share GPUs between the pods. The community is also very interested in this topic.

Now there is a GPU sharing solution on native Kubernetes: it is based on scheduler extenders and device plugin mechanism, so you can reuse this solution easily in your own Kubernetes.

Prerequisites

  • Kubernetes 1.11+
  • golang 1.10+
  • NVIDIA drivers ~= 361.93
  • Nvidia-docker version > 2.0 (see how to install and it's prerequisites)
  • Docker configured with Nvidia as the default runtime.

Design

For more details about the design of this project, please read this Design document.

Setup

You can follow this Installation Guide. If you are using Alibaba Cloud Kubernetes, please follow this doc to install with Helm Charts.

User Guide

You can check this User Guide.

Developing

Scheduler Extender

git clone https://github.com/AliyunContainerService/gpushare-scheduler-extender.git && cd gpushare-scheduler-extenderdocker build -t cheyang/gpushare-scheduler-extender .

Device Plugin

git clone https://github.com/AliyunContainerService/gpushare-device-plugin.git && cd gpushare-device-plugindocker build -t cheyang/gpushare-device-plugin .

Kubectl Extension

  • golang > 1.10
mkdir -p $GOPATH/src/github.com/AliyunContainerServicecd $GOPATH/src/github.com/AliyunContainerServicegit clone https://github.com/AliyunContainerService/gpushare-device-plugin.gitcd gpushare-device-plugingo build -o $GOPATH/bin/kubectl-inspect-gpushare-v2 cmd/inspect/*.go

Demo

- Demo 1: Deploy multiple GPU Shared Pods and schedule them on the same GPU device in binpack way

- Demo 2: Avoid GPU memory requests that fit at the node level, but not at the GPU device level

Related Project

Roadmap

  • Integrate Nvidia MPS as the option for isolation
  • Automated Deployment for the Kubernetes cluster which is deployed by kubeadm
  • Scheduler Extener High Availablity
  • Generic Solution for GPU, RDMA and other devices

Acknowledgments

  • GPU sharing solution is based on Nvidia Docker2, and their gpu sharing design is our reference. The Nvidia Community is very supportive and We are very grateful.

鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
热门话题
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap