Presented By O’Reilly and Intel AI
Put AI to work
April 10-11, 2018: Training
April 11-13, 2018: Tutorials & Conference
Beijing, CN

基于Apache Spark的弹性调度在GPU/CPU异构环境中的深度学习应用

此演讲使用中文 (This will be presented in Chinese)

Yonggang Hu (IBM), Junfeng Liu (IBM), Feng Kuan (IBM Canada)
11:1511:55 Friday, April 13, 2018
实施人工智能 (Implementing AI)
Location: 多功能厅5A+B(Function Room 5A+B)
Secondary topics:  深度学习(Deep Learning)

必要预备知识 (Prerequisite Knowledge)

熟悉分布式TensorFlow、Caffe及Spark

您将学到什么 (What you'll learn)

学习构建和管理一个工业级别的深度学习集群

描述 (Description)

深度学习技术是从海量数据集中构建人工智能的关键技术。将Apache Spark与诸如Caffe, MXNet等深度学习框架集成之后,可以使得后者的学习阶段能够大规模并行化。与其它机群环境相比,Apache Spark大为简化了数据搬移的过程,因此它可以提供更好的资源共享,使得深度学习在分布式数据集上的计算更为简洁高效。但是这样的系统在企业部署的时候会仍然遇到不少问题。

我们将会分享我们在使用Apache Spark进行深度学习,特别是使用GPU的深度学习的方法,并且分享一些认知计算的实际案例。主要关于如何Apache Spark应用生产环境中更好地管理以及高效地使用GPU和CPU资源。

我们将会分享:

  • 一个面向Caffe/TensorFlow应用的分布式的CPU和GPU资源调度/负载均衡系统。该系统会根据GPU/CPU的硬件拓补结构专门进行资源分配,以达到更高的机群执行效率。
  • 如何根据资源的拓补结构动态地优化资源调整以提高整个机群的利用率:一个与深度学习引擎集成的自动根据GPU利用率和并行模式进行调整的调度器
  • 如何保障深度学习的服务质量(QoS):如何在深度学习框架的任务调度层面为预定义好运算拓补结构的深度学习任务合理分配GPU资源,以此来保证由任务优先级和机群的负载情况共同决定的服务质量(QoS)。
Photo of Yonggang Hu

Yonggang Hu

IBM

Yonggang Hu is a distinguished engineer and chief architect of platform computing at IBM. He has been working on distributed computing, grid, cloud, and big data for the past 20 years. Previously, Yonggang was vice president and application architect at JPMorgan Chase focusing on computational analytics and application infrastructure. Yonggang holds an MS in computer science from Peking University and an MBA from Cornell University.

Photo of Junfeng Liu

Junfeng Liu

IBM

刘俊峰,是IBM Platform Computing的软件架构师,关注于大数据平台的设计和实现,成功的向多个重要客户提供技术解决方案。

Photo of Feng Kuan

Feng Kuan

IBM Canada

Feng Kuan is an architect at IBM Canada focusing on Spark and AI development.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)