Analytics Zoo:基于Apache Spark的生产级别分布式TensorFlow(Analytics Zoo: Distributed TensorFlow in production on Apache Spark)

此演讲使用中文 (This will be presented in Chinese)

Yang Wang (Intel)
11:1511:55 Friday, June 21, 2019
实施人工智能 (Implementing AI)
Location: 报告厅(Auditorium)

必要预备知识 (Prerequisite Knowledge)

  • A basic understanding of deep learning and Spark (useful but not required)

您将学到什么 (What you'll learn)

  • Learn that the TensorFlow program, both training and inference, can run on Apache Spark with ease of use and scalability, allowing the combination of deep learning and big data
  • Understand that Analytics Zoo, a unified analytics and AI platform for distributed TensorFlow, Keras, and BigDL on Apache Spark, is designed for production environment, enabling easy deployment, high performance, and efficient model serving for deep learning applications

描述 (Description)

Building a model is fun and exciting; putting it to production is a different story. While TensorFlow focuses on building a model, a complete DL/ML system always needs a robust infrastructure platform for data ingestion, feature extraction, and pipeline management—making Apache Spark a perfect candidate. In recent releases, TensorFlow has been enhanced for distributed learning and Hadoop distributed file system (HDFS) access. Several community projects are also wiring TensorFlow onto Apache Spark clusters. While these approaches are a step in the right direction, they usually require extracomplicated deployment steps or error-prone interprocess communication.

Yang Wang offers an overview of Analytics Zoo, a unified analytics and AI platform for distributed TensorFlow, Keras, and BigDL on Apache Spark. This new framework enables easy experimentation for algorithm designs and supports training and inference on Spark clusters with ease of use and near-linear scalability. Compared with other frameworks, Analytics Zoo is designed to serve in production environments with minimum or even zero deployment effort on vanilla Spark clusters; offer high performance by intraprocess communication and optimized parameter synchronization; provide a rich choice of inference patterns, including low-latency local plain old Java object (POJO), high-throughput batching, and streaming; and supply a variety of reference use cases and preprocessing utilities.

Join Yang to learn Analytics Zoo’s tech details as he walks you through examples outlining the key capabilities. Along the way, you’ll discover how to transform, with a few extra lines, an existing TensorFlow algorithm into a Spark application and integrate it with the big data world.

建立模型既有趣又令人兴奋的事,但将其部署到生产环境则是一个不同的故事。虽然TensorFlow专注于构建模型,但完整的深度学习和机器学习系统始终需要一个强壮的基础架构平台,用于数据导入、特征提取和流水线管理。而Apache Spark就成为完美的候选者。在最近发布的版本中,TensorFlow已经针对分布式学习和HDFS访问进行了增强。一些社区项目也将TensorFlow连接到Apache Spark集群。虽然这些方法是朝着正确方向迈出的一步,但它们通常需要非常复杂的部署步骤或容易出错的进程间通信。

本议题将介绍Analytics Zoo,一个统一的“分析+人工智能”平台,实现了运行在Apache Spark上的分布式TensorFlow、Keras和BigDL。这个新框架可以轻松地实现算法设计的试验,支持在Spark集群的训练和推断,并易于使用和近线性的扩展。与其他框架相比,Analytics Zoo被设计于在生产环境上提供服务:


讲师将介绍Analytics Zoo的技术细节,并通过多个案例概述一些关键功能。听众可以学到如何通过只添加一点代码就能把现有的TensorFlow算法转换成Spark应用并与大数据世界集成。

Photo of Yang Wang

Yang Wang


Yang Wang is a machine learning engineer on the data analytics team at Intel, focusing on deep learning infrastructure, algorithms, and applications. He’s one of the core contributors of Analytics Zoo and BigDL.

Yang Wang是英特尔数据分析团队的机器学习工程师,专注于深度学习基础架构、算法和应用。他是Analytics Zoo和BigDL的核心贡献者之一。