Analytics Zoo: Distributed TensorFlow in production on Apache Spark

This will be presented in English.

Yang Wang (Intel)
11:1511:55 Friday, June 21, 2019

必要预备知识 (Prerequisite Knowledge)

  • A basic understanding of deep learning and Spark (preferred)

您将学到什么 (What you'll learn)

  • Learn that the TensorFlow program, both training and inference, can run on Apache Spark with ease of use and scalability, allowing the combination of deep learning and big data
  • Understand that Analytics Zoo, a unified analytics and AI platform for distributed TensorFlow, Keras, and BigDL on Apache Spark, is designed for production environment, enabling easy deployment, high performance, and efficient model serving for deep learning applications

描述 (Description)

Building a model is fun and exciting; putting it to production is always a different story. While TensorFlow focuses on building a model, a complete DL/ML system always needs a robust infrastructure platform for data ingestion, feature extraction, and pipeline management, where Apache Spark becomes a perfect candidate. In recent releases, TensorFlow has been enhanced for distributed learning and Hadoop distributed file system (HDFS) access. Several community projects are also wiring TensorFlow onto Apache Spark clusters. While these approaches are a step in the right direction, they usually require extra-complicated deployment steps or error-prone interprocess communication.

Yang Wang introduces Analytics Zoo, a unified analytics and AI platform for distributed TensorFlow, Keras, and BigDL on Apache Spark. This new framework enables easy experimentation for algorithm designs and supports training and inference on Spark clusters with ease of use and near-linear scalability. Compared with other frameworks, Analytics Zoo is designed to serve in production environments with minimum or even zero deployment effort on vanilla Spark Cluster; high performance by intraprocess communication and optimized parameter synchronization; rich choices on inference patterns, including low-latency local plain old Java object (POJO), high throughput batching, and streaming; and a variety of reference use cases and preprocessing utilities. Learn the tech details within Analytics Zoo as he walks you through multiple examples outlining the key capabilities. Discover how an existing TensorFlow algorithm, with a few extra lines, can be transformed into a Spark application and integrated with the big data world.

建立模型既有趣又令人兴奋的事,但将其部署到生产环境则是一个不同的故事。虽然TensorFlow专注于构建模型,但完整的深度学习和机器学习系统始终需要一个强壮的基础架构平台,用于数据导入、特征提取和流水线管理。而Apache Spark就成为完美的候选者。在最近发布的版本中,TensorFlow已经针对分布式学习和HDFS访问进行了增强。一些社区项目也将TensorFlow连接到Apache Spark集群。虽然这些方法是朝着正确方向迈出的一步,但它们通常需要非常复杂的部署步骤或容易出错的进程间通信。

本议题将介绍Analytics Zoo,一个统一的“分析+人工智能”平台,实现了运行在Apache Spark上的分布式TensorFlow、Keras和BigDL。这个新框架可以轻松地实现算法设计的试验,支持在Spark集群的训练和推断,并易于使用和近线性的扩展。与其他框架相比,Analytics Zoo被设计于在生产环境上提供服务:


讲师将介绍Analytics Zoo的技术细节,并通过多个案例概述一些关键功能。听众可以学到如何通过只添加一点代码就能把现有的TensorFlow算法转换成Spark应用并与大数据世界集成。

Photo of Yang Wang

Yang Wang


Yang Wang is a machine learning engineer on the Intel data analytics team, focusing on deep learning infrastructure, algorithms, and applications. He’s one of the core contributors of Analytics Zoo and BigDL.

Yang Wang是英特尔数据分析团队的机器学习工程师,专注于深度学习基础架构、算法和应用。他是Analytics Zoo和BigDL的核心贡献者之一。

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)