Presented By

June 18-21, 2019
Beijing, CN

Analytics Zoo: Distributed TensorFlow in Production on Apache Spark

This will be presented in English.

Yang Wang (Intel)
11:1511:55 Friday, June 21, 2019

必要预备知识 (Prerequisite Knowledge)

No hard requirement, some basic ideas about deep learning and Spark are preferred.

您将学到什么 (What you'll learn)

1. TensorFlow program, both training and inference, can run on Apache Spark with ease of use and scalability, allowing the combination of Deep Learning and Big Data. 2. Analytics Zoo, a unified analytics + AI platform for distributed TensorFlow, Keras and BigDL on Apache Spark, is designed for production environment. It enables easy deployment, high performance and efficient model serving for deep learning applications.

描述 (Description)

Building a model is fun and exciting, putting it to production is always a different story. While TensorFlow focuses on building a model, a complete DL/ML system always needs a robust infrastructure platform for data ingestion, feature extraction and pipeline management, where Apache Spark becomes a perfect candidate. In recent releases, TensorFlow has been enhanced for distributed learning and HDFS access. Several community projects are also wiring TensorFlow onto Apache Spark clusters. While these approaches are a step in the right direction, they usually requires extra-complicated deployment steps or error-prone inter-process communication.

This session will introduce Analytics Zoo, a unified analytics + AI platform for distributed TensorFlow, Keras and BigDL on Apache Spark. This new framework enables easy experimentation for algorithm designs, and supports training and inference on Spark clusters with ease of use and near-linear scalability. Compared with other framework, Analytics Zoo is designed to serve in production environment:

1. minimum or even zero deployment effort on vanilla Spark Cluster;
2. high performance by intra-process communication and optimized parameter synchronization.
3. rich choices on inference pattern, including low latency local POJO, high throughput batching and Streaming.
4. a variety of reference use cases and preprocessing utilities.

The speakers will introduce the tech details within Analytics Zoo and walk through multiple examples to outline these key capabilities. Learn how an existing TensorFlow algorithm, with a few extra lines, can be transformed into a Spark application and integrated with the Big Data world.

Photo of Yang Wang

Yang Wang


Yang Wang is a machine learning engineer in Intel Data Analytics team, focusing on deep learning infrastructure, algorithms and applications. He is one of the core contributors of Analytics-Zoo and BigDL.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)