Building a model is fun and exciting; putting it to production is a different story. While TensorFlow focuses on building a model, a complete DL/ML system always needs a robust infrastructure platform for data ingestion, feature extraction, and pipeline management—making Apache Spark a perfect candidate. In recent releases, TensorFlow has been enhanced for distributed learning and Hadoop distributed file system (HDFS) access. Several community projects are also wiring TensorFlow onto Apache Spark clusters. While these approaches are a step in the right direction, they usually require extracomplicated deployment steps or error-prone interprocess communication.
Yang Wang offers an overview of Analytics Zoo, a unified analytics and AI platform for distributed TensorFlow, Keras, and BigDL on Apache Spark. This new framework enables easy experimentation for algorithm designs and supports training and inference on Spark clusters with ease of use and near-linear scalability. Compared with other frameworks, Analytics Zoo is designed to serve in production environments with minimum or even zero deployment effort on vanilla Spark clusters; offer high performance by intraprocess communication and optimized parameter synchronization; provide a rich choice of inference patterns, including low-latency local plain old Java object (POJO), high-throughput batching, and streaming; and supply a variety of reference use cases and preprocessing utilities.
Join Yang to learn Analytics Zoo’s tech details as he walks you through examples outlining the key capabilities. Along the way, you’ll discover how to transform, with a few extra lines, an existing TensorFlow algorithm into a Spark application and integrate it with the big data world.
建立模型既有趣又令人兴奋的事，但将其部署到生产环境则是一个不同的故事。虽然TensorFlow专注于构建模型，但完整的深度学习和机器学习系统始终需要一个强壮的基础架构平台，用于数据导入、特征提取和流水线管理。而Apache Spark就成为完美的候选者。在最近发布的版本中，TensorFlow已经针对分布式学习和HDFS访问进行了增强。一些社区项目也将TensorFlow连接到Apache Spark集群。虽然这些方法是朝着正确方向迈出的一步，但它们通常需要非常复杂的部署步骤或容易出错的进程间通信。
本议题将介绍Analytics Zoo，一个统一的“分析+人工智能”平台，实现了运行在Apache Spark上的分布式TensorFlow、Keras和BigDL。这个新框架可以轻松地实现算法设计的试验，支持在Spark集群的训练和推断，并易于使用和近线性的扩展。与其他框架相比，Analytics Zoo被设计于在生产环境上提供服务：
Yang Wang is a machine learning engineer on the data analytics team at Intel, focusing on deep learning infrastructure, algorithms, and applications. He’s one of the core contributors of Analytics Zoo and BigDL.
Yang Wang是英特尔数据分析团队的机器学习工程师，专注于深度学习基础架构、算法和应用。他是Analytics Zoo和BigDL的核心贡献者之一。
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com