June 18-21, 2019
Beijing, CN

AI pipelines on container platforms

This will be presented in English.

14:5015:30 Friday, June 21, 2019

必要预备知识 (Prerequisite Knowledge)

  • A basic understanding of Docker, Kubernetes, Python, the Jupyter Notebook, deep learning, TensorFlow, model training, model serving, and model deployment

您将学到什么 (What you'll learn)

  • Discover how to build a pipeline platform that can automate as many steps of AI workflow as possible
  • Explore several open source projects related to AI pipelines, in particular the Kubeflow pipelines used in the demo

描述 (Description)

Simplifying lifecycle management for machine learning (ML) and deep learning (DL) is critical for the success of AI applications. From data ingestion and preprocessing to model training and evaluation to model deploying and serving, each stage in the ML/DL workflow relies on significant effort from the AI platform managed and monitored by data engineers. Kubernetes automates deployment, scaling, and management of containerized applications. Building AI workflow as pipelines and deploying pipelines on Kubernetes provides the same benefit and further improves the reproducibility and collaboration in AI workflow.

Weiqiang Zhuang and Huaxin Gao showcase a couple of pipeline examples using Kubeflow pipelines. The IBM Watson Machine Learning service is used for data processing, model training, and serving. Once you understand how AI pipelines can help, Weiqiang and Huaxin compare some open source projects that provide pipelines capability, including Argo, Airflow, MLflow, and Pachyderm, and explore criteria for a good AI pipeline platform. You’ll also get a sneak peak at the project they’re working on right now.




Weiqiang Zhuang is a senior software engineer in IBM’s Open Source Data and AI Group focusing on building a cloud native pipeline solution for AI workflows. He was the tech lead of the BigR machine learning project built on top of Hadoop and has contributed to Apache Spark, MLflow, Kubeflow, Apache SystemML, and R4ML. He was also one of the core engineers for DB2’s process model component.

Photo of Huaxin Gao

Huaxin Gao


Huaxin Gao is a software engineer in IBM’s Open Source Data and AI Group focusing on Apache Spark machine learning and deep learning. She’s an active code contributor to the Apache Spark project.