AI pipelines on container platform

此演讲使用中文 (This will be presented in Chinese)

14:5015:30 Friday, June 21, 2019

必要预备知识 (Prerequisite Knowledge)

  • A basic understanding of Docker, Kubernetes, Python, Jupyter Notebook, deep learning, TensorFlow, model training, model serving, and model deployment

您将学到什么 (What you'll learn)

  • Discover how to build a pipeline platform that can automate as many steps of AI workflow as possible
  • Learn about several open source projects related to AI pipelines, in particular the Kubeflow pipelines used to demo

描述 (Description)

Simplifying lifecycle management for machine learning (ML) and deep learning (DL) is critical for the success of AI applications. From data ingestion and preprocessing to model training and evaluation to model deploying and serving, each stage in the ML/DL workflow relies on significant effort from the AI platform managed and monitored by data engineers. Kubernetes automates deployment, scaling, and management of containerized applications. Building AI workflow as pipelines and deploying pipelines on Kubernetes provides the same benefit and further improves the reproducibility and collaboration in AI workflow.

Weiqiang Zhuang and Huaxin Gao showcase a couple of pipeline examples using Kubeflow pipelines. IBM Watson Machine Learning service is used for data processing, model training, and serving. Once you understand how AI Pipelines can help, Weiqiang and Huaxin compare some open source projects that provide pipelines capability, including Argo, Airflow, MLflow, Pachyderm, etc. They explore criteria for a good AI pipeline platform, and you’ll get a sneak peak at the project they’re working on.




Weiqiang Zhuang is a senior software engineer in IBM’s Open Source Data and AI Group with focus on building a cloud native pipeline solution for AI workflow. He was also the tech lead of the BigR machine learning project built on top of Hadoop. He has code contributions to Apache Spark, MLflow, Kubeflow, Apache SystemML, and R4ML. He was also one of the core engineers for DB2’s process model component.

Photo of Huaxin Gao

Huaxin Gao


Huaxin Gao is a software engineer in IBM Open Source Data and AI Group with focus on Apache Spark machine learning and deep learning. She’s an active code contributor to Apache Spark project.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)