Simplifying lifecycle management for machine learning (ML) and deep learning (DL) is critical for the success of AI applications. From data ingestion and preprocessing to model training and evaluation to model deploying and serving, each stage in the ML/DL workflow relies on significant effort from the AI platform managed and monitored by data engineers. Kubernetes automates deployment, scaling, and management of containerized applications. Building AI workflow as pipelines and deploying pipelines on Kubernetes provides the same benefit and further improves the reproducibility and collaboration in AI workflow.
Weiqiang Zhuang and Huaxin Gao showcase a couple of pipeline examples using Kubeflow pipelines. The IBM Watson Machine Learning service is used for data processing, model training, and serving. Once you understand how AI pipelines can help, Weiqiang and Huaxin compare some open source projects that provide pipelines capability, including Argo, Airflow, MLflow, and Pachyderm, and explore criteria for a good AI pipeline platform. You’ll also get a sneak peak at the project they’re working on right now.
Weiqiang Zhuang is a senior software engineer in IBM’s Open Source Data and AI Group focusing on building a cloud native pipeline solution for AI workflows. He was the tech lead of the BigR machine learning project built on top of Hadoop and has contributed to Apache Spark, MLflow, Kubeflow, Apache SystemML, and R4ML. He was also one of the core engineers for DB2’s process model component.
Huaxin Gao is a software engineer in IBM’s Open Source Data and AI Group focusing on Apache Spark machine learning and deep learning. She’s an active code contributor to the Apache Spark project.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org