Presented By O’Reilly and Intel AI
Put AI to work
April 10-11, 2018: Training
April 11-13, 2018: Tutorials & Conference
Beijing, CN

Extending Spark NLP: Training your own deep-learned natural language understanding models

此演讲使用中文 (This will be presented in Chinese)

David Talby (Pacific AI)
14:5015:30 Thursday, April 12, 2018
实施人工智能 (Implementing AI), 模型与方法 (Models and Methods)
Location: 紫金大厅B(Grand Hall B) Level: Intermediate
Secondary topics:  自然语言与语音技术(Natural Language and Speech Technologies)
Average rating: *****
(5.00, 1 rating)

必要预备知识 (Prerequisite Knowledge)

Basic knowledge of Spark, Python, and deep learning

您将学到什么 (What you'll learn)

Learn to train custom, domain-specific deep learning models for common NLP tasks on top of the Spark NLP library

描述 (Description)

Natural language is highly varied and nuanced. An SMS message, an academic paper, a patent filing, and an online news article all use different grammar, jargon, and implied semantics. This requires most real systems to train domain-specific models for the type of text and type of inferences the system must make. Of course, different models are also required for understanding text in multiple human languages. Most state-of-the-art algorithms in natural language processing are based on deep learning: word embeddings, bidirectional LSTMs, and hybrid neural network combinations that combine to achieve high-accuracy results.

David Talby explains how to train custom word embeddings, named entity recognition, and question-answering models on the NLP library for Apache Spark, which provides distributed implementations of these tasks as a native extension of Spark ML, taking advantage of Spark’s runtime performance optimization at scale.

This talk is intended to be an immediate follow-up to Introducing Spark NLP. David uses sample PySpark notebooks, which will be made publicly available after the talk.

Photo of David Talby

David Talby

Pacific AI

David Talby is a chief technology officer at Pacific AI, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, Agile, distributed teams. Previously, he was with Microsoft’s Bing Group, where he led business operations for Bing Shopping in the US and Europe. Earlier, he worked at Amazon both in Seattle and the UK, where he built and ran distributed teams that helped scale Amazon’s financial systems. David holds a PhD in computer science and master’s degrees in both computer science and business administration.