Presented By O’Reilly and Intel AI
Put AI to work
April 10-11, 2018: Training
April 11-13, 2018: Tutorials & Conference
Beijing, CN

Introducing Spark NLP: State-of-the-art natural language processing at scale

This will be presented in English.

David Talby (Pacific AI)
14:0014:40 Thursday, April 12, 2018
Secondary topics:  自然语言与语音技术(Natural Language and Speech Technologies)

必要预备知识 (Prerequisite Knowledge)

  • Basic familiarity with Spark, Python, and machine learning

您将学到什么 (What you'll learn)

  • Learn what the Spark NLP library is, how and why it is designed, and which use cases it enables

描述 (Description)

Natural language processing is a key component in many data science systems that must understand or reason about text. Common use cases include question answering, paraphrasing or summarization, sentiment analysis, natural language BI, language modeling, and disambiguation. Building such systems usually requires combining three types of software libraries: NLP annotation frameworks, machine learning frameworks, and deep learning frameworks.

David Talby offers an overview of the NLP library for Apache Spark, which natively extends Spark ML’s pipeline APIs, enabling zero-copy, distributed, combined NLP and ML pipelines that leverage all of Spark’s built-in optimizations. The library implements core NLP algorithms, including lemmatization, part-of-speech tagging, dependency parsing, named-entity recognition, spell checking, and sentiment detection. David then demonstrates how to use these algorithms to build commonly used pipelines, using PySpark on notebooks that will be made publicly available after the talk.

Photo of David Talby

David Talby

Pacific AI

David Talby is a chief technology officer at Pacific AI, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, Agile, distributed teams. Previously, he was with Microsoft’s Bing Group, where he led business operations for Bing Shopping in the US and Europe, and worked at Amazon both in Seattle and the UK, where he built and ran distributed teams that helped scale Amazon’s financial systems. David holds a PhD in computer science and master’s degrees in both computer science and business administration.