Presented By O’Reilly and Intel AI
Put AI to work
April 10-11, 2018: Training
April 11-13, 2018: Tutorials & Conference
Beijing, CN

Practical considerations when shifting to using deep learning for your text data

This will be presented in English.

Emmanuel Ameisen (Stripe), Yan Kou (Insight Data Science)
13:1013:50 Thursday, April 12, 2018
Secondary topics:  自然语言与语音技术(Natural Language and Speech Technologies)

您将学到什么 (What you'll learn)

Understand how to quickly prototype ways to understand and leverage your text data

描述 (Description)

Most companies in industry collect and leverage text data for some part of their business operations. Some, such as Yelp and Twitter, have text data at the core of their platform while most others utilize it behind the scenes, triaging and responding to support requests and customer feedback. Top companies have achieved incredible performance by switching to deep learning methods for text analysis. Companies making this shift, though, typically encounter a set of challenges which include determining which models to spend their time and money on, how to validate and explain model performance, and how model complexity affects the ease of deploying them. Examples of such business challenges include:

  • How do you automatically make the distinction between different categories of sentences?
  • How can you find sentences in a dataset that are most similar to a given one?
  • How can you extract a rich and concise representation that can then be used for a range of other tasks?
  • Most importantly, how do you find quickly whether these tasks are feasible on your dataset at all?

Drawing on research gathered from conversations with 75+ teams from Google, Facebook, Amazon, Twitter, Salesforce, Airbnb, Capital One, Bloomberg, and others, Emmanuel Ameisen and Yan Kou share a guide for moving your company from traditional machine learning approaches, such as logistic regression on bag-of-words features to more expressive deep learning models, such as convolutional neural networks and recurrent neural networks. These new techniques allow companies to improve many of the core algorithmic concerns that underlie a majority of key business operations, such as clustering (e.g., to identify topics in articles) and classification (e.g., to automatically forward support requests to the appropriate person). You’ll learn the trade-offs of different models in terms of power, complexity, and interpretability and understand how to choose the ones most appropriate for your projects.

Photo of Emmanuel Ameisen

Emmanuel Ameisen


Emmanuel Ameisen, a machine learning engineer at Stripe, implemented and deployed predictive analytics and machine learning solutions for Local Motion and Zipcar. Recently, he led Insight Data Science’s AI program, directing more than a hundred machine learning projects. Emmanuel holds graduate degrees in artificial intelligence, computer engineering, and management from three of France’s top schools.

Photo of Yan  Kou

Yan Kou

Insight Data Science

Yan Kou is the director of product at Insight Data Science, which over her tenure instituted the first in the market professional education program on data science in healthcare in the US. Over the past two years, Yan has directed 80+ data science projects on topics including consumer genomics, electronic medical records, natural language processing, deep learning, medical images, and wearables. Yan’s team is an official partner of Y Combinator and has partnered with many leading healthcare organizations, including Massachusetts General Hospital, Optum, the Broad Institute, Flatiron Health, Biogen, and many more. Yan has a background in human genomics and five years experience in data science and machine learning. Her research on complex human diseases such as cancer and autism has resulted in more than 2,000 citations. Yan was nominated as one of Forbes’s 30 under 30 in 2013.