Undesired bias in machine learning has become a worrying topic due to the numerous high profile incidents that have been covered by the media. It is certainly a challenging topic, as it could even be said that the concept of societal bias is inherently biased in itself depending on an individual’s (or group’s) perspective. In this talk we avoid re-inventing the wheel, instead we use traditional methods to simplify this issue so it can be tackled from a practical perspective.
In this talk we will cover the high level definitions of bias in machine learning to remove ambiguity, and we will demistify it through a hands on example. Our objective will be to automate the loan approval process for a company using machine learning. This will allow us to go through this challenge step by step, using key tools and techniques from latest research that will allow us to assess and mitigate undesired bias in our machine learning models.
We will begin by providing a high level definition of undesired bias as two constituent parts: “a-priori societal bias” and “a-posteriori statistical bias”. We will provide tangible examples of how undesired bias is introduced in each step. This initial section will introduce very interesting research findings in this topic. Spolier alert: We will take a pragmatic approach, showing how any non-trivial system will always have an inherent bias, so the objective is not to remove bias, but to make sure 1) you can get as close as possible to your objectives, and 2) you can make sure your objectives are as close as possible to the “ideal solution”.
In this talk we introduce a pragmatic process to assess bias in machine learning models through three key steps: 1) Data analysis, 2) Inference result analysis, and 3) Production metrics analysis. For each of these three steps we will walk through a real life example. We will be tasked with the automation of a loan approval process. We will show how some bias may affect our results in a negative way, as well as how we can use various techniques to ensure we perform a reasonable analysis. Our objective is not to show how to completely remove bias from a machine learning model, but instead what are the tools and techniques available, as well as the key touch-points & metrics to ensure the right domain experts are involved.
We will cover fundamental topics in data science such as feature importance analysis, class imbalance assessment, model evaluation metrics, partial dependence, feature correlation, etc. More importantly, we will cover how these fundamentals can interact at different touch-points with the right domain experts to ensure undesired bias is identified and documented. All will be covered with a hands on example through a practical jupyter notebook experience.
Alejandro Saucedo is the Chief Scientist at The Institute for Ethical AI & Machine Learning. With over 10 years of software development experience, Alejandro has held technical leadership positions across hyper-growth scale-ups and tech giants including Eigen Technologies, Bloomberg LP and Hack Partners. Alejandro has a strong track record building multiple departments of machine learning engineers from scratch, and leading the delivery of numerous large-scale machine learning systems across the financial, insurance, legal, transport, manufacturing and construction sectors (in Europe, US and Latin America).
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org