Fraud detection is one of the top priorities for companies like China Telecom. Except for risks such as identity theft, account theft, money laundry, merchant fraud, etc., there is another type of fraud that is a challenge in recent years. The fraudsters use certain kinds of techniques to register a large number of user accounts on the platform and manipulate those accounts so as to let the company believe that those accounts are authentic or real users. Each year in times like November 11, June 18, and May 25, companies like Bestpay, Alipay, and JD launch big promotions by releasing a large volume of coupons online to the market, trying to boost user activity and attract new users. The fraudsters will then use their fake users to get hold of a huge volume of coupons and work with some merchants to cash those coupons with fake transactions. Those fake users are not the target user, and the coupons they have are not meant to be within their reach. The company has a long history of fighting this kind of fraud, and by analyzing and mining the data behind this fraud, a large number of fraudsters’ accounts were successfully shut down and suspicious accounts are put on watch lists every day. The high similarity between a certain section of a group of users’ phone number digits, the number of different IP addresses used by same device ID, and the distribution of the login time period were some of the patterns found. But as fraudulent behavior becomes more and more deceptive and complex, you have to adapt responsively to new patterns of fraud to protect your assets. China Telecom has over 200 million registered users, with dozens of PB data comprised of the transaction logs, device information, account information, behavior logs, etc. It always results in up to tens of billions of feature space with high sparsity. That makes traditional statistical analyzing and most of machine learning models hard to play. The company found that AAE can be used to do representation learning on high dimensional data (in a nonlinear way).
Weisheng Xie dives deep into how a trained AAE model (three hidden layers for encoder/decoder and discriminator) on the unlabeled data extracts the latent vectors from the encoder and evaluates the representation with t-SNE; the latent vector was then fed to to a Gaussian mixture model (GMM) to come up with a number of clusters. The data within each cluster shows a strong latent connection. This method serves as an effective and efficient step, in which AAE helps capture the intrinsic features from the complex data (i.e., a good representation) to model the risk factors. With clustering, it helps us narrow a huge volume of intricate data down to a limited size of groups.
A quick test on the data gathered from one of the promotions in 2018, 37 of those groups were examined and the company found 609 accounts that were already in its blacklist, another 3100 accounts on its watch list, and found an unseen fraud pattern which is more valuable. One problem of AAE, though (and also most of the other deep learning networks), is that the results remain unexplainable, which means the company cannot directly use AAE to come up with decisions, but it’s shown that AAE has good representation of high dimensional data, which is exploited to have a good clustering and further analyze the data and discover certain undiscovered fraud patterns. Moreover, from the test result, it also helped cut the false alarm rate by 6%.
Vincent Xie (谢巍盛) is the chief scientist and director of China Telecom BestPay Co., Ltd. He builds the company’s artificial intelligence group and leads the team to carry out research related to big data and AI. Previously, he worked for Intel, leading an engineering team working on machine learning- and big data-related open source technologies.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com