AI for Personal Default Prediction: Promises and Challenges Ahead

by Andras Fulop , 18.12.20

Information technology has been shaping consumer credit risk for decades. The Fair Isaac Corporation (FICO) introduced FICO scores in 1989, marking a milestone in moving the credit evaluation process from humans towards algorithms. The FICO score is a credit score that takes into account five areas to determine creditworthiness: payment history, current level of indebtedness, types of credit used, length of credit history, and new credit accounts. Today, it is used in more than 90% of the credit decisions made in the U.S.

With the increasing digitalisation of society in the last decade, there has been an explosion both in the collection of personal data and in the sophistication of the algorithms and computing capacity to process all this information. This clearly holds the potential to fundamentally impact the process of evaluating individuals’ creditworthiness. In this piece we summarize some of the lessons that can be gleaned from the recent academic literature arising from the application of powerful artificial intelligence (AI) techniques to consumer credit risk.

How can AI help?

The general finding is that newer AI tools indeed live up to their promise of improving the technology of credit screening. Traditional credit scoring models such as the one behind the FICO score are based on hard information sources from individuals’ financial accounts . A relatively small number of predictive variables are usually then combined using rating scorecards, or linear models such as logit regressions. Newer AI approaches go beyond such tools in at least two important aspects.

First, modern machine learning algorithms, such as tree-based methods or neural nets, allow flexible non-linear predictive relationships between predictors and the individuals’ credit risk, while tackling in-sample overfitting through sophisticated regularization methods. The overall message from the literature is that machine learning techniques tend to outperform traditional linear models, especially within higher risk groups. For instance, Walther et al. (2020), using a dataset of millions of US mortgages, showed that tree-based methods significantly outperform logit techniques. Albanesi and Vamossy (2019) compared several machine learning algorithms on data from the Experian credit bureau and found that their ensemble technique combining neural networks and gradient-boosted trees improves upon traditional models and the improvement is especially pronounced for consumers with low credit scores.

Second, digitalization and AI algorithms allow the use of new types of data. Using a dataset covering around 270,000 purchases at a German e-commerce company, Berg et al. (2020) analysed the predictive power of digital footprints (the information left behind by individuals while visiting a website, such as the device, the email address, the shopping hour, etc.). They found that the accuracy of a credit risk model built on digital footprint variables is comparable and complementary to that of traditional credit risk scores. An important implication of this result is that digital footprints can help in screening some borrowers with little credit history. Another source of information that AI algorithms can benefit from is unstructured user data in the form of text or images (for instance, information from social networks such as LinkedIn, Twitter or Facebook). Using data from Prosper, a crowdfunding platform, Netzer et al. (2019) find that supplementing the financial and demographic information with the textual information submitted by prospective borrowers substantially improves default prediction. Using a similar dataset, Iyer et al. (2016) found that the market interest rate that mirrors the information available to lenders on Prosper is a more precise predictor of defaults than traditional credit scores.

What are the implications of these new techniques on consumer welfare?

On a positive note, improvements in screening technology that are particularly pronounced among riskier groups and people with scant credit history [Berg et al (2020)] can decrease asymmetric information problems between borrowers and lenders and lead to increased access to credit. This feature can be especially useful in emerging countries with limited reach of the formal banking sector and hence a lack of traditional credit information for most consumers.

A further potential benefit of turning credit decisions over to algorithms is that human biases of loan officers – such as racism and in-group/out-group bias –can be short-circuited, leading to less discrimination. To examine this issue empirically, one first needs a precise definition of discrimination. Bartlett et al (2019) suggested using the interpretation of US courts, whereby any differential impact of the treatment of minority groups not related to ‘legitimate-business-necessity’ is deemed discriminatory. In the credit context, ‘legitimate-business-necessity’ essentially means variables that help in predicting default risk. Hence, to measure discrimination against minority groups, one would need to compare consumers from minority groups with peers from majority groups and the same credit risk. Given that empirically the credit risk of individuals is observed only imperfectly, this leads to an omitted variable problem. Bartlett et al. (2019) used an identification afforded by the pricing of mortgage credit risk by government-sponsored entities (GSEs) (Fannie Mae and Freddie Mac) in the US to deal with this issue. In particular, these GSEs use a predetermined grid that prices credit risk across loan-to-value and credit-score buckets. Given that the credit risk of conforming mortgages are insured by these GSEs, any access or price differences for borrowers within the same bucket are unrelated to creditworthiness. and fail to qualify as ‘legitimate business necessities’ but qualify as discrimination. Using this empirical strategy, Bartlett et al. (2019) found that FinTech algorithms also discriminate, but 40% less than face-to-face lenders in pricing mortgages. What’s more, FinTechs do not discriminate in loan approval, while face-to-face lenders do.

New challenges of AI algorithms

First, more precise screening algorithms tend to lead to more inequality in the cost of credit among consumers. This increase in dispersion is particularly pronounced among minorities and riskier borrowers [see Walther et al (2020)]. Shaping policies that improve the terms of credit for disadvantaged households in the presence of such improved screening technology is a crucial topic for the regulatory debate going forward.

Second, as Bartlett et al. (2019) pointed out, AI algorithms may also increase the performance of screening of consumers in non-‘legitimate-business-necessity’ dimensions. In particular, if a lender uses such algorithms to maximize profits unrelated to screening for credit risk and such profit-maximizing screening has a differential impact on protected minority groups, the company risks coming under the purview of anti-discriminatory legislation even if there is no personal bias against minorities in the algorithm. Further, the black-box nature of most AI algorithms increases the risk of such scenarios, as its functioning may not be clear to the humans operating it. Hence, a key challenge is the development of non-discriminatory AI algorithms. The future of AI algorithms in credit decisions is bright, but its human operators must be careful to ensure that they understand how it works and aim to reduce the risk of inequitable decisions to provide fair, accurate evaluations to all.