Building your HR analytics strategy: how to succeed and what to watch out for

The discussion surrounding the digital transformation in management has moved from big data to machine learning to artificial intelligence at an astounding speed. Yet the gap between promise and reality remains wide: 41% of CEOs report that they are not at all prepared to use new data analytics tools, and only 4% say they are “to a large extent” prepared (1). In their recent article, ESSEC management professor Valery Yakubovich and his colleagues Peter Cappelli and Prasanna Tambe from the Wharton School at the University of Pennsylvania identify four challenges in using artificial intelligence techniques in human resources management and propose practical responses to these challenges.   

All That Glitters Is Not Gold

AI’s increasing sophistication when it comes to predictive analytics makes it very interesting for human resources. It could be applied to a variety of HR practices, like recruitment and selection, training, and employee retention. But since HR involves dealing with people, the questions are important and nuanced and don’t have cut-and-dry answers. Further complicating matters is the fact that HR datasets tend to be much smaller than those in other domains, such as market research, and data science techniques perform poorly when predicting relatively rare outcomes. Firing someone for poor performance is one example of an outcome that happens relatively rarely in companies, but one that has important implications for individuals and society.

The problems that AI faces in its HR applications tend to fall into four main groups: the complexity of HR problems, small datasets, ethical and legal considerations, and employee reactions. We explain how these apply at each stage of the AI life cycle, from data generation, to machine learning, to decision-making, and include questions to ask yourself when designing an AI strategy for HR management. 

The life cycle of an AI project 

1.  Generating Data

Gathering the data for your AI algorithm can be complicated. Take the seemingly straightforward question, “What constitutes a good employee?”, a question that becomes less straightforward when you dig a little deeper.  Job requirements can be broad and hard to specify for an algorithm. There is also the question of bias: an AI algorithm might be able to identify relationships between employee attributes and job performance, but if, for example, a company has historically hired and promoted white men, the algorithm could predict that white men will be the highest performers and inadvertently discriminate against other candidates, even if those candidates are highly qualified. Measuring performance can also present challenges: Who assesses performance? What is it based on? We work in an interconnected ecosystem, so performance is also impacted by factors like our colleagues, job resources, and company culture. Ideally an algorithm would include multiple indicators of performance, but creating an aggregate variable to represent performance is difficult. Therefore, do not seek perfect measures as they do not exist, but choose instead reasonable ones and stick with them.

There is also some selection bias in assessing employees, as often only those that were hired are included in the dataset. Most companies do not keep records of all of the data that they accumulate. To build a larger dataset, aggregate information from multiple sources and over time, including from candidates who are screened out.

Before launching a new digital HR project, determine the necessary and available data that can be extracted and transferred into a usable format at a reasonable cost. Sharing data across units must become a priority in the short-term; to evaluate employees’ performance, you must incorporate the company’s business and financial data. Invest in data standardization and platform integration across your company in the long run.

Do you have enough data to build an algorithm? Small datasets are often sufficient for identifying causal relationships, which managers need to understand in order to act on insights. Therefore, the less data you have, the more theory you will need (drawing from management literature, expert knowledge, and managerial experience). Randomized experiments are not to be neglected in order to test causal assumptions. 

If other companies are making their data available for machine learning, make sure that your context is not too distinct so that the algorithm built on data from elsewhere will be effective for your own organization. You can also use social media as an alternative source of data: some employers use it for hiring, others to identify problems such as harassment. HR stakeholders also must take privacy considerations into account and see under what conditions employee data can be used. 

2. Using Machine Learning 

Consider the example of using machine learning in the hiring process: we might look at which applicant characteristics have been linked to stronger performance in the past and use this to inform our hiring decisions. Using a machine learning algorithm might end up working better than conventional strategies but it poses a problem of self-selection: the ability of the model to “keep learning” and adapt to new information disappears when the flow of new hires is constrained by the predictions of the current algorithm. To address this problem, it could be useful to periodically retrain the algorithm using data on the performance of candidates that do not fit its criteria. Another possible issue is that using algorithms in selection could reduce the range on the variables of interest, potentially masking true relationships. For example, if a hiring manager makes its decision based on university grades, they might then have a hard time finding a link between grades and performance, for the simple reason that there is little variability in employees’ grades and so the relationship is not as clear. 

There are also potential ethical issues with the use of algorithms in HR decisions. For example, if we consider the difference between majority populations and minority populations, algorithms that maximize predictive success for the population as a whole may be less accurate in predicting success for the minority population. Generating separate algorithms for both might lead to better outcomes, but also to conflicts with legal norms of disparate treatment.  Thus, the effective implementation of machine-learning algorithms requires a review of labor laws. 

3.  Decision-Making

When choosing between two candidates that are both qualified for the position, the hiring manager has to make a tough decision. Suppose an algorithm determines that one candidate is an 80% match for the position and the other one is a 90% match. Is a 10% difference large or small, taking into account some very likely measurement errors and biases? In order to mitigate some of these issues, we could introduce random variation, which has been an unrecognized but important mechanism in management. Contrary to popular belief, research shows that employees perceive random processes as fair in determining complex and thus uncertain outcomes. Therefore, if both candidates are strong, it makes more sense to make a random choice. In other words, randomization should be an AI-management tool.

Employee buy-in is also a key part of the equation, as they will be impacted by changes in the decision-making process. How will employees react to decisions made by an algorithm instead of a supervisor? Even if employees are not always committed to the organization, they might be committed to their manager. Let us look at the following example. In the workplace, if your supervisor assigns you to work on the weekend, you might do it without complaining if you think your supervisor is generally fair. When the work schedule is generated by a program, you might react differently, as you don’t have a preexisting relationship with the algorithm. That being said, some decisions are easier to accept from an algorithm especially when those decisions have negative consequences for us, such as increased prices, as the decision feels less personal. 

So where do we go from here?

These are a few questions you should ask yourself before using AI technologies in HR management. In sum, remember:

1.   Causal explanations are essential for analytics and decision-making in HR because they can ensure fairness, be understood by stakeholders, and are ethically and morally defensible. 

2.   Companies have to accept HR algorithms’ relatively low predictive power.

3.   Randomization can help with establishing causality and partially compensate for algorithms’ low predictive power.

4.   Formalizing algorithm development processes and involving stakeholders in the process will help employees form a consensus about the use of algorithms and accept their outcomes. 

Further reading:


Tambe, P., Cappelli, P., & Yakubovich, V. (2019). Artificial intelligence in human resources management: Challenges and a path forward. California Management Review, 61(4), 15-42.


Originally published on October 29th, 2018; updated in December 2020


ESSEC Knowledge on X