In the age of AI, do we still need researchers?

In the age of AI, do we still need researchers?

With ESSEC Knowledge Editor-in-chief

Do we still need researchers given the rise and reach of artificial intelligence? We know that machine learning can identify complex relationships in massive datasets that are not necessarily identifiable to the “naked” (human) eye. We also know that artificial intelligence will eventually be able to take over many human functions. Will research be one of them? Vivianna Fang He of ESSEC Business School and her colleagues Yash Raj Shreshta (ETH Zurich), Phanish Puranam (INSEAD), and Georg von Krogh (ETH Zurich) dive into this question in their recent research.

In a word: no, but there’s more to it than that. 

Until now, machine learning techniques have been widely used for coding data and making predictions, but not yet for the core task of a researcher-- building theory. Why is this? It could be due to a scholarly distaste for so-called “predictions without explanations”. In fact, this is exactly where the opportunity lies, Prof. He and colleagues suggest.  Machine learning could indeed do a better job than researchers in finding robust and complex patterns in the data.

Traditionally, organizational scientists propose a theory and/or model and then test it, usually using a relatively small dataset. With larger datasets, there’s more of a chance that the results will be applicable to a wider population rather than just the one used in the study and thus be replicable--true in other situations other than the one at hand. Researchers can also study more variables when working with larger datasets, which is invaluable for constructing a more complete picture of the situation we are studying. When building a theory from data using traditional statistical tools, researchers run the risk of overfitting--finding a pattern that is specific to the current sample. Machine learning algorithms have procedures that help avoid overfitting, meaning that the patterns they identify are more likely to be reproduced in other samples. This is a highly valuable property, as it could, for example, help address psychology’s current replication crisis by facilitating the development of robust theories and therefore replicable results. 

Another advantage of integrating machine learning techniques is that it can help manage researcher bias by making the research processes and decisions transparent. Researchers are only human, after all, so it is possible that they’ll experience confirmation bias and look for results that support their predictions: in other words, that they’ll see what they want to see. In using machine learning algorithms, researchers can specify the level of complexity in the patterns detected and document these decisions. Together these procedures allow for a thoughtful approach to balancing predictive accuracy and interpretability. Higher predictive accuracy can mean that the patterns are too complex to understand, and higher interpretability can mean that the pattern is simpler and perhaps then not taking all impactful factors into account. Being able to control this tradeoff is essential for interpreting the patterns in a way that makes sense to people and not just to machines. It also means that researchers can explain their rationale in a transparent way. 

However, the machines can’t act alone: algorithms lack the intuition and common sense that humans have. While they can put the pieces of the puzzle together, it is up to us humans to explain why the pieces go together. Many critical parts of the theory building process will still be up to the researchers, such as defining what factors are of interest, selecting or developing ways to measure those factors, and explaining the relationships driving the observed patterns. The future of theorizing will require a synergy between algorithms and humans. 

Prof. He and her colleagues propose a four-stage procedure to explore this opportunity. The first stage is splitting the sample into two: one sample to use for machine learning-supported pattern detection, and the second sample to use for testing the hypotheses. In stage 2, the researchers program the algorithms and the algorithms do their magic and identify interpretable and reliable patterns. In stage 3, the researchers ask themselves if the patterns make sense, and come up with the explanations for the patterns. This stage is where human expertise and judgement are essential, as ML algorithms don’t have the capacity to do this. In stage 4, researchers test the hypotheses- the theory- in the second sample to see if the pattern holds. 

The authors have applied this method to study the governance disputes in online communities (He, Puranam, Shreshta, von Krogh, 2020), whereas other organizational scholars have used it to identify optimal revenue for a wide range of App store products (Tidhar & Eisenhardt, 2020) and to gauge whether or not an idea will take off (Dahlander, Fenger, Beretta, Kazami, & Frederiksen, 2020). Similar approaches are also being experimented in natural sciences. For example, Udrescu and Tegmark (2020), two physicists at MIT, used 100 equations to generate data and then feed that data to a neural network. Their algorithm was able to recover all 100 equations! This diverse set of studies show that the approach can be applied to a wide variety of topics, making it useful for researchers across disciplines. 

While this approach has extensive implications for theory-building, the authors do note that there are some caveats to be considered before using this approach. Machine learning assumes that the future can be predicted from the past, so it’s best to use machine learning algorithms when assessing relatively stable phenomena. Second, machine learning cannot replace randomization. Machine learning techniques are most suitable for coming up with predictions, rather than testing a theory about the relationships between variables. There is also the risk that ML techniques could amplify biases present in the data, leading to biased conclusions, as biases could be hard to detect but have significant ethical consequences. Therefore, researchers must have a strong conceptual understanding of the techniques they’re using, which is no easy feat in such a rapidly advancing field. 

In a nutshell, while machine learning cannot replace researchers, it CAN take over some of the functions that humans currently do, like pattern recognition, rote memorization, and arithmetic. However, people are needed for tasks that require more intuition and creativity, like explaining patterns, book writing, and art. 

So, do we still need researchers? Yes - and machine learning can be a powerful tool for producing more robust research. 

For more information on using machine learning algorithms in theory-building, check out their article here.


Dahlander, L., Fenger, M., Beretta, M., Kazami, S., & Frederiksen, L. (2020). A Machine-Learning Approach to Creative Forecasting of Ideas. In Academy of Management Proceedings (Vol. 2020, No. 1, p. 17027). Briarcliff Manor, NY 10510: Academy of Management.

He, V. F., Puranam, P., Shrestha, Y. R., & von Krogh, G. (2020). Resolving governance disputes in communities: A study of software license decisions. Strategic Management Journal, 41,  doi: 10.1002/smj.3181

Shrestha, Y. R., He, V. F., Puranam, P. and von Krogh, G., (Forthcoming) Algorithm supported induction for building theory: How can we use prediction models to theorize? Organization Science.

Tidhar, R, Eisenhardt, KM. (2020). Get rich or die trying… finding revenue model fit using machine learning and multiple cases. Strategic Management Journal, 41: 1245– 1273. doi : 10.1002/smj.3142

Udrescu, S. M., & Tegmark, M. (2020). AI Feynman: A physics-inspired method for symbolic regression. Science Advances, 6(16), eaay2631. doi: 10.1126/sciadv.aay2631

ESSEC Knowledge on X