Podcast: The Human-AI Oversight Paradox

Podcast: The Human-AI Oversight Paradox

With ESSEC Knowledge Editor-in-chief

Julia Smith, Editor-in-Chief of ESSEC Knowledge: Hello everyone and welcome to Be in the Know, the ESSEC Knowledge podcast sharing the research and expertise of ESSEC professors. Today, I'm here with Charles Ayoubi, Assistant Professor of Management, who's here to talk about his research on AI and decision-making. So Charles, could you tell me a little bit about your research interests and what led to these interests? 

Charles Ayoubi, Assistant Professor of Management: Thank you, Julia. It's really a pleasure to be here. I’m very happy to talk about my research interests and research work. I'm actually a former ESSEC student, and I always had this double interest in engineering and business, so I naturally became interested in innovation, which bridges the two: on one side, the business side, and on the other, the engineering side.

When you think about innovation, the subject that has always fascinated me, you can think of it as a three-step process: 

  • Step 1, come up with ideas. That's the idea generation process. 

  • Step 2, once all these ideas are generated, we need to think about them, evaluate them, and assess which ones we're going to invest in, put resources in, and go forward with. That's the idea evaluation step. 

  • Step 3, which happens once ideas are selected. The ones that are selected go into the last step, which is diffusion, and these ideas diffuse into the economy.

I believe every single one of these steps is really important, so my research has been looking at how AI affects each one of these steps.

Julia Smith: It's really interesting, especially since I think that we're all using AI in some capacity in our work, so it's interesting to think about how that is actually impacting us. I saw that the title of one of your recent papers is Narrative AI and the Human AI Oversight Paradox. What do you mean by this? 

Charles Ayoubi: That's a great question. As I said, there are three steps. We all have great ideas, right? I guess each person here listening has had an idea of launching a new startup or something. Then the question is, are these ideas good or not? And so, the key moment in the innovation process is the evaluation step. 

It's kind of the pivotal moment when we decide the ideas worth investment. Along with my co-authors, Jacqueline Lane (Harvard School of Business); Leonard Boussioux, Ying Hao Chen, Camila Lin (all three of the University of Washington); Rebecca Spens and Pooja Wagh (MIT Solve), and Pei-Hsin Wang (University of Washington), we ran an experiment. This is how I typically conduct research, by running field experiments. We did what we call an RCT, a randomized control trial. This is a bit like what we do with drug trials: we give some people the real drugs, we give others a placebo. We did something similar for the evaluation process. 

We let people evaluate many ideas of startups. Some did it the traditional way, just by getting the full idea and reading it. In the other two treatments, they either got a recommendation by AI on whether or not to pass the solution or both a recommendation by AI and also a narrative. That's why it's called narrative AI. Because today, LLMs, or generative AI, produce text, so you can ask them to give you an explanation or a narrative. The reason we call it a narrative is because it's not really AI explaining itself: it's AI giving a rationale for why it decided to say yes or no. 

So why is it a paradox? It's because we looked at what happens when you give people a text that justifies the decision. Our first intuition was to say, 'People make better decisions when they have more information. That's the natural propensity, right? If you want to make a good decision, you need more information. And so, if AI gives you this extra narrative, it should allow you to see, 'Okay, what was the rationale?' This should help you make a better decision.

What we find is that something else is happening, and this is where the paradox lies. 

Julia Smith: So how did you go about exploring this paradox in this study? 

Charles Ayoubi: We ran a field experiment. When you want to run a field experiment, if you want it to be relevant, you need to do it in real life, so to speak. We had this fantastic collaboration with our partner, MIT Solve. MIT Solve is a big entrepreneurship platform that receives hundreds of startup applications from all over the world every year. You can go online and look - they get hundreds and hundreds of startups from Benin, Bangladesh, Singapore, China, you name it, every single country in the world has people submitting their solutions to be selected in the MIT Solve competition. 

MIT receives all these solutions every year, and they need to decide which ones they take forward in the process to decide whether they select them as finalists and as winners. They came to us saying it's really hard to do all this evaluation, every year we're getting more and more ideas, from the idea generation process. How do we select the right ones, and say, can AI help us? So we said, oh, wait a minute, we have a great setting to run an experiment where we're going to make some people do it the traditional way, where it's been super hard.

Let’s see what happens when you give people some support, either by just an AI recommendation telling them, 'Well, this is what AI thinks,' or an AI recommendation with this narrative. That’s how we proceeded to do this field experiment to see the impact of AI on evaluation. 

Julia Smith: And the million-dollar question, what did you find?

Charles Ayoubi: What did we find? So that's what I was kind of hinting at with the paradoxes. One of our hypotheses was that people will be stimulated by the narrative and it will make them think better and so they will make better decisions when they have the narrative, in this last condition. To start with the positive news, let's say there's some good news and some bad news about AI and evaluations.

The good news: we made experts evaluate all the solutions to have a sense of how good each one of these are, so we can see how good the participants in our experiments were at selecting the ones that end up being really good solutions, versus the ones that end up being not so good. What we find is, when you have AI recommendations, you do better. The ones that you pass usually end up getting better grades by the experts, and the ones that you decide to not pass to the next round end up getting lower grades by the evaluator, and they do better than people without AI. So it sounds like AI is helping.

The paradox kicks in when you look at the difference between narrative and what we call black box recommendations. When you get the narrative, what happens is some of the solutions that AI recommended to fail, like not to go forward, ended up being actually really well rated by experts. So AI is not perfect in the sense that it might miss out on some potentially very impactful solutions. And so, if we look at these solutions, AI should have passed but didn't. What we find is that people follow AI a bit too much. But when they only have the AI recommendation, they're very likely to override AI. So, when you are in the black box condition, AI tells you, 'You should remove this solution’, although it's quite good.

When  you only have the recommendation, you keep this kind of critical thinking from the human perspective. And you're like, 'No, AI is not right.' Eighty percent of the time, evaluators overrode AI in the black box case. When the narrative comes in, this goes down to 40%. This shows that, when you have a narrative, AI can convince you to do something that you shouldn't be doing, which is removing a solution that should have been passed. It shows one of the big risks with this new kind of AI, which is LLMs, is the ability to generate text that is very persuasive. That's a big risk. We need to think about what this means. It can convince us to do things that we don't necessarily want to do or should be doing. 

Julia Smith: Yes, it's a little bit alarming, isn't it? What do you think it means for the use of AI, especially for people using it in the workplace? 

Charles Ayoubi: I think it says many things. As I said, there's some good news. AI can help a lot. Our partners at MIT Solve said it was great for the participants to feel like they get a second opinion. But it's great when it stays as a second opinion. Think about it as if you're asking a colleague for advice. You're not always going to believe what your colleague is saying. You're just going to consider it. Or like when you go see a doctor, you sometimes like to get a second opinion.

The second thing that's really important is we need to all be aware of how persuasive these algorithms are. I want to mention a research paper that came in recently, suggesting that AI is very good at convincing people out of conspiracy theories. Or… into them. So yeah, it's very good at convincing you of a conspiracy theory, making you believe it's true. Or making you stop believing in a conspiracy theory. It has this very strong persuasive power, which is great. But we need to be using it for good in a certain way.

Julia Smith: Yeah, so good news and bad news, I suppose. And what do you think are the key takeaways for this paper? And where do we go from here in your research, in the workplace?

Charles Ayoubi: That's a great question. What I usually say when I present this paper is, 'We need to be very aware.' We're not saying everyone should be using AI or no one should. Well, we have what we call in research a positive approach, which is different from a normative approach. We're not telling people what to do. We're trying to understand what happens when they do. The thing that we observe and we see it in every single, as you're saying, like in every single stage of work and society and we see it also in the academic world, people are reviewing papers, evaluating research work…We see people using AI, either to summarize the work, or to give an opinion, or to do the work in our place.

What we need to be aware of is that AI has its own biases. It might be taking us in its own biases and convincing us of going in its way. It’s key to keep in mind that we need our human judgment in every single step where we use AI. 

Julia Smith: It's very reassuring to hear that AI is best used as a tool and won't replace us quite yet, that we still have a critical role to play. In the process. So thank you so much, Charles, for sharing your insights today, I look forward to reading more of your work. 

Charles Ayoubi: Thank you, Julia. It was a pleasure.

 Further reading

N. Lane, Jacqueline and Boussioux, Leonard and Ayoubi, Charles and Chen, Ying Hao and Lin, Camila and Spens, Rebecca and Wagh, Pooja and Wang, Pei-Hsin, Narrative AI and the Human-AI Oversight Paradox in Evaluating Early-Stage Innovations (August 02, 2024). Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 25-001, Harvard Business School Working Paper No. 25-001, ESSEC Business School Research Paper, Available at SSRN: https://ssrn.com/abstract=4914367

FOLLOW US ON SOCIAL MEDIA