Can AI really reduce bias in hiring? A closer look at a "writing with AI" study

There's a new paper doing the rounds with a headline that sounds almost too good: Writing with AI Can Reduce Gender Bias in Hiring Evaluations. It's from a team at the University of Chicago (Liu, Lee and Bai), and it was publised at CHI 2026, one of the most respected venues in human-computer interaction. The premise is clever, the experiment is careful, and the results are real.

So why am I not reaching for the champagne?

Because when you read past the title, the study tells a more complicated story about AI bias in hiring than the headline suggests. It's a story worth telling properly, because it shapes where I think AI actually belongs in recruitment, and where it really doesn't.


What the study actually did…

The researchers ran a preregistered experiment with 672 participants. Everyone reviewed two résumés for a financial analyst role: one from "Jennifer" and one from "John." The résumés were matched, so any difference in how people judged them came down to the names and the words evaluators chose.

Here's the interesting part. As participants typed up their evaluations, an AI writing assistant offered short autocomplete suggestions, a bit like Gmail's Smart Compose. The suggestions came in three flavours for Jennifer:

  • Stereotypical, leaning on warmth: approachable, supportive, empathetic.

  • Counter-stereotypical, leaning on competence: confident, analytical, ambitious.

  • Neutral, as a control.

John's suggestions stayed neutral throughout.

The findings were consistent and consistent with what we’d (sadly) expect. When the assistant fed people competence-flavoured words for Jennifer, they rated her as more competent, picked her as a trusted leader more often, and, most strikingly, closed the salary gap. In the control condition Jennifer was offered about $64,132 against John's $65,116, a gap that was statistically significant. With counter-stereotypical suggestions, the gap shrank to roughly $64,446 versus $64,802 and stopped being significant.

Proof that words effect salary - what we’ve been saying all along. If you change the words people reach for, then you change the judgment that follows. The psychology behind it is well established: language changes how we see people.

What findings are important?

  1. The bias moved. In the counter-stereotypical condition, the same participants who now saw Jennifer as a capable leader also rated her as less warm, less friendly, and less enjoyable to work with. She was picked as the more pleasant colleague 126 times, down from 149 and 158 in the other conditions. The researchers are honest about this and call it what it is: a backlash pattern, the familiar penalty competent women pay for being seen as competent. So the intervention bought Jennifer respect and charged her likeability for it. The bias didn't go away. It changed shape.

  2. The actual hiring decision barely changed. Jennifer was hired 44.6% of the time with counter-stereotypical suggestions, 41.5% in the control, and 39.7% in the stereotypical condition. The direction is encouraging, but the differences weren't statistically significant. The salary number moved; the yes-or-no call did not. For anyone who cares about who gets the job, that's the result that matters most, and it's the one that didn't move.

  3. A lot of the evaluation wasn't really written by the evaluator. Participants accepted around 73-77% of the AI's suggestions and rarely edited them. By the authors' own measure, about 39-41% of the words in the final write-ups came straight from the assistant. So when the tool nudged someone toward calling Jennifer "analytical," and they kept it, whose assessment is that? The evaluator borrowed a verdict and signed their name to it.


Why this matters beyond one experiment…

None of this makes the study bad. The authors flag the limits clearly: the participants were online crowdworkers, not professional recruiters. It was one job, one pair of names, binary gender, a single AI system, and a lab setting. The effects were, in their words, at least temporary. Nobody has shown they survive contact with a real hiring process or last beyond the session.

The authors also raise the ethical question that sits underneath the whole thing, and I'm glad they did. An autocomplete that quietly steers the words you use is a nudge. A helpful one, maybe, if it pushes you toward fairness you already believe in. But it's still a system shaping your judgment at the exact moment you're making it, often without you clocking which moments. They compare it to Grammarly's tone suggestions, where you opt in knowing your expression will be shaped. That's a fair comparison. It's also a reason to be careful, because hiring is not the same as polishing an email.

The intervention works by managing the human evaluator at the point of decision. It treats the recruiter or hiring manager as the thing to be corrected, in real time, by software whispering better adjectives. Even when that nudge points the right way, it's a strange place to put your trust. You've made someone’s typing less biased while the AI is switched on but haven’t actually taught them anthing.


Where I think AI actually belongs in recruitment

I run a business built on the belief that AI has a real role in fairer hiring, so I'm not coming at this as a sceptic. My view is about where the technology earns its place, and where it oversteps.

AI is good upstream, before a single candidate applies. It's much weaker, and much riskier, the closer it gets to judging actual people.

A job description is a document. It has no feelings to bruise, no career to derail. If software flags that "rockstar ninja who'll crush it" is quietly telling women not to bother, or that a wall of "essential" requirements is screening out brilliant people who'd thrive, you can fix the text and lose nothing. The stakes are low and the upside is wide, because a better-written advert reaches a wider, stronger pool of applicants from the start.

A candidate evaluation is the opposite. It's a judgment about a person, with a real consequence attached. Hand that to AI, whether it's scoring résumés or feeding evaluators their verdicts mid-sentence, and you've put a machine where human accountability needs to be. The Chicago study is a neat demonstration of why. Even a well-intentioned nudge at the evaluation stage produced backlash, left the hire decision unchanged, and blurred the line over who actually made the call.

That's the line I'd draw. Use AI to improve the inputs to a fair process. Keep humans firmly in charge of the judgments.


The job for AI is the job description

This is why we built JobFair. It works on the part of hiring where AI helps and does no harm: the job advert itself.

JobFair reads your job descriptions and flags the language and hidden barriers that quietly push good candidates away, from gendered wording to the unnecessary "requirements" that deter strong applicants who'd otherwise apply. The recommendations are grounded in published research, not vibes. Then it hands you clear, specific changes you can accept, reject, or rewrite. You stay in control of every word.

JobFair doesn't score candidates. It doesn't rank your shortlist. It doesn't sit over a recruiter's shoulder feeding them adjectives while they decide someone's future. It fixes the document, then gets out of the way.

That's our whole thing. Give recruiters and hiring managers sharper tools to do their own job better. Don't quietly replace their judgment with the model's.

The paper linked above is a valuable piece of research, and I'd encourage anyone serious about fair hiring to read it in full. But I read it as a careful caution, not a green light. Words shape how we see people, which is exactly why we should be thoughtful about who, or what, is choosing them.

Get the advert right, and you widen the pool before bias ever gets a look in. That's a job AI can do well, and it's the one I'd trust it with.

Next
Next

Female birdsong and the sexism baked into science