Generative AI Can Help Performance Reviews, But Not the Way You Think

Performance reviews are a notoriously dysfunctional element of workplace management.

“The big problem with performance reviews is that they’re not really what they should be,” said Frank Cespedes, senior lecturer at Harvard Business School and author of "Sales Management That Works: How to Sell in a World That Never Stops Changing." “They're not individualized discussions and conversations with someone else about strengths, weaknesses, developmental needs and whatever else we know about human beings.”

Data backs up this dire picture. Only 26% of U.S. companies consider themselves to have effective performance management processes in place, according to consulting firm WTW’s 2022 Performance Reset Survey of 837 organizations. And only about half feel that managers in their companies are good at assessing or differentiating the performance of those they supervise.

With the rapid rise of AI-powered large language models (LLMs) like ChatGPT, people in workplaces across the country are wondering whether these powerful tools can help improve — or even take over — the performance review process.

The answer is that LLMs can help around the edges. But any benefit they can provide in this area is tempered by numerous drawbacks.

How Generative AI Can Help With Performance Reviews

Managing Time

Managers struggle to find time to create thoughtful performance reviews, in particular allotting time to track employees’ performance in real-time over the course of months. Managers can periodically feed their thoughts or feedback into an LLM system and request a concise paragraph or bullet point on that item. Collecting these blurbs throughout the year will result in a specific and comprehensive write-up of helpful feedback that a manager can compile and interpret fairly quickly when it comes time for a review.

Generating Suggested Goals

Performance reviews often include setting goals for the coming period, which is more helpful if these goals are based on stated expectations, objectives and job functions. Managers can feed these elements into an LLM, along with an employee’s career aspirations and information about overarching team and company goals. The LLM can take all this data to generate ideas for performance goals tailored to that specific employee and their position in the company which the manager can then tailor further as need be.

Prompting Ideas and Language

One reason managers fall short on giving helpful performance reviews is difficulty phrasing constructive criticism. LLMs might be able to help on this front, by acting as a kind of sounding board for ideas and turns of phrase. A manager can enter unfiltered sentiments about a certain team member’s performance and prompt the machine to rephrase them in more professional or sensitive language. However, it’s important not to input employees’ names or personal information, and to be highly alert to the possibility of the chatbot introducing bias into its language (more on both of those issues below).

Planning for Employee Development

Performance reviews often include a discussion of how an employee can develop their skills and progress in their careers. Manager who are short on time or ideas can ask an LLM to spit out appropriate development activities for specific team members. Detailed input about the person’s role, current skills, ambitions and performance shortfalls will prompt the most helpful ideas from the LLM. Managers can then use their judgement to edit and analyze this output.

How Generative AI Can Make Things Worse

Threatening Privacy and Security

The most prominent concern about LLMs in their current form is the security of the information that users input into the system. Managers should remember that a site like ChatGPT is simply that — a website. Such sites don’t have any particular privacy or security assurances, and as such they are not acceptable places to put private employee information.

Experts in AI suggest treating ChatGPT and similar tools like interns. Like an intern, an LLM should only have access to limited information of a non-sensitive nature. The risks of disregarding this potential security and privacy problem are unacceptably high.

Introducing Bias

Another major issue with LLMs is they tend to introduce biased language into performance reviews. Large language models reflect the data they were trained on and in many cases — including with ChatGPT — they were trained on an unfiltered selection of available internet data.

Textio, a company whose advanced workplace language guidance tool helps managers and HR teams spot and correct bias in their performance feedback and recruiting, tested the ways in which ChatGPT introduces bias into performance reviews. It found widespread gender, racial and age bias in the chatbot's responses to generic prompts. For example, the system automatically uses she/her pronouns when asked to describe the work of a nurse, receptionist or kindergarten teacher. And when a worker is said to be “confident,” the LLM assumes the person is male three times more often than it assumes they are female.

The solution is not as easy as requesting that the tool police its own bias. Like the humans whose communication it was trained on, it interprets this instruction by simply becoming more formal. “When you ask it to remove bias it takes away contractions, it elevates language, it uses more passive verbs,” said Textio CEO Kieran Snyder. “It’s the same thing people do. People feel like they’re not as biased if they’re authoritative.”

Tools like Textio, Applied, Datapeople and TalVista all aim to help identify and remove bias, but there is no substitute for human judgment. “Managers should always apply a critical lens to AI-generated output and ground feedback in an incumbent’s job requirements and expectations to minimize bias,” said Grace Ewles, manager for HR Research & Advisory Services at McLean & Company.

Sowing Mistrust and Cynicism

Employees are already skeptical of the performance-review process, since many have had negative experiences with it, or at least didn't get a lot from it. Introducing LLM-produced reviews is likely to make them even more cynical and mistrustful of the process unless the manager is using the LLM to enhance the thoroughness, thoughtfulness and helpfulness of the review.

“In effect what many managers do is make the performance review a parade of platitudes, or even worse, they make it a sermon,” said Cespedes. “If ChatGPT simply accelerates that common bad practice, it’s a bad thing.”

As Ewles put it, “AI is not a replacement for human connection, empathy or supportive leadership.”

The Bottom Line

The bad news for managers who are struggling with performance reviews is that LLMs don’t provide a push-button solution. The good news is that tools like ChatGPT can help the process in various smaller ways, making it easier and faster to collect and generate feedback, work up employee goals and brainstorm professional development ideas.

LLMs are best used as enablers of the unique human work of giving constructive and helpful feedback to another person.

Learning Opportunities

Webinar

Dec

Rebrand. Migrate. Optimize. How to Do It All (Without Slowing Down)

Cresta leveled up site speed, design flexibility and marketer sanity (in record time). Find out how.

Webinar

Dec

[EIS Webinar] Beyond the Pilot: Why Most GenAI Projects Fail to Scale and How to Become One of the Success Stories

Move from experimental projects to integrated solutions that drive strategic decision-making.

Webinar

On demand

From Manual to Magical: How AI Transforms CX Teams

Learn how to replace manual support processes with automation that actually delivers.

Watch Now

Webinar

On demand

How to Build a Solid Knowledge Foundation for AI Success

See how leading brands keep their AI honest, compliant and actually helpful.

Watch Now

Webinar

On demand

Fix the Content Bottleneck: Build a Better WebOps Strategy

Content stalled? Dev overloaded? You’re not the only one. Learn how streamlined WebOps bridges the publishing gap.

Watch Now

Webinar

On demand

Beyond Storage: Smarter Content, Bigger Impact with DAM + AI

Discover how the DAM + AI duo makes content smarter, stronger and more accessible.

Watch Now

Webinar

Dec

Rebrand. Migrate. Optimize. How to Do It All (Without Slowing Down)

Cresta leveled up site speed, design flexibility and marketer sanity (in record time). Find out how.

Webinar

Dec

[EIS Webinar] Beyond the Pilot: Why Most GenAI Projects Fail to Scale and How to Become One of the Success Stories

Move from experimental projects to integrated solutions that drive strategic decision-making.

Webinar

On demand

From Manual to Magical: How AI Transforms CX Teams

Learn how to replace manual support processes with automation that actually delivers.

Watch Now

“Ultimately, we are talking about tools — admittedly, one that looks like a pretty darn powerful tool, but it's a tool, and its basic use is to free up time for things that humans do particularly well,” said Cespedes. “In this case, it's paying attention to their people and creating the right relationships and culture for that performance conversation. I’ve yet to see a machine that's able to do that."