Performance reviews are a notoriously dysfunctional element of workplace management.
“The big problem with performance reviews is that they’re not really what they should be,” said Frank Cespedes, senior lecturer at Harvard Business School and author of "Sales Management That Works: How to Sell in a World That Never Stops Changing." “They're not individualized discussions and conversations with someone else about strengths, weaknesses, developmental needs and whatever else we know about human beings.”
Data backs up this dire picture. Only 26% of U.S. companies consider themselves to have effective performance management processes in place, according to consulting firm WTW’s 2022 Performance Reset Survey of 837 organizations. And only about half feel that managers in their companies are good at assessing or differentiating the performance of those they supervise.
With the rapid rise of AI-powered large language models (LLMs) like ChatGPT, people in workplaces across the country are wondering whether these powerful tools can help improve — or even take over — the performance review process.
The answer is that LLMs can help around the edges. But any benefit they can provide in this area is tempered by numerous drawbacks.
How Generative AI Can Help With Performance Reviews
Managing Time
Managers struggle to find time to create thoughtful performance reviews, in particular allotting time to track employees’ performance in real-time over the course of months. Managers can periodically feed their thoughts or feedback into an LLM system and request a concise paragraph or bullet point on that item. Collecting these blurbs throughout the year will result in a specific and comprehensive write-up of helpful feedback that a manager can compile and interpret fairly quickly when it comes time for a review.
Generating Suggested Goals
Performance reviews often include setting goals for the coming period, which is more helpful if these goals are based on stated expectations, objectives and job functions. Managers can feed these elements into an LLM, along with an employee’s career aspirations and information about overarching team and company goals. The LLM can take all this data to generate ideas for performance goals tailored to that specific employee and their position in the company which the manager can then tailor further as need be.
Prompting Ideas and Language
One reason managers fall short on giving helpful performance reviews is difficulty phrasing constructive criticism. LLMs might be able to help on this front, by acting as a kind of sounding board for ideas and turns of phrase. A manager can enter unfiltered sentiments about a certain team member’s performance and prompt the machine to rephrase them in more professional or sensitive language. However, it’s important not to input employees’ names or personal information, and to be highly alert to the possibility of the chatbot introducing bias into its language (more on both of those issues below).
Planning for Employee Development
Performance reviews often include a discussion of how an employee can develop their skills and progress in their careers. Manager who are short on time or ideas can ask an LLM to spit out appropriate development activities for specific team members. Detailed input about the person’s role, current skills, ambitions and performance shortfalls will prompt the most helpful ideas from the LLM. Managers can then use their judgement to edit and analyze this output.
Related Article: How to Equitably Handle Employee Recognition in a Hybrid Workplace
How Generative AI Can Make Things Worse
Threatening Privacy and Security
The most prominent concern about LLMs in their current form is the security of the information that users input into the system. Managers should remember that a site like ChatGPT is simply that — a website. Such sites don’t have any particular privacy or security assurances, and as such they are not acceptable places to put private employee information.
Experts in AI suggest treating ChatGPT and similar tools like interns. Like an intern, an LLM should only have access to limited information of a non-sensitive nature. The risks of disregarding this potential security and privacy problem are unacceptably high.
Introducing Bias
Another major issue with LLMs is they tend to introduce biased language into performance reviews. Large language models reflect the data they were trained on and in many cases — including with ChatGPT — they were trained on an unfiltered selection of available internet data.
Textio, a company whose advanced workplace language guidance tool helps managers and HR teams spot and correct bias in their performance feedback and recruiting, tested the ways in which ChatGPT introduces bias into performance reviews. It found widespread gender, racial and age bias in the chatbot's responses to generic prompts. For example, the system automatically uses she/her pronouns when asked to describe the work of a nurse, receptionist or kindergarten teacher. And when a worker is said to be “confident,” the LLM assumes the person is male three times more often than it assumes they are female.
The solution is not as easy as requesting that the tool police its own bias. Like the humans whose communication it was trained on, it interprets this instruction by simply becoming more formal. “When you ask it to remove bias it takes away contractions, it elevates language, it uses more passive verbs,” said Textio CEO Kieran Snyder. “It’s the same thing people do. People feel like they’re not as biased if they’re authoritative.”
Tools like Textio, Applied, Datapeople and TalVista all aim to help identify and remove bias, but there is no substitute for human judgment. “Managers should always apply a critical lens to AI-generated output and ground feedback in an incumbent’s job requirements and expectations to minimize bias,” said Grace Ewles, manager for HR Research & Advisory Services at McLean & Company.
Sowing Mistrust and Cynicism
Employees are already skeptical of the performance-review process, since many have had negative experiences with it, or at least didn't get a lot from it. Introducing LLM-produced reviews is likely to make them even more cynical and mistrustful of the process unless the manager is using the LLM to enhance the thoroughness, thoughtfulness and helpfulness of the review.
“In effect what many managers do is make the performance review a parade of platitudes, or even worse, they make it a sermon,” said Cespedes. “If ChatGPT simply accelerates that common bad practice, it’s a bad thing.”
As Ewles put it, “AI is not a replacement for human connection, empathy or supportive leadership.”
Related Article: Are You Giving Employees Guidance on Generative AI Use? You Should Be
The Bottom Line
The bad news for managers who are struggling with performance reviews is that LLMs don’t provide a push-button solution. The good news is that tools like ChatGPT can help the process in various smaller ways, making it easier and faster to collect and generate feedback, work up employee goals and brainstorm professional development ideas.
LLMs are best used as enablers of the unique human work of giving constructive and helpful feedback to another person.
“Ultimately, we are talking about tools — admittedly, one that looks like a pretty darn powerful tool, but it's a tool, and its basic use is to free up time for things that humans do particularly well,” said Cespedes. “In this case, it's paying attention to their people and creating the right relationships and culture for that performance conversation. I’ve yet to see a machine that's able to do that."