All EpisodesApril 9, 2026

Why Performance Reviews Fail (And How to Actually Make Them Fair)

In this episode of The Science of Leading, Claire and Edwin unpack why so many performance reviews feel unfair, political, or just plain useless—and what leaders can do about it.

Drawing on OAD’s performance evaluation framework, they break down the most common pitfalls in team and individual reviews: recency bias and halo/horns effects, personality-based judgments disguised as "fit," confusing effort with real outcomes, and rating scales that mean something different in every department.

Claire surfaces the real questions managers are wrestling with—how do you keep reviews consistent, avoid bias, and still move fast?—while Edwin offers concrete, research-backed practices: anchoring reviews to shared team goals, insisting on specific evidence and work artifacts, using simple rating scales with behavioral anchors, and running calibration sessions that turn "manager opinion" into a defensible system.

If you’re an HR leader, people manager, or founder who wants performance reviews to drive clarity, accountability, and development instead of anxiety and politics, this episode gives you a practical roadmap for doing reviews fairly—and making them actually useful.

This show was created with Jellypod, the AI Podcast Studio. Create your own podcast with Jellypod today.

Is this your podcast and want to remove this banner? Click here.

Chapter 1

Why So Many Performance Reviews Feel Wrong

Claire Monroe

Welcome back to The Science of Leading. I’m Claire, and I’m here with Edwin Carrington. Today we’re talking about something almost everyone quietly dreads at work… performance reviews.

Edwin Carrington

Yes, the annual ritual of anxiety.

Claire Monroe

Exactly. And what I wanna dig into is… why do so many reviews feel off? Like, you walk out thinking, “That didn’t really describe my work… but it still affects my future.”

Edwin Carrington

You’re describing the core problem. Reviews quietly define “what good looks like.” They tell people: this is what gets rewarded, this is what gets ignored. If that message is sloppy or inconsistent, people lose trust very quickly.

Claire Monroe

So even if leaders don’t say it out loud, the review is where the organization decides, “We care about outcomes” or “We care about optics.”

Edwin Carrington

That’s right. When reviews are vague, political, or based on memory, employees learn that performance doesn’t matter as much as being visible, or being liked. That crushes engagement. And your best people… they eventually leave.

Claire Monroe

Let’s name the biggest landmines. You talk a lot about recency bias. Can you explain that in plain terms?

Edwin Carrington

Recency bias is when the last few weeks become the whole story. A person can do solid work for nine months, then miss one deadline in the last month, and suddenly the review is, “You struggle with follow-through.” It’s not fair, but it’s very human.

Claire Monroe

And the opposite happens too, right? One great project in Q4 and everything else gets washed away.

Edwin Carrington

Exactly. That connects to the halo and horns effect. One strong trait, or one big mistake, colors every area. “You’re great with clients, so you must be great at time management.” Or, “You dropped the ball once, so you must be unreliable in general.”

Claire Monroe

Hmm. So instead of, “Your communication was rough in this one launch,” it becomes, “You’re bad at communication, full stop.”

Edwin Carrington

Right. And then we layer on personality judgments. Labels like “not a culture fit,” “too quiet,” “too intense.” None of those say anything about observable behavior. They’re just… manager feelings.

Claire Monroe

Why do you think so many leaders slip into that? Is it laziness, or just not having language for what they’re seeing?

Edwin Carrington

Often it’s a lack of structure. If you don’t have a framework, you default to personality. “I enjoy working with this person, therefore they’re strong.” Or, “They irritate me, therefore they’re weak.” We confuse effort with outcomes. “They stay late, they must be high performers.” Maybe. Or maybe they’re fixing avoidable mistakes.

Claire Monroe

So “works hard” becomes a stand‑in for “delivers results,” even if there’s no evidence. I’ve definitely heard, “She has such a great attitude,” used as a reason to rate someone highly, with zero mention of actual deliverables.

Edwin Carrington

Yes. Effort is only useful when it leads to something: better quality, faster delivery, fewer escalations, stronger collaboration. Otherwise we’re just rewarding busyness.

Claire Monroe

Okay, let’s flip this. If we’re not centering reviews on personality or vibes, what do we center them on?

Edwin Carrington

On shared team goals. Start with: what did this team exist to do in this timeframe? Delivery: did we meet the right deadlines. Quality: did we reduce rework and complaints. Collaboration: did handoffs go smoothly. Improvement: did we actually learn and fix recurring issues.

Claire Monroe

So instead of, “Ed is great to work with,” it becomes, “Ed improved our on‑time delivery by cleaning up handoffs between product and support.” Still human, but grounded.

Edwin Carrington

Exactly. First define the outcomes, then evaluate individuals and the team against those outcomes. That’s where tools like OAD can help as well—bringing in fast, validated personality data so you understand someone’s natural style. But that data is there to explain patterns and tailor development, not to replace evidence about results.

Claire Monroe

That’s an important distinction. Personality data should support a fair review, not become another excuse for bias. “Well, the assessment says you’re not a team player…” No thank you.

Edwin Carrington

Precisely. The goal is very simple: people should see a clear line between what they did, the impact it had, and what the organization values next cycle. When you get that wrong, you don’t just have a bad meeting—you damage trust and retention.

Claire Monroe

Alright, so in the next part, let’s get specific about how to turn those vague “vibes” into actual evidence in a review.

Chapter 2

Turning Vibes into Evidence

Claire Monroe

So Edwin, if I’m a manager listening to this, I might be thinking, “Okay, I get the problem… but what does a fair review actually look like on paper?”

Edwin Carrington

Let’s build it from the ground up. A useful review has a few core pieces. First: role and scope. What is this job actually responsible for, and who are the key stakeholders—customers, other teams, direct reports.

Claire Monroe

So you’re not rating someone on expectations you never wrote down.

Edwin Carrington

Exactly. Second: goals and outcomes for the timeframe. What was expected, what was delivered, and what changed. If priorities shifted three times, write that down—don’t quietly punish people for leadership thrash.

Claire Monroe

Then we get into competencies, right? Work ethic, time management, problem solving, communication, collaboration… those kinds of buckets.

Edwin Carrington

Yes. Keep the list small and relevant. Under each competency, you need evidence: specific examples, work artifacts like tickets or project docs, and any meaningful metrics. Finally, you apply a simple rating scale on top—a 3‑point or 5‑point scale with clear definitions.

Claire Monroe

Let’s talk about swapping those fuzzy phrases. Take “not a team player.” How do we rewrite that in a way that’s actually helpful?

Edwin Carrington

We translate it into behavior and impact. For example: “Did not share status updates on Project X, which led to missed handoffs and rework for the support team.” That’s observable. You can agree or disagree with the example, but at least you can see it.

Claire Monroe

And on the positive side, “great attitude” becomes something like, “When deadlines slipped in March, you stayed solution‑focused and helped coordinate a recovery plan that kept the launch within the quarter.”

Edwin Carrington

Exactly. There’s nothing wrong with being pleasant to work with. Just tie it to outcomes. Did their behavior improve delivery, quality, or collaboration.

Claire Monroe

What about artifacts? You mentioned tickets and docs. How much of that should show up in a review?

Edwin Carrington

Enough that someone outside the team could follow the story. “Here’s the project doc you led, here’s the customer feedback, here’s the before-and-after defect rate.” It doesn’t have to be a novel, but it can’t just be, “Trust me, they’re great.”

Claire Monroe

Okay, rating scales. People love to argue about 3‑point versus 5‑point. How do you think about that?

Edwin Carrington

If you need speed and you don’t have strong calibration habits, use three: Needs improvement, Meets expectations, Exceeds expectations. If you’re ready for more nuance, a 5‑point scale can work, but you must define what a 4 and a 5 look like in behavior. Otherwise, everyone becomes a 4.

Claire Monroe

And that’s where behavioral anchors come in, right? Like, for time management, a “3” might be “Consistently meets deadlines and communicates tradeoffs,” and a “5” might be “Improves team planning systems and reduces recurring delays.”

Edwin Carrington

Yes. Anchors turn the scale from a mood meter into a tool. Then you run calibration: managers compare ratings with evidence, first within a department, then across departments. You’re looking for patterns—one manager who rates everyone as “exceptional,” another who never gives above “meets expectations.”

Claire Monroe

Hmm… So calibration is basically, “Are we all using this scale the same way?”

Edwin Carrington

Exactly. And this is another place where behavioral data from something like the OAD Survey can be helpful. If you know a manager tends to avoid conflict, for example, you might expect them to inflate ratings. Or if a role needs a very specific behavioral profile, you can check whether the expectations you’re setting line up with how that person is wired—and tailor development instead of just slapping on a low score.

Claire Monroe

So you’re not using OAD to label people, you’re using it to ask better questions during calibration and to make coaching more precise.

Edwin Carrington

Exactly. You still come back to the same rule: every rating must connect to evidence—examples, artifacts, and impact on team goals. The structure isn’t there to make HR happy. It’s there to make the system fair and understandable.

Claire Monroe

Alright, next I wanna ask: once you’ve collected all this evidence and put in the ratings… how do you turn that into actual growth, not just a score?

Chapter 3

From Feedback to Actual Improvement

Claire Monroe

We’ve talked about what a good review looks like. Now let’s talk about the part that usually breaks down: turning that feedback into real improvement.

Edwin Carrington

Yes. Many companies stop at, “Here’s your rating, see you next year.” A review without a development plan is just a history lesson.

Claire Monroe

So what does a concrete development plan actually include? Because I see a lot of “Work on communication” with no specifics.

Edwin Carrington

You need three things. One: a single focus area, not a shopping list. Work ethic, time management, problem solving, or communication—pick one. Two: observable behaviors. “Provide weekly status updates,” “Summarize decisions after meetings,” “Flag blockers within 24 hours.” Three: checkpoints with dates.

Claire Monroe

So instead of “Get better at time management,” it becomes, “For the next 60 days, share weekly priorities, agree on deadlines, and hit those dates on two key projects.”

Edwin Carrington

Exactly. And you define what success looks like. “No last‑minute surprises for stakeholders on Project A and B.” That makes the conversation in the next review cycle very clear—did it happen or not.

Claire Monroe

Where do follow‑ups come in? Because if you only talk about this a year later, it’s already too late.

Edwin Carrington

You need a rhythm. Quarterly check‑ins to talk goals and blockers. A mid‑year review that looks more deeply at performance and the development plan. Then an annual review mainly for compensation and promotion, using all that prior evidence. That cadence reduces recency bias and almost eliminates “surprise” feedback.

Claire Monroe

And in between, you’re doing short, practical check‑ins. “How are the status updates going? Are we still seeing last‑minute rework, or has it dropped?”

Edwin Carrington

Exactly. And again, if you have something like OAD in the mix, you can make those plans more tailored. If you know someone is naturally low‑detail, for example, you might keep their time‑management plan very simple and build in visual tools, instead of just saying, “Be more organized.” You’re aligning development with how they’re actually wired.

Claire Monroe

So we’ve got structure, cadence, and behavioral insight all working together. Before we wrap, could we give managers a quick checklist for their next review cycle?

Edwin Carrington

Let’s do it. Here’s a simple one. One: Set the timeframe clearly—“We’re reviewing from January 1st to March 31st”—and require evidence from inside that window. Two: Start with team goals. What were the top three outcomes this team owned: delivery, quality, collaboration, improvement.

Claire Monroe

Three: For each person, define their role and scope in writing. What are they actually on the hook for, and who are their key stakeholders.

Edwin Carrington

Four: Use a small set of competencies—work ethic, time management, problem solving, communication and collaboration. For each one, gather at least two specific examples and any relevant metrics or artifacts.

Claire Monroe

Five: Apply a simple rating scale—3‑point or 5‑point—with behavioral anchors. No rating without evidence. If you can’t back it up, you can’t score it.

Edwin Carrington

Six: Run calibration. Compare ratings across managers, look for drift, and adjust based on evidence, not hierarchy. Seven: Turn every constructive point into a development plan: one focus area, clear behaviors, and 30‑, 60‑, 90‑day checkpoints.

Claire Monroe

Eight: Lock in the follow‑up rhythm. Quarterly check‑ins, a mid‑year review, and an annual review that summarizes, instead of inventing a new story.

Edwin Carrington

And nine: Keep asking, “Would a reasonable person see this process as fair?” If not, adjust. Because fairness is what protects trust, engagement, and retention over time.

Claire Monroe

I love that. And if you wanna add a behavioral layer that doesn’t slow everything down, this is where something like the OAD Survey can be really powerful—fast data on how people naturally work, so your reviews and development plans are grounded in both behavior and results.

Edwin Carrington

Well put. When you combine clear goals, real evidence, structured ratings, and good behavioral insight, reviews stop being rituals and start being levers for performance and growth.

Claire Monroe

Alright, we’re gonna leave it there for today. Edwin, as always, thank you for the wisdom.

Edwin Carrington

Always a pleasure, Claire. And for those listening, remember: structure isn’t bureaucracy. It’s how you show people you’re paying attention.

Claire Monroe

If you want to see how OAD performs on your own roles and candidates, test OAD for free at OAD.ai. Thanks for listening to The Science of Leading. Edwin, talk to you next time.

Edwin Carrington

Looking forward to it. Goodbye everyone.