The Science of Leading

EducationBusiness

Listen

All Episodes

Beyond Gut Feel: Building Fair, Defensible Leadership Evaluation Systems

In this episode of The Science of Leading, Claire Monroe and Edwin Carrington break down how HR and People leaders can move beyond political, vibes-based leadership reviews and design a fair, defensible evaluation system that actually improves leadership performance.

Drawing on real-world best practices, they explore how to define a clear, role-specific leadership competency model, translate it into observable behaviors, and align it with your organization’s strategy and culture. They unpack how to combine tools like 360-degree feedback, validated assessments, performance reviews, and simulations without creating noise or burnout.

Claire and Edwin also discuss how to protect psychological safety and DEI, ensure anonymity and trust in feedback, and turn assessment data into concrete development plans, coaching priorities, and promotion decisions. If you’re an HR or People leader tired of biased talent conversations and fuzzy ratings, this episode will help you build a leadership evaluation system your executives can trust—and your leaders will actually use.

This show was created with Jellypod, the AI Podcast Studio. Create your own podcast with Jellypod today.

Is this your podcast and want to remove this banner? Click here.


Chapter 1

Why Leadership Evaluation Fails (and What Fair Really Means)

Claire Monroe

Welcome back to The Science of Leading. I’m Claire, and today we’re talking about something that drives a lot of HR folks quietly up the wall: leadership evaluations.

Edwin Carrington

Mm-hm. The annual ritual where everyone pretends we’re assessing leadership… and instead we reward confidence, charisma, and good storytelling.

Claire Monroe

Exactly. I’ve sat in so many calibration meetings where the “strong leader” is just the person who talks a good game. Or the system is so complicated nobody even uses it. So Edwin, can we start with the biggest ways these evaluations fail?

Edwin Carrington

Let’s name three. First, gut-feel reviews. A manager thinks, “I like this person, they seem strong in meetings,” and that vibe turns into a rating. No structure, no shared standard. That’s charisma bias in action. Second, hero worship of results. “They hit their numbers, so they must be a great leader,” even if their team’s burned out and updating LinkedIn. Third, over-engineered frameworks. Fifteen competencies, six-point scales, a 40-page guide… and then in practice people ignore it and go right back to gut feel.

Claire Monroe

Yeah, the beautiful framework that lives in a slide deck, not in actual decisions. When you say “charisma bias,” what does that look like on the ground?

Edwin Carrington

It’s the leader who sounds brilliant in presentations, so we assume they’re strong at coaching, conflict resolution, all of it. But if you ask their direct reports, you hear, “We never get clear priorities,” or “They avoid tough conversations.” The show is great. The day-to-day leadership behaviors are not.

Claire Monroe

So let’s flip this. For an HR leader who’s tired of that pattern, how would you define a fair, defensible leadership evaluation system?

Edwin Carrington

I’d use four words. Role-specific, behavior-based, multi-method, and decision-linked. Role-specific means you’re clear about what “good” looks like for a team lead versus a senior VP. Behavior-based means you score what people actually do—how they run one on ones, how they make decisions—rather than labels like “is a visionary.” Multi-method means you never rely on one input. You combine 360 feedback, a validated survey—something practical like OAD—performance data, maybe a simulation. And decision-linked means you know exactly what this system is informing: who’s ready for promotion, who needs coaching, who’s a succession candidate, where you have leadership risk.

Claire Monroe

I like that you put decisions last, but also… it sounds like that’s where most people should start and don’t. They pick tools first and only later ask, “What are we going to do with all this data?”

Edwin Carrington

That’s right. Before you buy a tool or design a form, you should answer, very plainly: what decisions must this evaluation support in the next year or two? Promotion and succession are obvious. Development is another: what will guide coaching plans and training investments? And risk management: do we have leaders whose behaviors are damaging culture, increasing attrition, or creating ethical risk?

Claire Monroe

So if I’m building or fixing a system, I might literally write at the top of the doc, “This process informs: promotion, succession, development planning, and leadership risk calls.” And then sanity-check every question or tool against that list.

Edwin Carrington

Exactly. If a piece of data won’t change a decision, it’s probably noise. Fair and defensible does not mean exhaustive. It means you can explain, calmly, how you evaluated a leader, why you trust the data, and how it tied to an actual decision.

Claire Monroe

Okay, so we’ve named the failure modes and the north star. Next we’ll get into the mechanics: how you actually design that behavior-based, role-specific framework without turning it into a bureaucratic side quest.

Chapter 2

Designing a Behavior-Based, Role-Specific Evaluation Framework

Claire Monroe

Alright, Edwin, let’s get practical. I’m an HR leader. I want something my managers will actually use. Where do I start on this competency and behavior piece without drowning in complexity?

Edwin Carrington

Start small and start with your own reality. Three questions: What does this role really own? Where do leaders here typically fail? And what do your best leaders do differently? From that, build a short competency model by level. For example: team leads, mid-level leaders, senior leaders.

Claire Monroe

Can you walk through a simple version of that?

Edwin Carrington

Sure. Team leads: you might focus on coaching and feedback, basic prioritization, and managing team conflict. Mid-level leaders: cross-functional influence, managing managers, and decision quality across a broader scope. Senior leaders: strategic thinking, building other leaders, and culture shaping. That’s your skeleton. Then you translate each competency into observable behaviors.

Claire Monroe

This is where people tend to stay vague, right? “Great communicator,” “strategic,” all that.

Edwin Carrington

Yes, which is useless for scoring. Take “communication” for a team lead. Behavior examples: “Explains the top three priorities and why they matter,” “Checks understanding and adjusts when people look confused,” “Addresses conflict directly and respectfully instead of avoiding it.” Those are things a manager, a peer, or a direct report can actually observe.

Claire Monroe

And then we need some kind of rating scale that anchors those behaviors. How would you build that?

Edwin Carrington

That’s where behaviorally anchored rating scales—BARS—come in. You keep the scale simple, say one to five, and define what each point looks like. For example, for “clarity of priorities” for a team lead: One: “Rarely communicates priorities; team often unsure what matters.” Three: “Usually communicates priorities clearly; confusion gets corrected when raised.” Five: “Communicates priorities and trade-offs clearly and repeatedly; team stays aligned even when things change.” Now different raters can score more consistently, because they’re reacting to descriptions, not to their mood.

Claire Monroe

That already feels more fair. I can picture an HRBP putting those anchors right into the 360. Speaking of, how do you combine all these methods without making a noisy mess?

Edwin Carrington

You assign each method a job. Direct reports are your best source for day-to-day leadership behaviors: coaching, clarity, psychological safety. Peers see collaboration and influence. The manager sees scope and outcomes. A fast, validated survey like OAD adds a view into underlying traits and style—how this person is likely to behave under pressure. Simulations or behavioral interviews show how they think in real time. You don’t average everything blindly. You weight each source based on what it sees best.

Claire Monroe

Can you give an example of that weighting in practice without turning it into a math project?

Edwin Carrington

Sure. Imagine you’re assessing “conflict resolution” for a mid-level leader. You might treat direct report and peer 360 scores as primary data. The manager’s view is secondary. A simulation where they handle a tough stakeholder conversation is also primary. Their self-assessment and OAD style profile are supporting data—useful to explain patterns, but not the main evidence. If self and others disagree, that gap tells you about self-awareness.

Claire Monroe

So the trick is: be explicit. “For this competency, here’s what we’re listening to most, here’s what we’re using as a cross-check.” That alone would be a big upgrade from the “everything counts equally… or maybe nothing does” approach.

Edwin Carrington

Exactly. And document it in plain language so any leader can understand it. The goal is a framework people actually trust and can use in real conversations—not a theoretical model that only lives with HR.

Chapter 3

From Data to Decisions: Turning Evaluation into Growth and Governance

Claire Monroe

So now we’ve run the 360, maybe an OAD survey, looked at performance data, maybe a simulation. I’ve got this thick packet on a leader. The question I always hear is, “Now what?” How do we turn that into growth instead of just an interesting PDF?

Edwin Carrington

You start by ruthlessly prioritizing. One or two development themes, not ten. For each theme, define a specific behavior change, and then a simple action plan with coaching support. For example, theme: “Avoids difficult conversations.” Behavior change: “Addresses performance issues within 72 hours, using a structured script.” Actions: weekly review of open issues, scheduled conversations, plus coaching on language.

Claire Monroe

So instead of, “Work on communication,” it’s, “In your weekly team meeting, state the top three priorities and ask each person to repeat what they heard.” That’s tangible.

Edwin Carrington

Exactly. Then you add measurement. Maybe a short pulse survey to direct reports: “My leader clearly communicates our top priorities” on a simple scale. You re-check after, say, three months. That’s how you connect evaluation to real behavior change.

Claire Monroe

What about trust? A lot of leaders get nervous when they hear “360” or “assessment.” How do we protect psychological safety while still being honest?

Edwin Carrington

You need clear rules. Anonymity thresholds for direct report feedback—no reporting a separate score if there are only one or two people. Very careful handling of comments to remove identifying details. And you’re transparent about boundaries: this data is primarily for development and succession, not as a surprise weapon in performance management. If you will use it in high-stakes decisions, say exactly how.

Claire Monroe

And there’s a DEI angle here too. If the process is vague, bias creeps in. If it’s behavior-based and multi-method, you have more to stand on when someone asks, “Why was this person promoted and not that one?”

Edwin Carrington

That’s right. Behavior anchors, multiple perspectives, and validated tools help reduce the impact of “people like me” bias. You can say, “Here’s the competency model, here’s the data from direct reports, peers, manager, and a survey like OAD, here’s how we interpreted it, and here’s the development plan.” It doesn’t make bias disappear, but it gives you a much fairer, more defensible system.

Claire Monroe

Last piece I want to hit is scaling. HR teams are busy. How do you make this something we can actually run every year, not just for a one-time pilot?

Edwin Carrington

Keep the governance light but real. Standard templates: one-page competency model per level, a common 360 questionnaire with behavior anchors, a simple summary report for leaders and for sponsors. A basic calibration process so ratings mean the same thing across functions. And a clear yearly cycle: when you collect data, when you run debriefs, when you review progress.

Claire Monroe

So almost like an operating rhythm: evaluate, debrief, coach, pulse-check, repeat.

Edwin Carrington

Exactly. And build feedback loops. Quarterly, ask: Are we learning the right things from this process? Are development plans actually changing behavior? Are there groups consistently under- or over-rated? Then refine. The system should get simpler and sharper over time, not heavier.

Claire Monroe

If someone listening is thinking, “Our current leadership reviews are mostly opinion and politics,” what’s a concrete next step they could take after this episode?

Edwin Carrington

Pilot a better way with a small group of leaders. Define a tight competency set, run a structured 360 with behavior anchors, add a quick, validated survey like OAD to get a style and trait perspective, and then build one-page development plans. Compare that to your usual process. If the conversations feel clearer and less political, you’re on the right track.

Claire Monroe

And if you want to experiment without a huge investment, you can actually test OAD for free at OAD.ai. Use it alongside your current leadership evaluations and see if it gives you cleaner, behavior-focused data you can defend in a promotion or succession discussion.

Edwin Carrington

Used that way, it’s not a magic answer. It’s one more reliable lens in a fairer system.

Claire Monroe

Edwin, thanks as always. This gives HR and People leaders a really practical path to move from “vibes-based” leadership reviews to something they can actually stand behind.

Edwin Carrington

My pleasure, Claire.

Claire Monroe

And thanks to all of you for listening. We’ll be back with more on designing systems that make leadership better, not just louder. For now, take care, and we’ll talk to you next time.