Why Leadership Training Shows Inconsistent Results

Leadership

Dec 27

Organizations spend billions on leadership development annually. The implicit promise: train your leaders, and performance will follow.

The research tells a more complicated story. Leadership training can produce meaningful results. Meta-analyses show effect sizes of d = 0.60-0.63 for well-designed programs—a substantial impact by social science standards. Manager training reliably improves leader knowledge and confidence.

But here's the problem: most organizations never see those results. They invest in programs, collect positive feedback from participants, and watch nothing change.

The gap between what leadership training can do and what it typically does reveals a fundamental misunderstanding about how development actually works. Training is necessary but insufficient. Without the conditions that enable transfer, even excellent programs produce nothing.

What the Research Actually Shows

The evidence on leadership training effectiveness is more nuanced than either enthusiasts or skeptics suggest.

Meta-analyses consistently find that leadership training produces positive effects—when measured properly and when transfer conditions exist. The Lacerenza et al. (2017) meta-analysis of leadership training found overall effect sizes of d = 0.60-0.63 across 335 independent samples. That's meaningful impact.

But the findings come with important caveats.

    d = 0.60
    Effect size for well-designed leadership training programs—meaningful impact, but only when transfer conditions exist

Effects vary dramatically by what you measure. Using Kirkpatrick's four-level framework:

Level 1 (Reaction): Did participants like the training? Effect correlation ρ = .63. Participants generally report positive experiences.

Level 2 (Learning): Did participants gain knowledge and skills? Effect correlation ρ = .73. Training reliably produces learning.

Level 3 (Behavior): Did participants change their on-the-job behavior? Effect correlation ρ = .82—the strongest effect, but only when transfer conditions support application.

Level 4 (Results): Did training produce organizational outcomes? Effect correlation ρ = .72. Results depend entirely on whether behavior change occurred and was sustained.

Here's the critical finding: most organizations evaluate only Levels 1-2. They stop at satisfaction and knowledge, never measuring whether behavior actually changed or whether results followed.

This creates a dangerous illusion. Positive feedback and post-test knowledge gains convince stakeholders that training "worked." Meanwhile, leaders return to environments that don't support new behaviors, nothing changes, and the cycle repeats.

The research is unambiguous: training without transfer climate support produces minimal lasting change. Yet organizations continue deploying programs as if knowledge acquisition automatically becomes behavior change.

The Transfer Climate Problem

Baldwin and Ford's foundational research on training transfer identified the key variables that determine whether learning becomes behavior. Three factors dominate:

Supervisor support. Do leaders' managers expect, reinforce, and model the trained behaviors? If a participant's boss doesn't support the new approach, it won't stick.

Peer encouragement. Do colleagues reinforce trained behaviors or undermine them? Social pressure in the work environment often overwhelms training content.

Practice opportunities. Do leaders have chances to apply new skills with feedback? Without repeated application in real situations, skills decay rapidly.

Most organizations address none of these factors. They send leaders to training, measure satisfaction, and expect transformation. The training itself might be excellent. The environment makes transfer impossible.

Consider a common scenario: A manager completes a program on coaching skills. They return to work Monday morning. Their inbox has 200 emails. Three direct reports need immediate decisions. A deadline looms. Their own manager asks why a project is behind schedule.

When, exactly, are they supposed to practice coaching? The organizational context—workload, expectations, time pressure—actively works against applying new skills.

“Training is an individual remedy deployed in systems where transfer is impossible. We keep sending people to programs without changing anything about the environment they return to.”

The Evaluation Gap

The way organizations evaluate training creates systematic blindness to whether it works.

Most training evaluation stops at Kirkpatrick Levels 1-2:

Level 1 surveys ask: "Did you find this valuable?" "Would you recommend it?" "Was the facilitator engaging?" Participants generally say yes—they got time away from work, the content seemed reasonable, and there's social pressure to be positive.

Level 2 assessments measure knowledge gained: pre-test vs. post-test comparisons, skill demonstrations during the program. Participants generally show improvement—they just learned the material.

Neither level tells you whether anything will change back at work.

Levels 3-4 require more effort:

Level 3 (Behavior) requires observation over time. Are leaders actually doing things differently? This means 30-day, 60-day, 90-day follow-ups. Manager observations. 360-degree feedback comparisons. Most organizations don't invest in this measurement.

Level 4 (Results) requires connecting leader behavior to outcomes. Did team engagement improve? Did turnover decrease? Did performance metrics shift? This requires tracking over 6-18 months and controlling for other variables.

    Levels 1-2
    Where most organizations stop evaluating training—measuring satisfaction and knowledge but never behavior change or organizational results

The result: organizations accumulate evidence that training is "well-received" while having no idea whether it produces results. Budget decisions get made based on participant satisfaction rather than organizational impact.

This isn't laziness—it's structural. L&D teams face pressure to demonstrate activity. Satisfaction scores are easy to collect and report. Behavior change measurement requires collaboration with operations, extended timelines, and methodological sophistication that most L&D functions don't have.

The incentives favor programs that feel good over programs that work.

The Trickle-Down Problem

Leadership training operates on an implicit assumption: improve leaders, and their teams will benefit. The research on this "trickle-down" effect is sobering.

Studies using multi-level sampling—measuring both leaders and their direct reports—show that leadership training reliably produces changes in leaders (proximal effects). Leaders report more confidence. They demonstrate new knowledge. They express intentions to behave differently.

But effects on subordinates (distal effects) are inconsistent. Some studies show improved team outcomes. Many show no change at the team level despite leader-level improvements.

Why the disconnect?

Timing matters. Trickle-down effects take time to emerge. Most studies don't follow up long enough—the sample-size-weighted mean follow-up is only 43 weeks, with high variability. Effects that might emerge at 12-18 months go unmeasured.

Dosage matters. Brief interventions (a single workshop, a few hours of training) rarely produce lasting change. More intensive programs show better results, but organizations prefer shorter, cheaper options.

Context matters. Leader behavior change only produces team benefits if the organizational context supports it. A leader trying new approaches in a culture that punishes experimentation won't sustain the behavior—and their team won't benefit.

What Actually Works

The research points to specific conditions that make leadership training effective:

Extended Duration with Spaced Practice

One-off workshops don't work. Effective programs extend over weeks or months with spaced sessions that allow practice between learning. The Lacerenza meta-analysis found that longer programs with multiple sessions produced significantly larger effects.

Manager Involvement and Accountability

When participants' managers are involved—setting expectations before training, debriefing after, observing application, providing feedback—transfer rates improve dramatically. Without manager involvement, training competes against daily pressures and loses.

Behavioral Practice with Feedback

Knowledge acquisition is easy. Behavior change is hard. Programs that include role-play, simulation, real-situation application with coaching, and behavioral feedback outperform content-delivery-only approaches.

Organizational Context Alignment

Training that contradicts organizational norms produces cynicism, not change. If you train leaders on empowerment while the organization rewards control, you've wasted everyone's time. Effective development aligns with—or works to shift—broader organizational culture.

Multi-Level Evaluation

Programs that measure behavior change and results—not just satisfaction and knowledge—create accountability for actual impact. What gets measured gets attention.

“Training without environmental support is an individual fix for a systemic problem. The program might be excellent. The context makes transfer impossible.”

What This Means Practically

If you're investing in leadership development, the research suggests several shifts:

Stop Buying Programs, Start Building Systems

Training is one component of development, not the whole thing. Effective leadership development requires: selection of the right people, quality training content, transfer climate support, practice opportunities, feedback mechanisms, and accountability for application.

Most organizations invest heavily in training content and neglect everything else. Flip the ratio.

Extend Evaluation Timelines

Demand Level 3-4 evaluation. If your L&D team says they can't measure behavior change, that's a capability gap to address—not a reason to accept satisfaction scores as success metrics.

Build 6-12 month evaluation windows into program design from the start. Budget for follow-up measurement.

Involve Managers in the Process

Before training: Have participants' managers set expectations and identify specific behaviors to develop.

During training: Keep managers informed of content so they can reinforce it.

After training: Require manager check-ins on application. Include manager observations in evaluation.

Address Transfer Climate Explicitly

Ask: What in our environment will support or undermine application of this training? Workload? Competing priorities? Manager behavior? Cultural norms?

If the honest answer is "the environment will undermine it," either change the environment or don't waste money on training.

Match Intensity to Objectives

Brief awareness sessions can shift attitudes. Behavior change requires extended engagement with practice and feedback. Organizational culture change requires sustained intervention over years.

Match program design to what you're actually trying to accomplish.

The Bottom Line

Leadership training works—under the right conditions. Meta-analyses show meaningful effect sizes. Well-designed programs with transfer support produce real behavior change that benefits teams and organizations.

But most leadership training doesn't work, because most organizations ignore the conditions that make it effective. They buy programs, collect satisfaction scores, and wonder why nothing changes.

The problem isn't training. It's the assumption that training alone produces transformation.

Organizations serious about leadership development need to think systemically: What conditions support transfer? How will we measure behavior change? What environmental factors need to shift? How will managers reinforce new behaviors?

Without those questions—and real answers—leadership training is an expensive way to feel like you're doing something about a problem you're not actually addressing.

Assess Your Organization's Development Conditions

A 5-minute assessment of the factors that determine whether training produces results.

Take the A.R.T. Assessment →

Meagan Victoria Angelucci