Making Leaders Willing to Dig In — The Evidence
The hardest problem in Jer's market isn't teaching character — it's getting successful, defended 45–60-year-olds to want the mirror. This is the fact-checked playbook: each approach verified against meta-analyses (empirical) and named theory (theoretical). Effect sizes quoted where they exist; weak or failed-replication claims flagged honestly — including the popular ones we should NOT use.
How to read the evidence
Every claim below carries the statistics it rests on. The letters, once:
- d (Cohen's d) — the size of a difference between two groups, measured in standard deviations: (treated group's average − control group's average) ÷ the spread of scores. Rule of thumb: 0.2 = small, 0.5 = medium, 0.8 = large. Concretely: d = .65 means the average person who got the intervention ended up better off than roughly 3 out of 4 people who didn't.
- g (Hedges' g) — the same number with a correction for small studies. Read it exactly like d.
- r (correlation) — how tightly two things move together, from −1 to +1: .1 weak, .3 moderate, .5 strong. r says "these travel together," not "one causes the other."
- N — the total number of people studied. "(94 studies, N=8,461)" means a meta-analysis pooling 94 separate experiments covering 8,461 people.
- meta / meta-analysis — a study of studies: instead of trusting any single experiment, it pools every available one on a question and reports the combined effect. Every headline number in this document is meta-analytic unless noted.
- ns — not statistically significant: the data can't rule out a zero effect.
Calibration: behavioral interventions rarely exceed d = .6 in honest meta-analyses, so on this list .3 is respectable and .65 is top of the class. Be suspicious of any program quoting d > 1 from a single study — that's usually how the claims this document rejects were sold.
Ranked: what works for this audience
1. Motivational interviewing principles (Miller & Rollnick) — STRONG
- Evidence: Lundahl et al. 2010 meta (119 studies): g = 0.28 vs no-treatment; parity with longer treatments in less time. Mechanism metas (Magill 2014, 2018): MI-consistent behavior → client "change talk" (r = .55), which predicts change; confrontation evokes "sustain talk," which predicts failure.
- The key concept — the righting reflex: telling a defended person what's wrong and what to do generates counter-argument. Evoking their own reasons generates commitment.
- Apply: every results page and enrollment call asks, never asserts: "Which of these rings most true?" / "You said you've won at everything that doesn't matter — say more." Jer's natural style is already close to MI; this just bans the one failure mode (prescribing before evoking).
2. Self-affirmation before threat (Steele; Cohen & Sherman 2014) — STRONG
- Evidence: two independent metas (Epton et al. 2015; Sweeney & Moyer 2015): affirmation before threatening feedback → behavior d ≈ .27–.32. Boundary condition that decides the design: works only BEFORE the threat (Critcher et al. 2010 — once someone has rationalized, affirmation does nothing), and the affirmed value must be UNRELATED to the threatened domain.
- Apply: the Scorecard's affirmation screen (values question before results); never "you're clearly a strong leader, but…" — affirm family/faith/craft, then deliver the hard result.
3. Autonomy-supportive framing + real deselection (Self-Determination Theory) — STRONG
- Evidence: Ng et al. 2012 (184 datasets): autonomy support → need satisfaction → autonomous motivation → maintained behavior; Ntoumanis et al. 2021 intervention meta: effects persist at follow-up via autonomous motivation. Controlling language produces introjected motivation, which decays — fatal for a 9–15-month formation arc.
- The deselection paradox, corroborated: "but you are free to refuse" roughly doubled compliance in Carpenter's 2013 meta (honesty note: g = 0.11 and not statistically significant among the 7 most rigorous studies — treat as plausible, not proven); scarcity/commodity theory (Lynn 1991 meta) confirms restricted availability raises perceived value. Boundary: the screening and caps must be REAL — sophisticated buyers detect manufactured exclusivity and trust collapses.
- Apply: keep and sharpen Jer's "this isn't for you if…" — it's structurally autonomy-supportive. Add the explicit free-choice close everywhere: "Whatever you decide, the diagnostic is yours to keep."
4. Psychological safety engineering + the AA-derived group structure — STRONG
- Evidence: Frazier et al. 2017 meta (136 samples): psych safety robustly predicts voice, learning, performance; chief antecedent = leader behavior. The striking comp: Cochrane 2020 (Kelly, Humphreys & Ferri) — manualized 12-step facilitation beat CBT for continuous abstinence (RR 1.21, high-certainty evidence, persisting at 24–36 months). The rare case where a structured peer confession/accountability group outperforms professional treatment.
- Transferable mechanisms (structure, not outcome claims): leader discloses first; senior members model self-disclosure; accountability dyads (the sponsor analogue); long duration; structured moral inventory + disclosure (Steps 4–5 are literally an immunity map plus a Dinner of Truth).
- Apply: two cohort norms — (1) Jer (and later alumni) open every session with their own current character failure before anyone else speaks; (2) assigned accountability dyads between sessions. This converts the course's length from a marketing liability into its active ingredient.
5. Implementation intentions / facilitated WOOP (Gollwitzer; Oettingen) — STRONG
- Evidence: Gollwitzer & Sheeran 2006 (94 studies, N=8,461): if-then planning d = .65 on goal attainment — the largest reliable effect on this list. WOOP meta (Wang et al. 2021): g = .34 overall, g = .47 live-facilitated vs .28 self-guided — facilitation nearly doubles it. Boundary: amplifies existing commitment; doesn't create it; pure positive visualization reduces effort (Oettingen).
- Apply: every session's application step ends with a WOOP done live in cohort: wish → outcome → the inner obstacle (their named pattern from the Scorecard/immunity map) → if-then plan ("If I feel the urge to dominate the meeting, then I ask a question and wait").
Worth using, with eyes open
- Anticipated inaction regret — STRONG correlations (Brewer et al. 2016 meta, N=45,618: intentions r = .50): frame regret of not acting ("who will you have become at 70 if nothing changes?"), always paired with a feasible next step.
- Effort-justified admission (Aronson & Mills 1959; replicated; IKEA-effect descendants) — MODERATE: voluntary, meaning-linked effort at entry raises valuation and commitment. Apply: application essay + interview + pre-work before acceptance. Jer's mutuality screening, made effortful, becomes a commitment device.
- Public, difficult, group goal-setting (Epton et al. 2017 meta, d ≈ .34; strongest exactly when goals are public + difficult + group) — Apply: cohort launch ritual — each member states one formation goal aloud.
- Adult character change is real — cite the right literature: Hudson & Fraley's volitional-change studies (2020 mega-analysis): adults of all ages who set trait- change goals and practice trait-consistent behavior show measured trait growth in ~16 weeks. This is the honest answer to "I am who I am at 55" — use it instead of pop growth-mindset claims.
- Narrative identity / life review (McAdams; Adler 2012: agency themes rise before mental-health improvement) — MODERATE, perfect demographic fit: the "re-author your first mountain" session (what it cost, what it was for) is empirically aligned, and it's the Clinton timeline exercise's research twin.
- Best Possible Self (Carrillo et al. 2019 meta: optimism d = .33, positive affect d = .51) — the hope counterweight after hard feedback; feeds WOOP's wish/outcome steps. Apply: "best possible self at 70" written exercise, session 1.
Do NOT build on these (fact-check results)
- Academic growth-mindset effect sizes — Sisk et al. 2018: d = 0.08; best-practice replications null (Macnamara & Burgoyne 2023). Real but tiny and concentrated in at-risk adolescents. Don't quote Dweck numbers at executives; use Hudson & Fraley instead.
- Noun/identity micro-copy ("be a leader who…" vs "lead") — the famous "be a voter" effect failed large-scale replication (Gerber et al. 2018). Fine as brand poetry; never claim it as a mechanism.
- Scarcity theatrics — only real caps, real screening.
- Certificates as incentive — SDT warning: external rewards can undermine autonomous motivation for intrinsically-framed work. Keep Jer's certificate but reframe it as a commemoration of a publicly stated commitment (ties to the Epton public-goal moderator), not a credential.
How the persuasion architecture matches Jer's instincts (the happy audit)
| Jer already does | The evidence says |
|---|---|
| Deselection ("not for you if…") | SDT autonomy support + choice framing — keep, sharpen, add free-choice close |
| Mutuality/fit screening | Effort-justification — make entry more effortful, not less |
| Crock-pot length | Psych-safety antecedents + AA duration — the length IS the mechanism; add leader-discloses-first + dyads to cash it in |
| "Heavy lifting" language | Anticipated-regret + difficult-goal moderators — heavy is attractive to the right buyer |
| Rohr-adjacent soul framing | Honor → reframe → invite (validate the first mountain as required curriculum, then make falling the elite move: "most successful people never make it to the second half") |
The three highest-leverage, lowest-cost additions (nothing existing changes): pre-results affirmation screen, MI-style reveal/enrollment copy, and facilitated WOOP as the homework spine.