Causal Identification / IV + RD

Instrumental Variables and Regression Discontinuity

IV isolates plausibly exogenous treatment variation, while RD uses local continuity around a cutoff. Both methods answer the same question: where does the counterfactual come from?

Mechanism Lab

Animation: IV isolates exogenous variation and RD reads the cutoff jump

The left panel shows Z shifting D before Y; the right panel shows treatment probability and outcome jumps around a running-variable cutoff.

Step 1 / 5

Instrument

The instrument must create a first stage by shifting treatment probability or intensity.

Cov(Z,D) != 0

Animation Control

Reduced-motion users receive the same step states without continuous motion.

01 / Intuition

Core Intuition

IV is useful when treatment D is endogenous: use only the part of D moved by an instrument Z.

RD is useful when a clear cutoff c changes treatment status or treatment probability for units close to the threshold.

Neither design is guaranteed by a special regression command; credibility comes from instrument assumptions, cutoff continuity, first-stage strength, and local sample diagnostics.

02 / Math

LATE for IV and local jumps for RD

01 / First stage

Let Z be the instrument, D treatment, and Y the outcome. The instrument must shift treatment; without a first stage there is no identified treatment variation.

First stage:  E[D|Z=1] - E[D|Z=0] != 0

02 / Wald / LATE

Under independence, exclusion, and monotonicity, the reduced-form jump divided by the first-stage jump identifies the local average treatment effect for compliers.

tau_LATE = {E[Y|Z=1] - E[Y|Z=0]} / {E[D|Z=1] - E[D|Z=0]}

03 / Covariance form

With one instrument in a linear setting, the Wald estimand is the covariance ratio. It uses only the variation in D explained by Z.

beta_IV = Cov(Z,Y) / Cov(Z,D)

04 / 2SLS derivation

Two-stage least squares first projects D onto instruments and controls, then explains Y using the projected treatment. In matrix form, P_Z is the projection onto the instrument space.

D_hat = P_Z D
beta_2SLS = (X_hat^T X)^(-1) X_hat^T Y,  X_hat = P_Z X

05 / Sharp RD

If treatment is fully determined by D=1[X>=c], the treatment effect at the cutoff is the right-limit outcome minus the left-limit outcome.

tau_RD = lim_{x down c} E[Y|X=x] - lim_{x up c} E[Y|X=x]

06 / Fuzzy RD

If the cutoff changes treatment probability but does not perfectly determine treatment, RD becomes a local Wald ratio.

tau_FRD = jump_Y(c) / jump_D(c)

07 / Local linear estimation

Applied RD usually fits local linear regressions within bandwidth h and uses a triangular kernel to weight observations near the cutoff more heavily.

min sum_i K((X_i-c)/h) [Y_i - alpha - tau 1{X_i>=c} - beta_l(X_i-c) - beta_r 1{X_i>=c}(X_i-c)]^2

03 / Code

Python code: IV 2SLS and RD local linear estimation

The IV example uses linearmodels for 2SLS; the RD example writes a triangular-kernel local linear regression with statsmodels. Applied papers should add weak-instrument tests, bandwidth sensitivity, and manipulation diagnostics.

import numpy as np
import pandas as pd
import statsmodels.api as sm
from linearmodels.iv import IV2SLS

# IV example:
# outcome: test_score
# treatment: program_enroll
# instrument: lottery_offer
# controls: baseline_score, age, income
iv_formula = (
    "test_score ~ 1 + baseline_score + age + income "
    "+ [program_enroll ~ lottery_offer]"
)
iv_model = IV2SLS.from_formula(iv_formula, data=df).fit(
    cov_type="clustered",
    clusters=df["school_id"],
)
print(iv_model.summary)

# RD example:
# running variable: assignment_score
# cutoff: 70
# outcome: test_score
def triangular_kernel(u):
    return np.maximum(1 - np.abs(u), 0)

def local_linear_rd(data, outcome, running, cutoff, bandwidth):
    sample = data[np.abs(data[running] - cutoff) <= bandwidth].copy()
    sample["right"] = (sample[running] >= cutoff).astype(int)
    sample["centered"] = sample[running] - cutoff
    sample["right_x_centered"] = sample["right"] * sample["centered"]
    weights = triangular_kernel(sample["centered"] / bandwidth)
    X = sm.add_constant(sample[["right", "centered", "right_x_centered"]])
    fit = sm.WLS(sample[outcome], X, weights=weights).fit(cov_type="HC1")
    return fit.params["right"], fit.conf_int().loc["right"], fit

tau, ci, rd_fit = local_linear_rd(
    df,
    outcome="test_score",
    running="assignment_score",
    cutoff=70,
    bandwidth=8,
)
print({"rd_effect": tau, "ci_low": ci[0], "ci_high": ci[1]})

04 / Case

Case: lottery offers and score cutoffs in an education program

  • IV setting: a school program is oversubscribed, so lottery offers instrument actual enrollment. The offer affects participation but should not affect scores except through participation.
  • The IV report should show the reduced form, first stage, 2SLS/LATE, weak-instrument risk, exclusion restriction, and complier interpretation.
  • RD setting: scholarship eligibility is determined by a 70-point cutoff. Students at 69.8 and 70.2 should be locally comparable, while eligibility jumps at the cutoff.
  • The RD report should show threshold plots, local linear estimates, bandwidth sensitivity, covariate continuity, and whether running-variable density jumps at the cutoff.

05 / Risks

Common Pitfalls

Using a strong predictor as an instrument without a credible exclusion story.
Ignoring a weak first stage and interpreting noisy 2SLS coefficients as precise causal effects.
Showing a clean RD plot while hiding manipulation, sorting, or bandwidth sensitivity near the cutoff.

References