Causal Identification / IV + RD

Instrumental Variables and Regression Discontinuity

IV isolates plausibly exogenous treatment variation, while RD uses local continuity around a cutoff. Both methods answer the same question: where does the counterfactual come from?

Mechanism Lab

Animation: IV isolates exogenous variation and RD reads the cutoff jump

The left panel shows Z shifting D before Y; the right panel shows treatment probability and outcome jumps around a running-variable cutoff.

Step 1 / 5

Instrument

The instrument must create a first stage by shifting treatment probability or intensity.

Cov(Z,D) != 0

Animation Control

Reduced-motion users receive the same step states without continuous motion.

01 / Intuition

Core Intuition

IV is useful when treatment D is endogenous: use only the part of D moved by an instrument Z.

RD is useful when a clear cutoff c changes treatment status or treatment probability for units close to the threshold.

Neither design is guaranteed by a special regression command; credibility comes from instrument assumptions, cutoff continuity, first-stage strength, and local sample diagnostics.

02 / Math

LATE for IV and local jumps for RD

01 / First stage

Let Z be the instrument, D treatment, and Y the outcome. The instrument must shift treatment; without a first stage there is no identified treatment variation.

First stage:  E[D|Z=1] - E[D|Z=0] != 0

02 / Wald / LATE

Under independence, exclusion, and monotonicity, the reduced-form jump divided by the first-stage jump identifies the local average treatment effect for compliers.

tau_LATE = {E[Y|Z=1] - E[Y|Z=0]} / {E[D|Z=1] - E[D|Z=0]}

03 / Covariance form

With one instrument in a linear setting, the Wald estimand is the covariance ratio. It uses only the variation in D explained by Z.

beta_IV = Cov(Z,Y) / Cov(Z,D)

04 / 2SLS derivation

Two-stage least squares first projects D onto instruments and controls, then explains Y using the projected treatment. In matrix form, P_Z is the projection onto the instrument space.

D_hat = P_Z D
beta_2SLS = (X_hat^T X)^(-1) X_hat^T Y,  X_hat = P_Z X

05 / Sharp RD

If treatment is fully determined by D=1[X>=c], the treatment effect at the cutoff is the right-limit outcome minus the left-limit outcome.

tau_RD = lim_{x down c} E[Y|X=x] - lim_{x up c} E[Y|X=x]

06 / Fuzzy RD

If the cutoff changes treatment probability but does not perfectly determine treatment, RD becomes a local Wald ratio.

tau_FRD = jump_Y(c) / jump_D(c)

07 / Local linear estimation

Applied RD usually fits local linear regressions within bandwidth h and uses a triangular kernel to weight observations near the cutoff more heavily.

min sum_i K((X_i-c)/h) [Y_i - alpha - tau 1{X_i>=c} - beta_l(X_i-c) - beta_r 1{X_i>=c}(X_i-c)]^2

03 / Code

Python code: IV 2SLS and RD local linear estimation

The IV example uses linearmodels for 2SLS; the RD example writes a triangular-kernel local linear regression with statsmodels. Applied papers should add weak-instrument tests, bandwidth sensitivity, and manipulation diagnostics.

import numpy as np
import pandas as pd
import statsmodels.api as sm
from linearmodels.iv import IV2SLS

# IV example:
# outcome: test_score
# treatment: program_enroll
# instrument: lottery_offer
# controls: baseline_score, age, income
iv_formula = (
    "test_score ~ 1 + baseline_score + age + income "
    "+ [program_enroll ~ lottery_offer]"
)
iv_model = IV2SLS.from_formula(iv_formula, data=df).fit(
    cov_type="clustered",
    clusters=df["school_id"],
)
print(iv_model.summary)

# RD example:
# running variable: assignment_score
# cutoff: 70
# outcome: test_score
def triangular_kernel(u):
    return np.maximum(1 - np.abs(u), 0)

def local_linear_rd(data, outcome, running, cutoff, bandwidth):
    sample = data[np.abs(data[running] - cutoff) <= bandwidth].copy()
    sample["right"] = (sample[running] >= cutoff).astype(int)
    sample["centered"] = sample[running] - cutoff
    sample["right_x_centered"] = sample["right"] * sample["centered"]
    weights = triangular_kernel(sample["centered"] / bandwidth)
    X = sm.add_constant(sample[["right", "centered", "right_x_centered"]])
    fit = sm.WLS(sample[outcome], X, weights=weights).fit(cov_type="HC1")
    return fit.params["right"], fit.conf_int().loc["right"], fit

tau, ci, rd_fit = local_linear_rd(
    df,
    outcome="test_score",
    running="assignment_score",
    cutoff=70,
    bandwidth=8,
)
print({"rd_effect": tau, "ci_low": ci[0], "ci_high": ci[1]})

04 / Case

Case: lottery offers and score cutoffs in an education program

IV setting: a school program is oversubscribed, so lottery offers instrument actual enrollment. The offer affects participation but should not affect scores except through participation.
The IV report should show the reduced form, first stage, 2SLS/LATE, weak-instrument risk, exclusion restriction, and complier interpretation.
RD setting: scholarship eligibility is determined by a 70-point cutoff. Students at 69.8 and 70.2 should be locally comparable, while eligibility jumps at the cutoff.
The RD report should show threshold plots, local linear estimates, bandwidth sensitivity, covariate continuity, and whether running-variable density jumps at the cutoff.

05 / Risks

Common Pitfalls

Using a strong predictor as an instrument without a credible exclusion story.

Ignoring a weak first stage and interpreting noisy 2SLS coefficients as precise causal effects.

Showing a clean RD plot while hiding manipulation, sorting, or bandwidth sensitivity near the cutoff.