Causal Identification / IV + RD
Instrumental Variables and Regression Discontinuity
IV isolates plausibly exogenous treatment variation, while RD uses local continuity around a cutoff. Both methods answer the same question: where does the counterfactual come from?
Mechanism Lab
Animation: IV isolates exogenous variation and RD reads the cutoff jump
The left panel shows Z shifting D before Y; the right panel shows treatment probability and outcome jumps around a running-variable cutoff.
Step 1 / 5
Instrument
The instrument must create a first stage by shifting treatment probability or intensity.
Cov(Z,D) != 0Animation Control
Reduced-motion users receive the same step states without continuous motion.
01 / Intuition
Core Intuition
IV is useful when treatment D is endogenous: use only the part of D moved by an instrument Z.
RD is useful when a clear cutoff c changes treatment status or treatment probability for units close to the threshold.
Neither design is guaranteed by a special regression command; credibility comes from instrument assumptions, cutoff continuity, first-stage strength, and local sample diagnostics.
02 / Math
LATE for IV and local jumps for RD
01 / First stage
Let Z be the instrument, D treatment, and Y the outcome. The instrument must shift treatment; without a first stage there is no identified treatment variation.
First stage: E[D|Z=1] - E[D|Z=0] != 002 / Wald / LATE
Under independence, exclusion, and monotonicity, the reduced-form jump divided by the first-stage jump identifies the local average treatment effect for compliers.
tau_LATE = {E[Y|Z=1] - E[Y|Z=0]} / {E[D|Z=1] - E[D|Z=0]}03 / Covariance form
With one instrument in a linear setting, the Wald estimand is the covariance ratio. It uses only the variation in D explained by Z.
beta_IV = Cov(Z,Y) / Cov(Z,D)04 / 2SLS derivation
Two-stage least squares first projects D onto instruments and controls, then explains Y using the projected treatment. In matrix form, P_Z is the projection onto the instrument space.
D_hat = P_Z D
beta_2SLS = (X_hat^T X)^(-1) X_hat^T Y, X_hat = P_Z X05 / Sharp RD
If treatment is fully determined by D=1[X>=c], the treatment effect at the cutoff is the right-limit outcome minus the left-limit outcome.
tau_RD = lim_{x down c} E[Y|X=x] - lim_{x up c} E[Y|X=x]06 / Fuzzy RD
If the cutoff changes treatment probability but does not perfectly determine treatment, RD becomes a local Wald ratio.
tau_FRD = jump_Y(c) / jump_D(c)07 / Local linear estimation
Applied RD usually fits local linear regressions within bandwidth h and uses a triangular kernel to weight observations near the cutoff more heavily.
min sum_i K((X_i-c)/h) [Y_i - alpha - tau 1{X_i>=c} - beta_l(X_i-c) - beta_r 1{X_i>=c}(X_i-c)]^203 / Code
Python code: IV 2SLS and RD local linear estimation
The IV example uses linearmodels for 2SLS; the RD example writes a triangular-kernel local linear regression with statsmodels. Applied papers should add weak-instrument tests, bandwidth sensitivity, and manipulation diagnostics.
import numpy as np
import pandas as pd
import statsmodels.api as sm
from linearmodels.iv import IV2SLS
# IV example:
# outcome: test_score
# treatment: program_enroll
# instrument: lottery_offer
# controls: baseline_score, age, income
iv_formula = (
"test_score ~ 1 + baseline_score + age + income "
"+ [program_enroll ~ lottery_offer]"
)
iv_model = IV2SLS.from_formula(iv_formula, data=df).fit(
cov_type="clustered",
clusters=df["school_id"],
)
print(iv_model.summary)
# RD example:
# running variable: assignment_score
# cutoff: 70
# outcome: test_score
def triangular_kernel(u):
return np.maximum(1 - np.abs(u), 0)
def local_linear_rd(data, outcome, running, cutoff, bandwidth):
sample = data[np.abs(data[running] - cutoff) <= bandwidth].copy()
sample["right"] = (sample[running] >= cutoff).astype(int)
sample["centered"] = sample[running] - cutoff
sample["right_x_centered"] = sample["right"] * sample["centered"]
weights = triangular_kernel(sample["centered"] / bandwidth)
X = sm.add_constant(sample[["right", "centered", "right_x_centered"]])
fit = sm.WLS(sample[outcome], X, weights=weights).fit(cov_type="HC1")
return fit.params["right"], fit.conf_int().loc["right"], fit
tau, ci, rd_fit = local_linear_rd(
df,
outcome="test_score",
running="assignment_score",
cutoff=70,
bandwidth=8,
)
print({"rd_effect": tau, "ci_low": ci[0], "ci_high": ci[1]})04 / Case
Case: lottery offers and score cutoffs in an education program
- IV setting: a school program is oversubscribed, so lottery offers instrument actual enrollment. The offer affects participation but should not affect scores except through participation.
- The IV report should show the reduced form, first stage, 2SLS/LATE, weak-instrument risk, exclusion restriction, and complier interpretation.
- RD setting: scholarship eligibility is determined by a 70-point cutoff. Students at 69.8 and 70.2 should be locally comparable, while eligibility jumps at the cutoff.
- The RD report should show threshold plots, local linear estimates, bandwidth sensitivity, covariate continuity, and whether running-variable density jumps at the cutoff.
05 / Risks
Common Pitfalls
References
- Angrist, Imbens, and Rubin (1996), Identification of Causal Effects Using Instrumental Variableshttps://doi.org/10.1080/01621459.1996.10476902
- Imbens and Lemieux (2008), Regression Discontinuity Designshttps://doi.org/10.1016/j.jeconom.2007.05.001
- Lee and Lemieux (2010), Regression Discontinuity Designs in Economicshttps://doi.org/10.1257/jel.48.2.281