Causal Identification / Synthetic Control

Synthetic Control and SDID

Synthetic control builds a shadow treated unit from weighted untreated donors, using pre-policy fit to construct the post-policy counterfactual.

Mechanism Lab

Animation: how donor weights build the synthetic counterfactual

The animation reveals donor units, pushes their weights into the synthetic path, then compares the treated post-policy path with the synthetic counterfactual.

Step 1 / 5

Donor pool

Start with untreated units that are institutionally comparable and unaffected by the policy.

j=2,...,J+1

Animation Control

Reduced-motion users receive the same step states without continuous motion.

01 / Intuition

Core Intuition

When one city, school, region, or country is treated, standard DID may lack a natural control. Synthetic control builds a better comparison by weighting a donor pool.

Weights should fit pre-policy outcomes and covariates, not post-policy outcomes.

Credibility comes from pre-fit quality, donor-pool justification, absence of concurrent shocks, placebo tests, and transparent reporting of weights and sample choices.

02 / Math

From donor weights to the post-treatment counterfactual

01 / Panel structure

Unit 1 is treated and units 2...J+1 are untreated donors. T0 is the last pre-treatment period.

Y_1t: treated unit outcome
Y_jt: donor unit outcome, j=2,...,J+1
t <= T0: pre-period,  t > T0: post-period

02 / Weight constraints

Synthetic-control weights are usually nonnegative and sum to one, making the synthetic unit a convex combination of donor units.

w_j >= 0,  sum_{j=2}^{J+1} w_j = 1

03 / Pre-treatment fit

Let X1 be treated-unit pre-policy features and X0 the donor feature matrix. Choose weights that make weighted donors match the treated pre-period features.

w_hat = argmin_w (X_1 - X_0 w)^T V (X_1 - X_0 w)
s.t. w >= 0, 1^T w = 1

04 / Counterfactual path

After treatment, the weighted donor outcome path estimates what the treated unit would have experienced without treatment.

Y_1t(0)_hat = sum_{j=2}^{J+1} w_hat_j Y_jt,  for t > T0

05 / Effect path

The treatment effect at each post-period is observed treated outcome minus synthetic counterfactual.

tau_t_hat = Y_1t - Y_1t(0)_hat
ATT_post = (1/(T-T0)) sum_{t=T0+1}^{T} tau_t_hat

06 / Placebo inference

Iteratively pretend each donor is treated and rebuild synthetic controls. A large treated-unit gap relative to placebo gaps strengthens the evidence.

ratio_i = RMSPE_post,i / RMSPE_pre,i

07 / SDID intuition

Synthetic DID combines donor weights with time weights, blending synthetic-control weighting with DID-style before-after differencing.

tau_SDID = (Y_1,post - omega^T Y_0,post) - (Y_1,pre - omega^T Y_0,pre) lambda

03 / Code

Python code: constrained optimization for synthetic-control weights

This skeleton uses `scipy.optimize.minimize` to estimate nonnegative donor weights that sum to one, then builds the synthetic path, effect path, and pre/post RMSPE.

import numpy as np
import pandas as pd
from scipy.optimize import minimize

# df columns:
# unit, year, outcome, treated_unit
treated_unit = "City A"
pre_years = range(2010, 2020)
post_years = range(2020, 2025)

panel = df.pivot(index="year", columns="unit", values="outcome").sort_index()
donors = [unit for unit in panel.columns if unit != treated_unit]

Y1_pre = panel.loc[pre_years, treated_unit].to_numpy()
Y0_pre = panel.loc[pre_years, donors].to_numpy()

def objective(weights):
    synthetic_pre = Y0_pre @ weights
    return np.mean((Y1_pre - synthetic_pre) ** 2)

n_donors = len(donors)
constraints = [{"type": "eq", "fun": lambda w: w.sum() - 1}]
bounds = [(0, 1)] * n_donors
start = np.repeat(1 / n_donors, n_donors)

result = minimize(objective, start, bounds=bounds, constraints=constraints)
weights = pd.Series(result.x, index=donors).sort_values(ascending=False)

synthetic_path = panel[donors] @ weights
effect_path = panel[treated_unit] - synthetic_path

pre_rmspe = np.sqrt(np.mean(effect_path.loc[pre_years] ** 2))
post_rmspe = np.sqrt(np.mean(effect_path.loc[post_years] ** 2))

print(weights[weights > 0.01])
print({"pre_rmspe": pre_rmspe, "post_rmspe": post_rmspe})
print(effect_path.loc[post_years])

04 / Case

Case: evaluating a city emissions policy with one treated unit

  • Question: did a city-level emissions policy introduced in 2020 reduce pollution?
  • The donor pool should include untreated cities with comparable institutions, industrial structure, and no major concurrent shocks.
  • The key graph is not only the post-policy gap; it is whether the treated city and synthetic city track closely before policy.
  • A credible report includes donor weights, pre-policy RMSPE, effect path, placebo distribution, leave-one-donor-out sensitivity, and donor-pool justification.

05 / Risks

Common Pitfalls

Including donors exposed to the same policy or another concurrent shock.
Interpreting a post-period gap when pre-treatment fit is poor.
Showing only the main path plot without weights, placebo checks, RMSPE, and high-weight donor sensitivity.

References