Causal Identification / Synthetic Control

Synthetic Control and SDID

Synthetic control builds a shadow treated unit from weighted untreated donors, using pre-policy fit to construct the post-policy counterfactual.

Mechanism Lab

Animation: how donor weights build the synthetic counterfactual

The animation reveals donor units, pushes their weights into the synthetic path, then compares the treated post-policy path with the synthetic counterfactual.

Step 1 / 5

Donor pool

Start with untreated units that are institutionally comparable and unaffected by the policy.

j=2,...,J+1

Animation Control

Reduced-motion users receive the same step states without continuous motion.

01 / Intuition

Core Intuition

When one city, school, region, or country is treated, standard DID may lack a natural control. Synthetic control builds a better comparison by weighting a donor pool.

Weights should fit pre-policy outcomes and covariates, not post-policy outcomes.

Credibility comes from pre-fit quality, donor-pool justification, absence of concurrent shocks, placebo tests, and transparent reporting of weights and sample choices.

02 / Math

From donor weights to the post-treatment counterfactual

01 / Panel structure

Unit 1 is treated and units 2...J+1 are untreated donors. T0 is the last pre-treatment period.

Y_1t: treated unit outcome
Y_jt: donor unit outcome, j=2,...,J+1
t <= T0: pre-period,  t > T0: post-period

02 / Weight constraints

Synthetic-control weights are usually nonnegative and sum to one, making the synthetic unit a convex combination of donor units.

w_j >= 0,  sum_{j=2}^{J+1} w_j = 1

03 / Pre-treatment fit

Let X1 be treated-unit pre-policy features and X0 the donor feature matrix. Choose weights that make weighted donors match the treated pre-period features.

w_hat = argmin_w (X_1 - X_0 w)^T V (X_1 - X_0 w)
s.t. w >= 0, 1^T w = 1

04 / Counterfactual path

After treatment, the weighted donor outcome path estimates what the treated unit would have experienced without treatment.

Y_1t(0)_hat = sum_{j=2}^{J+1} w_hat_j Y_jt,  for t > T0

05 / Effect path

The treatment effect at each post-period is observed treated outcome minus synthetic counterfactual.

tau_t_hat = Y_1t - Y_1t(0)_hat
ATT_post = (1/(T-T0)) sum_{t=T0+1}^{T} tau_t_hat

06 / Placebo inference

Iteratively pretend each donor is treated and rebuild synthetic controls. A large treated-unit gap relative to placebo gaps strengthens the evidence.

ratio_i = RMSPE_post,i / RMSPE_pre,i

07 / SDID intuition

Synthetic DID combines donor weights with time weights, blending synthetic-control weighting with DID-style before-after differencing.

tau_SDID = (Y_1,post - omega^T Y_0,post) - (Y_1,pre - omega^T Y_0,pre) lambda

03 / Code

Python code: constrained optimization for synthetic-control weights

This skeleton uses `scipy.optimize.minimize` to estimate nonnegative donor weights that sum to one, then builds the synthetic path, effect path, and pre/post RMSPE.

import numpy as np
import pandas as pd
from scipy.optimize import minimize

# df columns:
# unit, year, outcome, treated_unit
treated_unit = "City A"
pre_years = range(2010, 2020)
post_years = range(2020, 2025)

panel = df.pivot(index="year", columns="unit", values="outcome").sort_index()
donors = [unit for unit in panel.columns if unit != treated_unit]

Y1_pre = panel.loc[pre_years, treated_unit].to_numpy()
Y0_pre = panel.loc[pre_years, donors].to_numpy()

def objective(weights):
    synthetic_pre = Y0_pre @ weights
    return np.mean((Y1_pre - synthetic_pre) ** 2)

n_donors = len(donors)
constraints = [{"type": "eq", "fun": lambda w: w.sum() - 1}]
bounds = [(0, 1)] * n_donors
start = np.repeat(1 / n_donors, n_donors)

result = minimize(objective, start, bounds=bounds, constraints=constraints)
weights = pd.Series(result.x, index=donors).sort_values(ascending=False)

synthetic_path = panel[donors] @ weights
effect_path = panel[treated_unit] - synthetic_path

pre_rmspe = np.sqrt(np.mean(effect_path.loc[pre_years] ** 2))
post_rmspe = np.sqrt(np.mean(effect_path.loc[post_years] ** 2))

print(weights[weights > 0.01])
print({"pre_rmspe": pre_rmspe, "post_rmspe": post_rmspe})
print(effect_path.loc[post_years])

04 / Case

Case: evaluating a city emissions policy with one treated unit

Question: did a city-level emissions policy introduced in 2020 reduce pollution?
The donor pool should include untreated cities with comparable institutions, industrial structure, and no major concurrent shocks.
The key graph is not only the post-policy gap; it is whether the treated city and synthetic city track closely before policy.
A credible report includes donor weights, pre-policy RMSPE, effect path, placebo distribution, leave-one-donor-out sensitivity, and donor-pool justification.

05 / Risks

Common Pitfalls

Including donors exposed to the same policy or another concurrent shock.

Interpreting a post-period gap when pre-treatment fit is poor.

Showing only the main path plot without weights, placebo checks, RMSPE, and high-weight donor sensitivity.