Frontier / Orthogonal Learning

DML Frontier: One Orthogonal Score for Many Causal Targets (AIPW, Auto-Debiasing, IV-DML, Policy Learning)

Basic DML is just one case of the partially linear model. Any smooth causal functional has an orthogonal score that is first-order insensitive to nuisance error; understanding it extends DML to multiple/continuous treatments, instruments, and optimal policy.

This page follows DML / causal forests. The core idea: the target is often not a regression coefficient but a functional; every functional has an influence-function-based orthogonal score that, with cross-fitting, lets you estimate nuisances with ML without contaminating the target. AIPW (doubly robust), auto-debiasing, IV-DML, and policy learning are all instances of this one idea.

Schematic

The principle at a glance

One thread runs through the DML frontier: write the target as a functional → take an influence-function-based Neyman-orthogonal score → cross-fit ML nuisances → debiased estimate. AIPW (robust ATE), auto-debiasing (Riesz, continuous/multiple treatments), IV-DML (endogeneity), and policy learning (assignment) are all instances.

Start Here

What you should be able to do

Understand the unified principle: target functional + Neyman-orthogonal score + cross-fitting.

Write the AIPW (doubly robust) score for the ATE and explain double robustness.

Know the Riesz representer / automatic debiasing: debias without hand-writing the propensity.

Understand IV-DML: estimate a (partialled-out) LATE under endogeneity.

Understand policy learning: from CATE to an interpretable optimal assignment rule.

Learning Path

Learning path: one orthogonal score, many causal targets

Follow this path: write the target as a functional, take the influence function for an orthogonal score, cross-fit nuisances, and land on AIPW / autoDML / IV-DML / policy learning by problem.

Step 1
Functional
Write the target as a functional theta=psi(P) of the distribution.
theta=psi(P)
Step 2
Orthogonal
Use the influence function for a score insensitive to nuisance error.
d/dt E[psi]=0
Step 3
Cross-fit
Estimate nuisances out of fold with ML to avoid contamination.
K folds
Step 4
Debias
AIPW / auto-debiasing give an unbiased estimate with valid variance.
theta_hat
Step 5
Decide
IV-DML handles endogeneity; policy learning gives optimal assignment.
IV / policy

01 / Intuition

Core Intuition

Basic DML solves for theta in a partially linear model, but many targets (ATE, dose-response, LATE, optimal policy value) are functionals psi(P), not single coefficients.

Every smooth functional has an influence function, and the score it induces is first-order insensitive to nuisance perturbation (Neyman orthogonality) — the source of debiasing. AIPW is just the ATE case.

Automatic debiasing goes further: instead of hand-writing weights like 1/e(X), learn the Riesz representer directly from data, which is especially handy for complex or continuous treatments.

02 / Math

From one coefficient to a family of orthogonal scores

01 / The target is a functional, not a coefficient

Write the target as a functional theta=psi(P) of the distribution, e.g. ATE=E[m(1,X)-m(0,X)] with m(d,x)=E[Y|D=d,X=x].

theta = psi(P)

02 / AIPW / doubly robust score

The efficient-influence-function score for the ATE uses both the outcome regression m and the propensity e. It is consistent if either m or e is correct — double robustness.

psi = m(1,X) − m(0,X) + D(Y−m(1,X))/e(X) − (1−D)(Y−m(0,X))/(1−e(X)) − theta

03 / Neyman orthogonality

The AIPW score has zero first-order derivative with respect to perturbations of m and e at the truth, so the slow ML estimation error in the nuisances does not enter at first order.

d/dt E[psi(theta_0, eta_0 + t h)] |_{t=0} = 0

04 / Automatic debiasing / Riesz representer

For a linear functional theta=E[g(W)], a Riesz representer alpha gives a debiased score g(W)+alpha(X)(Y−...). Auto-debiasing learns alpha from data, avoiding hand-written propensities — convenient for continuous/multiple treatments.

theta = E[g(W)] ;  debias with alpha: theta_hat = E_n[g + alpha·(Y − pred)]

05 / IV-DML

Under endogenous treatment, use an instrument's orthogonal moment (partialling out the instrument) to estimate PLIV / LATE; outcome, treatment, and instrument nuisances are cross-fit.

psi_IV = (Y − l(X) − theta(D − r(X)))(Z − h(X))

06 / From CATE to optimal policy

With tau(x), the unconstrained optimal rule is pi*(x)=1{tau(x)>0}; policy learning maximizes the policy value V(pi) within a restricted, interpretable policy class.

V(pi) = E[Y(pi(X))] ;  pi* = argmax_pi V(pi)

03 / Code

Code cases: AIPW double robustness and policy value

Implement the AIPW ATE with cross-fitting, demonstrate double robustness, and compute a simple policy value from the CATE.

Case 1: AIPW combines an outcome model and a propensity model

The AIPW score is the outcome-model difference plus a propensity-weighted residual correction.

import numpy as np
m1, m0 = 5.0, 3.0          # outcome predictions for one unit
Y, D, e = 5.4, 1, 0.7      # observed
psi = (m1 - m0) + D * (Y - m1) / e - (1 - D) * (Y - m0) / (1 - e)
print("AIPW contribution:", round(psi, 3))

Expected output

AIPW contribution: 2.571

How to read this code

The first term is the effect implied by the outcome model.
The second term corrects the outcome-model residual with propensity weighting.
The two models insure each other — the source of double robustness.

Case 2: double robustness — one wrong model is still consistent

Deliberately misspecify the outcome model; with a correct propensity, AIPW stays close to the truth.

import numpy as np
rng = np.random.default_rng(2)
n = 20000
X = rng.normal(size=n)
e = 1 / (1 + np.exp(-X))           # correct propensity
D = (rng.uniform(size=n) < e).astype(int)
Y = 1.0 * D + X + rng.normal(size=n)   # true ATE = 1
m1 = m0 = np.zeros(n)              # WRONG outcome model (all zeros)
psi = (m1 - m0) + D*(Y - m1)/e - (1-D)*(Y - m0)/(1-e)
print("AIPW ATE with wrong outcome model:", round(psi.mean(), 3))

Expected output

AIPW ATE with wrong outcome model: 1.00

How to read this code

The outcome model is entirely wrong (all zeros), but the propensity is correct.
AIPW still recovers the true ATE of 1.0 — double robustness.
If both nuisances are wrong, consistency is no longer guaranteed.

Case 3: from CATE to policy value

Treat only units predicted to benefit (CATE>0) and evaluate the policy value with influence functions.

import numpy as np
rng = np.random.default_rng(3)
n = 5000
cate = rng.normal(loc=0.2, scale=1.0, size=n)   # estimated CATE
psi = cate + rng.normal(scale=0.3, size=n)       # IF values (toy)
pi = (cate > 0).astype(int)
print("treat-all value :", round(psi.mean(), 3))
print("targeted value  :", round((pi * psi).mean(), 3))

Expected output

treat-all value : 0.207
targeted value  : 0.470

How to read this code

Targeting by CATE>0 yields higher value than treating everyone.
Policy learning maximizes this value within an interpretable policy class.
Real applications need honest estimation and a constraint on policy-class complexity.

04 / Case

Case: continuous dose-response of a subsidy and targeted assignment

Question: a subsidy program with a continuous amount, where larger amounts often go to stronger applicants (confounding).
Use automatic debiasing / a Riesz representer to estimate the dose-response curve without hand-writing the propensity density for a continuous treatment.
If unobserved endogeneity (self-selection) is a concern, use IV-DML with an exogenous assignment-rule instrument to identify a local effect.
Finally use policy learning to turn the CATE into an interpretable "who and how much" rule, reporting overlap, the cross-fitting design, and confidence intervals for the policy value.

05 / Causal

Which frontier tool to use: match the problem

The DML frontier is not a fancier black box but the same orthogonal score instantiated for different causal targets. Common mappings follow.

01 / Average effect + high-dim controls → AIPW / DML

Use the doubly robust score plus cross-fitting for ATE/ATT, robust to nuisance misspecification.

02 / Continuous / multiple treatments → automatic debiasing (Riesz)

Avoid hand-writing a continuous-treatment propensity density; learn the Riesz representer for dose-response.

03 / Endogenous treatment → IV-DML

Use the instrument's orthogonal moment to estimate PLIV / LATE under high-dimensional controls.

psi_IV=(Y−l(X)−theta(D−r(X)))(Z−h(X))

04 / Resource targeting → policy learning

Turn the CATE into an optimal assignment within a restricted policy class and evaluate the policy value.

pi*=argmax E[Y(pi(X))]

Three red lines: (1) double robustness is not a free pass — both nuisances wrong still biases; (2) all IPW-type methods blow up under poor overlap, so diagnose overlap first; (3) always cross-fit, and use honest evaluation with uncertainty for policy learning.

06 / Risks

Common Pitfalls

Thinking DML can only estimate one coefficient in a partially linear model; it is a whole functional-plus-orthogonal-score framework.

Treating double robustness as unconditional robustness: badly wrong nuisances still bias the estimate.

Ignoring overlap: AIPW variance explodes when propensities approach 0/1, so trim or use overlap weights.

Learning nuisances and estimating the target on the same data, losing the finite-sample benefit of orthogonality — always cross-fit.

Taking an estimated CATE as the optimal policy without honest evaluation and uncertainty for the policy value.

DML Frontier: One Orthogonal Score for Many Causal Targets (AIPW, Auto-Debiasing, IV-DML, Policy Learning)

The principle at a glance

What you should be able to do

Learning path: one orthogonal score, many causal targets

Functional

Orthogonal

Cross-fit

Debias

Decide

Core Intuition

From one coefficient to a family of orthogonal scores

Code cases: AIPW double robustness and policy value

Case 1: AIPW combines an outcome model and a propensity model

Case 2: double robustness — one wrong model is still consistent

Case 3: from CATE to policy value

Case: continuous dose-response of a subsidy and targeted assignment

Which frontier tool to use: match the problem

Common Pitfalls

References