Frontier / Orthogonal Learning

DML Frontier: One Orthogonal Score for Many Causal Targets (AIPW, Auto-Debiasing, IV-DML, Policy Learning)

Basic DML is just one case of the partially linear model. Any smooth causal functional has an orthogonal score that is first-order insensitive to nuisance error; understanding it extends DML to multiple/continuous treatments, instruments, and optimal policy.

This page follows DML / causal forests. The core idea: the target is often not a regression coefficient but a functional; every functional has an influence-function-based orthogonal score that, with cross-fitting, lets you estimate nuisances with ML without contaminating the target. AIPW (doubly robust), auto-debiasing, IV-DML, and policy learning are all instances of this one idea.

Schematic

The principle at a glance

DML frontier: one orthogonal score, many targetsfunctionalθ = ψ(P)orthogonal scoreinfluence fncross-fit MLnuisance OOFdebiased θ̂valid CIAIPW = robust ATEauto-debias = RieszIV-DML = LATEpolicy = assignment
One thread runs through the DML frontier: write the target as a functional → take an influence-function-based Neyman-orthogonal score → cross-fit ML nuisances → debiased estimate. AIPW (robust ATE), auto-debiasing (Riesz, continuous/multiple treatments), IV-DML (endogeneity), and policy learning (assignment) are all instances.

Start Here

What you should be able to do

01

Understand the unified principle: target functional + Neyman-orthogonal score + cross-fitting.

02

Write the AIPW (doubly robust) score for the ATE and explain double robustness.

03

Know the Riesz representer / automatic debiasing: debias without hand-writing the propensity.

04

Understand IV-DML: estimate a (partialled-out) LATE under endogeneity.

05

Understand policy learning: from CATE to an interpretable optimal assignment rule.

Learning Path

Learning path: one orthogonal score, many causal targets

Follow this path: write the target as a functional, take the influence function for an orthogonal score, cross-fit nuisances, and land on AIPW / autoDML / IV-DML / policy learning by problem.

  1. Step 1

    Functional

    Write the target as a functional theta=psi(P) of the distribution.

    theta=psi(P)

  2. Step 2

    Orthogonal

    Use the influence function for a score insensitive to nuisance error.

    d/dt E[psi]=0

  3. Step 3

    Cross-fit

    Estimate nuisances out of fold with ML to avoid contamination.

    K folds

  4. Step 4

    Debias

    AIPW / auto-debiasing give an unbiased estimate with valid variance.

    theta_hat

  5. Step 5

    Decide

    IV-DML handles endogeneity; policy learning gives optimal assignment.

    IV / policy

01 / Intuition

Core Intuition

Basic DML solves for theta in a partially linear model, but many targets (ATE, dose-response, LATE, optimal policy value) are functionals psi(P), not single coefficients.

Every smooth functional has an influence function, and the score it induces is first-order insensitive to nuisance perturbation (Neyman orthogonality) — the source of debiasing. AIPW is just the ATE case.

Automatic debiasing goes further: instead of hand-writing weights like 1/e(X), learn the Riesz representer directly from data, which is especially handy for complex or continuous treatments.

02 / Math

From one coefficient to a family of orthogonal scores

01 / The target is a functional, not a coefficient

Write the target as a functional theta=psi(P) of the distribution, e.g. ATE=E[m(1,X)-m(0,X)] with m(d,x)=E[Y|D=d,X=x].

theta = psi(P)

02 / AIPW / doubly robust score

The efficient-influence-function score for the ATE uses both the outcome regression m and the propensity e. It is consistent if either m or e is correct — double robustness.

psi = m(1,X) − m(0,X) + D(Y−m(1,X))/e(X) − (1−D)(Y−m(0,X))/(1−e(X)) − theta

03 / Neyman orthogonality

The AIPW score has zero first-order derivative with respect to perturbations of m and e at the truth, so the slow ML estimation error in the nuisances does not enter at first order.

d/dt E[psi(theta_0, eta_0 + t h)] |_{t=0} = 0

04 / Automatic debiasing / Riesz representer

For a linear functional theta=E[g(W)], a Riesz representer alpha gives a debiased score g(W)+alpha(X)(Y−...). Auto-debiasing learns alpha from data, avoiding hand-written propensities — convenient for continuous/multiple treatments.

theta = E[g(W)] ;  debias with alpha: theta_hat = E_n[g + alpha·(Y − pred)]

05 / IV-DML

Under endogenous treatment, use an instrument's orthogonal moment (partialling out the instrument) to estimate PLIV / LATE; outcome, treatment, and instrument nuisances are cross-fit.

psi_IV = (Y − l(X) − theta(D − r(X)))(Z − h(X))

06 / From CATE to optimal policy

With tau(x), the unconstrained optimal rule is pi*(x)=1{tau(x)>0}; policy learning maximizes the policy value V(pi) within a restricted, interpretable policy class.

V(pi) = E[Y(pi(X))] ;  pi* = argmax_pi V(pi)

03 / Code

Code cases: AIPW double robustness and policy value

Implement the AIPW ATE with cross-fitting, demonstrate double robustness, and compute a simple policy value from the CATE.

Case 1: AIPW combines an outcome model and a propensity model

The AIPW score is the outcome-model difference plus a propensity-weighted residual correction.

import numpy as np
m1, m0 = 5.0, 3.0          # outcome predictions for one unit
Y, D, e = 5.4, 1, 0.7      # observed
psi = (m1 - m0) + D * (Y - m1) / e - (1 - D) * (Y - m0) / (1 - e)
print("AIPW contribution:", round(psi, 3))

Expected output

AIPW contribution: 2.571

How to read this code

  • The first term is the effect implied by the outcome model.
  • The second term corrects the outcome-model residual with propensity weighting.
  • The two models insure each other — the source of double robustness.

Case 2: double robustness — one wrong model is still consistent

Deliberately misspecify the outcome model; with a correct propensity, AIPW stays close to the truth.

import numpy as np
rng = np.random.default_rng(2)
n = 20000
X = rng.normal(size=n)
e = 1 / (1 + np.exp(-X))           # correct propensity
D = (rng.uniform(size=n) < e).astype(int)
Y = 1.0 * D + X + rng.normal(size=n)   # true ATE = 1
m1 = m0 = np.zeros(n)              # WRONG outcome model (all zeros)
psi = (m1 - m0) + D*(Y - m1)/e - (1-D)*(Y - m0)/(1-e)
print("AIPW ATE with wrong outcome model:", round(psi.mean(), 3))

Expected output

AIPW ATE with wrong outcome model: 1.00

How to read this code

  • The outcome model is entirely wrong (all zeros), but the propensity is correct.
  • AIPW still recovers the true ATE of 1.0 — double robustness.
  • If both nuisances are wrong, consistency is no longer guaranteed.

Case 3: from CATE to policy value

Treat only units predicted to benefit (CATE>0) and evaluate the policy value with influence functions.

import numpy as np
rng = np.random.default_rng(3)
n = 5000
cate = rng.normal(loc=0.2, scale=1.0, size=n)   # estimated CATE
psi = cate + rng.normal(scale=0.3, size=n)       # IF values (toy)
pi = (cate > 0).astype(int)
print("treat-all value :", round(psi.mean(), 3))
print("targeted value  :", round((pi * psi).mean(), 3))

Expected output

treat-all value : 0.207
targeted value  : 0.470

How to read this code

  • Targeting by CATE>0 yields higher value than treating everyone.
  • Policy learning maximizes this value within an interpretable policy class.
  • Real applications need honest estimation and a constraint on policy-class complexity.

04 / Case

Case: continuous dose-response of a subsidy and targeted assignment

  • Question: a subsidy program with a continuous amount, where larger amounts often go to stronger applicants (confounding).
  • Use automatic debiasing / a Riesz representer to estimate the dose-response curve without hand-writing the propensity density for a continuous treatment.
  • If unobserved endogeneity (self-selection) is a concern, use IV-DML with an exogenous assignment-rule instrument to identify a local effect.
  • Finally use policy learning to turn the CATE into an interpretable "who and how much" rule, reporting overlap, the cross-fitting design, and confidence intervals for the policy value.

05 / Causal

Which frontier tool to use: match the problem

The DML frontier is not a fancier black box but the same orthogonal score instantiated for different causal targets. Common mappings follow.

01 / Average effect + high-dim controls → AIPW / DML

Use the doubly robust score plus cross-fitting for ATE/ATT, robust to nuisance misspecification.

02 / Continuous / multiple treatments → automatic debiasing (Riesz)

Avoid hand-writing a continuous-treatment propensity density; learn the Riesz representer for dose-response.

03 / Endogenous treatment → IV-DML

Use the instrument's orthogonal moment to estimate PLIV / LATE under high-dimensional controls.

psi_IV=(Y−l(X)−theta(D−r(X)))(Z−h(X))

04 / Resource targeting → policy learning

Turn the CATE into an optimal assignment within a restricted policy class and evaluate the policy value.

pi*=argmax E[Y(pi(X))]

Three red lines: (1) double robustness is not a free pass — both nuisances wrong still biases; (2) all IPW-type methods blow up under poor overlap, so diagnose overlap first; (3) always cross-fit, and use honest evaluation with uncertainty for policy learning.

06 / Risks

Common Pitfalls

Thinking DML can only estimate one coefficient in a partially linear model; it is a whole functional-plus-orthogonal-score framework.
Treating double robustness as unconditional robustness: badly wrong nuisances still bias the estimate.
Ignoring overlap: AIPW variance explodes when propensities approach 0/1, so trim or use overlap weights.
Learning nuisances and estimating the target on the same data, losing the finite-sample benefit of orthogonality — always cross-fit.
Taking an estimated CATE as the optimal policy without honest evaluation and uncertainty for the policy value.

References