StatsPAI - Building the future of empirical research with AI
Building the Future of Empirical Research

Your AI Research Assistant for Publication-Ready Papers

StatsPAI empowers researchers with AI-assisted tools for paper outline planning, literature review synthesis from uploaded PDFs, and rigorous statistical modeling — dramatically accelerating your path from data to publication.

REAP - Rural Education Action ProjectStanford Rural Education Action Program - Center on China's Economy & Institutions

StatsPAI, together with LearnPy.online and CoPaper.AI, is committed to rapidly making Python the #1 language for empirical research

Traditional empirical research requires literature review, data cleaning, statistical modeling, and full paper writing — a tedious and time-consuming process. StatsPAI's AI assistant works alongside you — helping structure your paper outline, generate 100% reliable literature reviews, build econometric models, refine your analysis, generate fully formatted papers, and provide Python/R/Stata code for full reproducibility. Let AI handle the tedious groundwork so you can focus on research and innovation, gaining insights into the past, present, and future.

Led by AI & Statistical Experts

About the Team

Bryce Wang @Stanford REAP

Bryce Wang @Stanford REAP

Founder

Researcher and engineer focused on Agentic AI and statistical empirical analysis. Author of "Building Intelligent Multi-Agent Systems".

Expertise

AI Agent Architecture
Statistical Analysis
Multi-Agent System Design
Empirical Research Methods
Scott Rozelle, PhD @Stanford REAP

Scott Rozelle, PhD @Stanford REAP

Cofounder & Strategic Advisor

Helen F. Farnsworth Senior Fellow at Stanford University and Faculty Co-director of the Stanford Center on China's Economy and Institutions. Leading expert on agricultural policy, rural development, and education economics in China. Author of "Invisible China" and recipient of China's Friendship Award.

Expertise

Empirical Research Methods
Econometric Analysis
Causal Inference & RCTs
Development Economics

Our Mission

We believe empirical research should be accessible to everyone. By combining cutting-edge AI technology with deep statistical expertise, we're removing the technical barriers that prevent researchers from focusing on what matters most: asking the right questions and interpreting results.

Democratization

Making professional research tools accessible to researchers worldwide

Quality

Maintaining the highest standards of statistical rigor and methodology

Innovation

Pushing the boundaries of what AI can achieve in empirical analysis

Our Product Ecosystem

Three Platforms, One Mission

From the open-source foundation to the AI research assistant to free learning — we cover the full stack of empirical research in the AI era.

Available Now
CoPaper.AI

AI Research Assistant — Plan · Estimate · Iterate · Paper

Your AI-powered research assistant for academic paper writing. From outline planning to empirical data analysis and full paper generation — produce publication-ready work efficiently.

Paper outline planning with structured templates
Empirical data analysis (CSV, Excel, JSON, Parquet)
Publication-ready full paper generation
Robustness checks and statistical rigor
Try CoPaper.AI
Open Source · MIT
StatsPAI

Agent-Native Causal Inference · 800+ Functions · One import

The open-source Python package that consolidates Stata and R's causal inference ecosystems into one agent-native API — purpose-built for LLM-driven research and fully ergonomic for humans.

800+ estimators unified under sp.* with one CausalResult object
Agent-native schemas for OpenAI / Anthropic tool calling
DiD, RD, synth, DML, causal forest, neural causal, discovery…
Word / LaTeX / Excel / HTML publication output out of the box
Explore StatsPAI
Available Now

LearnPy.online

Free Interactive Learning

Interactive platform for learning Python, statistics, and econometrics. Code in your browser, get AI assistance, and build skills from basics to advanced topics.

Browser-based Python environment
AI chatbot for code help and explanations
Structured curriculum: Python → Stats → Econometrics
100% free with no signup required
Start Learning

All three platforms designed to democratize access to advanced statistical analysis in the AI era

Core Features

Powered by Advanced AI

Everything you need for professional empirical research

Automated Research Workflow

From data cleaning to final report generation, our AI handles the entire research process automatically.

Publication-Ready Output

Generate comprehensive DOCX reports with methodology, results, robustness checks, and discussion chapters.

Intelligent Analysis

Leveraging Claude 4.5 and GPT-5 for sophisticated statistical modeling and evidence-based insights.

Minutes, Not Months

Complete what traditionally takes weeks of work in just minutes with AI-powered acceleration.

Comprehensive Statistics

Descriptive statistics, baseline models, robustness checks, and advanced econometric analysis.

Democratized Research

No technical barriers. Focus on your research questions, not implementation details.

Free Interactive Learning Platform

LearnPy.Online

Master Python, Statistics, and Econometrics with interactive coding and AI assistance - completely free

Interactive Code Editor

Write and run Python code directly in your browser with instant feedback

AI-Powered Assistant

Get explanations, debug code, and learn concepts with intelligent chatbot support

Comprehensive Curriculum

From Python basics to advanced econometrics - structured learning paths

statistics_demo.py
# Python Statistics Example
import numpy as np
import pandas as pd
from scipy import stats

# Generate sample data
data = np.random.normal(100, 15, 1000)

# Perform t-test
t_stat, p_value = stats.ttest_1samp(data, 100)

# Display results
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")
Start Learning Free

100% Free - No Credit Card Required

Flagship AI Research Platform
CoPaper.AI

From Data to Paper, Together

From data to paper. 20 minutes to a reproducible paper.

CoPaper.AI is the AI research co-authoring platform from Stanford's REAP program. Upload your data, set your research direction, and collaborate deeply with AI at every step to produce reproducible academic papers with full code — from OLS, Logit/Probit and mediation to DiD, IV, and RD.

38+

Econometric Methods

3,000+

Papers Assisted

200+

Universities Covered

5–30 min

Avg. Generation Time

Sample Outputs

38 econometric methods, publication-quality figures

Every figure below was generated end-to-end by CoPaper.AI — ready to drop straight into your .docx, untouched.

Regression table comparing OLS, FE, RE, 2SLS, GMM

Multi-Model Regression Table

Side-by-side OLS / FE / RE / 2SLS / GMM with clustered SEs and standard diagnostics.

Event study parallel trends test chart

Event Study · Parallel Trends

Dynamic treatment effects with pre-period placebo coefficients and 95% CIs.

RDD visualization with binned scatter and local polynomial fit

Regression Discontinuity

Binned scatter + local polynomial fit around the cutoff with robust bias-corrected inference.

Instrumental variables first-stage and reduced-form chart

Instrumental Variables · 2SLS

First-stage F-statistic, reduced form, and 2SLS point estimates with weak-IV robust CIs.

Dynamic event study coefficient plot

Dynamic Event Study

Lead-and-lag coefficients around policy implementation with point-wise and uniform bands.

Mediation path diagram with ACME and ADE

Mediation Analysis

Direct and indirect effects decomposition with bootstrap CIs and sensitivity diagnostics.

Robustness checks summary forest plot

Robustness Checks Summary

Point estimates across alternative specifications — controls, samples, and SE choices — in one glance.

Heckman selection model regression table

Heckman Selection Model

Two-stage selection correction with inverse Mills ratio and selection-equation diagnostics.

Descriptive statistics summary table

Descriptive Statistics Table

Mean / SD / min / max by group with tests of balance — formatted for direct journal submission.

All figures and tables are sample outputs from real CoPaper.AI runs. Papers delivered as complete, citation-formatted .docx files.

You're the Author, AI Is Your Research Partner

Unlike tools that generate an entire paper with one click, CoPaper.AI pauses at every critical step — outline, variable selection, model specification, result interpretation — and waits for your input. Every decision reflects your academic judgment. The AI handles the heavy lifting; you steer the research.

Features

Full-pipeline AI paper co-authoring

From data upload to paper export — you participate, revise, and control quality at every step.

Multi-Dataset Upload

CSV, Excel, JSON, Parquet and more. Up to 20 datasets simultaneously. Auto-detects Excel multi-sheet files.

Intelligent Data Analysis

Automated EDA, variable definitions, and econometric modeling — OLS, FE, IV, threshold, DiD, RD, mediation, and more.

Human-in-the-Loop Every Step

Pauses at outline, variables, model specs, and interpretation — nothing moves forward without your sign-off.

Fully Reproducible Code

Every number and figure ships with Python code — plus Stata / R / EViews translations for complete reproducibility.

Publication-Ready Papers

Properly structured .docx with intro, literature review, data & methods, empirical results, and discussion — journal-ready.

AI Refinement & Review

Multi-pass polish and review — tightens prose, checks consistency across sections, and flags weak claims.

Four steps to your paper

1

Upload Data

Drag and drop your datasets (CSV, Excel, Stata, and more). The system automatically detects variable types and data structure.

2

Set Research Direction

Define your research question, choose methods, and select variables. Use AI-assisted inference or set everything manually.

3

AI Writes, You Guide

The AI generates each chapter step by step, pausing after every section for your review and feedback before continuing.

4

Refine & Export

Polish your paper with AI-powered refinement. Export a complete, publication-ready DOCX with one click.

100% Reproducible

Every regression, chart, and statistical result comes with complete Python code — plus Stata, R, and EViews translations for full reproducibility.

Human-in-the-Loop at Every Step

AI doesn't decide for you. At each stage — outline, data analysis, results interpretation — you review, revise, and approve before moving forward.

Publication-Ready Output

Generates properly structured papers with introduction, literature review, data & methods, empirical results, and discussion — ready for journal submission.

Real Impact

Real users, real outcomes

CoPaper.AI has helped researchers cross the finish line — from journal submission, to modeling competitions, to thesis defense.

Journal Publications

Multiple users have completed empirical analyses and paper writing through CoPaper.AI, successfully publishing in academic journals. Reproducible code and standardized format significantly improved submission efficiency.

Modeling Competition Awards

Users leveraged CoPaper.AI's rapid data analysis and paper generation to achieve outstanding results in empirical modeling competitions — prize-winning work turned around in hours instead of weeks.

Thesis Completion & Graduation

Undergraduate, master's, and PhD students used CoPaper.AI to complete empirical chapters and full theses — from research design to defense-ready drafts, all reproducible from code.

Flagship Open-Source Package
StatsPAI

The agent-native causal inference toolkit for the AI era

StatsPAI brings R's Causal Inference Task View and Stata's core econometrics commands into a single, consistent Python API — 800+ functions, one import, purpose-built for LLM-driven research workflows while remaining fully ergonomic for human researchers.

View on GitHub

800+

Functions

450+

Public API Surface

MIT

License

3.9→3.13

Python Support

Agent-Native by Design

Every function ships with a self-describing schema — list_functions(), describe_function(), function_schema() — ready for OpenAI / Anthropic tool-calling out of the box.

One Import, 800+ Functions

DiD, RD, synthetic control, matching, DML, causal forests, neural causal, causal discovery, policy learning — all under sp.* with one consistent CausalResult object.

2025–2026 Frontier Methods

Callaway-Sant'Anna, Borusyak-Hull-Jaravel, Park-Xu shift-share IV, particle-filter assimilation, Bayesian causal forest — all re-implemented from the original papers.

Publication-Ready Pipeline

Word + Excel + LaTeX + HTML + Markdown export from every estimator. No more outreg2 / modelsummary dance — just .to_latex() and you're done.

Head-to-Head

vs Stata, R, and legacy Python

StatsPAI consolidates what used to require a $695/yr Stata license plus 20+ incompatible R packages — into one agent-native Python API.

CapabilityStataRPython (legacy)StatsPAI
Unified API across methods
Agent-native schemas (LLM tool calling)
Modern ML causal (DML, forest, TMLE)
Neural causal (TARNet, CFRNet, DragonNet)
Word + LaTeX + Excel publication output
One-call robustness (spec_curve, assumption_audit)
Cost$695+/yrFreeFreeFree
Open source (MIT)
Full supportPartial / fragmentedNot available
Agent-Native Platform

Built for LLMs, ergonomic for humans

Stata and R were designed for humans with keyboards. StatsPAI is the first econometrics toolkit designed from the ground up for AI agents — while being just as clean to drive manually.

  • Every one of the 800+ estimators exposes an OpenAI / Anthropic tool-calling schema via function_schema().
  • Every result is a structured CausalResult object with .summary() / .plot() / .to_latex() / .cite() — no fragile string parsing.
  • Ships with an MCP server scaffold so Claude, ChatGPT, and custom agents can drive the full library via natural language.
agent_tool_calling.py
import statspai as sp

# 1. LLM discovers 800+ estimators
schemas = sp.list_functions()

# 2. Self-describing tool schema
spec = sp.function_schema("callaway_santanna")
#  → OpenAI / Anthropic tool-calling ready

# 3. Agent invokes — structured result
res = sp.callaway_santanna(data=df, ...)
res.summary()     # console
res.to_latex()    # publication
res.plot()        # figure
res.cite()        # bibtex

Proof, Not Promises

See it on GitHub & PyPI

A fast-moving, fully open-source project — 279+ commits, 54+ stars, 9+ releases, and a JOSS submission underway.

The Vision

To become the #1 causal inference tool of the AI era

The last 40 years of causal inference were built in Stata and R — closed, paid, and designed for human hands. The next 40 will be built in Python, open-source, and agent-native. StatsPAI is the foundation: one import, every frontier method, every result machine-readable, every table publication-ready — so the next generation of researchers and their AI collaborators can move at the speed of thought.

From Basics to Advanced Agent Systems

Deep Dive into Agentic AI

Agentic AI Book Cover
Agentic AI

Building Intelligent Multi-Agent Systems

A comprehensive guide to understanding and implementing intelligent multi-agent systems. Goes beyond drag-and-drop workflows to teach dynamic, intelligent decision-making in AI systems.

6 comprehensive modules from basic to advanced
3-5x more content than standard courses
Practical code examples and real implementations
Zero barrier to entry with clear explanations
Focus on Agentic AI thinking and architecture

AI era, understanding is king. Abandon non-essential details, directly reach the core of Agentic AI thinking.

Available in Chinese, English version coming soon

Get in Touch

Get in Touch

Ready to transform your research workflow?

Join researchers worldwide who are accelerating their empirical analysis with StatsPAI.