Agent / Scholarly Writing
De-AIGC: How AI-Text Detection Works, Its Limits, and Responsible Paper Revision
AI detectors judge "does this look machine-written?" from statistical signals like perplexity, burstiness, and probability curvature — but those signals are neither stable nor robust to rewriting. This page explains how detection works and where it breaks, and reframes "humanizing" as rewriting a draft with content you genuinely understand, in your own voice, with verifiable facts, and with AI use disclosed per journal policy — not as a how-to for deception.
Schematic
The principle at a glance
Start Here
What you should be able to do
Understand three detection signals: perplexity and burstiness, DetectGPT probability curvature, and generation watermarks.
Know why detectors are unreliable: high false-positive rates, systematic bias against non-native writing, and easy weakening by rewriting.
Read "humanizing" as improving authenticity, accuracy, and readability — not as circumventing academic integrity.
Know the AI-use disclosure policies of major journals / conferences, and where and how to state them.
Learning Path
Learning path: perplexity → burstiness → curvature → watermark → limits
Read AI-text detection along this path: start with plain perplexity and burstiness, move to DetectGPT curvature and watermarks, and end by recognizing the fundamental limit from distributional overlap.
Step 1
Perplexity
Machine text is less surprising on average, so perplexity is lower.
PPL=exp(−mean log p)
Step 2
Burstiness
Human sentence surprise varies more; machine text is smoother.
std/mean
Step 3
Curvature
DetectGPT: machine text sits at a local log-probability maximum.
d(x)>0
Step 4
Watermark
Generation biases a green list; testable but easily weakened by rewriting.
z-score
Step 5
Limits
Distributional overlap makes false positives unavoidable; a score is only a clue.
AUC<1
01 / Intuition
Core Intuition
LLMs tend to generate high-probability, low-surprise token sequences, so machine text has lower average perplexity and smaller sentence-level surprisal variance (burstiness) — the statistical basis of most detectors.
DetectGPT uses one insight: machine text usually sits near a local maximum of the model log-probability, so small paraphrases tend to lower the log-probability; human text need not have this curvature.
Watermarking biases token choice toward a pseudo-random "green list" at generation time, testable after the fact; but translation, rewriting, or switching models weakens it — showing that all detection signals rest on fragile distributional assumptions.
02 / Math
The statistics of detection signals and their fundamental limits
01 / Perplexity
Perplexity measures how "surprised" a model is by a text on average. Machine-generated text is low-perplexity to the model that wrote it — but carefully polished, conventional human text can also be low-perplexity, a key source of false positives.
PPL=exp(−(1/N) Σ_i log p(w_i | w_<i))02 / Burstiness
Human writing varies more in sentence-level surprise (long complex sentences mixed with short ones); machine text is smoother. Burstiness captures this variation via the dispersion of per-sentence surprisal.
burstiness = std(s_j) / mean(s_j), s_j = mean surprisal of sentence j03 / DetectGPT probability curvature
Perturb the text many times (paraphrase) and compare the mean log-probability of the original with the perturbations. Machine text usually sits at a local maximum, so the gap is clearly positive; human text need not be.
d(x)=log p(x) − E_{tilde x}[log p(tilde x)]; d>0 → machine-leaning04 / Generation watermarking
At generation, hash the previous token to split the vocabulary into green / red lists and bias toward green. Detect by counting green tokens with a z-test. The upside is provable statistics; the downside is that rewriting / translation quickly weakens it.
z = (|green| − γT) / sqrt(T γ(1−γ))05 / Why rewriting lowers every signal
Synonym substitution and reordering raise perplexity, scatter the watermark, and flatten the curvature. This is not an evasion guide but a statement of fragility: signals depend on the specific model and generation process, so a different writing style shifts the distribution.
06 / The fundamental limit: distributional overlap
Human and machine text distributions overlap heavily, and "human-written then AI-polished" is a continuum, not a binary. So any threshold trades false positives against misses and the ROC cannot be perfect — which is why a detection score can only be a clue, never proof.
TPR and FPR cannot both be ideal (overlap → AUC<1)03 / Code
Code case: perplexity and burstiness from log-probs (the detection side)
This shows only the detection-side statistics: given per-token log-probabilities for a text, compute perplexity and sentence-level burstiness, and see why polished human text can be misflagged.
Case 1: how perplexity is computed
Perplexity is the exponential of the mean negative log-likelihood; lower means less "surprised".
import numpy as np
log_probs = np.array([-1.2, -0.9, -1.5, -0.7, -1.1]) # nats per token
ppl = np.exp(-log_probs.mean())
print("perplexity:", round(float(ppl), 3))Expected output
perplexity: 2.93How to read this code
- Machine-generated text is low-perplexity to the model itself.
- But conventional, clear human text can also be low-perplexity.
- So low perplexity does not mean "machine-written".
Case 2: the intuition behind burstiness
Human sentences vary more in surprise; machine text is smoother.
import numpy as np
human = np.array([3.1, 0.8, 2.9, 1.0, 3.4]) # varied sentence surprisal
machine = np.array([1.8, 1.9, 1.7, 2.0, 1.8]) # smooth
b = lambda s: round(float(s.std() / s.mean()), 3)
print("human burstiness :", b(human))
print("machine burstiness:", b(machine))Expected output
human burstiness : 0.594
machine burstiness: 0.057How to read this code
- Human text is bursty: sentence surprise varies a lot.
- Machine text is smoother, with low burstiness.
- But rewriting or mixed authorship makes the two converge.
Case 3: a false-positive demonstration
A non-native writer using short, regular sentences can be misflagged as AI.
import numpy as np
# a careful non-native writer: short, regular sentences -> low PPL, low burstiness
lp = np.array([-0.8, -0.7, -0.9, -0.6, -0.8, -0.7])
ppl = np.exp(-lp.mean())
sent = np.array([0.75, 0.80, 0.70]) # smooth per-sentence surprisal
burst = sent.std() / sent.mean()
print("PPL:", round(float(ppl), 2), "| burstiness:", round(float(burst), 3),
"-> may be flagged, wrongly")Expected output
PPL: 2.16 | burstiness: 0.058 -> may be flagged, wronglyHow to read this code
- Concise, regular human writing is also low-perplexity and low-burstiness.
- This is exactly the mechanism behind bias against non-native authors.
- Takeaway: a score is a clue, never a verdict.
04 / Case
Case: a non-native researcher misflagged after LLM language polishing
- Scenario: a non-native English researcher uses an LLM to polish the language of a paper and a detector flags it as "highly likely AI-generated".
- Problem: detection signals cannot separate "ghost-written by a machine" from "human-written then machine-polished", and they systematically bias toward false positives for clean, concise non-native writing.
- Responsible path: keep versions and drafts of the writing process; restate the contribution and arguments in your own words so every sentence maps to content you truly understand; verify every citation, datum, and number.
- Transparent disclosure: per the target venue policy, state the scope of AI-tool use (e.g., "language polishing only") in methods, acknowledgments, or the cover letter, putting integrity and verifiability first.
05 / Risks
Common Pitfalls
References
- Mitchell et al. (2023), DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature, ICMLhttps://arxiv.org/abs/2301.11305
- Kirchenbauer et al. (2023), A Watermark for Large Language Models, ICMLhttps://arxiv.org/abs/2301.10226
- Sadasivan et al. (2023), Can AI-Generated Text be Reliably Detected?https://arxiv.org/abs/2303.11156
- Liang et al. (2023), GPT Detectors Are Biased Against Non-Native English Writers, Patternshttps://arxiv.org/abs/2304.02819