Input Perturbations Propagate Differently: A Mechanistic Analysis of GPT-2 Robustness

26 minute read

Published: May 27, 2026

Overview

Large language models (LLMs) are often evaluated on clean text. However, real-world inputs are rarely clean. User prompts may contain typos, OCR artifacts, formatting errors, incorrectly copy/pasted fragments, or shuffled and partially corrupted content. These perturbations can affect model outputs in ways that are difficult to predict from input alone.

Emily Liu

Blog posts

2026

Input Perturbations Propagate Differently: A Mechanistic Analysis of GPT-2 Robustness

Overview