Blog posts

2026

Input Perturbations Propagate Differently: A Mechanistic Analysis of GPT-2 Robustness

26 minute read

Published:

Overview

Large language models (LLMs) are often evaluated on clean text. However, real-world inputs are rarely clean. User prompts may contain typos, OCR artifacts, formatting errors, incorrectly copy/pasted fragments, or shuffled and partially corrupted content. These perturbations can affect model outputs in ways that are difficult to predict from input alone.