Input Perturbations Propagate Differently: A Mechanistic Analysis of GPT-2 Robustness
Published:
Overview
Large language models (LLMs) are often evaluated on clean text. However, real-world inputs are rarely clean. User prompts may contain typos, OCR artifacts, formatting errors, incorrectly copy/pasted fragments, or shuffled and partially corrupted content. These perturbations can affect model outputs in ways that are difficult to predict from input alone.
