In traditional software, a function takes an input and produces a definite output. When something malfunctions, the failure modes are finite — it either does the right thing or the wrong thing. You find the bug, patch the function, ship the fix. The malfunction surface is bounded.
AI models don’t behave this way. The same input can produce many different outputs depending on temperature, sampling, context, and conversation history. There is no fixed input-to-output mapping to patch.
That’s why patching an AI model isn’t like patching a deterministic system. The malfunction surface isn’t a list of bugs you can close one by one — it’s a continuous space of possible failures, and a clever prompt can always find a new path through it.
This is the root reason jailbreaking is trivial: you can’t enumerate the prompts that would break the model, so you can’t defend against them all.
Heard from Sander Schulhoff on Lenny’s Podcast. See also the broader security mentality.