OpenAI publishes a practical evaluation about monitorability of models' internal reasoning — that is, how much we can understand and watch what a model is thinking before it gives an answer. Why does this matter today? Because if AI makes complex decisions you can't directly supervise, watching its thought process can be a powerful control tool.
What monitorability of the chain-of-thought means
Monitorability is the ability of a monitoring system to predict relevant properties of an agent's behavior from observable signals. Those signals can be actions, final outputs, internal activations or, in recent models, the chain-of-thought (CoT): the step-by-step narrative the model generates before giving an answer.
Does this mean you see everything the model thinks? Not exactly. It means that, in many cases, the CoT contains useful clues about intentions, mistakes, biases or even attempts to game the reward. And if those clues stay reliable as models scale, the CoT can be a valuable control layer.
