Microsoft introduces SentinelStep for patient AI agents

Oct 20, 20253 minutes

Modern conversational agent tools can debug code, analyze spreadsheets, and plan complex trips. But when you ask them for something simpler and more human — like waiting and notifying you when something happens — many fail. Why? Because they don’t know when to check again without draining resources or losing context. (https://www.microsoft.com/en-us/research/blog/tell-me-when-building-agents-that-can-wait-monitor-and-act/)

What SentinelStep is and what problem it solves

Microsoft Research proposes SentinelStep, a technique to build agents that can wait, monitor, and act for hours or days without getting lost in the conversation or consuming all the context. In practice, SentinelStep wraps the agent in a workflow with dynamic polling and careful context management so the monitoring task runs until a condition is met. (https://www.microsoft.com/en-us/research/blog/tell-me-when-building-agents-that-can-wait-monitor-and-act/)

Let me put it with a concrete example: you want an agent to watch your email for a colleague’s reply, or to tell you if a product’s price drops in the next three days. It’s not about scraping pages or reading emails just once. It’s about deciding when to look again, how long to keep state, and how not to “hog” the model’s memory. SentinelStep aims for that balance. (https://www.microsoft.com/en-us/research/blog/tell-me-when-building-agents-that-can-wait-monitor-and-act/)

The central challenge isn’t what the agent can do. The challenge is when and how often it should do it to be useful and efficient. (https://www.microsoft.com/en-us/research/blog/tell-me-when-building-agents-that-can-wait-monitor-and-act/)

How it works, in plain terms

SentinelStep works with three simple components: the actions that gather information, the condition that determines when the task is complete, and the polling interval that defines the cadence of checks. The logic is: every [polling interval] do [actions] until [condition] is true. The novelty is that the interval is estimated based on the task and adjusted dynamically, and the agent’s state is saved to avoid overflowing the context. (https://www.microsoft.com/en-us/research/blog/tell-me-when-building-agents-that-can-wait-monitor-and-act/)

In Microsoft’s demo, SentinelStep is integrated into a co-planning interface called Magentic-UI. There, the system suggests multi-step plans and prefilled parameters for monitoring steps; you can accept or tweak those parameters. The orchestrator assigns specialized agents (for example, to browse the web or run code) and controls when to restart or advance the flow. (https://www.microsoft.com/en-us/research/blog/tell-me-when-building-agents-that-can-wait-monitor-and-act/)

Does it work in practice? Results and evaluation

Evaluating real monitoring tasks is tricky, because many events happen only once and aren’t repeatable. To address that, the team built SentinelBench, a set of synthetic web environments with configurable scenarios that let you repeat experiments. Examples include simulators of GitHub repositories gaining stars, Teams monitors, and flight availability trackers. (https://www.microsoft.com/en-us/research/blog/tell-me-when-building-agents-that-can-wait-monitor-and-act/)

In initial tests, SentinelStep improves reliability for long tasks: for 1-hour tasks success rates rose from 5.6% without SentinelStep to 33.3% with it; for 2-hour tasks it rose to 38.9%. For short tasks performance stays similar. It’s a clear improvement when patience matters. (https://www.microsoft.com/en-us/research/blog/tell-me-when-building-agents-that-can-wait-monitor-and-act/)

Availability, code and precautions

Microsoft has open-sourced SentinelStep as part of Magentic-UI. You can find the repository on GitHub and install the interface with pip install magnetic-ui. The team warns that, as with any new technique, production deployment requires testing and validation tailored to each use case, and they point to a transparency note on privacy and security. (https://www.microsoft.com/en-us/research/blog/tell-me-when-building-agents-that-can-wait-monitor-and-act/)

If you’re an entrepreneur or developer, this lets you build assistants that are truly always attentive without being invasive or wasting tokens. Imagine an agent that watches a quote and notifies you the moment it crosses your threshold, or one that checks a support queue and acts only when a critical ticket arrives. Useful, right? (https://www.microsoft.com/en-us/research/blog/tell-me-when-building-agents-that-can-wait-monitor-and-act/)

What now? Impact for users and developers

SentinelStep is an example of how research is adding patience to AI workflows. It’s not magic; it’s design: choosing smart polling frequencies, saving states, and orchestrating specialist agents. For you that means fewer false alarms, less wasted resources, and automations that actually solve everyday tasks. (https://www.microsoft.com/en-us/research/blog/tell-me-when-building-agents-that-can-wait-monitor-and-act/)

If you want to read the original technical note or explore the code, check the Microsoft Research article and repository. Original article on Microsoft Research. (https://www.microsoft.com/en-us/research/blog/tell-me-when-building-agents-that-can-wait-monitor-and-act/)

Think: what tasks would you hand off to an agent that can patiently wait for you? Start with something small and test it. Patience in automation can save you a lot of time.

Stay up to date!

Get AI news, tool launches, and innovative products straight to your inbox. Everything clear and useful.