In part 2 of our series on the rise of Agentic Fraud Ops, Chen Zamir lays out the five-step sequence for handing your fraud reaction cycle over to AI agents, and explains why the order matters as much as the steps themselves. Catch up on part 1 here.
Most fraud teams that have started using agentic AI started in the right place. They’re piloting agents inside investigation: enriching alerts, structuring cases, and proposing resolutions for investigators to validate. For the most part, they are seeing clear efficiency gains from it.
The problem is that almost nobody is going further.
In a previous post, I argued that the master KPI in fraud isn’t precision or accuracy. It’s the reaction cycle - the time it takes your system to detect a gap and ship a fix for it.
And while fraudsters are deploying AI to adapt faster, fraud teams should use agentic AI to match that speed.
Automating parts of your investigation process is just the first step in that journey.
Compressing that one step while the other four still run at human speed gets you a faster investigation function but the same broken loop. The rules your investigators are chasing are still degrading. The labels feeding your models are still arriving weeks late.
But when the whole cycle runs at machine speed, the system becomes something different. Decisions happen at population and ring level instead of one event at a time. One investigation labels 150,000 accounts. One policy proposal replaces dozens of manual reviews. The cost per decision drops by an order of magnitude.
Getting there takes five steps, and the order matters more than most fraud leaders expect. Each step produces an input the next one needs, or builds the capability the team needs to govern it.
Skip ahead and you get rule automation running on stale labels, policy automation nobody knows how to govern, and a year of budget spent on systems that didn’t compound.
The sequence you want to follow is this:
- Investigation: enrich alerts, structure cases, propose resolutions
- Cases: cluster related alerts into ring-level cases
- Labels: generate training signal continuously
- Segmentation: automate population treatment policies
- Detection: automate rule and model proposals
What follows walks through each step and why it comes where it does.

Where to start: Agentic Fraud Investigation
This is where agentic AI is most mature today, where your team’s expertise is likely concentrated, and where the savings show up first. Specifically, you want to cover three areas:
Enrichment moves from the investigator to the agent. The 15 minutes of pulling device data, IP context, account history, and external lookups gets done in parallel and attached to the case before the investigator opens it.
Case output becomes structured. Instead of paragraphs typed into a comment field that nothing else can read, cases get logged in consistent, machine-readable form. This is the part that matters most for everything downstream.
And finally, the agent proposes resolutions with its reasoning. The investigator’s role shifts from generating the call to validating it. That’s a new skill, and this is the lowest-stakes place for your team to start building it.
The efficiency payoff is real: a 30-minute investigation cuts to about five. The deeper payoff is that investigation output is better structured, standardized, and machine-readable.
Then build cases from alerts
Once cases are structured, AI agents can begin to cluster them. New alerts get matched against known cases (fraud rings) as they come in, by device fingerprint, funding pattern, or behavioral signature. Matches get attributed to the existing case. Non-matches become candidates for a new one.
The investigator’s queue stops being a pile of unrelated alerts and becomes a curated set of assembled cases. Ring expansion happens automatically and unlocks your team’s true efficiency as one ruling now labels thousands of events at once.
This step is impossible without the structure work that came before. Teams that skipped it can’t build clustering at all.
Then automate fraud detection labels
So far you may have gained in efficiency, but your reaction cycle is not meaningfully faster. This step is the prerequisite that changes everything, and I can't stress how important it is.
The biggest bottleneck in your reaction cycle is labels. Chargebacks arrive weeks after the transaction, sometimes months, and by the time you have confirmed ground truth, the attack pattern has already moved on.
I’ve seen teams with state-of-the-art detection still underperform for one reason: their feedback loop was just too slow.
With agentic AI, and by following the steps in the right order, you can close this gap with relative ease. You already have an AI agent proposing investigation outcomes, and those recommendations can be turned into labels at scale.
The reason it works is that labels for training don’t have to be perfect. No customer gets declined because an agent labeled their transaction as suspicious. The labels only feed model retrains and rule backtests, not customer-facing decisions.
Would imperfect labels wrongly assigned by agents undermine your decisions? Not really, because rules and ML models already work with noisy data. A modest error rate from an agent is no worse than the noise you already have in chargebacks and investigation rulings. And that tolerance for imperfection is what makes continuous labeling at scale possible.
By the end of this step, the slowest part of your reaction cycle just went from weeks to hours. That’s when the real possibilities open up.
Then take on risk segmentation
Segmentation is the layer that decides how you'll process and assess the risk of each event: which rules fire, which vendors get called, or whether step-up authentication is triggered.
Many teams tend to ignore this crucial step, and even the ones that don’t usually resort to set-and-forget practices. But that's the layer where you implement your fraud strategy, and ignoring it slows your ability to react to emerging threats.
A word of caution: segmentation likely doesn't yet have a defined owner, goals, KPIs, or a process for moving populations between segments.
The thing is, that you simply can’t automate something nobody’s managing manually first. The team has to learn what good looks like before it can govern an agent that proposes changes.
There's one more difference worth flagging: this is a new motion for your team. The earlier steps all followed the same pattern, where the agent observes the live data pipeline and processes what goes through it.
But recommending a change to your risk segmentation works differently. The agent analyzes historical treatment performance across populations and proposes policy changes, not case rulings. This isn’t just semantics, it changes how you test, validate, and observe agent decisions.
The good news is that the agent isn't creating new segments; it's moving populations between existing ones. For example, a group of 25-to-30-day-old accounts moves out of the high-risk bucket and into the medium-risk one because their behavior looks like established users.
Because the risk segments are known and bounded, they're easier to test and validate than new fraud detection rules.
Save the detection layer for last
Detection is the layer that classifies events. ML models score, rules layer on top, and together they handle most decisions automatically. But we shouldn't confuse automated rule fires with automated rule writing.
Automating detection last is the right call, for two reasons.
The first reason we've already covered: fuel. Detection automation needs fresh, trustworthy labels to be useful. Without automated, early labeling already in place, an agent proposing new rules is just proposing wrong rules faster.
The second reason: governing agents that create and tweak rules is complex, especially if your team hasn't built the muscle for writing rules themselves or managing agents that run deep data analysis.
Unlike segmentation, where the agent picks between existing options, detection is generative. It proposes things that didn't exist before: new rule patterns, new features, and possibly new ML models.
Mastering agentic segmentation gives your team the governance experience detection automation requires.
But once you’ve mastered this step, you're running your reaction cycle at machine speed. Alerts come in, get clustered into ring-level cases, get labeled automatically, and fuel agents running continuous analysis for segment changes and new rule recommendations.
What it actually takes
Let’s be honest, this isn’t a one-quarter project.
The teams furthest along are six to 12 months in, and they’d say they have meaningful work still ahead. Each step needs tooling, organizational change, and the team’s time to absorb a new way of working before the next step makes sense.
Fraud teams are facing a clear risk here. On one hand, the pressure to adopt AI and scale back headcount is growing by the day, but there's little clarity on how to do it safely.
In an uncertain environment, it's crucial to understand your end goal: not just lower costs, but a better organization. Once you know what that means for your business, the right sequence follows.
As I outlined, the sequence is not only about where you can cut more costs. It’s also about clearing dependencies and growing new team muscles. Ignore these and you might need to hastily re-hire the team you thought was redundant. And we all know it’s not as simple as that.
Get the full picture
This is a roadmap overview. For the full picture, check out the whitepaper this post draws from.
It covers why agentic fraud ops are a reality all fraud teams need to prepare for, what such a system should look like, the headcount and skill mix it actually needs, the governance model that keeps continuous learning safe at scale, and 18-month rollout patterns: what works, where teams break it, and how to defend the sequence inside your organization.
If you’re the person at your company who’s going to fund, sequence, and defend this work internally, you’ll find many of the answers there.
When you evaluate platforms for this sequence, ask vendors for their answer to five questions, in this order. How do you verify agent identity? How do you infer agent intent? How do you measure behavior against expected patterns? How do you authorize specific actions and revoke them? How do you continuously monitor sessions across the user journey?
Vendors that have answers to all five are operating from a complete trust model. Vendors that have answers to two or three are still building one.





