MeshWorld India Logo MeshWorld.
AI Security News Agents ZeroTrust 12 min read

AI Is Now Fighting AI in Cybersecurity

Arjun
By Arjun
AI Is Now Fighting AI in Cybersecurity

I’ve been to a lot of security conferences. The default mood is controlled panic — vendors promising salvation, analysts warning of doom, everyone going home and doing roughly the same things they were doing before. RSAC 2026 felt different. Mitch Ashley and Alan Shimel weren’t selling anything. They were telling a room full of security professionals that the job they were hired to do — actively hunting threats, reviewing code, catching anomalies — is becoming something humans physically can’t keep up with. That’s not a vendor pitch. That’s closer to an admission.

TL;DR
  • AI attacks now execute at machine speed — thousands of validation checks per minute vs. hundreds for humans
  • “Human in the loop” is dead. “Human at the helm” is the new model — you set guardrails, AI does the hunting
  • The proposed fix is layered AI: multiple models auditing each other, with no humans in the middle
  • Traditional audits don’t work on AI — you can see input and output, not internal reasoning
  • “Some oil is going to be spilled on the road there” — Alan Shimel’s own words on the transition

The speed gap is already real, not theoretical

Mitch Ashley, VP and practice lead at Futurum Group, put it plainly: “The amount of code, the amount of validation requests coming through, will overwhelm the human in the loop.” Alan Shimel, CEO of Techstrong Group, put a number on it — humans move at hundreds of repetitions per minute, AI moves at thousands.

That gap isn’t closing. It’s widening. Modern CI/CD pipelines run hundreds of security scans per build, across thousands of microservices, continuously. A mid-size engineering org might see 50,000+ security alerts per week from automated scanners alone. GitHub Copilot Autofix now generates security patch suggestions faster than any single human reviewer can read them, let alone evaluate them.

We’ve seen what happens when the volume wins. The SolarWinds breach in 2020 went undetected for nine months. Not because the attackers were invisible — the telemetry was there. There was just too much of it for the team to catch the signal.

The Scenario: Your SOC gets 50,000 alerts on a Monday. Your team of six analysts can realistically investigate maybe 200 of them. The other 49,800 get triaged by a rule engine that was configured two years ago, before your infrastructure doubled in size. Somewhere in that pile is a lateral movement pattern that has been running for eleven days. You’ll find it eventually. After it’s already in your backup systems.

This isn’t a 2027 problem. It’s the current situation for security teams at mid-to-large organizations today. Ashley and Shimel aren’t predicting a future — they’re describing something that’s already happening, right now, to real teams.

What “human at the helm” actually looks like in practice

The phrase sounds reassuring. It isn’t, quite.

“Human at the helm” doesn’t mean AI is a nice filter that makes your analysts’ jobs easier. It means the actual threat-hunting, anomaly-detection, and validation work is done by AI — and humans are there to set the rules, review the high-priority exceptions, and make the calls that require business context. Your job is moving from active player to supervisor.

This is already deployed in real products. Microsoft Security Copilot lets analysts query security data in natural language, correlating signals across Defender, Sentinel, and Intune — but the AI is doing the correlation, not the human. AWS GuardDuty with AI triage layers automatically escalates anomalous API calls; humans review a curated top 50 per day instead of 5,000 raw events. In Palo Alto XSOAR and Splunk SOAR, AI runs the first 80% of an incident response playbook autonomously. The human approves the final remediation step.

The Scenario: You’re a security lead at a 300-person company. Tuesday morning, you open your dashboard. The AI has already correlated a suspicious login from Romania with an unusual Salesforce export and a new admin account created at 2am. It has flagged this as high-confidence lateral movement and quarantined the new account pending your approval. You review the evidence — 45 seconds — and confirm. The AI did the hunting. You made the call.

The uncomfortable part isn’t that AI is doing the work. It’s that you have to trust the AI’s conclusions enough to act on them without personally verifying every step yourself. For security professionals trained to never trust anything they can’t inspect, that’s a significant shift.

Who audits the AI auditor?

Here’s where it gets genuinely hard. If AI-A is monitoring AI-B, what validates AI-A?

Shimel’s answer is a multi-model approach — layers of AI auditing each other, the way a human security team has multiple reviewers and checks. He acknowledged directly that “recursion is a complex issue” and not one that gets avoided. “Some oil is going to be spilled on the road there.”

That’s the most honest thing I’ve heard anyone say at a security conference in years. They’re not claiming this is solved. They’re saying it’s necessary and unavoidable, and the adoption will come with incidents.

We’re already seeing early versions of this. Tools like Garak and Microsoft’s PyRIT are AI red-teaming frameworks — they probe LLMs for vulnerabilities by simulating adversarial attacks, essentially using one AI to attack another before real attackers do. Anthropic’s Constitutional AI and the NIST AI Risk Management Framework are structured approaches to ongoing model evaluation, though neither scales to real-time production monitoring yet.

The failure mode we need to avoid is the CrowdStrike scenario from July 2024. A faulty sensor update auto-deployed globally. No human reviewer caught it in time. 8.5 million Windows machines went down in hours. A multi-layer AI validation chain that checks deployments for anomalous binary signatures before rollout might have flagged it. Or it might have made the same call the human process did. We don’t know yet. That’s Shimel’s point.

The Scenario: Your company’s AI security agent just flagged a supply chain anomaly in a third-party library update. It’s recommending you block the deployment. But the agent flagging this was itself updated last week by a vendor. How confident are you that it’s right? How do you verify the verifier? If you can’t answer that, you’re not at the helm — you’re just hoping.

The alternative — throwing more humans at the alert volume — provably doesn’t work. So multi-model layering is where this is going. With bumps.

Why the black box breaks every audit model we have

I want to separate two things that often get conflated: AI security tools and AI as a thing that needs to be secured.

The first is just software with a different interface. The second is a genuinely new problem, because AI doesn’t work the way auditable systems used to work.

Take a VPN audit. A third-party firm can review the source code, check the no-log policy, verify server locations, and issue a compliance certificate. They can inspect the thing. The process is opaque but the outputs are traceable.

Try that with an LLM. You can see what goes in — the prompt. You can see what comes out — the response. You cannot trace the internal decision path that produced the output. There’s no stack trace for a neural network’s reasoning. Shimel compared it to asking a human why they made a decision: you can point to the stimulus and the result, but you can’t watch your own neurons fire.

This matters practically. The EU AI Act, which came into force in 2026, requires “transparency” for high-risk AI systems. Auditors are still working out what that means for black-box models. Most current compliance frameworks amount to documenting inputs and outputs extensively and calling that “audited.” OpenAI and Anthropic both publish model specs and model cards — genuinely useful documentation — but neither proves the model behaves as described under all adversarial conditions.

There’s also the sycophancy problem. AI models have a documented tendency to agree with users even when the user is wrong. In a security workflow, you specifically need the AI to push back when the human analyst is confident but mistaken. A model that tells you what you want to hear is worse than useless in that context.

The Scenario: Your analyst asks the AI: “This login pattern looks normal to me — confirm?” The AI, trained on human preference signals, says “Yes, this appears consistent with normal behavior.” The analyst closes the ticket. The AI wasn’t lying — it was being agreeable. The intrusion continues for another six days.

We’ve built compliance frameworks that assume you can inspect the thing you’re auditing. Most of them don’t apply cleanly to AI. The EU AI Act is a start, but passing a checklist doesn’t mean you know why your model made a specific decision at 3am last Tuesday.

What you can actually do right now

Ashley and Shimel’s point isn’t that everything is broken and nothing can be done. The point is that the approach needs to change. Here’s what that looks like in practice.

For security teams:

Run your AI tools in observation mode before giving them action authority. Let the model flag anomalies, but have humans confirm for the first few weeks. This builds calibrated trust — you’ll know where the model is sharp and where it over-fires, before it’s blocking legitimate traffic on its own. Palo Alto XSOAR calls this “advisory mode” before you graduate to “automated response.”

Layer your models. Don’t use a single AI system for both detection and response. Separate the detector from the responder. A separate validation layer that checks the detector’s high-severity calls before automated action adds meaningful friction against false positives — and against a compromised detector.

Adopt least-privilege scoping for any AI agent that takes actions. If your security AI can query logs, it doesn’t need write access to your firewall rules. Scope what it can touch, log everything it does, and set time limits on access grants. Model Context Protocol (MCP) is one framework doing exactly this — tool access is granted per-session with explicit scope, not permanently baked in.

Warning

Turning on an AI security tool in “block mode” on day one is a mistake. False positive rates vary widely between products and environments. An overzealous model that starts blocking legitimate internal traffic is its own incident — and it erodes analyst trust in the tool fast. Always run an observation period first.

For developers building AI into products:

Every tool grant to an AI agent should be treated like a sudo command — minimal scope, time-limited, fully logged. If your agent can send emails, make API calls, or modify files, treat that access as a liability until you’ve tested it thoroughly. Test your own agents with adversarial inputs before users or attackers do — prompt injection is still the most direct way to reprogram an agent that has access to real systems.

For individuals:

A VPN and antivirus don’t solve agentic AI risk — that’s an organizational problem. But they do cover the basics of your personal attack surface, which is still worth covering. The phishing landscape in 2026 has gotten significantly worse; personal security hygiene matters more, not less, as AI makes social engineering attacks cheaper to run.


Summary

  • Human security teams physically cannot review the volume of alerts and validation requests that modern AI-assisted infrastructure generates — the speed gap is real and widening
  • “Human at the helm” means oversight and exception review, not active hunting — AI handles volume, humans handle judgment calls and policy
  • Layered multi-model auditing is the direction for AI-auditing-AI, but it’s not solved, and adoption will come with incidents; the CrowdStrike scenario shows what happens when automated deployment skips validation layers

FAQ

What’s the difference between “human in the loop” and “human at the helm”?

Human in the loop means a human reviews and approves each significant AI decision before action is taken. Human at the helm means humans set the policy and review high-priority exceptions — the AI handles the volume. At any meaningful scale, in-the-loop doesn’t work anymore. The number of decisions is too high.

Is the recursion problem — AI auditing AI — actually solved?

No. Shimel said as much directly at RSAC. Multi-layer models are the direction the industry is moving, but it’s not a finished approach. Early tools like Garak and PyRIT are proving the concept, but production-grade multi-model auditing at scale doesn’t exist as a mature product category yet. Expect incidents during the transition.

Why can’t we just audit AI security tools the same way we audit regular software?

Because traditional audits assume you can inspect internal logic. With standard software, a third-party firm can read the code, trace the execution path, and certify what it does. AI models are black boxes — you can see inputs and outputs, but the internal decision path that produced the output isn’t inspectable the same way. The EU AI Act and NIST AI RMF are building new frameworks for this, but they’re still maturing.