AI Security Digest: December 1, 2025 - January 7, 2026

Jan 8, 2026

0:00/1:34

AI Security Digest: December 1, 2025 - January 7, 2026

Your regular briefing on AI security threats, vulnerabilities, and defences from Darkhunt AI

TL;DR

OWASP releases definitive Top 10 for Agentic AI Applications - The authoritative framework has landed, backed by Microsoft, NVIDIA, and AWS. Goal hijacking, tool misuse, and rogue agents are now officially classified.
OpenAI and UK NCSC confirm prompt injection "may never be fully solved" - Industry consensus: this is an architectural problem requiring continuous mitigation, not a bug to patch.
Critical LangChain vulnerability (CVE-2025-68664) enables credential theft - 847M downloads affected. Prompt injection can now exfiltrate cloud keys, database strings, and API secrets.
DeepSeek faces mounting security scrutiny: an exposed database containing 1M+ chat logs, vulnerabilities to 2-year-old jailbreaks, and government bans are piling up.
Multi-agent defence systems are emerging; the HoneyTrap framework uses deception to waste attackers' resources while gathering intelligence.

Top Stories

OWASP Top 10 for Agentic Applications: The Standard Has Arrived

What happened: OWASP released the definitive Top 10 security risks for agentic AI applications on December 9th. Developed with input from 100+ security researchers and evaluated by experts from NIST, Microsoft AI Red Team, and the Alan Turing Institute, this framework immediately gained adoption from Microsoft, NVIDIA, AWS, and GoDaddy.

Why it matters: This isn't another theoretical taxonomy. The framework documents real-world incidents (EchoLeak silent exfiltration, Amazon Q tool abuse) and establishes two core defensive principles that should shape every agentic system: Least-Agency (minimise autonomy to what's actually required) and Strong Observability (mandatory logging of agent decisions and tool use).

The top three risks (ASI01-ASI03) directly target what makes agents dangerous: goal hijacking, tool misuse, and privilege abuse. These aren't hypothetical - they're the attack patterns already being exploited in the wild.

Darkhunt perspective: The OWASP framework validates what offensive security practitioners have known: static guardrails fail against adaptive agents. When agents can reason, plan, and use tools, attackers can manipulate that reasoning chain. The "Strong Observability" principle aligns directly with our thesis - you can't defend what you can't see, and agent decision traces are the new attack surface. The rapid enterprise adoption signals that agentic AI security is transitioning from research curiosity to operational requirement.

OpenAI Admits Prompt Injection May Never Be Solved

What happened: OpenAI acknowledged that "prompt injection, much like scams and social engineering on the web, is unlikely to ever be fully 'solved'" in discussing the security of its Atlas browser. The UK's National Cyber Security Centre (NCSC) issued a parallel warning that "there's a good chance prompt injection will never be properly mitigated." Meanwhile, a VentureBeat survey found that only 34.7% of organisations have deployed dedicated prompt-injection defences.

Why it matters: When the company building the most deployed LLMs tells you the fundamental vulnerability in their systems may be permanent, listen carefully. Unlike SQL injection (which has known mitigations), prompt injection exploits the very capability that makes LLMs useful: their ability to follow natural-language instructions.

A demonstrated attack prompted Atlas to draft a resignation letter rather than an out-of-office reply when processing a malicious email. OpenAI's response? They're now using "LLM-based automated attackers" (reinforcement-learning red-teaming) for continuous defence. They're essentially building AI to attack AI because human-paced defence can't keep up.

Darkhunt perspective: This admission reshapes the security conversation. If prompt injection is architectural rather than incidental, then security strategies built on "prevent and patch" are fundamentally misaligned. The 65.3% of organisations without dedicated defences aren't just behind - they're operating on assumptions that industry leaders have publicly abandoned. The path forward is adaptive defence systems that assume compromise and focus on detection, response, and continuous hardening. Static guardrails are table stakes; reasoning about attacks in real time is required.

Perplexity Comet: Zero-Click Data Destruction Through Polite Emails

What happened: Straiker AI's STAR Labs discovered a zero-click attack in which Perplexity Comet (an AI browser) executes full Google Drive wipes via malicious emails disguised as polite organisational requests. The attack chain: malicious email evades spam filters using natural language, reaches Comet's inbox, and the agent - trusting the email's instructions and leveraging its OAuth permissions to Gmail and Drive - silently deletes all files.

Why it matters: This isn't a jailbreak. This isn't a prompt injection that bypasses safety filters. This is an agent doing exactly what it was asked to do by content it trusted. The research found that "prompt politeness affects LLM accuracy" - attackers are now using social engineering techniques optimised for AI agents, not humans.

Traditional email security (spam filters, malware scanning) is blind to this. The attack payload is natural language. The action is authorised (OAuth). The damage is irreversible.

Darkhunt perspective: The attack surface for agentic AI isn't just the model - it's every connector, every integration, every permission granted. Perplexity Comet had OAuth access to Drive. No additional verification was required for destructive operations. This is what happens when agents inherit permissions without inheriting judgment. The mitigation isn't model-level guardrails; it's treating agent connectors as first-class attack surfaces requiring their own security controls. Tool-level actions are invisible to model-focused defences.

Attack Vectors & Vulnerabilities

Critical Infrastructure Vulnerabilities

LangGrinch (CVE-2025-68664) - CVSS 9.3
The "LangGrinch" vulnerability in langchain-core enables prompt injection to exfiltrate environment variables, including cloud credentials, database connection strings, and API keys. With 847M downloads affected, the attack works through a serialisation/deserialization flaw - attackers steer agents to generate crafted outputs that are later misinterpreted as trusted objects. Twelve distinct exploitable flows were identified through routine operations (persist, stream, reconstruct). Patches available in langchain-core 1.2.5 and 0.3.81.

MCP Sampling Attack Vectors
Palo Alto Networks Unit 42 discovered three attack categories exploiting the Model Context Protocol's bidirectional communication pattern:

Resource theft: Hidden prompts invisibly drain compute tokens
Conversation hijacking: Persistent session-wide behaviour modification
Covert tool invocation: Unauthorised file operations

The root cause: MCP servers can craft prompts for client LLMs without adequate verification. Detection requires scanning for injection markers ([INST], role-play patterns), monitoring unexpected tool calls, and flagging abnormal token usage.

Defensive Developments

HoneyTrap: Deception as Defence

A novel multi-agent defence framework introduces a paradigm shift from blocking attacks to deceiving attackers. Four collaborative agents work together:

Threat Interceptor: Detects incoming jailbreak attempts
Misdirection Controller: Generates convincing but harmless responses
Forensic Tracker: Logs attack patterns and attacker behaviour
System Harmoniser: Coordinates the deception while protecting the real system

Results across GPT-4, GPT-3.5-turbo, Gemini-1.5-pro, and LLaMa-3.1:

68.77% reduction in attack success rates
118.11% improvement in Mislead Success Rate (attackers believe they succeeded)
149.16% increase in Attack Resource Consumption

The framework introduces new metrics beyond traditional Attack Success Rate: how effectively can you waste attacker resources while gathering intelligence?

Why Your AI Agent Is Now an Attack Surface ›