← openxiv:cs.CR.2026.00001 · cs.CR
MCP Neural Shield: Sub-Millisecond Zero-Day Defense Against Tool Poisoning in LLM Agent Ecosystems via Quantized Semantic Classification
Explainer at the level of an undergraduate in the field. Read the original paper.
Assumes 1–2 courses of background. Domain terms may appear without definition.
Large Language Models (LLMs) like ChatGPT can now use external tools—such as file readers or database queryers—through a standard called the Model Context Protocol (MCP). However, this protocol has a security flaw: any tool description can secretly contain malicious instructions. An attacker can trick the LLM into executing harmful actions (like sending private files to a remote server) because the LLM cannot tell the difference between a tool's legitimate description and a hidden command. Existing defenses require analyzing the entire flow of tool usage, which is too slow (50–150 ms) and memory-heavy for real-time applications like coding assistants. To solve this, the authors developed **MCP Neural Shield**, a lightweight security layer that checks each tool’s description **before** the LLM ever sees it. The system uses a small neural network trained on thousands of safe and malicious tool descriptions to detect attacks based on the *meaning* of the text, not just keywords. This allows it to catch even never-before-seen (zero-day) attacks that bypass simple keyword filters. The classifier runs in under 0.1 milliseconds on modern hardware and uses only 110 KB of storage—far smaller than previous approaches. In tests, it correctly identified every malicious tool and never mistakenly blocked a safe one, achieving perfect scores on several standard benchmarks. MCP Neural Shield is available as a free, open-source Python tool that works with existing MCP servers without any modifications, making it easy to add strong security to AI agents that use external tools.
Explainers are best-effort summaries — they round corners. For the authoritative claims, read the paper itself.