← openxiv:cs.CR.2026.00001 · cs.CR

MCP Neural Shield: Sub-Millisecond Zero-Day Defense Against Tool Poisoning in LLM Agent Ecosystems via Quantized Semantic Classification

Explainer at the level of a researcher in an adjacent area. Read the original paper.

For a curious high-schooler For an undergraduate in the field For a researcher in an adjacent area

Assumes deep technical literacy. Bridges to the closest neighbouring fields.

**Problem Statement**: The Model Context Protocol (MCP) exposes LLM agents to indirect prompt injection via tool poisoning and shadowing, as servers supply unverified schemas that the LLM treats as benign metadata. Existing GNN-based defenses require full execution graphs, incur >150 MB checkpoints and 50–150 ms latency, making them unsuitable for latency-sensitive local agent workflows. **Method**: MCP Neural Shield is a lightweight security proxy that intercepts tool schemas at the MCP transport layer without protocol modifications. It serializes each schema into natural language, encodes it with a quantized all-MiniLM-L6-v2 Sentence Transformer to 384‑d embeddings, then classifies via an int8‑optimized three‑layer MLP. A deterministic keyword pre‑filter provides zero‑latency coverage for known signatures, while an MD5‑keyed LRU cache reduces hot‑path inference to under 0.1 ms on Apple M3 Max hardware. Training uses 4,301 schemas (2,903 safe, 1,398 adversarial) augmented via Semantic Cross‑Pollination to prevent shortcut learning. **Main Results**: On a 20% held‑out validation split (861 schemas) and a full 2,448‑schema benchmark (MCPTox, MCPSecBench, MCPToolBench++), the system achieves 100% true‑positive rate and 0% false‑positive rate (F1 = 1.000). The quantized checkpoint is ~110 KB with a runtime footprint under 15 MB. **Limitations**: The classifier is English‑only and may miss adversarial payloads in other languages or obfuscated forms (Base64, Unicode). Perfect benchmark performance partly reflects near‑duplicates from the augmentation pipeline; adaptive adversaries could shift the decision boundary though an adjustable threshold is provided. Execution‑layer attacks and obfuscated instructions that evade the encoder’s tokenization are out of scope.

AI-generated (deepseek-v4-flash) · created 2026-05-27

Explainers are best-effort summaries — they round corners. For the authoritative claims, read the paper itself.