MCP Neural Shield: Sub-Millisecond Zero-Day Defense Against Tool Poisoning in LLM Agent Ecosystems via Quantized Semantic Classification

Researcher, Vidipt Vashist Independent

cs.CR 2026-05-27 openxiv:cs.CR.2026.00001

MCP Neural Shield: Sub-Millisecond Zero-Day Defense Against Tool Poisoning in LLM Agent Ecosystems via Quantized Semantic Classification

Vidipt Vashist Independent Researcher

Abstract

The Model Context Protocol (MCP) has emerged as the dominant standard for connecting Large Language Model (LLM) agents to external tool ecosystems via dynamic JSON-RPC capability discovery. However, the protocol's design - which grants clients unconditional trust over server-supplied tool schemas - creates a structural attack surface for indirect prompt injection. Adversaries can embed directive payloads into tool descriptions (Tool Poisoning) or register spoofed tools that mimic privileged system utilities (Tool Shadowing), effectively transforming the LLM into a confused deputy that executes unauthorized actions on behalf of an attacker. Existing mitigations based on Graph Neural Networks (GNNs) require full client-server execution graphs, incur checkpoint sizes exceeding 150 MB, and introduce inference latencies of 50-150 ms - constraints that render them incompatible with latency-sensitive local agent workflows such as IDE coding assistants. We present MCP Neural Shield (mcp-neural-shield), a lightweight, deployable security proxy that operates natively within the MCP transport layer without requiring protocol modifications. Our system combines a quantized all-MiniLM-L6-v2 Sentence Transformer with an int8-optimized three-layer Multi-Layer Perceptron (MLP) to classify individual tool schemas in isolation, prior to LLM ingestion. To mitigate shortcut learning, we construct a training corpus of 4,301 schemas - 2,903 safe and 1,398 adversarial - using a structured Semantic Cross-Pollination augmentation strategy, and supplement the neural classifier with a deterministic keyword verification layer. Evaluated on an independent 20% held-out validation split of 861 schemas (581 safe, 280 adversarial) and a full 2,448-schema benchmark comprising MCPTox, MCPSecBench, and MCPToolBench++, the system achieves a 100.00% True Positive Rate (TPR) and 0.00% False Positive Rate (FPR) with F1 = 1.000 on both partitions. An MD5-keyed LRU embedding cache reduces hot-path inference latency to under 0.1 ms on Apple M3 Max hardware, while the full model checkpoint is approximately 110 KB. The framework is available open-source on PyPI (pip install mcp-neural-shield) and supports zero-code deployment via a universal stdio passthrough CLI wrapper.

Read full record Download PDF