← openxiv:psy.cog.2026.00001 · psy.cog

Beyond Reward Maximization: A Pressure-Driven Cognitive Architecture for Continual Cross-Domain Generalization and Stability

Explainer at the level of an undergraduate in the field. Read the original paper.

For a curious high-schooler For an undergraduate in the field For a researcher in an adjacent area

Assumes 1–2 courses of background. Domain terms may appear without definition.

Modern AI systems struggle with two key challenges: learning new information without forgetting old knowledge, and using what they’ve learned in one situation to solve problems in a completely different domain. This paper introduces RAVANA, a cognitive architecture designed to tackle these problems by mimicking how biological brains manage multiple competing needs. Unlike typical AI agents that simply try to maximize a reward signal, RAVANA balances several internal “pressures”—such as prediction error, novelty, memory stability, and internal dissonance—to guide its learning and behavior. The system includes specialized components: a recursive learning model that builds hierarchical concepts, a typed graph structure for representing relationships, mechanisms for local pattern completion (like predictive coding), and a “sleep” phase that consolidates and replays past experiences to prevent forgetting. Initial tests on synthetic benchmarks show promising results. In simple within-domain tasks, accuracy jumped from 0% to 100% after architectural improvements. When tested on cross-domain analogies, the system achieved 14.3% top‑1 accuracy and 71.4% top‑10 accuracy. Most impressively, a combination of replay, elastic weight consolidation, and Bayesian updates reduced catastrophic forgetting from 12.0% to 0.0% over a stream of 15,000 experiences. The authors caution that these are proof‑of‑concept results from small, artificial tasks, and independent replication is needed. Nevertheless, RAVANA offers a concrete computational hypothesis: robust, flexible intelligence may arise not from maximizing a single reward, but from maintaining a dynamic balance between prediction, analogy, memory consolidation, and self‑stability.

AI-generated (deepseek-v4-flash) · created 2026-05-28

Explainers are best-effort summaries — they round corners. For the authoritative claims, read the paper itself.