多智能体系统：从经典范式到大基础模型驱动的未来

> **来源：[研报客](https://pc.yanbaoke.cn)** # Summary of "Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures" ## Core Content This survey explores the evolution of **Multi-Agent Systems (MASs)** from **Classical MASs (CMASs)** to **Large Foundation Model-based MASs (LMASs)**. It emphasizes the transition from environment-specific, task-driven coordination mechanisms to more **general-purpose, cognitively empowered systems** that leverage **Large Foundation Models (LFMs)** for **semantic-level reasoning**, **planning**, and **zero/few-shot generalization**. The paper presents a comprehensive review of both paradigms, comparing their **architectures**, **operating mechanisms**, **adaptability**, and **applications**, while also identifying **key challenges** and **future research opportunities** in the development of MASs. ## Main Contributions 1. **Comprehensive Overview**: A detailed review of core theories and recent advances in both CMASs and LMASs. 2. **Comparative Analysis**: A systematic comparison of CMASs and LMASs from theoretical and practical perspectives, highlighting their **similarities**, **differences**, and the **paradigm shift** they represent. 3. **Future Directions**: Identification of **open challenges** and **potential research opportunities** for the development of future MAS systems. ## Key Dimensions of Classical MASs (CMASs) CMASs are structured around a **closed-loop coordination framework**, with four fundamental dimensions: - **Perception**: Involves processing **agent state**, **sensor data**, and **signals from other agents**. It includes **early**, **intermediate**, and **late fusion** methods, with **intermediate fusion** being the dominant approach due to its balance of **communication efficiency** and **perception performance**. - **Communication**: Refers to the **exchange of information** among agents. It is analyzed from three perspectives: - **Topological**: Communication networks are modeled as **graph structures** with nodes representing agents and edges representing communication links. - **Frequency**: Communication can be **event-triggered** or **time-triggered**, with the former reducing **redundant data** and improving **bandwidth efficiency**. - **Content**: Includes **explicit** and **implicit** communication, where the latter relies on **environment-mediated interactions**. - **Decision-Making**: Governed by **cooperative**, **competitive**, and **hybrid** interaction modes. It is categorized into **model-based** and **learning-based** approaches: - **Model-based**: Uses **rule-based**, **game theory**, and **evolutionary optimization** to derive decision strategies. - **Learning-based**: Embraces **multi-agent reinforcement learning (MARL)** for **end-to-end optimization** in complex, partially observable environments. - **Control**: Focuses on **distributed mechanisms** for **coordination**, **stability**, and **global objectives**. Two main paradigms are: - **Consensus Control**: Ensures **asymptotic convergence** of agent states to a common value, with recent studies incorporating **learning-based methods** to achieve **closed-loop stability**. - **Formation Control**: Enables agents to **form and maintain spatial structures**, with approaches ranging from **leader-follower** to **learning-based** methods. These dimensions are **interconnected**, forming a **closed-loop system** that enables **system-level intelligence**. ## Key Dimensions of Large Foundation Model-based MASs (LMASs) LMASs integrate **LFMs** to enhance **reasoning**, **planning**, and **generalization** capabilities. The paper reviews LMASs across four core aspects: - **Core Modules**: - **Role Definition**: Assigns **distinct responsibilities** to agents, with **manually defined** and **task-adaptive** roles. - **Perception**: Structured into **semantic**, **situational**, and **cognitive** levels, enabling **task-aware interpretation** and **anticipatory understanding**. - **Planning**: Transforms **high-level instructions** into **executable steps**, with strategies including **structured planning**, **feedback-driven optimization**, and **reliability enhancement**. - **Memory**: Stores and retrieves **task-relevant knowledge**, with **short-term** and **long-term** components, enhancing **reasoning efficiency** and **task execution**. - **Execution**: Converts **reasoning and planning** into **actions**, leveraging **language generation** and **external tools** for **task-specific performance**. - **Interaction Mechanisms**: - **Collaborative Architecture**: Includes **cooperative**, **competitive**, and **hybrid** paradigms, enabling **dynamic interaction**. - **Task Orchestration**: Ranges from **structured workflows** to **autonomous coordination**, with frameworks like **ReSo** and **MAS²** supporting **self-organization** and **self-evolution**. - **Communication**: Emphasizes **structured protocols** (e.g., **MCP**, **A2A**, **ANP**) and **cost reduction** strategies, such as **pruning-based methods**. - **Human-Agent Interaction**: Supports **goal alignment**, **ethical oversight**, and **iterative feedback**, with frameworks like **AutoGen** and **InteRecAgent** facilitating **collaborative decision-making**. - **Hierarchical Optimization**: - **Model Layer**: Optimizes **reasoning and decision-making** through **supervised fine-tuning (SFT)** and **reinforcement learning (RL)**. - **Knowledge Layer**: Enables **cross-task experience accumulation** and **transfer**, enhancing **generalization** and **adaptability**. - **System Layer**: Supports **cross-environment task execution** across **device-edge-cloud infrastructures**. ## Comparative Analysis | **Aspect** | **CMASs** | **LMASs** | |---------------------|---------------------------------------------------------------------------|---------------------------------------------------------------------------| | **Architecture** | Task-specific, model-based, or rule-based | General-purpose, LFM-based, with **semantic-level reasoning** | | **Operating Mechanism** | Relies on **explicit models** or **learning-based training** | Leverages **pretrained knowledge** and **reasoning** for **zero/few-shot** generalization | | **Adaptability** | Limited by **explicit modeling** and **task-specific design** | Enhanced adaptability through **generalization** and **dynamic reasoning** | | **Applications** | Suitable for **structured**, **high-precision**, and **stability-critical** tasks | Better suited for **open-ended**, **reasoning-intensive**, and **complex** applications | ## Future Perspectives The paper identifies several **key challenges** and **research opportunities** in the development of MASs: - **Scalability**: Enhancing **system scalability** in complex and dynamic environments. - **Robustness**: Improving **robustness** and **reliability** of LFM-based systems. - **Interpretability**: Increasing **model interpretability** to ensure **trustworthy** and **transparent** decision-making. - **Human-in-the-Loop**: Developing **effective human supervision** mechanisms to guide **goal alignment** and **ethical compliance**. - **Generalization**: Enabling **zero-shot** and **few-shot learning** to improve **task adaptability**. - **Integration with Embodied Intelligence**: Combining **LFMs** with **sensory input** to enable **physical interaction** and **autonomous execution**. ## Conclusion LMASs represent a **paradigm shift** in multi-agent systems, moving from **task-specific** and **model-dependent** frameworks to **general-purpose**, **cognitively empowered**, and **semantically rich** systems. While CMASs remain essential for **reliable low-level control**, LMASs offer **greater flexibility**, **adaptability**, and **intelligence** through **semantic reasoning**, **planning**, and **generalization**. This survey aims to **bridge the gap** between these paradigms, promoting **synergistic co-evolution** and **joint advancement** in future research.