2408.08435_ADAS: Automated Design of Agentic Systems¶

https://arxiv.org/abs/2408.08435
组织: 加拿大英属哥伦比亚大学, Vector 研究所, 加拿大CIFAR AI Chair
作者: 胡胜然，吕聪和Jeff Clune
研究人员正在投入大量精力开发强大的通用代理，其中基础模型被用作代理系统中的模块（例如，Chain-of-Thought、Self-Reflection、Toolformer）。然而，机器学习的历史告诉我们，手工设计的解决方案最终会被学习的解决方案所取代。我们制定了一个新的研究领域，代理系统的自动化设计（ADAS），旨在自动创建强大的代理系统设计，包括发明新颖的构建块和/或以新的方式组合它们。我们进一步证明，ADAS 中存在一种未开发但有前途的方法，可以在代码中定义代理，并且可以由在代码中编程更好的元代理自动发现新代理。鉴于编程语言是图灵完备的，这种方法理论上可以学习任何可能的代理系统：包括新颖的提示、工具使用、控制流及其组合。我们提出了一种名为 Meta Agent Search 的简单而有效的算法来演示这一想法，其中 Meta Agent 根据不断增长的先前发现档案迭代编程有趣的新代理。通过跨多个领域（包括编码、科学和数学）的广泛实验，表明我们的算法可以逐步发明具有新颖设计的代理，这些代理的性能大大优于最先进的手工设计的代理。重要的是，我们始终观察到一个令人惊讶的结果，即 Meta Agent Search 发明的代理即使在跨域和模型传输时也能保持卓越的性能，这证明了它们的稳健性和通用性。只要我们安全地开发它，我们的工作就说明了一个令人兴奋的新研究方向的潜力，即自动设计更强大的代理系统以造福人类。
Researchers are investing substantial effort in developing powerful general-purpose agents, wherein Foundation Models are used as modules within agentic systems (e.g. Chain-of-Thought, Self-Reflection, Toolformer). However, the history of machine learning teaches us that hand-designed solutions are eventually replaced by learned solutions. We formulate a new research area, Automated Design of Agentic Systems (ADAS), which aims to automatically create powerful agentic system designs, including inventing novel building blocks and/or combining them in new ways. We further demonstrate that there is an unexplored yet promising approach within ADAS where agents can be defined in code and new agents can be automatically discovered by a meta agent programming ever better ones in code. Given that programming languages are Turing Complete, this approach theoretically enables the learning of any possible agentic system: including novel prompts, tool use, control flows, and combinations thereof. We present a simple yet effective algorithm named Meta Agent Search to demonstrate this idea, where a meta agent iteratively programs interesting new agents based on an ever-growing archive of previous discoveries. Through extensive experiments across multiple domains including coding, science, and math, we show that our algorithm can progressively invent agents with novel designs that greatly outperform state-of-the-art hand-designed agents. Importantly, we consistently observe the surprising result that agents invented by Meta Agent Search maintain superior performance even when transferred across domains and models, demonstrating their robustness and generality. Provided we develop it safely, our work illustrates the potential of an exciting new research direction toward automatically designing ever-more powerful agentic systems to benefit humanity.
GitHub: https://github.com/ShengranHu/ADAS
一作博客: https://www.shengranhu.com/ADAS/
一作twitter: https://x.com/shengranhu

https://img.zhaoweiguo.com/uPic/2024/08/P2WgYU.png — Overview of the proposed algorithm **Meta Agent Search** and examples of discovered agents.¶

https://img.zhaoweiguo.com/uPic/2024/08/gsO2Po.png — Figure 2 | The three key components of Automated Design of Agentic Systems (ADAS). The search space determines which agentic systems can be represented in ADAS. The search algorithm specifies how the ADAS method explores the search space. The evaluation function defines how to evaluate a candidate agent on target objectives such as performance.¶

Main prompt for the meta agent:

You are an expert machine learning researcher testing different agentic systems.

[Brief Description of the Domain]
[Framework Code]
[Output Instructions and Examples]
[Discovered Agent Archive] (initialized with baselines, updated at every iteration)

# Your task
You are deeply familiar with prompting techniques and the agent works from the literature. Your goal is
to maximize the performance by proposing interestingly new agents ......
Use the knowledge from the archive and inspiration from academic literature to propose the next
interesting agentic system design.

https://img.zhaoweiguo.com/uPic/2024/08/UxRM2O.png — ARC 挑战赛的 Meta Agent Search 结果。（a） Meta Agent Search 根据不断增长的 archive of previous discoveries 逐步发现高性能代理。我们通过评估代理五次，来报告保留测试集的中位准确率和 95% bootstrap 置信区间。（b） Meta Agent Search 在 ARC 挑战赛中发现的最佳代理的可视化。¶

We compared against five state-of-the-art hand-designed agents:

1) Chain-of-Thought (COT) (Wei et al., 2022),
    which instructs the agent to output the reasoning before answering to improve complex problem-solving through intermediate steps;
2) Self-Consistency with Chain-of- Thought (COT-SC) (Wang et al., 2023b),
    which ensembles multiple parallel answers from COT to produce a more accurate answer;
3) Self-Refine (Madaan et al., 2024; Shinn et al., 2023),
    which allows iterative self-reflection to correct mistakes made in previous attempts;
4) LLM-Debate (Du et al., 2023),
    which enables different LLMs to debate with each other, leveraging diverse perspectives to find better answers;
5) Quality-Diversity, a simplified version of Intelligent Go-Explore (Lu et al., 2024c),
    which produces and ensembles diverse answers to better explore potential solutions.

https://img.zhaoweiguo.com/uPic/2024/08/aoM0yg.png

Prompt¶

https://img.zhaoweiguo.com/uPic/2024/08/3Jd0rh.png

https://img.zhaoweiguo.com/uPic/2024/08/b41rQb.png