Data Interpreter: An LLM Agent For Data Science¶
作者:Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Wenyi Wang, Xiangru Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Zongze Xu, Chenglin Wu
作者:洪思瑞, 林一章, 刘邦, 刘邦邦, 吴斌豪, 李丹阳, 陈佳琪, 张佳怡, 王金林, 张丽, 张凌瑶, 杨敏, 诸葛明晨, 郭泰成, 拓周, 陶伟, 王文一, 唐相如, 卢向涛, 郑郑霞武, 梁新兵, 费亚英, 程宇恒, 徐宗泽, 吴成林
简介:Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution designed to solve with code that emphasizes three pivotal techniques to augment problem-solving in data science: 1) dynamic planning with hierarchical graph structures for real-time data adaptability;2) tool integration dynamically to enhance code proficiency during execution, enriching the requisite expertise;3) logical inconsistency identification in feedback, and efficiency enhancement through experience recording. We evaluate the Data Interpreter on various data science and real-world tasks. Compared to open-source baselines, it demonstrated superior performance, exhibiting significant improvements in machine learning tasks, increasing from 0.86 to 0.95.
简介:基于大型语言模型 (LLM) 的代理已显示出显著的有效性。但是,在需要实时数据调整、由于各种任务之间的复杂依赖关系而需要优化专业知识以及识别逻辑错误以进行精确推理的能力的数据科学场景中,它们的性能可能会受到影响。在这项研究中,我们介绍了数据解释器,这是一种旨在使用代码求解的解决方案,它强调三种关键技术来增强数据科学中的问题解决能力:1)具有分层图形结构的动态规划,以实现实时数据适应性;2)动态集成工具,提高执行过程中的代码熟练程度,丰富必要的专业知识;3)识别反馈中的逻辑不一致,通过经验记录提高效率。我们在各种数据科学和现实世界任务中评估数据解释器。与开源基线相比,它表现出卓越的性能,在机器学习任务方面表现出显着改进,从 0.86 增加到 0.95。
INTRODUCTION¶
LLM-based agent, called the Data Interpreter, designed specifically for the field of data science. This agent follows a
plan-code-verify
approach to fulfill human requirements by breaking down tasks, executing code, and verifying feed- back.