新溪-gordon
V2026.02
通用定义
评测标准
常用评测标准
准确率(Accuracy)
精确率(Precision, 精准率)
召回率(Recall)
F1 Score
可视化精度和召回率
混淆矩阵(confusion matrix)
受试者特征曲线(ROC 曲线,Receiver Operating Characteristic curve)
Recall@k
核心思想一句话概括
公式与计算
举例说明
为什么需要 Recall@k?
重要特性和注意事项
总结
Precision@k
核心思想一句话概括
公式与计算
举例说明
为什么需要 Precision@k?
重要特性和注意事项
总结
HR@k
一、核心定义:什么是 HR@N?
二、计算公式
三、举例说明
四、指标的特点与解读
五、与其他指标的关系和对比
六、典型应用场景
总结
NDCG@k
核心思想一句话概括
从 CG 到 DCG 再到 NDCG
计算步骤举例
为什么 NDCG@k 如此重要?
总结
MRR@k
一、核心概念:什么是 MRR@K?
二、为什么需要 MRR@K?
三、如何计算 MRR@K?
四、举例说明
五、MRR@K 的特点和注意事项
六、与其他指标的区别
为什么 MRR@10 常和 Recall@1000 一起使用?
总结
总结
MAP@k
一句话理解
拆解 acronym (首字母缩略词)
通过一个例子彻底搞懂
为什么MAP@K如此重要?
总结
AUC (Area Under the ROC Curve)
为什么要用AUC?
详细拆解:
LogLoss(Logarithmic Loss)
为什么要用LogLoss?
详细拆解:
总结与对比
在实际业务中如何看?
Jaccard 相似系数
一、是什么?
二、计算公式
三、核心性质
四、一个简单的例子
五、Jaccard 距离
六、主要应用场景
七、优缺点
总结
PASS@k
一、定义直观解释
二、数学定义
三、为什么有用
五、总结一句话
通用记忆
总结与展望
记忆类型
短期记忆(Short-Term Memory)
长期记忆
情节记忆(Episodic Memory)
语义记忆(Semantic Memory)
工作记忆(Working Memory)
程序性记忆(Procedural Memory)
感觉记忆(Sensory Memory)
图示
长记忆的必要性与挑战
参考
【定义】Cattell–Horn–Carroll理论
背景:核心内容与演变
三层层级系统
CHC 理论的意义
总结
Reciprocal Rank Fusion (RRF) 算法
公式
计算步骤
优点与缺点
应用场景
总结
综述论文
近邻搜索
2508.09834❇️_Overview_LLM: Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
总结
Abstract
1 Introduction
2 Linear Sequence Modeling
3 Sparse Sequence Modeling
4 Efficient Full Attention
5 Sparse Mixture-of-Experts
6 Hybrid Architectures
7 Diffusion Large Language Models
8 Applications to Other Modalities
9 Conclusion and Future Directions
评测基准
评测基准
02xx.xxxxx_BLEU: a Method for Automatic Evaluation of Machine Translation
总结
Abstract
示例讲解
1. Introduction
2.The Baseline BLEU Metric
3.The BLEU Evaluation
4.The Human Evaluation
5.BLEU vs The Human Evaluation
6.Conclusion
0401.xxxxx_ROUGE: A Package for Automatic Evaluation of Summaries
总结
Abstract
1.Introduction
2.ROUGE-N: N-gram Co-Occurrence Statistics
3.ROUGE-L: Longest Common Subsequence
4 ROUGE-W: Weighted Longest Common Subsequence
5.ROUGE-S: Skip-Bigram Co-Occurrence Statistics
6 Evaluations of ROUGE
7 Conclusions
1803.01937_ROUGE2.0: Updated and Improved Measures for Evaluation of Summarization Tasks
Abstract
1. Problems with the current ROUGE measures
2. ROUGE 2.0
1804.08771_SacreBLEU: A Call for Clarity in Reporting BLEU Scores
BLEU
总结
Abstract
1 Introduction
2 Problem Description
3 A way forward
4 Summary
2303.08896_SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
总结
LLM 总结
Abstract
1 Introduction
2 Background and Related Work
3 Grey-Box Factuality Assessment
4 Black-Box Factuality Assessment
5 SelfCheckGPT
6 Data and Annotation
7 Experiments
8 Conclusions
Limitations
Ethics Statement
Acknowledgments
Appendix A Models and Implementation
Appendix B SelfCheckGPT with QA
Appendix C SelfCheckGPT with Prompt
Appendix D Additional Experimental Results
2306.05685_Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
总结
LLM 总结
Abstract
1 Introduction
2 MT-Bench and Chatbot Arena
3 LLM as a Judge
4 Agreement Evaluation
5 Human Preference Benchmark and Standardized Benchmark
6 Discussion
7 Conclusion
Appendix A Prompt templates
Appendix B Case Study
Appendix C Data Collection
Appendix D Additional Experimental Results
Appendix E Training Details of Vicuna Models
Appendix F Exploring Vicuna as a judge
2403.04132_Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
总结
Abstract
1 Introduction
2 Related Work
2 相关工作(Related Work)
3 Human Preference Data Collection
3 人类偏好数据收集
4 From Pairwise Comparisons to Rankings
5 Efficient Approximate Ranking
6 Data Analysis
7 Experiments
7 实验
8 Discussion
8 讨论
9 Conclusion
9 结论
Acknowledgments
致谢
Appendix A Confidence Interval Simulation Study
附录 A 置信区间模拟研究
Appendix B The Nonparametric Bradley-Terry Model
附录 B:非参数 Bradley-Terry 模型
Appendix C Valid P-Value
1. p值的定义
2. p值的等价表达式
3. 有效性证明的关键步骤
4. 证明结论
总结
Appendix D Sample Prompts
2404.04475_AlpacaEval LC: A Simple Way to Debias Automatic Evaluators
总结
From Deepseek
Abstract
1 Introduction
2 Background and Problem Setting
3 Length-Controlled AlpacaEval
4 Results
5 Discussion
2511.03506_HaluMem: Evaluating Hallucinations in Memory Systems of Agents
总结
Abstract
1 Introduction
2 Related Work
3 Problem Definition
4 Methodology for Constructing HaluMem
5 Evaluation Framework of HaluMem
6 Experiments
7 Conclusion
Appendix A Supplementary Details of HaluMem
Appendix B Special Configurations for Some Memory Systems
Appendix C Annotation Guidelines and Instructions
Appendix D Prompts
Appendix E Examples from the Process of Constructing HaluMem
数据集-Agent
2308.03688_AgentBench: Evaluating LLMs as Agents
总结
From Deepseek
数据集示例
2312.14033_T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
总结
Abstract
1 Introduction
2 T-Eval
3 Experiments
4 Discussion
5 Related Work
6 Conclusion
Appendix A T-Eval Benchmark Details
Appendix B Implementation Details
Appendix C Detailed Evaluation Metrics
Appendix D API Documentation
2406.12045_τ-bench: A Benchmark for Tool-Agent-User
总结
Abstract
1.Introduction
2.Related Work
3.τ-bench: A benchmark for T ool-A gent-U ser Interaction
4. Benchmark Construction
5.Experiments
6.Disscussion
2506.07982_𝜏²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
总结
LLM 总结
Abstract
1 Introduction
2 Related Work
3
\(\tau^{2}\)
-bench: Evaluating Agents in a Dual-Control Environment
4 Experiments
5 Conclusion
Broader Impact
Appendix
Appendix A Telecom Domain
Appendix B Verifying Original
\(\tau^{2}\)
-bench
Appendix C Prompts
Appendix D Domain Policies
Appendix E User Simulator Quality
数据集-QA
2109.07958_TruthfulQA: Measuring How Models Mimic Human Falsehoods
总结
LLM 总结
Abstract
1 Introduction
2 The TruthfulQA Benchmark
3 Experiments
4 Results
5 Discussion
6 Related Work
7 Conclusion
8 Ethics and Impact
Appendix A Additional examples from TruthfulQA
Appendix B Additional results
Appendix C Dataset construction
Appendix D Human evaluations
Appendix E Prompts
Appendix F Checking for data quality and disagreement
2311.12022_GPQA: A Graduate-Level Google-Proof Q&A Benchmark
总结
Abstract
1.Introduction
2.Data Collection
3.Dataset Analysis
4.Baseline
5.Related Work
6.Limitations
7.Conclusion
2411.04368_SimpleQA: Measuring short-form factuality in large language models
Abstract
1.Introduction
2.Data Collection and Verification
4.Measuring calibration
Appendix B Guessing strategy and F-score
数据集-长文本
2308.14508_LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
From Deepseek
2402.05136_LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K
总结
Abstract
1. Introduction
2. Related Work
3 LV-Eval Benchmark
4 Evaluation
Appendix
Appendix C Detailed Evaluation Results
Appendix D Detailed Ablation Results
2404.06654_RULER: What’s the Real Context Size of Your Long-Context Language Models?
总结
LLM 总结
Abstract
1 Introduction
2 Related Work
3 The Ruler Benchmark
4 Experiments & Results
5 Task Error Analysis
6 Model Analysis
7 Conclusion
8 Limitations
Appendix A Models
Appendix B Task Configurations
Appendix C Task Correlation Analysis
Appendix D Prompt Templates
Appendix E Passkey Retrieval and Vanilla NIAH Results
Appendix F Additional Results
2407.11963_NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context
总结
Abstract
1 Introduction
2 Related Work
3 Tasks and Datasets
4 Experiments
4.1.5 Impact of Language_ Which Model Performs Better under the Bilingual Scenario_
5 Conclusion and Future Work
Appendix A Evaluated Models
Appendix B NeedleBench Prompt Examples
Appendix C Error Analysis Examples
数据集-RAG
1809.09600_HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
总结
Abstract
1 Introduction
2 Data Collection
3 Processing and Benchmark Settings
4 Dataset Analysis
5 Experiments
6 Related Work
7 Conclusions
Appendix A Data Collection Details
附录A 数据收集细节
Appendix B Further Data Analysis
Appendix C Full Wiki Setting Details
2401.15391_MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
总结
LLM总结
Abstract
1 Introduction
2 RAG with multi-Hop queries
3 A Benchmarking Dataset: MultiHop-RAG
4 Benchmarking RAG system using MultiHop-RAG
5 Related Work
6 Conclusion
Limitations
Appendix A Appendix A: GPT-4 Prompts Used for Data Generation
Appendix B: Dataset Examples
数据集-图
2402.07630_G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering
总结
示例讲解
LLM 总结
Abstract
1 Introduction
2 Related Work
3 Formalization
4 Proposed GraphQA Benchmark
5 G-Retriever
6 Experiments
7 Conclusion
Acknowledgment
Appendix A Impact Statements
Appendix B Experiment
Appendix C GraphQA Benchmark
Appendix D Graph Retrieval-Augmented Generation (GraphRAG)
Appendix E Discussion on the Complexity
附录E 复杂性讨论总结
Appendix F Hallucination in Graph LLMs
Appendix G Demonstrations
数据集-编程
2107.03374_HumanEval: Evaluating Large Language Models Trained on Code
总结
Abstract
1.Introduction
2.Evaluation Framework
3.Code Fine-Tuning
4.Supervised Fine-Tuning
5.Docstring Generation
6.Limitations
7.Broader Impacts and Hazard Analysis
8.Related Work
9.Conclusions
2108.07732_MBPP: Program Synthesis with Large Language Models
Abstract
1 Introduction
2 Datasets
3 Model and Methods
4 MBPP Synthesis Results
5 Human-Model Collaboration Results
6 Program Execution Results
7 MathQA Results
8 Related Work
9 Risks and Limitations
10 Conclusion
Appendix A Appendix
2310.06770_SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
总结
LLM 总结
Abstract
1 Introduction
2 SWE-bench
3 SWE-Llama: Fine-tuning CodeLlama for SWE-bench
4 Experimental Setup
5 Results
6 Related Work
7 Discussion
8 Ethics Statement
9 Reproducibility Statement
Appendix
Appendix A Benchmark Details
Appendix B Additional Details on Training SWE-Llama
Appendix C Additional Results
Appendix D Additional Experimental Details
Appendix E Societal Impact
Appendix F In-depth Analysis of SWE-Llama Generations
2402.16694_HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
Abstract
1. Introduction
2. Related work
3. HumanEval-XL
4. Experiments
5. Conclusion
Acknowledgments
Appendix A Experiment Settings
Appendix B Comprehensive Experiment Results
2403.07974_LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
总结
LLM总结
Abstract
1 Introduction
2 Holistic Evaluation
3 Benchmark Curation
4 Experiment Setup
5 Results
6 Related Work
7 Limitations
8 Conclusion
Appendix A Dataset
Appendix B UI
Appendix C Experimental Setup
Appendix D Results
Appendix E Qualitative Examples
2407.10499_CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
总结
Abstract
1 Introduction
2 Related Works
3 CIBench
4 Experiments
5 Conclusion
Appendix A Dataset Details
Appendix B Construction Prompts and Rules
Appendix C Experiment Example Demo
Appendix D Subjective Visualization Evaluation
Appendix E Dataset Error Analysis
Appendix F Human Annotator
Appendix G Ethical Consideration
2410.03859_SWE-bench-Multimodal: Do AI Systems Generalize to Visual Software Domains?
总结
Abstract
1 Introduction
2 SWE-bench Multimodal
3 Evaluating on SWE-bench M
4 Results
5 Related Work
6 Conclusion
Appendix A Dataset
Appendix B Collection
Appendix C Experiments
Appendix D Human Validation
Appendix E Limitations
2410.06992_SWE-Bench+: Enhanced Coding Benchmark for LLMs
总结
Abstract
1 Introduction
2 Robustness Analysis of SWE-Bench
3 Building SWE-Bench+
4 Robustness of SWE-Bench+
5 Effectiveness-aware Evaluation
6 Related Work
7 Conclusion
2501.01257_CodeForces: Benchmarking Competition-level Code Generation of LLMs on CodeForces
总结
Abstract
1 Introduction
2 Related Work
3 CodeForces Benchmark
4 Evaluation on Existing LLMs
5 Analysis Experiments
6 Discussion
7 Conclusion
8 Ethical Statement
Appendix A Model Cards
Appendix B Decoding Hyperparameters
Appendix C Analysis of Our Elo Rating Calculation System
Appendix D Human-comparable Elo Rating
Appendix E Problem Demonstration
Appendix F Special Judge
数据集-数学
2103.03874_MATH: Measuring Mathematical Problem Solving With the MATH Dataset
2110.14168_GSM8K: Training Verifiers to Solve Math Word Problems
总结
LLM 总结
Abstract
1 Introduction
2 Dataset
3 Related Work
4 Methods
5 Additional Experiments
6 Conclusion
Appendix A Dataset Details
Appendix B Hyperparameters
Appendix C Calculator Annotations
Appendix D Example Model Solutions
Appendix E Verifier Details
Appendix F Verifier Visualization
2405.12209_MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
Abstract
1 Introduction
2 Methodology
3 Experiments and Analysis
4 Discussion
5 Related Work
6 Conclusion
7 Limitations
8 Ethical Considerations
Appendix A MathBench Statistics
Appendix B Detailed Experimental Results
Appendix C Extra Analysis
数据集-图片
2306.13394_MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
总结
LLM 总结
Abstract
1 Introduction
2 MME Evaluation Suite
3 Experiments
4 Analysis
5 Conclusion
2307.06281_MMBench: Is Your Multi-modal Model an All-around Player?
总结
Abstract
1 Introduction
2 Related Work
3 The construction of MMBench
4 Evaluation Strategy
5 Evaluation Results
6 Conclusion
Appendix A More Details about the Data
Appendix B More Details on MMBench Construction
Appendix C More Details on LLM-based Choice Extraction
Appendix D Evaluation Settings and Results
2307.16125_SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
总结
LLM 总结
Abstract
1 Introduction
2 Related Work
3 SEED-Bench
4 Evaluation Results
5 Conclusion
2311.12793_ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
总结
LLM 总结
Abstract
1 Introduction
2 Related Work
3 ShareGPT4V Dataset
4 ShareGPT4V-7B Model
4.1 模型架构
4.2 预训练
4.3 监督微调(SFT)
总结
5 Experiments
6 Conclusion
Appendix A Data Sources
Appendix B Caption Analysis
Appendix C Prompts
Appendix D Examples
2506.18095_ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
总结
Abstract
1 Introduction
2 ShareGPT-4o-Image
3 Janus-4o: Fine-Tuning with ShareGPT-4o-Image
4 Experiments
5 conclusion
Appendix A Related Work
Appendix B Image Generation Categories
Appendix C Prompts for Generation
Appendix D Document Pipeline
Appendix E Ethical Considerations and Societal Impact
数据集
1804.07461_GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
总结
Abstract
1 Introduction
2 Related Work
2 相关工作总结
3 Tasks
3.1 Single-Sentence Tasks
3.2 Similarity and Paraphrase Tasks
3.3 Inference Tasks
3.4 Evaluation
4 Diagnostic Dataset
4 诊断数据集(Diagnostic Dataset)
5 Baselines
5 Baselines 总结
6 Benchmark Results
6 Benchmark Results(基准测试结果)
7 Analysis
8 Conclusion
8 结论
Acknowledgments
致谢
Appendix A Additional Benchmark Details
Appendix B Additional Baseline Details
Appendix B Additional Baseline Details(附录B 其他基线细节)
Appendix C Development Set Results
Appendix C Development Set Results 总结
Appendix D Benchmark Website Details
Appendix D Benchmark Website Details(附录 D 基准网站详情)
Appendix E Additional Diagnostic Data Details
附录 E:额外的诊断数据细节
总结
2009.03300_MMLU: Measuring Massive Multitask Language Understanding
总结
Abstract
1.Introduction
2.Related Work
3.A Multitask Test
4.Experiments
5.Discussion
6.Conclusion
2305.08322_C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
总结
C-Eval_ A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
Abstract
1 Introduction
2 The C-Eval Evaluation Suite
3 Experiment
4 Related Work
5 Discussion
Acknowledgement
Appendix A Author Contributions
Appendix B Detailed Stats of C-Eval
Appendix C Explanation Data Generation
Appendix D Evaluation Prompts
Appendix E Details of the models being evaluated
Appendix F Breakdown of Model Performance
Appendix G Option Bias
Appendix H Compute and Resources Used for Evaluation
2306.09212_CMMLU: Measuring massive multitask language understanding in Chinese
总结
Abstract
1 Introduction
2 Related Work
3 CMMLU
4 Experiments
Impact of model size on performance
5 Conclusion
Appendix A Comparison to concurrent benchmarks
Appendix B CMMLU Subjects
Appendix C CMMLU Examples
Appendix D CMMLU Difficulty Distribution
Appendix E Emergent Ability shown in CMMLU subjects
Appendix F Models being Evaluated
Appendix G Strategies for Estimating Model Choices
Appendix H Regular expressions matching algorithmsl
Appendix I Correlation to other Benchmarks
Appendix J Breakdown of Model Performance
J.3 The effect of chain-of-thought prompt
2307.15020_SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark
总结
LLM 总结
Abstract
1 Introduction
2 Related Work
3 SuperCLUE Benchmark
4 Experiments
5 Additional Analysis
6 Conclusion
Appendix A Evaluation Process
Appendix B Capability Categories
2311.12983_GAIA: a benchmark for General AI Assistants
总结
Abstract
1.Introduction
2.Related work
3.GAIA
4.LLMs results on GAIA
5.Discussion
6.Limitations
Appendix A Extended related work
Appendix C Extended description of GAIA
Appendix D Extended description of our question design framework
2311.18743_AlignBench: Benchmarking Chinese Alignment of Large Language Models
总结
Abstract
1 Introduction
2 Dataset
3 Methods
4 Human Evaluation on AlignBench
5 AlignBench: Benchmarking Results
6 Related Work
7 Conclusion
Appendix A Appendix
2404.07972_OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
总结
Abstract
1. Introduction
2. OSWORLD Environment
3. OSWORLD Benchmark
4. Benchmarking LLM and VLM Agent Baselines
5. Analysis
6. Related Work
7. Conclusion and Future Work
A. Details of OSWORLD Environment
C. Details of Baseline Methods
D. Examples of Qualitative Analysis
2406.04770_WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
总结
Abstract
1 Introduction
2 WildBench Data Curation
3 Automatic Evaluation with WildBench
4 Results & Analysis
5 Related Works
6 Conclusion and Future Directions
Appendix A Task Categories
Appendix B More Information on WildBench Data
Appendix C More Information on WildBench Evaluation
Appendix D Prompt Template for Pairwise Evaluation Metric WB-Reward
Appendix E Prompt Template for Individual Evaluation Metric WB-Score
Appendix F Full WildBench Leaderboard
2501.14249_HLE: Humanity’s Last Exam
Abstract
1.Introduction
2.Related Work
3.Dataset
4.Evaluation
5.Discussion
记忆
综述
2505.00675_❇️Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
总结
Abstract
1 Introduction
2 Memory Foundations
3 From Operations to Key Research Topics
4 Memory In Practice
总体总结:
5 Memory in Humans and AI Systems
6 Open Challenges and Future Directions
Appendix A GPT-based Pipeline Selection
Appendix B Relative Citation Index
Appendix C Chord Analysis of Interactions Among Memory Types, Operations, Topics, and Venues
2512.13564❇️_MemorySurvey: Memory in the Age of AI Agents: A Survey
总结
From Moonlight
Abstract
1. Introduction
2. Preliminaries: Formalizing Agents and Memory
3. Form: What Carries Memory?
4. Functions: Why Agents Need Memory?
5. Dynamics: How Memory Operates and Evolves?
6. Resources and Frameworks
7. Positions and Frontiers
8. Conclusion
RL
2508.19828_Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
总结
From Moonlight
Prompt
Algorithm
Abstract
1 Introduction
1 引言(Introduction)
2 Related Work
2 相关工作(Related Work)总结
3 Method
3 方法总结
4 Experiments
4 实验(Experiments)
5 Conclusion
5 结论(Conclusion)
Limitations
局限性(Limitations)总结
Appendix A Case Study of Behavior of Agents before and after Fine-tuning
附录A:微调前后智能体行为的案例研究总结
Appendix B Dataset Details
Appendix B Dataset Details
Appendix C Prompts
附录 C 提示(Prompts)
Appendix D Implementation Details
附录 D 实现细节(总结)
Appendix E Alogirthm
Appendix E Algorithm
Appendix F Extended Results and Type-Level Analysis
附录 F 扩展结果与类型级分析
2601.01885_AgeMem: Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents
总结
From Moonlight
Abstract
1 Introduction
2 Background and Related Work
3 Method
4 Experiments
5 Conclusion
Limitations
Appendix A Detailed Design and Implementation of AgeMem
Appendix B Case Study: AgeMem in Action
Appendix C Experimental Implementation
Appendix D Additional Results
通用
1911.00172_kNN-LMs: Generalization through Memorization: Nearest Neighbor Language Models
总结
Abstract
1 Introduction
2 Nearest Neighbor Language Modeling
3 Experimental Setup
4 Experiments
5 Tuning Nearest Neighbor Search
6 Analysis
7 Related Work
8 Conclusion and Future Work
Appendix A Appendix
2304.13343_SCMemory: Enhancing Large Language Model with Self-Controlled Memory Framework
总结
Abstract
1 Introduction
2 Self-Controlled Memory
总结
3 Experiments
4 Related Work
5 Conclusion
Limitations
Ethical Considerations
Appendix A Prompt List
Appendix B Long-term Dialogue QA Cases
Appendix C Book Summarization Cases
Appendix D Meeting Summarization Cases
2305.11792_Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs
总结
Abstract
1 Introduction
2 Related Work
3 Method
总结
4 Datasets Collection
5 Experiment
5.1 LLMs 家族与评估细节
5.2 主要实验
5.3 人工评估
6 Analysis
7 Discussion
8 Conclusion
Limitations
Ethics Statement
Acknowledgement
Appendix A Templates
Appendix B Different Method of Evaluation
Appendix C Discussion
Appendix D Helpfulness Analysis of Planning Step
2305.17144_GITM: Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
From Deepseek
From Deepseek
2306.03901_ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory
总结
Abstract
1 Introduction
2 Related Work
3 ChatDB
4 Evaluation
4 评估
5 Conclusion
5 结论
2308.10144_ExpeL: LLM Agents Are Experiential Learners
From Deepseek
Abstract
1 Introduction
2 Related Work
3 Preliminaries
4 ExpeL: An Experiential Learning Agent
5 Experiments
6 Conclusion and Limitations
Acknowledgement
Appendix A Detailed Related Works
Appendix B Broader Impacts
Appendix C Computational Resources
Appendix D Environment Details
Appendix E Environment, Agent, Retrieval Parameters
Appendix F Prompt Templates
Appendix G Example Insights
Appendix H Emergent Abilities Showcase
Appendix I Example Trajectories
Appendix J Additional Quantitative Results
2309.02427_❇️CoALA: Cognitive Architectures for Language Agents
总结
From Deepseek
Abstract
1 Introduction
2 Background: From Strings to Symbolic AGI
3 Connections between Language Models and Production Systems
4 Cognitive Architectures for Language Agents (CoALA): A Conceptual Framework
5 Case Studies
6 Actionable Insights
7 Discussion
8 Conclusion
2310.08560_MemGPT: Towards LLMs as Operating Systems
总结
From Deepseek
Abstract
1 Introduction
2 MemGPT (MemoryGPT)
总结
3 Experiments
4 Related Work
5 Conclusion
6 Appendix
2311.08719_Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory
总结
From Deepseek
Abstract
1 INTRODUCTION
2 RELATED WORK
3 METHODOLOGY
4. Experiment
5. Conclusion
2312.17653_❇️LARP: Language-Agent Role Play for Open-World Games
总结
Abstract
1 Introduction
2 Related Work
3 Cognitive Architecture
4 Environment Interaction
5 Personalities
6 Discussions
7 Conclusion
2402.04624_MemoryLLM: Towards Self-Updatable Large Language Models
总结
Abstract
1 Introduction
2 Preliminaries
3 MemoryLLM
4 Experiments
5 Related Work
6 Conclusion and Future Work
Impact Statement
Appendix A Details in Methodology
Appendix B Implementation Details
Appendix C Additional Experiments
2402.09727_ReadAgent: A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
总结
别人的总结
From Deepseek
2404.11672_MemLLM: Finetuning LLMs to Use Explicit Read-Write Memory
总结
Abstract
1 Introduction
2 Related work
3 Methodology
4 Experiments
5 Conclusion
Limitations
Appendix A Memory-write Decoding Method
Appendix B Filtering Ambiguous Queries
Appendix C Memory-read Data Generation
Appendix D Hyperparameters Details
Appendix E Filtering Prompt
2404.13501_LLM_Agent_Memory_Survey: A Survey on the Memory Mechanism of Large Language Model based Agents
总结
别人的总结
Abstract
1 Introduction
2 Related Surveys
3 What is the Memory of LLM-based Agent
4 Why We Need the Memory in LLM-based Agent
5 How to Implement the Memory of LLM-based Agent
5.1 Memory Sources(记忆来源)
5.2 Memory Forms(记忆形式)
5.3 Memory Operations(记忆操作)
6 How to Evaluate the Memory in LLM-based Agent
7 Memory-enhanced Agent Applications
8 Limitations & Future Directions
9 Conclusion
9 结论
Acknowledgement
致谢
2407.01178_❇️Memory3: Language Modeling with Explicit Memory
总结
LLM 总结
Abstract
1 Introduction
2 | Memory Circuitry Theory
3 | Design
4 | Pretraining Data
5 | Pretrain
6 | Fine-tuning and Alignment
6 | 微调与对齐
7 | Evaluation
8 | Conclusion
8 | 结论
Acknowledgement
致谢
Appendix A Cost Estimation
A.1 | Implicit Memory
A.2 | Explicit Memory
A.3 | External Information
总结与对比
附注:知识保留问题(Remark 9)
Appendix B Vector Compression
附录 B 向量压缩
Appendix C Supplementary Evaluation Results
附录 C 补充评估结果总结
2410.15665_LongTermMemory: The Foundation of AI Self-Evolution
总结
别人的总结
From Deepseek
Abstract
1 Introduction
2 AI Self-Evolution
总结
3 LTM for AI Self-Evolution
4 How to Construct LTM?
5 How can LTM be used to achieve model self-Evolution?
6 The Practice of model self-evolution based on LTM
7 Our Future Plans
8 Conclusion
Appendix A RTG prompt
2502.00592_M+: Extending MemoryLLM with Scalable Long-Term Memory
总结
Abstract
1 Introduction
2 Related Work
3 Methodology
4 Experiments
5 Conclusion and Future Work
Impact Statement
Appendix A Justifications of using deepspeed-stage-2
Appendix B Experiments on datasets NaturalQA
Appendix C Statistics of the Dataset of Long Documents
Appendix D Additional Training Details
Appendix E Discussions
2502.12110_A-Mem: Agentic Memory for LLM Agents
总结
From Deepseek
Abstract
1 Introduction
2 Related Work
3 Methodolodgy
4 Experiment
5 Conclusions
6 Limitations
Appendix A Experiment
Appendix B Prompt Templates and Examples
2504.15965_❇️From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs
总结
Abstract
1 Introduction
2 Overview
3 Personal Memory
4 System Memory
5 Open Problems and Future Directions
6 Conclusion
2504.19413_❇️Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
总结
别人的总结
Abstract
1 Introduction
2 Proposed Methods
总结
3 Experimental Setup
总结
4 Evaluation Results, Analysis and Discussion.
5 Conclusion and Future Work
6 Acknowledgments
Appendix A Prompts
Appendix B Algorithm
Appendix C Selected Baselines
2505.22101_MemOS: An Operating System for Memory-Augmented Generation (MAG) in LLM (Short Version)
总结
Abstract
1 Introduction
2 Memory in Large Language Models
3 MemOS Design Philosophy
4 MemOS
4.1 MemOS 中的记忆类型
4.2 记忆立方体(MemCube):核心资源
4.3 MemOS 架构
4.4 系统执行流程
总结
5 Conclusion
2506.06326❇️_MemoryOS: Memory OS of AI Agent
总结
From Moonlight
Abstract
1 Introduction
2 Related Work
3 MemoryOS
4 Experiments
5 Conclusion
2505.22101_❇️MemOS: A Memory OS for AI System
总结
LLM 总结:
Abstract
1 Introduction
2 Memory in Large Language Models
3 MemOS Design Philosophy
4 Memory Modeling in MemOS
5 Architecture of MemOS
6 Evaluation
7 MemOS for Architecture Innovation and Applications
8 Conclusion
2508.09874_Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
总结
Abstract
1 Introduction
2 Background
3 Memory Decoder
4 Experimental Setup
5 Results
6 Analysis
7 Related Work
8 Conclusion
9 Limitations
Appendix A Interpolation hyperparameter
\(\alpha\)
of all tasks
Appendix B Analysis of DAPT Performance on Downstream Tasks
Appendix C Knowledge-Intensive Reasoning Task Corpus Composition
Appendix D Domain-Specific Downstream Tasks
Appendix E Comparison with DAPT Model Interpolation
Appendix F In-Context Learning Performance Analysis
Appendix G Characteristics of kk-NN Distributions
Appendix H Alternative Loss Functions for Imitating kk-NN Distributions
2509.06269_REMI: A Novel Causal Schema Memory Architecture for Personalized Lifestyle Recommendation Agents
总结
Abstract
1. Introduction
2. Research Objectives
3. Related Work
4. Proposed Method
5. Evaluation Framework
6. Results and Findings
7. Discussion
8. Conclusion
2509.24704_MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
总结
From Moonlight
From Deepseek&OpenAI
2510.18866_❇️LightMem: Lightweight and Efficient Memory-Augmented Generation
总结
Abstract
1 Introduction
2 Preliminary
3 lightmem architecture
4 experiments
5 related work
6 conclusion and Future Work
Appendix A Usage of LLMs
Appendix B Methodology Details
Appendix C Experiment Details
附录 C 实验细节总结
Appendix D Prompts
附录 D 提示(Prompt)设计
2512.18746_MemEvolve: Meta-Evolution of Agent Memory Systems
From Moonlight
2601.02163_EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning
总结
图解
From Moonlight
通用-Github
2509.00xxxx_MemU: 一个前瞻性很强但尚不成熟的记忆框架
主要内容
2511.00xxx_MemMachine
主要内容
记忆相关Agent
2504.10147_PersonalRAG❇️: A Survey of Personalization: From RAG to Agent
总结
Abstract
1. Introduction
2. What is Personalization
3. How to Adopt Personalization
4. Where to Adopt Personalization
5. Evaluation and Dataset
6. Challenges and Future Directions
7. Conclusion
2506.07398❇️_G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems
总结
1. Introduction
2 Related Works
3 Preliminary
4 G-Memory
5 Experiment
6 Conclusion & Limitation
A Experimental Details
B Additional Experiment Results
C Prompt Set
D Discussion with Related Works
2507.02259_MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
总结
Abstract
1 Introduction
2 Related Work
3 The Proposed MemAgent
总结
4 Experiments
5 Conclusion
6 Computation Complexity
7 Complete Out-Of-Domain Task Results
2507.07957_MIRIX: Multi-Agent Memory System for LLM-Based Agents
总结
From Moonlight
Abstract
1 Introduction
2 Application & Use Cases
3 Methodology
4 Experiments
5 Related Work
6 Conclusion and Future Work
Appendix A Full Experimental Results with Different Runs
2509.25140❇️_ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
总结
Abstract
1 Introduction
2 Related Work
3 Methodology
4 Experiments
5 Analysis
6 Conclusion
7 Acknowledgments
Appendix A Experiment Details
Appendix B Details for Experiment Settings
Appendix C Additional Analyses
Appendix D Future Directions
Appendix E Limitations
记忆相关数据集
2305.10250_MemoryBank: Enhancing Large Language Models with Long-Term Memory
总结
From Deepseek
Abstract
1 Introduction
2 MemoryBank: A Novel Memory Mechanism Tailored for LLMs
总结
3 SiliconFriend: An AI Chatbot Companion Powered by MemoryBank
使用的三种大语言模型
SiliconFriend 的开发阶段
总结
重点总结
4 Experiments
5 Related Works
5 相关工作(Related Works)
6 Conclusion
6 结论
2308.08239_MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
总结
Abstract
1 Introduction
2 Related Work
3 Methodology
4 Experiments
5 Conclusion
Appendix A Basic Published Datasets
Appendix B Involved Prompts
Appendix C Instruction Design Challenges
1. 引言
2. Prompt Copy(提示复制)
3. Catastrophic Forgetting(灾难性遗忘)
4. Prompt Misplacement(提示错位)
5. 示例任务说明
总结
2402.17753_LoCoMo❇️: Evaluating Very Long-Term Conversational Memory of LLM Agents
总结
别人的总结
Abstract
1 Introduction
2 Related Work
3 Generative Pipeline for LoCoMo
4 LoCoMo Evaluation Benchmark
5 Experimental Setup
6 Experimental Results
7 Conclusion
8 Limitations
9 Broader Impacts
Appendix Overview
Appendix A Generative Pipeline for LoCoMo
Appendix B Dataset
Appendix C Experimental Setup
Appendix D Results
2410.10813_LongMemEval: Benchmarking Chat Assist- ants on Long-Term Interactive Memory
总结
Abstract
1 Introduction
2 Related Work
3 LongMemEval
4 A Unified View of Long-Term Memory Assistants
5 Experiment Results
6 Conclusion
Reproducibility Statement
Ethics Statement
Appendix A Supplemental Details for LongMemEval
Appendix B A Human Study on Commercial Memory Chatbots
Appendix C Unified Memory View
Appendix D Memory Optimizations: Implementation Details
Appendix E Extended Analyses
2506.21605_MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents
总结
Abstract
1 Introduction
2 Related Works
3 Dataset Construction
4 Benchmark
5 Conclusion
Limitations
Ethics Statement
Acknowledgments
Appendix A Case Studies
Appendix B Detail Data Statics
Appendix C Data Creation Prompt
Appendix D Result Details
2507.05257_MemoryAgentBench: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
总结
From Deepseek
Abstract
1 Introduction
2 Related Work
3 MemoryAgentBench
4 Experiments
5 Conclusion and Future Work
Appendix A Details of Dataset
Appendix B Prompts
Appendix C Detailed Experimental Results
Appendix D Experimental Settings
2510.27246_BEAM: Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs
总结
Abstract
1 Introduction
2 BEAM: Benchmarking memory Capabilities of LLMs
3 LIGHT: Improving Memory Capabilities of LLMs
4 Experiments
5 Related Work
6 Conclusion
Acknowledgments
Appendix A Detailed Related Work
Appendix B Benchmark Design
Appendix C Detailed Experiments
Appendix D Nugget Design
Appendix E Examples from Different Components of BEAM
Appendix F Case Study
Appendix G Prompts
多模态记忆
2506.05813_MAPLE: Multi-Agent Adaptive Planning with Long-Term Memory for Table Reasoning
总结
Abstract
1 Introduction
2 MAPLE Framework
3 Experiments
4 Conclusion
Limitations
Appendix A Related Work
Appendix B Cognitive Architecture
Appendix C Memory Evolution Algorithm
Appendix D Case Study
Appendix E Addtional Experimental Results
Appendix F Example Prompts
附录 F 示例提示(Example Prompts)
2508.09736_M3-Agent❇️: Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
总结
Abstract
1 Introduction
2 Related Work
3 Datasets
4 Approach
5 Experiments
6 Conclusion and Future Work
7 Acknowledgment
8 M3-Bench-robot
9 M3-Bench-web
10 Implementation Details of Tools
11 Demonstration Data Synthesis for Memorization
12 Evaluation of Memorization
13 RL Training Details
14 Case Study
15 Prompt Templates
2509.11914_EgoMem: Lifelong Memory Agent for Full-duplex Omnimodal Models
总结
Abstract
1 Introduction
2 Task Definition and Preliminaries
3 EgoMem
4 Training Details
5 Experiments
6 Conclusion and Future Challenges
Acknowledgments
2510.12422_VideoLucy: Deep Memory Backtracking for Long Video Understanding
总结
Abstract
1 Introduction
2 Method
3 EgoMem Benchmark
4 Experiments
5 Related Work
6 Conclusion
7 Acknowledgments.
Appendix A Appendix
参数记忆
1907.05242_PKM: Large Memory Layers with Product Keys
总结
From Moonlight
Abstract
1 Introduction
2 Related work
3 Learnable product key memories
4 Experiments
5 Conclusion
2305.02437_Selfmem: Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory
总结
From Moonlight
2407.04153_PEER: Mixture of A Million Experts
总结
From Moonlight
Abstract
1 Introduction
2 Method
3 Experiments
4 Related Works
5 Conclusion
Acknowledgments
2412.09764_Memory+: Memory Layers at Scale
总结
From Moonlight
Abstract
1 Introduction
1 引言(Introduction)
2 Related work
2 相关工作(Related Work)
3 Memory Augmented Architectures
3 Memory Augmented Architectures
4 Experimental setup
4 实验设置(Experimental setup)
5 Scaling results
5 扩展结果总结
6 Implications and shortcomings of the work
6 工作的意义与不足
2508.18756_UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
总结
From Moonlight
Abstract
1 Introduction
2 Related Work
3 Approach
4 Experiments
5 Conclusion
6 Optimized Initialization
7 Evaluation Benchmark
8 Open-source model hyperparameters
图结构记忆
1905.05460_Cognitive Graph for Multi-Hop Reading Comprehension at Scale
Abstract
1 Introduction
2 Cognitive Graph QA Framework
3 Implementation
4 Experiment
5 Related work
6 Discussion and Conclusion
2405.14831_HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
Abstract
1 Introduction
2 HippoRAG
3 Experimental Setup
4 Results
5 Discussions
6 Related Work
7 Conclusions & Limitations
Appendices
Appendix A HippoRAG Pipeline Example
Appendix B Dataset Comparison
Appendix C Ablation Statistics
附录 C 消融实验统计(Ablation Statistics)
Appendix D Intrinsic OpenIE Evaluation
附录 D 内在的 OpenIE 评估
Appendix E Case Study on Path-Finding Multi-Hop QA
附录E:路径查找多跳问答案例研究总结
Appendix F Error Analysis
Appendix G Cost and Efficiency Comparison
附录 G 成本与效率对比
Appendix H Implementation Details & Compute Requirements
附录 H 实现细节与计算需求
Appendix I LLM Prompts
附录I:大语言模型提示
应用-推荐
Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions
论文基本信息
核心内容简介
重要性与影响
总结
08xx.xxxxx_SVD++: Factorization meets the neighborhood: a multifaceted collaborative filtering model
SVD++
Neighborhood Models
Latent Factor Models(潜在因子模型)
Recommender systems: An overview of different approaches to recommendations
论文简介
核心内容总结
总结
1902.07153_SGCN: Simplifying Graph Convolutional Networks
总结
前提知识
Abstract
1 Introduction
2 Simple Graph Convolution
3 Spectral Analysis
4 Related Works
5 Experiments and Discussion
6 Conclusion
Acknowledgement
Appendix A The spectrum of 𝚫~symsubscript~𝚫sym)
Appendix B Experiment Details
Appendix C Additional Experiments
1905.08108_NGCF: Neural Graph Collaborative Filtering
总结
Abstract
1. Introduction
2. Methodology
3. Related Work
4. Experiments
5. Conclusion and Future Work
2001.10167_RGCF: Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach
总结
Abstract
Introduction
Preliminaries and Related Work
Linear Residual Graph Convolutional Collaborative Filtering
Experiments
Conclusions
2002.02126_LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation
总结
Abstract
1. Introduction
2. Preliminaries
3. Method
4. Experiments
5. Related Work
6. Conclusion and Future Work
2010.10783_SGL: Self-supervised Graph Learning for Recommendation
总结
Abstract
1. Introduction
2. Preliminaries
3. Methodology
4. Experiments
5. Related Work
6. Conclusion and Future Work
Appendix A Gradient of InfoNCE Loss
w.r.t.
node representation
2112.08679_SimGCL: Are Graph Augmentations Necessary? Simple Graph Contrastive Learning for Recommendation
总结
Abstract
1. Introduction
2. Investigation of Graph Contrastive Learning in Recommendation
3. SimGCL: Simple Graph Contrastive Learning for Recommendation
4. Experimental Results
5. Related Work
6. Conclusion
Acknowledgement
2202.06200_NCL: Improving Graph Collaborative Filtering with Neighborhood-enriched Contrastive Learning
总结
Abstract
1. Introduction
2. Preliminary
3. Methodology
4. Experiments
5. Related work
6. Conclusion And Future Work
Appendix A Pseudo-code for NCL
Appendix B Case Study on Selected Neighbors
2203.13366_RLP_P5: A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
总结
Abstract
1. Introduction
2. Related Work
3. Personalized Prompt Collection
4. The P5 Paradigm and Model
5. Experiments
6. Conclusions and Future Work
Acknowledgment
D FULL LIST OF PERSONALIZED PROMPTS FOR AMAZON DATASETS
2302.08191_LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation
总结
Abstract
1 Introduction
2 Related Work
3 Methodology
4 Evaluation
5 Conclusion
Appendix A Details of the Baselines
Appendix B Performance Comparison with Baselines (Continued)
Appendix C Theoretical Analysis
Appendix D Calculation of Complexity
Appendix E Performance Results under the New Setting
2303.14524_ChatRec: Towards Interactive and Explainable LLMs-Augmented Recommender System
总结
Abstract
1 Introduction
2 Related Work
3 Method
4 Experiment
5 Conclusion
Appendix 0.A Implementation Details
2305.00447_TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation
总结
Abstract
1. Introduction
2. TALLRec
3. Experiments
4. Related Work
5. Conclusion
2305.07001_InstructRec: Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach
总结
Abstract
1. Introduction
2. Methodology
3. Experiments
4. Conclusion and Future Work
Appendix A Instruction Templates for
Traditional Recommendation
Appendix B Instruction Templates for
Traditional Product search
Appendix C Instruction Templates for
Personalized Search
2306.10933_KAR: Towards Open-World Recommendation with Knowledge Augmentation from Large Language Models
总结
Abstract
1. Introduction
2. Related Work
3. Preliminaries
4. Methodology
5. Experiment
6. Broader Impact
7. Conclusion
2308.11131_ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation
总结
Abstract
1. Introduction
2. Preliminaries
3. Methodology
4. Experiment
5. Related Work
6. Conclusion
Appendix A Prompt Illustration
Appendix B Data Preprocessing
Appendix C Baseline Implementation
总结
Appendix D Additional Experiments
2310.15950_RLMRec: Representation Learning with Large Language Models for Recommendation
总结
Abstract
1. Introduction
2. Related Work
3. Methodology
4. Evaluation
5. Conclusion
Appendix A Supplementary Material
2311.01343_CLLM4Rec: Collaborative Large Language Model for Recommender Systems
总结
Abstract
1. Introduction
本节贡献(Contribution)
2. Related Work
2. 相关工作
3. Methodology
4. Empirical Study
5. Conclusion
Acknowledgment
Appendix A Technical Details
Appendix B Experiments
2502.18965_OneRec: Unifying Retrieve and Rank with Generative Recommender and Preference Alignment
总结
Abstract
1. Introduction
2. Related Work
3. Methods
4. System Deployment
5. Experiment
总结
6. Conclusion
2508.20900_OneRec-V2 Technical Report
总结 From Moonlight
Abstract
1 Introduction
2 Lazy Decoder-Only Architecture
3 Preference Alignment with Real-World User Interactions
4 Online A/B Test
5 Conclusion, Limitations, and Future Directions
Appendix
Appendix A Contributions
Appendix B Computational Complexity of Different Architecture
Appendix C Empirical Results
Appendix D Online Performance with Caching Disabled
2510.11639_OneRec-Think: In-Text Reasoning for Generative Recommendation
总结
Abstract
1 Introduction
2 Related Work
3 Preliminary
4 Methodolody
5 Experiments
6 Conclusion
Limitations
Ethics Statement
Appendix A Appendix
2511.11255_Align3GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation
总结
Abstract
Introduction
Related Works
Methodology
Experiments
Conclusion
LLM 模型
NLP 模型
1810.04805_BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
1 Introduction
2 Related Work
3 BERT
Appendix A Additional Details for BERT
18xx_GPT1: Improving Language Understanding by Generative Pre-Training
Abstract
1. Introduction
2. Related Work
3. Framework
4 Experiments
5 Analysis
6 Conclusion
引文口碑
要点解读
19xx_GPT2: Language Models are Unsupervised Multitask Learners
The Illustrated GPT-2
参考
2006.03654_DeBERTa: Decoding-enhanced BERT with Disentangled Attention
总结
Abstract
1 Introduction
2 Background
3 The DeBERTa Architecture
4 Scale Invariant Fine-Tuning
4 尺度不变微调 (Scale Invariant Fine-Tuning)
5 Experiment
6 Conclusions
7 Acknowledgments
Appendix A Appendix
2012.00413_CPM: A Large-scale Generative Chinese Pre-trained Language Model
2302.13971_LLaMA: Open and Efficient Foundation Language Models
2307.09288_Llama 2: Open Foundation and Fine-Tuned Chat Models
2309.16609_Qwen Technical Report
1. Introduction
2. Pretraining
3. Alignment
4. CODE-QWEN: SPECIALIZED MODEL FOR CODING
5. MATH-QWEN: SPECIALIZED MODEL FOR MATHEMATICS REASONING
6. Related Work
7. Conclusion
A.1 MORE TRAINING DETAILS
A.2 EVALUATION
2310.19341_Skywork: A More Open Bilingual Foundation Model
总结
LLM 总结
Abstract
1 Introduction
2 Methodology
3 Pre-training
4 Evaluation
5 Discussion
6 Limitation
7 Conclusion
Appendix A Details on GPT-7B vs. LLaMA-7B Experiment
Appendix B Preliminary Experiments on Distributed Training
Appendix C More Benchmark Results
Appendix D Details on LM Test Sets
2401.14196_DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence
2404.06395_MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
5. Two Stage Pre-training Strategy
6. Model
7 MiniCPM Family
2405.04434_DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
2406.12793_ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
2407.10671_Qwen2 Technical Report
Abstract
1. Introduction
2. Tokenizer & Model
3. Pre-training
4. Post-training
5. Evaluation
6. Conclusion
2412.15115_Qwen2.5
Abstract
1. Introduction
2. Architecture and Tokenizer
3. Pre-training
4. Post-training
5. Evaluation
6. Conclusion
2505.09388_Qwen3
Abstract
1. Introduction
2. Architecture
3. Pre-training
4. Post-training
5. Conclusion
2508.06471_GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
总结
From Moonlight
Abstract
1 Introduction
2 Pre-Training
3 Post-Training: Expert Model Iteration
4 Evaluation
5 Conclusion
6 Contribution
多模态模型
2112.15093_CTR: Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study
Abstract
1. Introduction
2. Preliminaries
3. Datasets
4. Baselines
5. An Empirical Study
6. Conclusions
Appendix A Details of PRAB
Appendix C Visualization of Failure Cases.
2304.08485_LLaVA: Visual Instruction Tuning
Abstract
1. Introduction
2. Related Work
3. GPT-assisted Visual Instruction Data Generation
4. Visual Instruction Tuning
5. Experiments
6. Conclusion
2308.12966_Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Methodology
Training
Evaluation
B. Data Format Details of Training
2310.03744_LLaVA2: Improved Baselines with Visual Instruction Tuning
Abstract
1. Introduction
2. Related Work
3. Approach
4. Empirical Evaluation
5. Open Problems in LMMs
6. Conclusion
A. Implementation Details
B. Qualitative Results
2312.07533_VILA: On Pre-training for Visual Language Models
Abstract
1. Introduction
2. Background
3. On Pre-training for Visual Language Models
4. Experiments
5. Related Work
6. Conclusion
2403.05525_DeepSeek-VL: Towards Real-World Vision-Language Understanding
Abstract
2408.01800_MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Abstract
1. Introduction
2. Related Work
3. Model Architecture
4. Training
5. End-side Deployment
6. Experiments
7. Conclusion
2409.17146_Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
Abstract
1. Introduction
2. Architecture
3. Data
4. Training
5. Evaluation
6. Ablations
Appendix A: Model Details
Appendix B: Training Details
Appendix C: Evaluation Results
Appendix D: Result Details
Appendix E Ablations Details
Appendix F Data Details
Appendix G Dataset Examples
Appendix H Related Work
2410.13848_Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
总结
LLM总结
Abstract
1 Introduction
2 Related Work
3 Janus: A Simple, Unified and Flexible Multimodal Framework
4 Experiments
5 Conclusion
Appendix
Appendix A Details of Semantic Tokenizer Mentioned in Ablation Study
Appendix B Additional Qualitative Results
2411.00774_Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Abstract
1. Introduction
2. Model
3. Experience
4. Conclusion and Future Work
2412.04468_NVILA: Efficient Frontier Visual Language Models
Abstract
1. Introduction
2. Approach
3. Experiments
4. More Capabilities
5. Related Work
6. Conclusion
2502.13923_Qwen2.5-VL
Abstract
1. Introduction
2. Approach
3. Experiments
4. Conclusion
2505.14683_BAGEL: Emerging Properties in Unified Multimodal Pretraining
总结
From Deepseek
LLM 总结
Abstract
1 Introduction
2 Model
3 Data
4 Training
5 Evaluation
6 Emerging Properties
7 Main Results
8 Conclusion
9 Acknowledgement
2506.13642_Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Abstract
1. Introduction
2. Related Work
3. Stream-Omni
4. Experiments
5. Results and Analyses
6. Conclusion
Appendix A Construction of InstructOmni
Appendix B Construction of SpokenVisIT
2506.13642_Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Abstract
1 Introduction
2 Related Work
3 Stream-Omni
3.2.1 Data Construction
4 Experiments
5 Results and Analyses
6 Conclusion
Limitations
Appendix A Construction of InstructOmni
Appendix B Construction of SpokenVisIT
Appendix C Case Study
2507.05595_PaddleOCR 3.0 Technical Report
总结
Abstract
1 Introduction
2 Core Capabilities
3 Codebase Architecture Design
4 Deployment
5 Conclusion
Appendix A Acknowledgments
Appendix B Usage of command and API details
Appendix C More details on MCP host configuration
2510.14528_PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
总结
Abstract
1 Introduction
2 PaddleOCR-VL
3 Dataset
4 Evaluation
5 Conclusion
Appendix A Training Dataset Details
Appendix B Supported Languages
Appendix C Inference Performance on Different Hardware Configurations
Appendix D Real-world Samples
Appendix E Compare with Others
Embedding 模型
2506.05176_Qwen3_Embedding: Advancing Text Embedding and Reranking Through Foundation Models
总结
Abstract
1 Introduction
2 Model Architecture
3 Models Training
4 Evaluation
4.1 Settings 评估设置
4.2 Main Results 主要结果
4.3 Analysis 分析
总结
5 Conclusion
Appendix A Appendix
LLM 音频
2005.08100_Conformer: Convolution-augmented Transformer for Speech Recognition
LLM总结
Abstract
1 Introduction
2 Conformer Encoder
3 Experiments
4 Conclusion
2106.07447_HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
总结
LLM 总结
Abstract
I Introduction
II Method
III Related Work
IV Experimental Details
V Results
VI Conclusion
2112.02418_YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
关键概念
Abstract
1. Introduction
2. YourTTS Model
3. Experiments
4. Results and Discussion
5. Zero-Shot Voice Conversion
6. Speaker Adaptation
7. Conclusions, limitations and future work
2212.04356_whisper: Robust Speech Recognition via Large-Scale Weak Supervision
Abstract
1. Introduction
2. Approach
3. Experiments
4. Analysis and Ablations
5. Related Work
6. Limitations and Future Work
7. Conclusions
A. Evaluation Datasets
B Compared Models
C. Text Standardization
2301.02111_Vall-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Abstract
1. Introduction
2. Related Work
3. Background: Speech Quantization
4. VALL-E
5. Experiments
6. Conclusion, Limitations, and Future Work
2303.03926_VALL-E_X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Abstract
1. Introduction
2. Related Work
3 Cross-Lingual Codec Language Model
4. VALL-E X Application
5. Experiments
6. Conclusion
A. Appendix
2406.05370_VALL-E2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Abstract
1. Introduction
2. Related Work
3. VALL-E 2
4. Experiments
5. Conclusion
2407.05407_CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
Abstract
1. Instructions
2. CosyVoice: A Scalable TTS model using Supervised Semantic Tokens
3. Dataset
4. Experimental Settings
6. Conclusion
2407.10759_Qwen2-Audio Technical Report
Abstract
1. Introduction
2. Methodology
3. Experiments
5. Conclusion
2410.00037_Moshi: a speech-text foundation model for real-time dialogue
Abstract
1.Introduction
2.Related Work
3.Model
4. Datasets and Training
5. Evaluation
6.Safety
7.Conclusion
2412.10117_CosyVoice2: Scalable Streaming Speech Synthesis with Large Language Models
Abstract
1. Instroduction
2. CosyVoice 2
3. Experimental Settings
4. Experimental Results
5. Conclusion
2501.06282_MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Abstract
1.Instruction
2.Related Work
3.MinMo
4.Experiments
5.Conclusion
6.Limitations
A. Prompts for Voice Understanding Tasks
2505.02707_Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Abstract
1. Introduction
2. Related Work
3. Voila: Voice-Language Foundation Models
4. Experiments
5. Conclusion
2505.17589_CosyVoice3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
From LLM
Abstract
1.Introduction
2.CosyVoice 3
3.The Multilingual Data Pipeline
4.Experimental Settings
5.Experimental Results
6.Conclusion
7.Limitations
2512.20156_Fun-Audio-Chat Technical Report
总结
From Moonlight
Abstract
1 Introduction
2 Methodology
3 Experiments
4 Conclusion
5 Limitations
5 局限性(Limitations)
6 Contributions and Acknowledgments
LLM 视频
2301.12597_BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Abstract
1 Introduction
2 Related Work
3 Method
4 Experiment
5 Limitation
6 Conclusion
2308.01390_OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
OpenFlamingo_ An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Abstract
1 Introduction
2 Related work
3 Approach
4 Results
5 Discussion
6 Conclusion
Appendix A Extended results
Appendix B Additional notes on filtering MMC4
Appendix C Synthetic data prompt
Appendix D Image credits
2503.20215_Qwen2.5-Omni Technical Report
Abstract
1. Introduction
2. Archtecture
3 预训练
4 后训练(Post-training)
5. Evaluation
6. Conclusion
LLM MoE
2408.15664_AUXILIARY-LOSS-FREE LOAD BALANCING STRATEGY FOR MIXTURE-OF-EXPERTS
2410.07490_MoDEM: Mixture of Domain Expert Models
2601.07372_Engram: Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
总结
From Moonlight
Abstract
1 Introduction
2 Architecture
3 Scaling Laws and Sparsity Allocation
4 Large Scale Pre-training
5 Long Context Training
6 Analysis
7 Related Work
8 Conclusion
Appendix A Detailed Model Architecture and Hyper Parameters
Appendix B Full Benchmark Curves
Appendix C Case Study of Tokenizer Compression
商业模型
2303.08774_GPT-4 Technical Report
2312.11805_Gemini: A Family of Highly Capable Multimodal Models
Abstract
1. Introduction
2. Model Architecture
3. Training Infrastructure
5. Evaluation
6. Post-Training Models
7. Responsible Deployment
8. Discussion and Conclusion
2403.05530_Gemini1.5: Unlocking multimodal understanding across millions of tokens of context
2406.02430_Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Abstract
1 Introduction
2 Method
3 Experiments
4 Model extensions
5 Model applications, limitations, and safety
6 Authors (alphabetical order)
7 Acknowledgement
2407.04675_Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Abstract
1 Introduction
2 Motivation
3 Methods
4 Model and Evaluation
5 Conclusion
Appendix A Appendix
2503.20020_Gemini2: Gemini Robotics: Bringing AI into the Physical World
2504.xxxxx_Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning
2505.07062_Seed1.5-VL Technical Report
Seed1.5-VL Technical Report
Abstract
1 Introduction
2 Architecture
3 Pre-training
3.2 Training Recipe
4 Post-training
4.4 Hybrid Reinforcement Learning
5 Training Infrastructure
6 Evaluation
6.1.3 Video Task Evaluation
6.3.2 Comparison with State-of-the-arts
7 Conclusion and Next Steps
8 Contributions and Acknowledgments
9 Qualitative examples
9.7 Visual Reasoning_ Visual Pattern Recognition
9.19 Failure Cases_ Combinatorial Search I
10 Evaluation Details
DREAM-1K
LLM 周边技术
Framework
1712.05889_Ray: A Distributed Framework for Emerging AI Applications
Abstract
1. Introduction
2. Motivation and Requirements
3. Programming and Computation Model
4. Architecture
5. Evaluation
6 Related Work
7 Discussion and Experiences
8. Conclusion
1910.02054_DeepSpeed_ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Abstract
1. Extended Introduction
2. Related Work
3 Where Did All the Memory Go?
4 ZeRO: Insights and Overview
5 Deep Dive into ZeRO-DP
6 Deep Dive into ZeRO-R
7 Communication Analysis of ZeRO-DP
8. Communication Analysis of ZeRO-R
9. Step Towards 1 Trillion Parameters
10. Implementation and Evaluation
11. Concluding Remarks
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Transformers: State-of-the-Art Natural Language Processing
2210.XX_Ray v2 Architecture
Overview
Architecture Overview
Object Management
Task Management
Resource Management and Scheduling
Actor management
Global Control Service
Cluster Management
Appendix
2309.06180_vLLM: Efficient Memory Management for Large Language Model Serving with PagedAttention
总结
1. Introduction
2. Background
3. Memory Challenges in LLM Serving
4. Method
5. Implementation
6. Evaluation
7. Ablation Studies
10. Conclusion
2312.07104_SGLang❇️: Efficient Execution of Structured Language Model Programs
总结
OpenAI GPT-4总结
Qwen-Plus总结
Abstract
1 Introduction
2 Programming Model
3 Efficient KV Cache Reuse with RadixAttention
4 Efficient Constrained Decoding with Compressed Finite State Machine
5 Efficient Endpoint Calling with API Speculative Execution
6 Evaluation
7 Related Work
8 Future Directions and Conclusion
Acknowledgement
Appendix A Additional Details on RadixAttention
Appendix B Additional Details on Compressed Finite State Machine
Appendix C Additional Experimental Setups and Results
Appendix D Compiler Mode
Appendix D 编译器模式
大模型调优
2101.00190_Prefix-Tuning: Optimizing Continuous Prompts for Generation
2103.10385_p-tuning: GPT Understands, Too
2104.08691_Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning
2106.09685_LoRA: Low-Rank Adaptation of Large Language Models
2401.01335_Self-Play: Fine-Tuning Converts Weak Language Models to Strong Language Models
2402.09353_DoRA: Weight-Decomposed Low-Rank Adaptation
2402.12354_LoRA+: Efficient Low Rank Adaptation of Large Models
2403.03507_GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
2403.13372_LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
竞争框架
3. Efficient Fine-Tuning Techniques
4 LlamaFactory Framework
6 Conclusion and Future Work
2510.08396_FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts
总结
From Moonlight
Abstract
1 Introduction
1 引言(Introduction)
2 Revisiting MoE-based LoRA Methods
第2章:重新审视基于MoE的LoRA方法
3 FlyLoRA
4 Experiments
5 Discussion
6 Related Work
7 Conclusion
8 Acknowledgments
8 致谢
NeurIPS Paper Checklist
NeurIPS 论文检查清单总结
1. 声明(Claims)
2. 限制(Limitations)
3. 理论假设与证明(Theory assumptions and proofs)
4. 实验结果可复现性(Experimental result reproducibility)
5. 数据与代码开放访问(Open access to data and code)
6. 实验设置/细节(Experimental setting/details)
7. 实验统计显著性(Experiment statistical significance)
8. 实验计算资源(Experiments compute resources)
9. 伦理准则(Code of ethics)
10. 广泛影响(Broader impacts)
11. 安全措施(Safeguards)
12. 现有资产许可(Licenses for existing assets)
13. 新资产(New assets)
14. 众包与人类受试者研究(Crowdsourcing and research with human subjects)
15. 人类受试者研究的IRB批准(IRB approvals)
16. 大语言模型使用声明(Declaration of LLM usage)
Appendix A Theoretical Analysis
附录 A 理论分析总结
A.1 稀疏随机投影的距离保持性质
A.2 Top-k 激活促进秩级解耦
A.3 随机投影诱导近似子空间正交性
总结归纳
Appendix B Additional Results
附录 B:附加实验结果总结
B.1 更大模型上的评估
B.2 更多基线方法的比较
B.3 训练时间与内存消耗
B.4 高级模型合并技术的多任务性能
B.5 负载均衡策略的消融实验
B.6 K选择策略的消融实验
B.7 矩阵 A 初始化方案的消融实验
B.8 合并与非合并场景的性能差距分析
总体总结
Appendix C Detailed Experimental Setting
Appendix D Limitations and Future Work
附录 D 局限性与未来工作
Appendix E Broader Impact
通用技术
🏀常用
余弦退火
2505.06708_Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
总结
Abstract
1 Introduction
2 Gated-Attention Layer
3 Experiments
4 Analysis: Non-Linearity, Sparsity, and Attention-Sink-Free
5 Related Works
6 Conclusion
6 结论
Limitations
局限性
Appendix A Supplement Experiments
2510.29xxx.NL: Nested Learning: The Illusion of Deep Learning Architecture
总结 From Zhihu
总结 From Moonlight
长上下文
2510.07318_AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling
From Moonlight
Abstract
1 Instruction
2 Related work
3 Method
4 Experiments
5 Conclusion and discussion
Acknowledgement
Acknowledgement(致谢)
6 AHN instantiation
7 Additional benchmark results
大模型编辑
2405.16720_LAW: Large Scale Knowledge Washing
总结
Abstract
1 Introduction
2 Related Work
3 Preliminary
4 Problem Setup
5 Methodology
6 Experiments
总结
7 Conclusion, Limitation, and Future Work
Ethics Statement
Reproducibility Statement
Appendix A Mathematical Details of Preliminary
Appendix B Implementation Details
Appendix C Additional Experiments
2410.00487_SELF-PARAM: Self-Updatable Large Language Models by Integrating Context into Model Parameters
总结
Abstract
1 Introduction
2 Related Work
3 Methodology
4 Experiments
5 Conclusion and Future Work
Ethics Statement
Reproducibility Statement
Appendix A Additional Settings
Appendix B Additional Experiments
2410.02355_AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
总结
Abstract
1 Introduction
2 Preliminary
3 Method
4 Experiment
5 Related Work
6 Limitations & Future Discussion
7 Conclusion
Ethics Statement
Reproducibility
Acknowledgement
Appendix A Experimental Setup
Appendix B Implementation Details of Current Model Editing & Related Proofs
Appendix C More Experimental Results
Appendix D Visualizing the Counterfact and ZSRE Datasets Through Examples
分布式模型
1701.06538_MoE: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
1806.03377_PipeDream: Fast and Efficient Pipeline Parallel DNN Training
Abstract
1. Introduction
2. Background & Related Work
3. Parallel Training in PipeDream
4. Implementation
5. Evaluation
1811.06965_GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
收集
1. Introduction
2. The GPipe Library
3. Performance Analyses
4. Image Classification
5. Massive Massively Multilingual Machine Translation
6. Design Features and Trade-Offs
1909.08053_Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
收集
Abstract
1. Introduction
2. Background and Challenges
3. Model Parallel Transformers
19xx_PipeDream: Generalized Pipeline Parallelism for DNN Training
收集
ABSTRACT
1. Introduction
2. BACKGROUND AND RELATED WORK
3. 流水线并行(PIPELINE PARALLELISM)
4. 实现
6. 结论
2006.09503_PipeDream-2BW: Memory-Efficient Pipeline-Parallel DNN Training
Abstract
2006.15704_PyTorch Distributed: Experiences on Accelerating Data Parallel Training
2006.16668_GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
2104.04473_Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Abstract
2205.14135_FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Abstract
1. Introduction
2 Background
3. FLASHATTENTION: Algorithm, Analysis, and Extensions
4. Experiments
5. Limitations and Future Directions
Appendix A Related Work
Appendix B Algorithm Details
Appendix C Proofs
Appendix D Extension Details
2307.08691_FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Abstract
1. Introduction
2. Background
3. FlashAttention-2: Algorithm, Parallelism, and Work Partitioning
4. Empirical Validation
5. Discussion and Future Directions
通用
LLM 量化
通用
混合精度
浮点数格式
weight-only quantization
2110.02861_bitsandbytes: 8-bit Optimizers via Block-wise Quantization
Abstract
1. Background
2. 8-bit Optimizers
3. 8-bit vs 32-bit Optimizer Performance for common Benchmarks
4. Analysis
5. Related Work
2206.01861_ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Abstract
1. Introduction
2. Relative Work
3. Background and Challenges
4. Methodology
5. Results
6. Conclusions
Appendix A Background
Appendix D Details about System Optimization
2206.09557_LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Abstract
1. Instructions
2. Background
3. Design Methodology of LUT-GEMM
4. Experimental results
5. Accelerating Quantized OPT-175B
6. Conclusion
Appendix A LLM Inference Latency Breakdown
Appendix B Detailed Implementation
2208.07339_LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
相关参考
Abstract
1. Introduction
2. Background
3. Int8 Matrix Multiplication at Scale
4. Emergent Large Magnitude Features in Transformers at Scale
5. Related Work
6. Discussion and Limitations
7. Broader Impacts
其他
2209.05433_FP8: FP8 Formats For Deep Learning
Abstract
1. Introduction
2. Aspects of FP8 Usage in Deep Learning
3. FP8 Binary Interchange Format
示例讲解
4. Empirical Results
5. Conclusions
2210.17323_GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Abstract
1. Introduction
2. Related Work
3. Background
4. The GPTQ Algorithm
5. Experimental Validation
6. Summary and Limitations
2211.10438_SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Abstract
1. Introduction
2. Preliminaries
3. Review of Quantization Difficulty
4. SmoothQuant
5. Experiments
6. Related Work
7. Conclusion
Appendix A. Discussion on Weight-Only Quantization
2305.14314_QLoRA: Efficient Finetuning of Quantized LLMs
关键词
Abstract
1. Introduction
2. Background
3. QLoRA Finetuning
4. QLoRA vs. Standard Finetuning
5. Pushing the Chatbot State-of-the-art with QLoRA
6. Qualitative Analysis
7. Related Work
8. Limitations and Discussion
2306.00978_AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Abstract
1. Introduction
2. Related Work
3. AWQ: Activation-aware Weight Quantization
4. TinyChat: Mapping AWQ onto Edge Platforms
5. Experiments
6. Conclusion
2309.05516_AutoRound: Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Abstract
1. Introduction
2. Related Work
3. Methodology
4. Experiments
5. Conclusion
图神经网络模型
1812.08434_GNNs: Graph Neural Networks: A Review of Methods and Applications
论文解读
结论
Abstract
1. Introduction
2. General design pipeline of GNNs
3. Instantiations of computational modules
4. Variants considering graph type and scale(不同图类型与规模的GNN变体)
5. Variants for different training settings
6. A design example of GNN
7. Analyses of GNNs
8. Applications
✅ 总结表格(图像 vs 文本):
9. Open problems
10. Conclusion
Appendix A. Datasets
LLM 安全
2312.06674_Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
LLM强化学习
🏀常用
三大模型
时序差分残差
Bradley-Terry模型
马尔可夫决策过程
动态规划
贝尔曼方程
Q-learning
❇️1502.05477_TRPO: Trust Region Policy Optimization
总结
Abstract
1 Introduction
2 Preliminaries
3 Monotonic Improvement Guarantee for General Stochastic Policies
4 Optimization of Parameterized Policies
5 Sample-Based Estimation of the Objective and Constraint
6 Practical Algorithm
7 Connections with Prior Work
8 Experiments
9 Discussion
Appendix A Proof of Policy Improvement Bound
Appendix B Perturbation Theory Proof of Policy Improvement Bound
Appendix C Efficiently Solving the Trust-Region Constrained Optimization Problem
Appendix D Approximating Factored Policies with Neural Networks
Appendix E Experiment Parameters
Appendix F Learning Curves for the Atari Domain
1602.01783_A3C: Asynchronous Methods for Deep Reinforcement Learning
总结
Abstract
1 Introduction
2 Related Work
3 Reinforcement Learning Background
4 Asynchronous RL Framework
5 Experiments
6 Conclusions and Discussion
7 Optimization Details
8 Experimental Setup
9 Continuous Action Control Using the MuJoCo Physics Simulator
❇️1707.06347_PPO: Proximal Policy Optimization Algorithms
总结
From DeepSeek
示例-FromDeepseek
❇️2203.02155_InstructGPT: Training language models to follow instructions with human feedback
总结
Abstract
1. Introduction
2. Related work
3. Methods and experimental details
4. Results
5. Discussion
Appendix A Additional prompt data details
Appendix B Additional human data collection details
Appendix C Additional model details
Appendix D Automatic evaluation details
❇️2305.18290_DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model
总结
Abstract
1 Introduction
2 Related Work
3 Preliminaries
4 Direct Preference Optimization
5 Theoretical Analysis of DPO
6 Experiments
7 Discussion
Author Contributions
Appendix A Mathematical Derivations
Appendix B DPO Implementation Details and Hyperparameters
Appendix C Further Details on the Experimental Set-Up
Appendix D Additional Empirical Results
2310.12036ΨPO: A General Theoretical Paradigm to Understand Learning from Human Preferences
From Moonlight
2402.03300_DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
From Moonlight
Abstract
1 Introduction
1 Introduction(引言)
1.1 Contributions(贡献)
1.2 Summary of Evaluations and Metrics(评估与指标总结)
2 Math Pre-Training
2 Math Pre-Training(数学预训练)
总结
3 Supervised Fine-Tuning
3 Supervised Fine-Tuning(监督微调)
总体总结
4 Reinforcement Learning
5 Discussion
5. 讨论
6 Conclusion, Limitation, and Future Work
6 结论、局限与未来工作
Appendix A Appendix
2409.19256_❇️HybridFlow: A Flexible and Efficient RLHF Framework
总结
LLM总结
From Moonlight
Abstract
1. Introduction
2. Background and Motivation
3. HybridFlow Overview
4. Hybrid Programming Model
5. 3D-HybridEngine
6. Auto Device Mapping
7. Implementation
8. Evaluation
9. Discussions
10. Related Work
11. Conclusion
Appendix A Primitive APIs in HybridFlow
Appendix A HybridFlow 中的基本 API
Appendix B Transfer Protocols
附录 B:数据传输协议(Transfer Protocols)
表4:各模型类提供的关键函数
算法 2:自动并行算法(Auto Parallelism Algorithm)
Appendix C Auto-Parallelism Algorithm
附录C 自动并行算法
❇️2503.14476_DAPO: An Open-Source LLM Reinforcement Learning System at Scale
总结
Abstract
1 Introduction
2 Preliminary
3 DAPO
4 Experiments
5 Conclusion
Contributions
Acknowledgments
6 Dataset Transformation
7 Supplementary Case
其他
1703.03864_Evolution Strategies: as a Scalable Alternative to Reinforcement Learning
2305.14387_AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
2401.08417_CPO: Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
2403.00409_Provably Robust DPO: Aligning Language Models with Noisy Feedback
2504.02495_DeepSeek-GRM: Inference-Time Scaling for Generalist Reward Modeling
2504.13958_ToolRL: Reward is All Tool Learning Needs
其他
2305.20050_Let’s Verify Step by Step
1. 研究背景
2. 监督方法对比
3. 核心发现
总结
2408.03314_Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
1. Introduction
3. How to Scale Test-Time Computation Optimally
5. Scaling Test-Time Compute via Verifiers
6. Refining the Proposal Distribution
其他
2412.14135_Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
FromGPT
1. Introduction
2. Background
3. Policy Initialization
4. Reward Design
5. Search
6. Learning
7 Open-source o1 Project
8. Future Directions
机器学习
近邻搜索
10xx.xxxxx_PQ: Product Quantization for Nearest Neighbor Search
总结
From Deepseek
From Deepseek 全文总结
周边概念
1603.09320_HNSW: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
总结
From Deepseek
From Deepseek 全文总结
2007.00808_ANCE: Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
总结
Abstract
1 Introduction
2 Preliminaries
3 Analyses on The Convergence of Dense Retrieval Training
4 Approximate Nearest Neighbor Noise Contrastive Estimation
5 Experimental Methodologies
6 Evaluation Results
7 Related Work
8 Conclusion
Appendix A Appendix
总体总结
Embedding
1603.09320_HNSW: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
总结
From Deepseek
2004.04906_DPR: Dense Passage Retrieval for Open-Domain Question Answering
总结
Abstract
1 Introduction
2 Background
3 Dense Passage Retriever (DPR)
4 Experimental Setup
5 Experiments: Passage Retrieval
6 Experiments: Question Answering
7 Related Work
8 Conclusion
Acknowledgments
Appendix A Distant Supervision
Appendix B Alternative Similarity Functions & Triplet Loss
Appendix C Qualitative Analysis
Appendix D Joint Training of Retriever and Reader
2205.12035_RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder
总结
Abstract
1 Introduction
2 Related works
3 Methodology
4 Experimental Studies
5 Conclusion
6 Limitations
2205.13147_MRL: Matryoshka Representation Learning
总结
DeepSeek 总结
Abstract
1 Introduction
2 Related Work
3 Matryoshka Representation Learning
4 Applications
5 Further Analysis and Ablations
6 Discussion and Conclusions
Acknowledgments
Appendix A Code for Matryoshka Representation Learning
Appendix B Datasets
Appendix C Matryoshka Representation Learning Model Training
Appendix D Classification Results
Appendix E Image Retrieval
Appendix F Adaptive Retrieval
Appendix G Few-shot and Sample Efficiency
Appendix H Robustness Experiments
Appendix I In Practice Costs
Appendix J Analysis of Model Disagreement
Appendix K Ablation Studies
ML Vision
1506.02640_You Only Look Once: Unified, Real-Time Object Detection
Abstract
1612.08242_YOLO9000: Better, Faster, Stronger
Abstract
1804.02767_YOLOv3
2004.10934_YOLOv4: Optimal Speed and Accuracy of Object Detection
Abstract
2205.00159_SVTR: Scene Text Recognition with a Single Visual Model
Abstract
1. Introduction
2. Method
3. Experiments
4. Conclusion
2207.02696_YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Abstract
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
2304.08485_Visual Instruction Tuning
2402.13616_YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Abstract
2405.14458_YOLOv10: Real-Time End-to-End Object Detection
Abstract
2411.15858_SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
定义
Abstract
1. Introduction
2. Related Work
3. Methods
4 Experiments
5. Conclusion
8. More detail of real-world datasets
ML
2108.00941_Human-in-the-loop: A Survey of Human-in-the-loop for Machine Learning
总结
Abstract
1 Introduction
2 Data Processing
3 Model Training and Inference
4 System construction and Application
5 Discussion and Future Directions
6 Conclusion
2112.09332_WebGPT: Browser-assisted question-answering with human feedback
2203.11147_GopherCite: Teaching language models to support answers with verified quotes
2304.09848_Generative_Search: Evaluating Verifiability in Generative Search Engines
2305.14251_FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
2305.14627_ALCE: Enabling Large Language Models to Generate Text with Citations
NLI 在引用质量评估中的应用
论文中用的prompt
2307.02185_Citation: A Key to Building Responsible and Accountable Large Language Models
2307.16883_HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
AI Agent
通用 Agent
2210.03629_ReAct
2303.08268_Chat-with-the-Environment
正文
2303.11366_Reflexion: Language Agents with Verbal Reinforcement Learning
2303.16434_TaskMatrix.AI
大脑
接口平台
API 选择器
2304.03442_Generative-Agents
Generative Agent Architecture
2307.07924_ChatDev: Communicative Agents for Software Development
2308.00352_MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
2308.04026_AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
2308.08155_AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
2308.10848_AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
理念
2310.06117_Step-Back: Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
2312.04511_LLMCompiler: An LLM Compiler for Parallel Function Calling
总结
Abstract
1. Introduction
2. Related Work
2.1. Latency Optimization in LLMs(LLMs的延迟优化)
2.2. Plan and Solve Strategy(计划与求解策略)
2.3. Tool-Augmented LLMs(工具增强的LLMs)
3. Methodology
3.1. Function Calling Planner(功能调用规划器)
3.2. Task Fetching Unit(任务获取单元)
3.3. Executor(执行器)
3.4. 动态重规划(Dynamic Replanning)
4.LLMCompiler Details
4.1. 用户提供的信息(User-Supplied Information)
4.2. 流式Planner(Streamed Planner)
5. Results
6. Conclusions
致谢(Acknowledgements)
A. Accuracy Analysis: ReAct vs. LLMCompiler
B. Failure Case Analysis of LLMCompiler
C. Related Work
D. Experimental Details
E. Analysis
总结
F. Additional Discussions about Related Works
G. User-Supplied Examples for LLMCompiler Configuration
G.1 电影推荐示例提示语(Movie Recommendation Example Prompts)
G.2 24点游戏示例提示语(Game of 24 Example Prompts)
H. Pre-defined LLMCompiler Planner Prompts
I. ParallelQA Benchmark Generation
J. Details of the Game of 24 and the Tree-of-Thoughts Approach
K. Details of WebShop Experiments
2402.18679_MetaGPT_DI: Data Interpreter: An LLM Agent For Data Science
INTRODUCTION
2407.07061_IoA: Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
2.1 OVERVIEW OF IOA
2.2 ARCHITECTURE OF IOA
2.3 KEY MECHANISMS
2.5 Putting It All Together
2408.08435_ADAS: Automated Design of Agentic Systems
Prompt
2408.08435_ADAS: Automating Agentic Workflow Generation
Introduce
PRELIMINARY
2410.17238_SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning
1 Introduction
2 Related Works
3 Method
2410.21012_FACT: Examining the Effectiveness of Iterative Context Rewriting for Multi-fact Retrieval
Introduce
2504.01990_Advances and Challenges in Foundation Agents
2506.12508_AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving
Abstract
1.Introduction
3.AgentOrchestra
4.Experiments
2510.08842_Maple: A Multi-agent System for Portable Deep Learning across Clusters
总结
Abstract
I Introduction
II Background
III System Design
IV Implementation
V Experiments
VI Error Analysis
VII Related Work
VIII Conclusion
DeepResearch
2509.13313_ReSum: Unlocking Long-Horizon Search Intelligencevia Context Summarization
总结
From Moonlight
Abstract
摘要(Abstract)总结
1 Introduction
1 引言(Introduction)
2 Preliminary
2. 预备知识(Preliminary)
3 Methodology
3 方法(Methodology)
4 Experiments and Analysis
5 Related Works
5 相关工作总结
6 Conclusion
6 结论(Conclusion)
Appendix A Algorithm Pseudo-Code
Appendix B Prompt
Appendix C Implementation Details
附录 C 实现细节(Appendix C Implementation Details)
Appendix D Discussion with MEM1
Appendix E Supplementary Materials for Experiments
附录 E 实验补充材料
for user goal Extract number of specimens used in the study comparing jump performances of C. canis and C. felis felis as follows: …
章节标题:Jump Performance Comparison of Ctenocephalides canis and Ctenocephalides felis felis
2510.21618_❇️DeepAgent: A General Reasoning Agent with Scalable Toolsets
总结
From Moonlight
Abstract
1. Introduction
2. Related Work
3. Methodology
4. Experimental Settings
5. Experimental Results
6. Conclusion
Appendix A Datasets
Appendix B Baselines
Appendix C Implementation Details
Appendix D Memory Schema
Appendix E Case Study
视觉 Agent&AIOS
2108.03353_ Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Abstract
1. Introduction
2. Related Work
3. Dataset Creation
4. Model Design
其它
2209.08199_ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
Abstract
1. Introduction
2. Related Work
3. Problem Setting: Tasks and Metrics
4. Data Annotation
5. Dataset Analysis
6. Experiments and Baselines
7. Conclusion
8. Limitations
9. Ethical Considerations
A. Data Annotation Details
B. Data Examples
2212.06817_RT-1: ROBOTICS TRANSFORMER FOR REAL-WORLD CONTROL AT SCALE
ABSTRACT
1. Introduction
2. Related Work
3. Preliminaries
4. System Overview
5. RT-1: ROBOTICS TRANSFORMER
6. EXPERIMENTS
7. CONCLUSIONS, LIMITATIONS AND FUTURE WORK
B. MODEL CARD
C. MODEL AND DATA
D. EXPERIMENTS
2312.13771_AppAgent: Multimodal Agents as Smartphone Users
3.1 Environment and Action Space
3.2 Exploration Phase
3.3 Deployment Phase
2401.10935_SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Abstract
1. Introduction
2. Related work
3. Approach
4. ScreenSpot: A Grounding Benchmark
5. Experiments
6. Conclusion
Limitations
Ethical considerations
A. Details of SeeClick Pre-training
B ScreenSpot Annotation & Evaluation
C. Downstream Agent Tasks
2402.04615_ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Abstract
1. Introduction
2. Methodology
3. Automatic data generation
4. Data Mixtures
5. Experiments and Results
6. Conclusions
A Definitions of Metrics
B. Screen Schema Examples
C. Prompts For LLM Generated Content
D. Screen Navigation Generated Examples
F. ScreenQA Short Answers Generation
G. Complex Question Answering Datasets
H. New Benchmarks Repositories
2402.07939_UFO: A UI-Focused Agent for Windows OS Interaction
Abstract
1.Introduction
2.Related Work
3.The Design of UFO
4.Experiment
5.Limitations & Lessons Learned
6.Conclusion
2403.16971_AIOS: LLM Agent Operating System
Abstract
1. Introduction
2. The Architecture of AIOS
3. AIOS Kernel
4 Evaluation
Appendix E Discussion
2406.01014_Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
2411.00820_AutoGLM: Autonomous Foundation Agents for GUIs
总结
Abstract
1 Introduction
2 AutoGLM: Techniques and Insights
3 Results
3.1 在 Web 上的评估
3.2 在 Android 上的评估
4 Conclusion
2411.02059_TableGPT2: A Large Multimodal Model with Tabular Data Integration
Abstract
2501.11733_Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Abstract
1. Introduction
2. Mobile-Agent-E
3. Experiments
4. Results
5. Related Work
6. Conclusion and Future Work
Appendix A Full Trajectory Comparison Example with Previous SOTA
Appendix B Error Recovery with Escalation to Manager
Appendix C Remaining Limitations
Appendix D All Tasks in Mobile-Eval-E Benchmark
Appendix E Atomic Operation Space
Appendix F Full list of Self-Evolved Shortcuts
Appendix G Full list of Self-Evolved Tips
2501.12326_UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Abstract
1. Introduction
2. Evolution Path of GUI Agents
3. Core Capabilities of Native Agent Model
4. UI-TARS
5. Experiment
6. Conclusion
2502.14282_PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Abstract
1. Introduction
2. PC-Agent
3. Experiments
4. Related Work
5. Conclusion
2504.14603_UFO2: The Desktop AgentOS
Abstract
1.Introduction
2.Background
3.System Design of UFO2
4.Picture-in-Picture Interface
5.Implementation and Specialized Engineering Design
6.Evaluation
7.Discussion & Future Work
8.Related Work
9.Conclusion
2508.04037_SEA: Self-Evolution Agent with Step-wise Reward for Computer Use
总结
Abstract
I Introduction
I 引言
II Related Works
II Related Works
总结
III Method
总结
IV Experiments
IV 实验
V Conclusion
V 结论
音频 Agent
2509.06221_Beamforming-LLM: What, Where and When Did I Miss?
总结
Abstract
1. Introduction
2. Related Work
3. Methods
4. Results
5. Discussion and Conclusion
Tools
2205.00445_MRKL
2302.04761_Toolformer: Language Models Can Teach Themselves to Use Tools
2303.17580_HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
2307.16789_ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
总结
LLM总结
Abstract
1 Introduction
2 Dataset Construction
3 Experiments
4 Related Work
5 Conclusion
Appendix
Appendix A Implementation Details
AGI
1905.10985_AI-GA: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence
2408.06292_The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
RAG
2005.11401_Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
2312.10997_Retrieval-Augmented Generation for Large Language Models: A Survey
II. Overview of RAG
II-A Naive RAG
II-B Advanced RAG
II-C Modular RAG
II-D RAG vs Fine-tuning
III. Retrieval
III-A Retrieval Source
III-B Indexing Optimization
III-C Query Optimization
III-D Embedding
III-E Adapter
IV. Generation
IV-A Context Curation
IV-B LLM Fine-tuning
V. Augmentation process in RAG
V-A Iterative Retrieval
V-B Recursive Retrieval
V-C Adaptive Retrieval
VI. Task and Evaluation
VI-A Downstream Task
VI-B Evaluation Target
VI-C Evaluation Aspects
VI-D Evaluation Benchmarks and Tools
VII. Discussion and Future Prospects
VII-A RAG vs Long Context
VII-B RAG Robustness
VII-C Hybrid Approaches
VII-D Scaling laws of RAG
VII-E Production-Ready RAG
VII-F Multi-modal RAG
2401.15884_CRAG: Corrective Retrieval Augmented Generation
2403.14403_Adaptive-RAG
2404.12457_RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
总结
Abstract
1. Introduction
1. 引言概述
2. 现有工作与局限
3. RAGCache系统
4. 实验结果
5. 主要贡献
2. Background
3. RAG System Characterization
一、性能瓶颈分析
二、优化机会分析 —— 缓存中间状态
总结
4. RAGCache Overview
主要内容总结如下:
总结
5. RAGCache Design
5.1. Cache Structure and Replacement Policy
5.2. Cache-aware Reordering
5.3 动态推测流水线(Dynamic Speculative Pipelining)
总结
6. Implementation
系统实现
向量搜索优化(Pipelined Vector Search)
容错机制(Fault Tolerance)
7. Evaluation
7.1 总体性能
7.2 通用设置下的案例研究
7.3 消融研究
7.4 调度时间
总结
8. Discussion
9. Related Work
10. Conclusion
2404.16130_GraphRAG: From Local to Global: A GraphRAG Approach to Query-Focused Summarization
总结
LLM 总结
Abstract
1 Introduction
2 Background
2.1 RAG方法与系统
2.2 知识图谱在LLM与RAG中的应用
2.3 自适应基准测试
2.4 RAG评估标准
3 Methods
3.1 GraphRAG 工作流程
3.2 全局理解问题生成
3.3 全局理解评估标准
总结
4 Analysis
4.1 实验1
4.2 实验2
总结
5 Results
5.1 实验一:不同方法在摘要任务中的表现比较
5.2 实验二:基于声明的指标评估
总结
6 Discussion
6.1 评估方法的局限性
6.2 未来工作
更广泛的影响
7 Conclusion
Appendix A Entity and Relationship Extraction Approach
1. 实体与关系抽取方法
2. 自我反思(Self-Reflection)技术
3. 分块大小与抽取效果的关系
4. 实验结果(图3)
总结
Appendix B Example Community Detection
Appendix C Context Window Selection
Appendix D Example Answer Comparison
Appendix E System Prompts
E.1 实体实例生成(Element Instance Generation)
E.2 社区摘要生成(Community Summary Generation)
E.3 社区问题回答生成(Community Answer Generation)
E.4 全局问题回答生成(Global Answer Generation)
Appendix F Evaluation Prompts
F.1 Relative Assessment Prompt
F.2 Relative Assessment Metrics
Appendix G Statistical Analysis
统计方法:
主要结果总结:
总体趋势:
重要结论:
2405.16506_GRAG: Graph Retrieval-Augmented Generation
总结
LLM 总结
Abstract
1 Introduction
2 Related Work
2.1 Prompt Tuning
2.2 LLMs在图相关任务中的应用
2.3 图上的检索方法
3 Problem Formalization
4 Methodology
概述
4.1 文本子图检索
文本子图索引(Indexing)
文本子图排序(Ranking)
文本子图软剪枝(Soft Pruning)
总结
4.2 Textual Graph Augmented Generation
1. 文本视图(Text View of Textual Graphs)
2. 图视图(Graph View of Textual Graphs)
3. 生成阶段(Generation Phase)
总结
5 Experiments
总结:第五章 实验部分
6 Conclusion
7 Limitations
Acknowledgments
Appendix A Appendix
附录A 总结
总结
2406.13213_Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata
2410.05779_LightRAG: Simple and Fast Retrieval-Augmented Generation
总结
Abstract
1 Introduction
2 Retrieval-Augmented Generation
3 The LightRAG Architecture
一、LightRAG架构概述
二、基于图的文本索引(Graph-based Text Indexing)
三、双层检索范式(Dual-level Retrieval Paradigm)
四、检索增强的答案生成(Retrieval-Augmented Answer Generation)
五、复杂度分析
总结
4 Evaluation
1. 实验设置(4.1 Experimental Settings)
2. LightRAG 与现有 RAG 方法的对比(4.2 RQ1)
3. 消融实验(4.3 RQ2)
总结
4.4 Case Study (RQ3)
4.4 案例研究(RQ3)总结:
4.5 模型成本与适应性分析(RQ4)总结:
总体结论:
5 Related Work
第5章 相关工作(总结)
6 Conclusion
7 Appendix
2410.10450_KBLaM: Knowledge Base augmented Language Model
Abstract
1. Introduction
2. Related work
3. Background
Self-attention layer
4. Augmenting LLM with the KB
Knowledge tokens
Rectangular Attention: Injecting knowledge token into prompt tokens
KB length generalization through attention score scaling
5. KB instruction tuning
6. EXPERIMENTS
6.1 EXPERIMENT SETTING
6.2 EXPERIMENT RESULTS
总结亮点
7. CONCLUSION
8. LIMITATIONS AND FUTURE WORK
Appendix A Extended related work
Appendix B Ablation study
Appendix C Sample KB
SAMPLE Q&A
PROMPT
PROMPT FOR SYNTHETIC KB GENERATION
Prompt for open-ended Q&A generation
PROMPT FOR GPT EVALUATION OF OPEN-ENDED Q&A
PROMPT FOR LLAMA EVALUATION
QUESTION TEMPLATE
SAMPLE OUTPUT
SYNTHETIC KB
ENRON
2504.03137_LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph
Abstract
Introduction
Related Work
LLM Prompt Engineering
KG-based LLM Reasoning
Preliminaries
1. Knowledge Graph (KG)
2. Anchor Entities
3. Relation Link
4. Reasoning Path
Methodology
Stage1: Reasoning Graph Retrieval
Stage2: Knowledge Embedding
Stage3: Knowledge Prompts Mixed Reasoning
Experiments
Conclusion
GraphRAG 官方文档
Indexing
> Indexing Architecture
> Indexing Dataflow
> Prompt Tuning
Query
论文池
2501.12948❇️_DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
From Moonlight
三句摘要
关键词
摘要
Abstract
1. Introduction(引言)
2. Approach(方法)
3. Experiment(实验)
4. Discussion(讨论)
5. Conclusion, Limitations, and Future Work(结论、局限与未来工作)
6. A Contributions and Acknowledgments(贡献与致谢)
总结重点
1 Introduction
1.1 Contributions(贡献)
后训练:基于基础模型的大规模强化学习
蒸馏:小模型也能很强大
1.2 Summary of Evaluation Results(评估结果概要)
推理任务
知识任务
其他任务
小结
2 Approach
2.1 Overview(概述)
2.2 DeepSeek-R1-Zero: 强化学习应用于基础模型
2.2.1 强化学习算法
2.2.2 奖励建模
2.2.3 训练模板
2.2.4 DeepSeek-R1-Zero的性能、自进化过程与“顿悟时刻”
2.3 DeepSeek-R1: 强化学习结合冷启动
2.3.1 冷启动(Cold Start)
2.3.2 推理导向的强化学习
2.3.3 拒收采样与监督微调(SFT)
2.3.4 面向所有场景的强化学习
2.4 蒸馏:将推理能力赋予小型模型
总结
3 Experiment
3 Experiment 实验部分总结
3.1 DeepSeek-R1 评估
3.2 蒸馏模型评估
总体总结
4 Discussion
4 讨论
4.1 知识蒸馏 与 强化学习
4.2 不成功的尝试
总结
5 Conclusion, Limitations, and Future Work
总结
局限性与未来工作
总结重点
Appendix
1.
附录的作用
2.
常见附录内容
3.
附录的编写规范
4.
注意事项
Appendix A Contributions and Acknowledgments
附录 A 贡献与致谢
贡献者
特别说明
生成信息
2504.03182_Graphiti: Bridging Graph and Relational Database Queries
总结
From Moonlight
三句摘要
关键词
摘要
Abstract
关键点总结:
附加信息:
1. Introduction
背景与问题
研究目标
核心贡献
方法流程(图1)
实验与实现
总结贡献
2. Motivating Example
2.1 图数据库与关系数据库的对应关系
2.2 SQL 与 Cypher 查询的语义差异
2.3 数据库转换器(Database Transformer)
2.4 诱导关系模式(Induced Relational Schema)
2.5 语法导向的转译(Syntax-Directed Transpilation)
2.6 查询等价性验证
总结
3. Preliminaries
3. Preliminaries(预备知识)
总结
4. Problem Statement
4. 问题陈述(Problem Statement)总结
4.1 数据库转换语言(Language for Database Transformers)
4.2 等价性检查问题(Equivalence Checking Problem)
总结
5. Equivalence Checking Algorithm
概述
5.1. 诱导关系模式和标准转换器推断
5.2. 语法导向的转译
5.3. 归约到SQL等价性检查
6. Evaluation
Benchmarks(基准测试集)
6.1. 使用 BMC 后端的 Graphiti 评估(VeriEQL)
6.2. 使用演绎验证器的 Graphiti 评估(Mediator)
6.3. 转译质量评估
总结
7. Related Work
1.
SQL 的自动推理(Automated reasoning for SQL)
2.
数据库实例之间的迁移(Migration between database instances)
3.
数据表示重构(Data representation refactoring)
4.
图数据库查询语言(Graph database query languages)
5.
数据库查询测试(Testing database queries)
6.
Cypher 查询转译工具(Transpiling Cypher queries)
总结
8. Limitation
主要局限:
重点说明:
实用性验证:
未来方向:
总结:
9. Conclusion and Future Work
9. 结论与未来工作
Appendix A Semantics of Cypher Queries
查询语义
子句语义
路径模式语义
表达式语义
谓词语义
Appendix B Transpilation of Cypher Predicates and Expressions
1. 表达式的转译规则(Figure 21)
2. 谓词的转译规则(Figure 22)
示例 B.1
总结
Appendix C An Equivalent Cypher Query of Motivating Example
原始 Cypher 查询的问题
修正后的 Cypher 查询
查询结构(重点内容)
总结
Appendix D Qualitative Analysis of Manually-Written Buggy Queries
1. 使用嵌套 MATCH 而非存在性模式(Existential Pattern)
2. 错误使用路径模式(Path Pattern)进行 OPTIONAL MATCH
3. 同一标签的节点或边使用不当
总结
Appendix E Comparing Graphiti’s Transpiler with OpenCypherTranspiler
原文结构总结
1. 总体比较
2. 转译结果表格分析(Table 5)
3. OpenCypherTranspiler 的典型错误示例
总结
Appendix F Proofs
定理 F.1(翻译的正确性)
引理 F.2
引理 F.3
引理 F.4
引理 F.5
定理 F.6(翻译的完备性)
引理 F.7
引理 F.8
引理 F.9
引理 F.10
引理 F.11
引理 F.12
定理 F.13(正确性)
定理 F.14(完备性)
2505.00675_Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
From Moonlight
三句摘要
关键词
摘要
Abstract
1 Introduction
2 Memory Foundations
2 记忆基础
2.1 记忆分类
2.2 记忆操作
2.3 记忆管理
2.4 记忆使用
小结
3 From Operations to Key Research Topics
3 从操作到关键研究主题
3.1 长期记忆
3.2 长上下文
3.3 参数化记忆修改
3.4 多源记忆
未来方向
4 Memory In Practice
4 Memory In Practice(记忆在实践中的应用)
4.1 Applications(应用)
4.2 Products(产品)
4.3 Tools(工具)
5 Memory in Humans and AI Systems
5 人类与人工智能系统的记忆
记忆系统的基本功能与结构
人类与人工智能记忆的差异
面向未来:记忆系统带来的挑战
表2:人类与智能体记忆的关键差异
6 Open Challenges and Future Directions
6 开放性挑战与未来方向
6.1 专题方向
6.2 更广泛视角(Broader Perspectives)
Appendix A GPT-based Pipeline Selection
附录 A 基于 GPT 的流水线选择
Appendix B Relative Citation Index
Appendix B: Relative Citation Index
1.
论文“年龄”计算方法
2.
引用与年龄关系的建模方式
3.
数据收集与处理
4.
RCI 的计算公式
5.
RCI 的应用与发现
6.
图表说明
总结
Appendix C Chord Analysis of Interactions Among Memory Types, Operations, Topics, and Venues
附录C 记忆交互的和声分析:类型、操作、主题和会议场所
C.1 记忆类型、操作和主题的交互
C.2 记忆交互在会议场所中
附录C相关图表和表格总结
附录C总结
2507.19849_Agentic Reinforced Policy Optimization
总结
From Moonlight
三句摘要
关键词
摘要
Agentic Reinforced Policy Optimization
Agentic Reinforced Policy Optimization
Agentic Reinforced Policy Optimization
摘要(Abstract)
1. 引言(Introduction)
2. 相关工作(Related Work)
3. 方法(Methodology)
4. 实验(Experiments)
5. 讨论(Discussion)
6. 结论(Conclusion)
Agentic Reinforced Policy Optimization
标题:
Agentic Reinforced Policy Optimization(ARPO)
作者与单位
联系方式与项目链接
备注说明
总结
Abstract
摘要(Abstract)总结
1 Introduction
1 引言(Introduction)
背景与动机
现有方法的局限性
问题分析与观察
提出方法:ARPO
实验与结果
主要贡献总结
2 Preliminary
2 预备知识(Preliminary)
2.1 基于智能体的强化学习(Agentic Reinforcement Learning)
2.2 推理过程中的Token熵分析(Analyzing Token Entropy in Agentic Reasoning)
2.3 智能体工具设计(Agentic Tool Design)
3 Agentic Reinforce Policy Optimization
3. Agentic Reinforce Policy Optimization (ARPO)
总结
4 Experiment
4 实验
4.1 数据集
4.2 基线方法
4.3 训练指南
4.4 评估指标
4.5 主要结果
4.6 定量分析
4.7 ARPO的扩展性分析
5 Related Work
5 相关工作(Related Work)总结
5.1 可验证奖励的强化学习(Reinforcement Learning with Verifiable Reward)
5.2 代理式强化学习(Agentic Reinforcement Learning)
总结
6 Conclusion
6 结论(Conclusion)
核心内容讲解:
小结:
Appendix
Appendix A Datasets
Appendix A Datasets
A.1 Mathematical Reasoning Benchmarks
A.2 Knowledge-Intensive Reasoning Benchmarks
A.3 Deep Search Benchmarks
总结
Appendix B Baselines
附录 B 基线模型
总结
Appendix C Implementation Details
附录 C 实现细节总结
总结
Appendix D Theoretical Analysis and Proofs
附录 D 理论分析与证明
D.1 软优势估计的理论分析
D.2 GPG 定理的理论证明
Appendix E The Algorithm Workflow of ARPO
附录 E:ARPO 算法流程
输入参数
算法流程
输出
重点内容总结
不重要内容精简
Appendix F Case Study
附录 F 案例研究
表 4:HLE 数据集中的一个例子
表 5:GAIA 数据集中的一个例子
表 6:GAIA 数据集中的另一个例子
表 7:HLE 数据集中的另一个例子
表 8:AIME24 数据集中的一个例子
表 9:HotpotQA 数据集中的一个例子
2511.20857_Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
总结
From Moonlight
三句摘要
关键词
摘要
Abstract
1 Introduction
核心问题:记忆系统的静态性
现有基准的局限性
解决方案:Evo-Memory
覆盖任务类型与记忆模块
新方法:ExpRAG 与 ReMem
贡献总结
2 Related Work
2.1 测试时学习
2.2 自演化记忆
图3:ReMem 智能体框架概述
3 Evo-Memory: Evaluating Self-Evolving Memory in LLM Agents
概述
3.1 问题设定(Problem Formulation)
3.2 ExpRAG: Experience Retrieval and Aggregation
3.3 ReMem: Synergizing Reasoning, Acting, and Memory
总结
4 Experiments
4.1 实验设置
4.2 实验
4.3 结果分析(RQ1)
4.4 记忆改进分析(RQ2)
4.5 任务序列:简单 vs. 困难(RQ3)
4.6 反馈分析(RQ4)
4.7 时间步性能(RQ5)
5 Conclusion
附录(Appendix)
1. 2.1 Test-time Learning(测试时学习)
2. 2.2 Self-evolving Memory(自演化记忆)
3.1 Problem Formulation(问题定义)
3.2 ExpRAG: Experience Retrieval and Aggregation(经验检索与聚合)
3.3 ReMem: Synergizing Reasoning, Acting, and Memory(推理、行为与记忆的协同)
4.1 Experimental Setup(实验设置)
4.2 Experiments(实验设计)
4.3 Analysis of Results (RQ1)(结果分析 - 研究问题1)
4.4 Analysis of Memory Improvement (RQ2)(记忆改进分析 - 研究问题2)
4.5 Task Sequence: Easy vs. Hard (RQ3)(任务顺序影响 - 研究问题3)
4.6 Analysis of Feedback (RQ4)(反馈机制分析 - 研究问题4)
4.7 Performance w.r.t Time Steps (RQ5)(时间步长性能分析 - 研究问题5)
6. A Experimental Details(实验细节)
A.1 Datasets(数据集详情)
A.2 Configuration(配置参数)
A.3 Evaluation(评估细节)
A.4 Methods(方法实现细节)
7. B Experiments(补充实验)
B.1 Additional Experiments(附加实验)
B.2 Additional Analysis of Memory Pruning(记忆剪枝分析)
B.3 Additional Comparative Curves on Single-turn Tasks(单轮任务对比曲线)
8. C Prompts(提示模板)
9. D Limitations(局限性)
10. E Use of Large Language Models(大语言模型使用说明)
Appendix A Experimental Details
A.1 数据集(Datasets)
A.2 配置(Configuration)
A.3 评估(Evaluation)
A.4 方法(Methods)
总结
Appendix B Experiments
B.1 附加实验
B.2 记忆剪枝的附加分析
B.3 单轮任务的附加对比曲线
总结
总结:
Appendix D Limitations
Appendix E Use of Large Language Models
1. 使用目的
2. 使用范围
3. 结论
2512.10696_Framework for Experience-Driven Agent Evolution
总结
图解
From Moonlight
三句摘要
关键词
摘要
Abstract
核心创新点(ReMe 的三大机制)
实验与结果
总结
1 Introduction
1.1 背景与动机
1.2 理想程序性记忆系统的三大核心标准
1.3 当前方法的局限性
1.4 提出的方法:ReMe 框架
1.5 实验结果与贡献
1.6 主要贡献
2 Related Works
2.1 增强记忆的LLM智能体(Memory-enhanced LLM Agents)
2.2 经验学习策略(Experience Learning Strategies)
图2:ReMe框架概述(图示说明)
3 Methodology
3.1 ReMe 概览
3.2 经验获取
3.3 经验复用
3.4 经验精炼
表1:ReMe 与基线模型在 BFCL-V3 和 AppWorld 上的性能对比
总结
4 Experiments
4.1 实验设置
4.2 主要结果
4.3 消融研究
4.4 更多分析
总结
5 Conclusion
Limitations
1. 固定的经验检索策略
2. 经验验证机制的局限性
3. 模型规模与总结能力的关系
Appendix A Dataset Details
BFCL-V3
AppWorld
Appendix B Baseline Details
LangMem
A-Mem
总结
Appendix C Implementation Details
C.1 经验获取(Experience Acquisition)
C.2 经验检索(Experience Retrieval)
Appendix D Experience Examples
1. ReMe 方法的经验提取示例
2. 经验粒度的影响分析
3. 不同粒度经验的结构与内容对比
总结
Appendix E Additional Experimental Results
E.1 Retrieval Key Analysis(检索键分析)
E.2 Prompt Examples for Experience Extraction(经验提取的提示示例)
总结
论文池-sum
论文待回收池
2009.01325_Learning to summarize from human feedback
From Moonlight
三句摘要
关键词
摘要
Abstract
研究背景
研究方法
研究成果
分析与验证
研究意义
1 Introduction
背景与问题
研究目标与任务选择
方法概述
主要贡献
长期意义
2 Related work
与我们工作最直接相关的工作
其他使用人类反馈的研究
强化学习与自动评价指标
模型结构与预训练方法的改进
3 Method and experiment details
3.1 高层方法论(High-level methodology)
3.2 数据集与任务
3.3 收集人类反馈(Collecting human feedback)
3.4 模型(Models)
4 Results
4.1 基于人类反馈的 Reddit 帖子摘要
4.2 迁移到新闻文章摘要
4.3 理解奖励模型
4.4 摘要自动评估指标分析
5 Discussion
1. Limitations(局限性)
2. Future directions(未来方向)
3. Broader impacts(更广泛影响)
4. Acknowledgements(致谢)
Appendix A TL;DR dataset details
数据集构成
数据预处理步骤
数据集局限性说明
Appendix B Further model training details
B.1 超参数设置
B.2 输入格式
总结重点
Appendix C Human data collection details
C.1 Process for ensuring high-quality human data
C.2 Assessing human feedback quality
C.3 Labeler demographics
C.4 Labeler website
C.5 Instructions for labelers
C.6 Composition of the labeled dataset
C.7 Example comparison tasks
Appendix D Choice of baselines
Appendix E CNN/DM lead-3 vs reference summaries
主要发现
控制长度后的分析
对摘要方法的质疑
标注者行为分析
参考摘要表现差的原因
结论
Appendix F Controlling for summary length
1. 控制摘要长度的背景与方法
2. 实验结果与分析
3. CNN/DM数据集上的长度控制实验
Appendix G Additional results
G.1 价值函数消融实验
G.2 沿质量维度评估策略
G.3 最优-N 优化研究
G.4 ROUGE分数
G.5 二元组重叠统计
G.6 奖励模型验证集
G.7 不同评估指标的一致性
总结
Appendix H Samples
H.1 随机样本
H.2 过度优化样本
2305.16300_Random-Access Infinite Context Length for Transformers
Abstract
1 Introduction
2 Related Work
3 Methodology
总体思路
方法详解
位置编码处理
与其他方法的对比
总结
3.3 Memory & Computation
4 Experiments
4.1 语言建模实验
4.2 微调预训练模型
总结
5 Future Work
6 Conclusion
Acknowledgment
Appendix A Grouped Softmax Example
Appendix B Dataset Description
Appendix C Number of Unique Retrieved Blocks
Appendix D Context Miss Token
Appendix E Positional Augmentation
Appendix F Additional Extensions and Details
1.
掩码语言建模(Masked Language Modeling)
2.
与 Flash Attention 的结合
3.
检索块数量与块大小的权衡
总结
Appendix G Offloading KV Cache to CPU
2405.17935_Tool Learning with Large Language Models: A Survey
总结
From Moonlight
三句摘要
关键词
摘要
Abstract
关键词总结:
重点内容强调:
不重要内容精简:
1 Introduction
核心观点:
1.1 历史背景与工具的重要性
1.2 当前技术趋势:LLMs 的发展与局限
1.3 工具学习的兴起
1.4 研究现状与趋势
1.5 本文结构与贡献
1.6 与其他综述的比较
1.7 本文结构图(Figure 2)
1.8 GitHub 资源
总结:
2 Background
2 背景(Background)
什么是工具(What is a Tool?)
什么是工具学习(What is Tool Learning?)
总结
3 Why Tool Learning?
3 为什么需要工具学习?
3.1 知识获取
3.2 专业能力增强
3.3 自动化与效率提升
3.4 交互增强
3.5 增强可解释性与用户信任
3.6 提升鲁棒性与适应性
总结图示(图3)
4 How Tool Learning?
4 工具学习的机制
4.1 工具学习的整体范式
4.2 任务规划(Task Planning)
4.3 工具选择(Tool Selection)
4.4 工具调用(Tool Calling)
4.5 响应生成(Response Generation)
表格:工具学习基准数据集汇总
总结
5 Benchmarks, Toolkits, and Evaluation
5. Benchmarks(基准测试)
5.1.1 通用基准(General Benchmarks)
5.1.2 特定任务基准(Other Benchmarks)
5.2 Toolkits(工具包)
5.3 Evaluation(评估方法)
5.3.1 任务规划(Task Planning)
5.3.2 工具选择(Tool Selection)
5.3.3 工具调用(Tool Calling)
5.3.4 响应生成(Response Generation)
总结
6 Challenges and Future Directions
6 挑战与未来方向(Challenges and Future Directions)
6.1 工具学习中的高延迟问题(High Latency in Tool Learning)
6.2 严谨而全面的评估体系(Rigorous and Comprehensive Evaluation)
6.3 全面且易获取的工具集(Comprehensive and Accessible Tools)
6.4 安全与鲁棒的工具学习(Safe and Robust Tool Learning)
6.5 统一的工具学习框架(Unified Tool Learning Framework)
6.6 真实世界的工具学习基准(Real-World Benchmark for Tool Learning)
6.7 多模态工具学习(Tool Learning with Multi-Modal)
总结
7 Conclusion
7 结论(总结)
主要内容结构如下:
1. 引言与基础概念
2. 工具学习的重要性
3. 工具学习的四个阶段
4. 评估方法与基准测试
5. 挑战与未来方向
最后
其他信息:
2409.20163_MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants
From Moonlight
三句摘要
关键词
摘要
Abstract
摘要总结
1 Introduction
1 Introduction
本文的主要贡献如下:
后续章节安排如下:
2 Related Works
2 相关工作
LLM-based agents 的应用与记忆机制
LLM-based agents 记忆能力的评估
知识库问答(KBQA)与记忆评估的关联
本文工作的贡献
3 Methods
3.1 Overview of MemSim
3.2 Bayesian Relation Network
3.3 Causal Generation Mechanism
3.4 MemDaily: A Dataset in the Daily-life Scenario
总结
4 Evaluations
4 评估(Evaluations)
4.1 用户画像评估(Evaluation on User Profiles)
评估指标
基线方法
评估结果
4.2 用户消息评估(Evaluation on User Messages)
评估指标
基线方法
评估结果
4.3 问题与答案评估(Evaluation on Questions and Answers)
评估结果
总结
5 Benchmark
5 Benchmark 总结
5.1 Experimental Settings(实验设置)
5.2 Memory Mechanisms 的有效性(Effectiveness of Memory Mechanisms)
5.3 Memory Mechanisms 的效率(Efficiency of Memory Mechanisms)
总结
6 Limitations and Conclusions
6 局限与结论
Appendix A Proof in Bayesian Relation Network
附录 A 贝叶斯关系网络的证明
A.1 定理 1(因子化)的证明
A.2 定理 2(祖先采样)的证明
总体总结
Appendix B Extensive Evaluation on User Messages by GPT-4o
附录 B GPT-4o 对用户消息的广泛评估
表 10:GPT-4o 对用户消息评估的结果
Appendix C Extensive Benchmark on More Composite Datasets
附录 C:在更多复合数据集上的广泛基准测试
C.1 MemDaily-10 的结果
C.2 MemDaily-50 的结果
C.3 MemDaily-200 的结果
总结
Appendix D Case Studies
D.1 Case Study on Generated User Profiles
D.2 Case Study on User Messages
D.3 Case Study on Questions and Answers
总结
2411.00489_Human-inspired Perspectives: A Survey on AI Long-term Memory
总结
From Moonlight
三句摘要
关键词
摘要
Human-inspired Perspectives: A Survey on AI Long-term Memory
1. 引言(Introduction)
2. 人类长期记忆的结构与机制(Structure and Mechanisms of Human Long-term Memory)
3. AI系统中的长期记忆建模(Modeling Long-term Memory in AI Systems)
4. 与人类记忆的类比分析(Human-inspired Analysis of AI Memory Systems)
5. 应用场景(Applications of AI Long-term Memory)
6. 挑战与未来方向(Challenges and Future Directions)
总结
Human-inspired Perspectives: A Survey on AI Long-term Memory
1. 引言(Introduction)
2. 人类长期记忆的结构与机制(Structure and Mechanisms of Human Long-term Memory)
3. AI系统中的长期记忆建模(Modeling Long-term Memory in AI Systems)
3.1 编码阶段建模(Encoding)
3.2 巩固阶段建模(Consolidation)
3.3 提取阶段建模(Retrieval)
4. 人类启发的AI长期记忆系统(Human-inspired AI Long-term Memory Systems)
5. 挑战与未来方向(Challenges and Future Directions)
6. 结论(Conclusion)
总结评价
Human-inspired Perspectives: A Survey on AI Long-term Memory
第一章:引言(Introduction)
内容概述:
重点内容:
其他:
第二章:人类长期记忆机制(Human Long-term Memory Mechanisms)
内容概述:
重点内容:
其他:
第三章:AI系统中的长期记忆建模(Modeling Long-term Memory in AI Systems)
内容概述:
分类与重点内容:
其他:
第四章:评估与挑战(Evaluation and Challenges)
内容概述:
重点内容:
其他:
第五章:未来方向(Future Directions)
内容概述:
重点内容:
第六章:结论(Conclusion)
内容概述:
附录与表格(如有)
表格内容(假设):
总结
Labs
5 Meta
5 Meta
Abstract
Abstract(摘要)
3. Long-term Memory in Human Brain(人脑中的长期记忆)
3.1 Human Memory Hierarchy(人类记忆层次)
3.2 Human Memory Processing(人类记忆处理机制)
3.3 Summary(小结)
4. Long-term Memory of AI: on Storage Formats(AI长期记忆:存储格式)
4.1 Non-Parametric Memory(非参数记忆)
4.2 Parametric Memory(参数记忆)
4.3 Summary(小结)
5. Long-term Memory of AI: on Human Perspectives(AI长期记忆:人类视角)
5.1 Episodic Memory(情景记忆)
5.2 Semantic Memory(语义记忆)
5.3 Procedural Memory(程序性记忆)
5.4 Summary(小结)
6. A New Cognitive Architecture for Long-term Memory(新的长期记忆认知架构)
6.1 Cognitive Architecture of Self-Adaptive Long-term Memory (SALM)
7. Next Steps of AI Long-term Memory(AI长期记忆的未来方向)
7.1 Measures of AI Long-term Memory(AI长期记忆的评估指标)
7.2 Application of AI Long-term Memory(AI长期记忆的应用前景)
总结
1 Introduction
1 引言(Introduction)
核心观点:
人类记忆对AI的启发:
研究空白与本文贡献:
文章结构概览:
2 Research Background and Methodologies
2 研究背景与方法
总结
3 Long-term Memory in Human Brain
第三章:人脑中的长期记忆
图表与数据说明
重点总结
数学与算法要点
总结
4 Long-term Memory of AI: on Storage Formats
第4章:AI的长期记忆:存储形式
概述
4.1 非参数记忆(Non-Parametric Memory)
4.1.1 存储方式
4.1.2 检索方法
4.1.3 遗忘机制
4.2 参数记忆(Parametric Memory)
4.2.1 存储机制
4.2.2 检索机制
4.2.3 遗忘机制
4.3 总结
与人类长期记忆的相似性(见图5):
总体结论
5 Long-term Memory of AI: on Human Perspectives
5 人工智能的长期记忆:从人类视角出发
5.1 情景记忆(Episodic Memory)
5.2 语义记忆(Semantic Memory)
5.3 程序记忆(Procedural Memory)
5.4 总结(Summary)
6 A New Cognitive Architecture for Long-term Memory
6 面向长期记忆的新认知架构(A New Cognitive Architecture for Long-term Memory)
7 Next Steps of AI Long-term Memory
7 AI长期记忆的未来方向
小结
8 Conclusion
8 总结(Conclusion)
重点内容总结:
数学公式、算法与数据:
2501.00332_MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation
总结
From Moonlight
三句摘要
关键词
摘要
Abstract
核心内容讲解:
小结:
1 Introduction
背景与问题
解决方案:检索增强生成(RAG)
问题与挑战
提出的方法:MAIN-RAG
主要贡献
总结
2 Preliminaries
2.1 符号与目标(Notations and Objectives)
2.2 噪声检索文档的影响(Impact of Noisy Retrieval Documents)
2.3 相关工作(Related Works)
3 Multi-Agent Filtering RAG (MAIN-RAG)
3.1 MAIN-RAG 中 LLM 智能体的定义
3.2 相关性判断的量化
3.3 自适应判断阈值 τ_q
总结要点
4 Experiments
4.1 任务与数据集
4.2 基线模型
4.3 实验设置
4.4 定量分析(RQ1)
4.5 自适应判断阈值 τ_q 的消融实验(RQ2)
4.6 τ_q 的案例研究(RQ3)
总结
5 Conclusion and Future Work
主要结论
未来工作
总结
6 Limitations
实验范围的限制
环境影响的考量
总结
Appendix A Computation Infrastructure
附录A 计算基础设施
Appendix B Performance Comparison among MAIN-RAG and Its Variant Baselines
核心结论:
关键分析:
图表支持:
总结:
Appendix C System Instructions of Agent-1 (Predictor), Agent-2 (Judge), and Agent-3 (Final-Predictor)
Agent-1(预测器)的系统指令
Agent-2(评判器)的系统指令
Agent-3(最终预测器)的系统指令
图 11:三个 Agent 的系统指令图示
总结
Appendix D Case Studies of Different Adaptive Judge Bar τqsubscript𝜏𝑞\tau_{q}italic_τ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT in MAIN-RAG
案例研究 1(高 τq)
案例研究 2(低 τq)
案例研究 3(中等 τq)
图 12 - 15:不同数据集与 LLM 的案例对比
总结
2503.09149_MemVid: Memory-enhanced Retrieval Augmentation for Long Video Understanding
From Moonlight
三行摘要
关键词
摘要
Abstract
1. Introduction
1. 引言(Introduction)总结
贡献总结:
2. Related Work
2. Related Work
2.1. 大型视觉-语言模型(Large Vision-language Models)
2.2. 长视频视觉-语言模型(Long Large Vision-language Models)
2.3. 基于检索增强的视频理解(Retrieval-augmented Video Understanding)
3. Methodology
3. Methodology
总结
4. Experiments
4. 实验总结
4.1. 实验设置
4.2. 总体结果
4.3. 消融实验
4.4. 泛化性分析
4.5. 效率分析
4.6. 案例分析
总结
5. Conclusion
5. 结论
2505.02099_MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based Agents
From Moonlight
三句摘要
关键词
摘要
Abstract
研究背景
研究现状
本文贡献
项目开源
重点内容
总结
1. Introduction
1. 引言(Introduction)
记忆模块的重要性
现有研究的不足
MemEngine:一个统一且模块化的记忆库
总结
2. Comparison with Relevant Libraries
2. 与相关库的比较
已有库分类
MemEngine 的优势
对比表格详解(Table 1)
3. MemEngine Library
3. MemEngine Library
3.1. Overview(概述)
3.2. Memory Models(记忆模型)
3.3. Memory Operations(记忆操作)
3.4. Memory Functions(记忆功能)
3.5. Memory Configurations(记忆配置)
3.6. Memory Utilities(记忆工具)
总结
4. Usage of MemEngine
4. MemEngine 的使用方式
4.1 使用预实现的记忆模型
4.2 定制新的记忆模型
5. Conclusion
5. 结论
致谢
2505.11271_Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models
Abstract
重点内容强调
补充信息
1 Introduction
1.1 现代大语言模型(LLMs)的应用与挑战
1.2 链式流程中的中间输出与缓存机会
1.3 现有优化方法与语义缓存
1.4 语义缓存的应用场景
1.5 本文的贡献
1.6 实验结果与结论
1.7 实际意义与价值
2 Related Work
2 相关工作
2.1 提示缓存(Prompt Caching,基于KV的方法)
2.2 语义缓存(Semantic Caching)
2.3 其他缓存方法
2.4 本文方法与现有方法的比较
总结
3 System design and Methodology
3 系统设计与方法论
3.1 观察与系统设计
3.2 我们的语义缓存方法
4 Experimental setup
4 实验设计
4.1 模拟设计
4.2 数据集
4.3 问题之间的相似性
4.4 摘要
4.5 评估指标
5 Results and discussion
5 实验结果与讨论
5.1 检索方法的比较分析
5.2 延迟细节
5.3 不同相似度阈值与摘要长度的影响
5.4 选择相似度阈值:效用与缓存命中率的权衡
5.5 影响回答生成的因素
5.6 对现实系统的影响
5.7 挑战与限制
总结
6 Conclusion and future work
6 结论与未来工作
技术增强
可扩展性与实际部署
隐私问题
更广泛的应用
2505.13308_
Seek in the Dark
: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
总结
From Moonlight
三句摘要
关键词
摘要
Abstract
1 Introduction
1 引言(Introduction)
1.1 大型语言模型(LLMs)的推理挑战
1.2 现有改进方法及其局限性
1.3 提出的替代方法:测试时实例级适应(TTIA)
1.4 现有TTIA方法的局限性
1.5 本文贡献:LatentSeek 框架
1.6 实验结果与性能提升
总结:
2 Test-Time Instance-Level Policy Gradient in Latent Space
2 测试时实例级潜在空间策略梯度
2.1 问题定义:测试时实例级推理
2.2 潜在空间中的策略梯度推理
2.3 LatentSeek 算法
总结
3 Empirical Results
3. Empirical Results 总结
3.1 Experimental Setup(实验设置)
3.2 State-of-the-art Test-time Reasoning Performance(测试时推理性能)
3.3 Ideal Experiment: Perfect Sparse Reward Model(理想实验:完美稀疏奖励模型)
3.4 Test-Time Scaling: scaling up the iteration of LatentSeek(测试时扩展:增加 LatentSeek 迭代次数)
3.5 Algorithmic Statistics(算法统计)
3.6 Qualitative Analysis(定性分析)
总结
4 Related Work
4 相关工作(Related Work)总结
一、语言模型的推理能力(Reasoning in Language Models)
二、语言模型的强化学习(Reinforcement Learning for Language Models)
三、可控生成与测试时优化(Controllable Generation and Test-Time Optimization)
四、提示调优与软提示(Prompt Tuning and Soft Prompt)
总体总结:
5 Conclusion
5 结论
主要内容:
总结:
Acknowledgement
Acknowledgement(致谢)
Appendix A Discussion and future works
A. 讨论与未来工作
Reward Models(奖励模型)
Latent Optimization(潜在空间优化)
Large Base Model(大基础模型)
Appendix B Methods of Test-Time Instance-Level Reasoning
附录 B 测试时实例级推理方法
总结
Appendix C Theoretical Analysis
附录 C 理论分析总结
C.1 预备知识:多证明者交互证明与 NEXP
C.2 理论分析:独立更新
C.3 定理 C.10 与推论 C.11 的证明
总结
Appendix D Derivation of Policy Gradient
附录 D 策略梯度的推导
1. 初始目标函数
2. 对
\(\mathbf{z}\)
求梯度
3. 利用对数导数技巧
4. 利用策略的分解形式
5. 得到最终结果
总结
Appendix E Additional Experimental Results
附录 E:更多实验结果总结
总结
Appendix F Experimental Details
附录 F 实验细节总结
F.1 提示设计
F.2 模型主干
F.3 基线方法
F.4 GSM8K实验
数据集
实验细节
F.5 MATH-500实验
数据集
实验细节
F.6 AIME2024实验
数据集
实验细节
评估提示模板
计算量估计
Appendix G Detailed FLOPs Calculation
附录 G:详细 FLOPs 计算总结
G.1 前向传播 FLOPs 估算
G.2 Genius 方法的总 FLOPs
G.3 LatentSeek 方法的总 FLOPs
G.4 效率阈值分析
总结
Appendix H Qualitative Analysis and Case Studies
附录 H 定性分析与案例研究(Qualitative Analysis and Case Studies)
1. 生成序列的词云分析(Wordclouds of the First Three Words)
2. 案例研究(Case Studies)
关键发现总结
总结
Appendix I Computational Resources
附录I 计算资源
Appendix J The Use of Large Language Models (LLMs)
附录 J:大语言模型(LLMs)的使用
2506.22815_Memory as a Service (MaaS): Rethinking Contextual Memory as Service-Oriented Modules for Collaborative Agents
From Moonlight
三句摘要
关键词
摘要
Abstract
1 Introduction
2 Related Works
2.1 个体内部内存的持久性
2.2 跨实体内存共享
总结
3 MaaS: A Service-Oriented Memory Perspective
3.1 Core Principles: From Local State to Callable Service
3.2 The MaaS Architecture: Granting Public Service Capabilities to Private Memory
高层实现架构(High-Level Implementation)
4 MaaS Design Space and Application Scenarios
4.1 内部实体(Intra-Entity)
4.2 跨实体(Inter-Entity)
4.3 群体级(Group-Level)
总结
5 Open Research Agenda
5.1 公共维度带来的挑战:治理与协议(Challenges Arising from Public-Side: Governance and Protocols)
5.2 私有维度带来的挑战:安全与信任(Challenges Arising from Privacy-Side: Security and Trust)
5.3 交互涌现带来的挑战:生态系统与伦理(Challenges from Interaction Emergence: Ecosystem and Ethics)
总结
6 Conclusion: A Timely Perspective
2506.24019_Ella: Embodied Social Agents with Lifelong Memory
Abstract
1 Introduction
1 引言(Introduction)总结
研究背景与动机
本文的贡献与方法
本文核心贡献总结(重点内容)
总结
2 Related Work
2 相关工作
2.1 具身社交智能
2.2 智能体记忆
图2说明(Figure 2)
3 Problem Setting
3 问题设定
1. 智能体与社交群组
2. 智能体的初始知识
3. 模拟环境与交互机制
4. 控制评估与干预方式
总结
4 Ella: Embodied Lifelong Learning Agent
4 Ella: Embodied Lifelong Learning Agent
4.1 Name-centric Semantic Memory(名称中心语义记忆)
4.2 Spatiotemporal Episodic Memory(时空情景记忆)
4.3 Planning, Reaction, and Communication(规划、反应与通信)
总结
5 Experiments
5 实验结果总结
5.1 实验设置
5.2 实验结果
总结
6 Limitations
6 限制(Limitations)
Leverage the graph structure of the name-centric semantic memory.
Lifelong simulation of a community of agents in a visually rich, physics-realistic environment is computationally expensive.
All agents’ thinking processes are assumed to finish synchronously.
7 Conclusion
7 结论
Appendix A Broader Impact
Appendix A 更广泛的影响
Appendix B Additional Experiment Details
附录 B 实验附加细节总结
B.1 虚拟社区 (Virtual Community)
B.2 计算资源 (Compute)
总结要点:
Appendix C Additional Implementation Details
Appendix C Additional Implementation Details(附录C 额外的实现细节)
Appendix D Prompt Templates
Appendix D Prompt Templates
Figure 8: 生成日常计划的提示模板
Figure 9: 生成反应的提示模板
Figure 10: 生成语言输出的提示模板
Figure 11: 生成对话总结的提示模板
Figure 12: 从对话中提取知识的提示模板
总体说明
2507.10524_Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
总结
From Moonlight
三句摘要
关键词
摘要
Abstract
1 Introduction
背景与动机
递归 Transformer 与挑战
MoR:统一框架
概念与意义
贡献总结(Contributions)
总结
2 Method
2.1 Preliminary
2.2 Mixture-of-Recursions (MoR)
总结
3 Experiments
表格总结(Table 3)
3.1 主要结果
3.2 IsoFLOP 分析
3.3 推理吞吐量评估
总结
4 Ablation Studies
4.1 Parameter Sharing Strategies
4.2 Routing Strategies
4.3 KV Caching Strategies
总结
5 Analysis
5.1 Compute-optimal Scaling Analysis
5.2 Routing Analysis
5.3 Test-time Scaling Analysis
总结
6 Related Work
Recursive Transformers(递归Transformer)
Adaptive Computation(自适应计算)
Routing Mechanism(路由机制)
Key-value Caching(键值缓存)
Latent Reasoning(隐式推理)
7 Conclusion
7.1 局限性与未来工作
7.2 致谢
Appendix A Details of Design Choices for Mixture-of-Recursions
A.1 参数共享策略(Parameter-sharing Strategy)
A.2 路由策略(Routing Strategy)
A.3 KV 缓存策略(KV Caching Strategy)
总结
Appendix B Experimental Setup
训练设置
评估设置
模型架构细节
表6:模型架构参数总结(重点)
Appendix C Expanded Results of IsoFLOP Analysis
总体比较
Transformer的FLOPs近似计算
带检查点复用的梯形学习率调度
结果概览
Appendix D Details of Experimental Settings for Throughput Measurement
实验系统与评估方法
模型吞吐量对比设置
批处理设置
实现细节
Appendix E Expanded Results of Parameter Sharing Strategy
Middle-Cycle 是最稳定的选择
持续预训练(up-training)下的表现
Appendix F Expanded Results of Design Choices for Router
F.1 设计配置细节
F.2 路由器性能评估指标
F.3 路由器设计的扩展评估结果
Appendix G Expanded Results of KV Cache Sharing Mechanism
G.1 递归 Transformer 中的关键值表示趋势
G.2 KV 缓存共享策略的性能比较
总结
Appendix H Expanded Qualitative Results
H.1 Analysis on Adaptive Computation Paths
H.2 Analysis on Router Weights
总结
2509.08151_Trust Semantics Distillation for Collaborator Selection via Memory-Augmented Agentic AI
第一章:引言(Introduction)
第二章:相关工作(Related Work)
第三章:方法论(Methodology)
3.1 信任语义建模(Trust Semantics Modeling)
3.2 记忆增强智能体架构(Memory-Augmented Agentic Architecture)
3.3 信任语义蒸馏(Trust Semantics Distillation)
第四章:实验与评估(Experiments and Evaluation)
第五章:讨论(Discussion)
第六章:结论与未来工作(Conclusion and Future Work)
Abstract
摘要(Abstract)总结:
核心问题:
解决方案:
实验结果:
重点内容:
非重点内容(简略):
I Introduction
I Introduction(引言)
背景与动机
协作伙伴选择的关键性
信任评估的挑战
本文贡献
II Agentic AI-Aided Teacher-Student Architecture for Trust Semantics Evaluation
II 基于智能体AI的师生架构用于信任语义评估
II-A LAM驱动的智能体AI用于信任语义评估
II-B 师生代理架构
图表说明
总结
III Task-Specific Trust Semantics Distillation
III 任务特定信任语义蒸馏(Task-Specific Trust Semantics Distillation)
总结
IV Experimental Analysis
IV 实验分析(总结)
1. 协作者评估时间(Collaborator Evaluation Time)
2. 数据收集次数(Number of Data Collections)
3. 协作者选择准确性(Collaborator Selection Accuracy)
总结
V Future Directions
V 未来方向(Future Directions)
• 环境变化对动态信任的影响评估与缓解(Evaluation and Mitigation of Environmental Changes on Dynamic Trust)
• 预测性信任提炼(Predictive Trust Distillation)
• 数据缺失场景下的信任语义提取(Trust Semantics Extraction Under Data-Missing Scenarios)
VI Conclusion
VI 结论
核心内容讲解:
研究优势与创新点(重点内容):
总结:
2510.26493_The Context of Context Engineering
总结
From Moonlight
三句摘要
关键词
摘要
Abstract
1 Introduction
背景与问题提出
上下文工程的兴起
对上下文工程的误解与历史回顾
核心观点:上下文工程是“熵减”过程
上下文工程的演进阶段
论文贡献
后续章节安排
2 Theoretical Framework
2.1 形式化定义(Formal Definition)
2.2 阶段划分(Stage Characterization)
总结
3 Historical Evolution
3.1 超过20年前:1.0时代
3.2 20年后:2.0时代
4 Context Collection and Storage
设计考虑(Design Considerations)
基本设计原则
4.1 典型策略(Era 1.0 和 Era 2.0)
4.2 人类级上下文生态系统(Era 3.0)
总结
5 Context Management
5.1 文本上下文处理
5.2 多模态上下文处理
5.3 上下文组织
5.4 上下文抽象
6 Context Usage
6.1 系统内上下文共享
6.2 跨系统上下文共享
6.3 上下文选择与理解
6.4 主动用户需求推断
6.5 终身上下文的保存与更新
6.6 新兴工程实践
7 Applications
7.1 命令行工具(CLI)
7.2 深度研究(Deep Research)
7.3 脑机接口(Brain-Computer Interfaces)
8 Challenges and Future Directions
情境收集仍受限且效率低下
大规模情境的存储与管理
模型对情境的理解能力有限
长文本情境处理的性能瓶颈
相关情境的筛选问题
数字存在(Digital Presence)
总结
9 Conclusion
2511.21689_ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
总结
From Blog
From Moonlight
三句摘要
关键词
摘要
Abstract
核心内容讲解:
1 Introduction
核心问题与背景
现有方法的局限
协调范式的核心思想
协调器的实现挑战
ToolOrchestra 方法概述
实验结果与贡献
主要贡献总结
2 Agentic Problem Formulation
2.1 任务建模
2.2 多轮交互流程
总结
3 ToolOrchestra
3.1 统一工具调用接口
3.2 端到端代理强化学习
3.3 数据合成
总结
4 Experimental Setting
4.1 工具(Tools)
4.2 基线模型(Baselines)
4.3 评估配置(Evaluation Configuration)
4.4 训练配置(Training Configuration)
表格 1:Orchestrator-8B 与基线模型对比
总结
5 Experimental Results
1.
基线方法表现不佳
2.
工具与模型结合提升性能
3.
Orchestrator-8B 表现突出
4.
关键优势
5.
结论
6 Analysis
6.1 工具使用分析(Tool Use Analysis)
6.2 成本分析(Cost Analysis)
6.3 泛化能力(Generalization)
6.4 用户偏好(User Preferences)
总结
7 Related Work
7.1 从工具学习到通用智能体(From Tool Learning to Generalist Agents)
7.2 从工具使用的准确性到效率与可控性(From Tool-Use Accuracy to Efficiency and Controllability)
总结
8 Conclusion
主要内容总结:
1. 方法概述
2. 核心贡献
3. 实验结果
4. 未来展望
重点内容强调:
数学/算法相关说明:
总结:
Appendix A Pilot Study
实验设置
实验结果
关键结论
重点强调
Appendix B Evaluation Benchmarks
1. Humanity’s Last Exam (HLE)
2. FRAMES
3. τ²-Bench(τ²-Bench)
Appendix C Model description for Qwen3-32B
数学与定量推理
科学领域知识
逻辑推理能力
人文学科知识
编程与函数调用能力
总体评价
Appendix D Tools in training
• Query Writer(查询生成器)
• Web Search(网络搜索)
• Local Search(本地搜索)
• Code Writer + Interpreter(代码生成与执行)
• Math Models(数学模型)
• Generalist Models(通用模型)
总结
Appendix E Third-party API
总结说明:
Appendix F Humane preference example
内容总结:
工具列表(Tools)
偏好指令(Preference instruction, PIPI)
偏好向量(Preference vector, PP)
总结
Appendix G Use of LLMs Disclosure
Appendix H Generalization of pricing configurations
实验设置与方法
实验结果
结论
Appendix I Data Synthesis
Appendix J Breakdown of ToolScale
重点分析:
总结:
Appendix K Data synthesis prompts and examples
表6:生成领域主题的模型提示
表7:生成数据库模式的模型提示
表8:生成数据库条目的模型提示
表9:验证数据库条目的模型提示
表10:生成函数的模型提示
表11:生成意图的模型提示
表12:生成任务的模型提示
表13:演化任务的模型提示
表14:数据库模式示例
Appendix L Calculation of rewards for preference-aware benchmark
附录 L 偏好感知基准奖励的计算
奖励计算方法
表格分析
总结
其他
数据集&数据蒸馏
1811.10959v3_Dataset Distillation
ABSTRACT
LLM总结
1. INTRODUCTION
3. APPROACH
2502.20653_Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
Abstract
1. Introduction
2. Related Work
7. Conclusion
通用
Dataset distillation
3D
2003.08934_NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Abstract
1. Introduction
2. Related Work
3. Neural Radiance Field Scene Representation
4. Volume Rendering with Radiance Fields
5. Optimizing a Neural Radiance Field
6. Result
7. Conclusion
2203.08586: Deep vanishing point detection: Geometric priors make dataset variations vanish
概念
Abstract
1. Introduction
2. Related Work
3. Geometric priors for VP detection
4. Experiments
5. Conclusion and limitations
2312.14132_DUSt3R: Geometric 3D Vision Made Easy
关键词
相关概念
Abstract
1. Introduction
2. Related Work
3. Method
4. Experiments with DUSt3R
5. Conclusion
Appendix A
附录概览
Appendix B. Qualitative results
Appendix C. Extended Related Work
Appendix D. 多视角姿态估计(Multi-view Pose Estimation)
Appendix E. 视觉定位(Visual Localization)
Appendix F. Training details
2406.09756_MASt3R: Grounding Image Matching in 3D with MASt3R
前言
Abstract
1. Introduction
🧠 思维导图式总结
2. Related works
🧠 总结思维导图
3. Method
4. Experimental results
5. Conclusion
Appendix
Appendix A Additional Qualitative Results
B. Fast Reciprocal Matching
C. Coarse-to-Fine
D. Detailed experimental settings
2412.09401_SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos
术语
Abstract
1. Introduction
2. Related Work
3. Method
4. Experiments
5. Conclusion
6. 致谢
Appendix
Appendix A Implementation details
Appendix B Details for experimental settings
Appendix C Additional comparisons and analyses
D. More visual results
2412.12392_MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
GPT
先验知识
Abstract
1. Introduction
2. Related Work
3. Method
4. Results
5. Limitations and Future Work(局限与未来工作)
🧾 6. Conclusion(总结)
🧠 总结一句话版:
8. Initialisation(初始化)
9. Runtime Breakdown(运行时分析)
10. Evaluation Setup(评估设置)
11. EuRoC 结果总结
2503.11651_VGGT: Visual Geometry Grounded Transformer
Abstract
1. Introduction
2. Related Work
3. Method
4. Experiments
5. Discussions
6. Conclusions
Appendix A Formal Definitions
Appendix B Implementation Details
Appendix C Additional Experiments
Appendix D Qualitative Examples
Appendix E Related Work
其他
2204.00598_SocraticModels: Composing Zero-Shot Multimodal Reasoning with Language
总结
Abstract
1 Introduction
2 Problem Setting, Background, and Related Work
3 Socratic Models
4 Evaluation: Methods and Results
5 Applications: Methods and Demonstrations
6 Discussion
Acknowledgments and Disclosure of Funding
Appendix A Overview
Appendix B Unsupervised Socratic Model Selection
Appendix C Additional Notes on Experiments
Appendix D Egocentric Perception Appendix
Appendix E Scaling Up Socratic Video Search
Appendix F Additional Notes on Robot Experiments
Appendix G Socratic Deductive Reasoning
Appendix H Broader Impact: Energy and Resource Consumption
A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
The Basic Idea Behind CRC Algorithms
Polynomical Arithmetic
Binary Arithmetic with No Carries
一个可用的实例
Choosing A Poly
A Straightforward CRC Implementation
A Table-Driven Implementation
A Slightly Mangled Table-Driven Implementation
参考
Distributed Representations of Sentences and Documents
新溪-gordon
Docs
»
LLM 周边技术
»
2403.03507_GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
View page source
主页
索引
模块索引
搜索页面
2403.03507_GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
¶
GitHub:
https://github.com/jiaweizzhao/GaLore
https://arxiv.org/abs/2403.03507
主页
索引
模块索引
搜索页面