新溪-gordon
V2026.03
  • 通用定义
    • 评测标准
      • 常用评测标准
      • 准确率(Accuracy)
      • 精确率(Precision, 精准率)
      • 召回率(Recall)
      • F1 Score
      • 可视化精度和召回率
        • 混淆矩阵(confusion matrix)
        • 受试者特征曲线(ROC 曲线,Receiver Operating Characteristic curve)
      • Recall@k
        • 核心思想一句话概括
        • 公式与计算
        • 举例说明
        • 为什么需要 Recall@k?
        • 重要特性和注意事项
        • 总结
      • Precision@k
        • 核心思想一句话概括
        • 公式与计算
        • 举例说明
        • 为什么需要 Precision@k?
        • 重要特性和注意事项
        • 总结
      • HR@k
        • 一、核心定义:什么是 HR@N?
        • 二、计算公式
        • 三、举例说明
        • 四、指标的特点与解读
        • 五、与其他指标的关系和对比
        • 六、典型应用场景
        • 总结
      • NDCG@k
        • 核心思想一句话概括
        • 从 CG 到 DCG 再到 NDCG
        • 计算步骤举例
        • 为什么 NDCG@k 如此重要?
        • 总结
      • MRR@k
        • 一、核心概念:什么是 MRR@K?
        • 二、为什么需要 MRR@K?
        • 三、如何计算 MRR@K?
        • 四、举例说明
        • 五、MRR@K 的特点和注意事项
        • 六、与其他指标的区别
        • 为什么 MRR@10 常和 Recall@1000 一起使用?
        • 总结
        • 总结
      • MAP@k
        • 一句话理解
        • 拆解 acronym (首字母缩略词)
        • 通过一个例子彻底搞懂
        • 为什么MAP@K如此重要?
        • 总结
      • AUC (Area Under the ROC Curve)
        • 为什么要用AUC?
        • 详细拆解:
      • LogLoss(Logarithmic Loss)
        • 为什么要用LogLoss?
        • 详细拆解:
        • 总结与对比
        • 在实际业务中如何看?
      • Jaccard 相似系数
        • 一、是什么?
        • 二、计算公式
        • 三、核心性质
        • 四、一个简单的例子
        • 五、Jaccard 距离
        • 六、主要应用场景
        • 七、优缺点
        • 总结
      • PASS@k
        • 一、定义直观解释
        • 二、数学定义
        • 三、为什么有用
        • 五、总结一句话
    • 通用记忆
      • 总结与展望
      • 记忆类型
        • 短期记忆(Short-Term Memory)
        • 长期记忆
        • 情节记忆(Episodic Memory)
        • 语义记忆(Semantic Memory)
        • 工作记忆(Working Memory)
        • 程序性记忆(Procedural Memory)
        • 感觉记忆(Sensory Memory)
        • 图示
        • 长记忆的必要性与挑战
        • 参考
      • 【定义】Cattell–Horn–Carroll理论
        • 背景:核心内容与演变
        • 三层层级系统
        • CHC 理论的意义
        • 总结
      • Reciprocal Rank Fusion (RRF) 算法
        • 公式
        • 计算步骤
        • 优点与缺点
        • 应用场景
        • 总结
    • 通用概念
      • Ralph Loop
        • 核心思想:把失败变成数据,让AI自己搞定一切
        • 它是如何工作的?
        • 核心优势
        • 产业影响与未来
  • 综述论文
    • 近邻搜索
      • 2508.09834❇️_Overview_LLM: Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Linear Sequence Modeling
        • 3 Sparse Sequence Modeling
        • 4 Efficient Full Attention
        • 5 Sparse Mixture-of-Experts
        • 6 Hybrid Architectures
        • 7 Diffusion Large Language Models
        • 8 Applications to Other Modalities
        • 9 Conclusion and Future Directions
  • 评测基准
    • 评测基准
      • 02xx.xxxxx_BLEU: a Method for Automatic Evaluation of Machine Translation
        • 总结
        • Abstract
        • 示例讲解
        • 1. Introduction
        • 2.The Baseline BLEU Metric
        • 3.The BLEU Evaluation
        • 4.The Human Evaluation
        • 5.BLEU vs The Human Evaluation
        • 6.Conclusion
      • 0401.xxxxx_ROUGE: A Package for Automatic Evaluation of Summaries
        • 总结
        • Abstract
        • 1.Introduction
        • 2.ROUGE-N: N-gram Co-Occurrence Statistics
        • 3.ROUGE-L: Longest Common Subsequence
        • 4 ROUGE-W: Weighted Longest Common Subsequence
        • 5.ROUGE-S: Skip-Bigram Co-Occurrence Statistics
        • 6 Evaluations of ROUGE
        • 7 Conclusions
      • 1803.01937_ROUGE2.0: Updated and Improved Measures for Evaluation of Summarization Tasks
        • Abstract
        • 1. Problems with the current ROUGE measures
        • 2. ROUGE 2.0
      • 1804.08771_SacreBLEU: A Call for Clarity in Reporting BLEU Scores
        • BLEU
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Problem Description
        • 3 A way forward
        • 4 Summary
      • 2303.08896_SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Background and Related Work
        • 3 Grey-Box Factuality Assessment
        • 4 Black-Box Factuality Assessment
        • 5 SelfCheckGPT
        • 6 Data and Annotation
        • 7 Experiments
        • 8 Conclusions
        • Limitations
        • Ethics Statement
        • Acknowledgments
        • Appendix A Models and Implementation
        • Appendix B SelfCheckGPT with QA
        • Appendix C SelfCheckGPT with Prompt
        • Appendix D Additional Experimental Results
      • 2306.05685_Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 MT-Bench and Chatbot Arena
        • 3 LLM as a Judge
        • 4 Agreement Evaluation
        • 5 Human Preference Benchmark and Standardized Benchmark
        • 6 Discussion
        • 7 Conclusion
        • Appendix A Prompt templates
        • Appendix B Case Study
        • Appendix C Data Collection
        • Appendix D Additional Experimental Results
        • Appendix E Training Details of Vicuna Models
        • Appendix F Exploring Vicuna as a judge
      • 2403.04132_Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 2 相关工作(Related Work)
        • 3 Human Preference Data Collection
        • 3 人类偏好数据收集
        • 4 From Pairwise Comparisons to Rankings
        • 5 Efficient Approximate Ranking
        • 6 Data Analysis
        • 7 Experiments
        • 7 实验
        • 8 Discussion
        • 8 讨论
        • 9 Conclusion
        • 9 结论
        • Acknowledgments
        • 致谢
        • Appendix A Confidence Interval Simulation Study
        • 附录 A 置信区间模拟研究
        • Appendix B The Nonparametric Bradley-Terry Model
        • 附录 B:非参数 Bradley-Terry 模型
        • Appendix C Valid P-Value
        • 1. p值的定义
        • 2. p值的等价表达式
        • 3. 有效性证明的关键步骤
        • 4. 证明结论
        • 总结
        • Appendix D Sample Prompts
      • 2404.04475_AlpacaEval LC: A Simple Way to Debias Automatic Evaluators
        • 总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 Background and Problem Setting
        • 3 Length-Controlled AlpacaEval
        • 4 Results
        • 5 Discussion
      • 2511.03506_HaluMem: Evaluating Hallucinations in Memory Systems of Agents
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Problem Definition
        • 4 Methodology for Constructing HaluMem
        • 5 Evaluation Framework of HaluMem
        • 6 Experiments
        • 7 Conclusion
        • Appendix A Supplementary Details of HaluMem
        • Appendix B Special Configurations for Some Memory Systems
        • Appendix C Annotation Guidelines and Instructions
        • Appendix D Prompts
        • Appendix E Examples from the Process of Constructing HaluMem
    • 数据集-Agent
      • 2308.03688_AgentBench: Evaluating LLMs as Agents
        • 总结
        • From Deepseek
        • 数据集示例
      • 2312.14033_T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
        • 总结
        • Abstract
        • 1 Introduction
        • 2 T-Eval
        • 3 Experiments
        • 4 Discussion
        • 5 Related Work
        • 6 Conclusion
        • Appendix A T-Eval Benchmark Details
        • Appendix B Implementation Details
        • Appendix C Detailed Evaluation Metrics
        • Appendix D API Documentation
      • 2406.12045_τ-bench: A Benchmark for Tool-Agent-User
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.τ-bench: A benchmark for T ool-A gent-U ser Interaction
        • 4. Benchmark Construction
        • 5.Experiments
        • 6.Disscussion
      • 2506.07982_𝜏²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 \(\tau^{2}\)-bench: Evaluating Agents in a Dual-Control Environment
        • 4 Experiments
        • 5 Conclusion
        • Broader Impact
        • Appendix
        • Appendix A Telecom Domain
        • Appendix B Verifying Original \(\tau^{2}\)-bench
        • Appendix C Prompts
        • Appendix D Domain Policies
        • Appendix E User Simulator Quality
    • 数据集-QA
      • 2109.07958_TruthfulQA: Measuring How Models Mimic Human Falsehoods
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 The TruthfulQA Benchmark
        • 3 Experiments
        • 4 Results
        • 5 Discussion
        • 6 Related Work
        • 7 Conclusion
        • 8 Ethics and Impact
        • Appendix A Additional examples from TruthfulQA
        • Appendix B Additional results
        • Appendix C Dataset construction
        • Appendix D Human evaluations
        • Appendix E Prompts
        • Appendix F Checking for data quality and disagreement
      • 2311.12022_GPQA: A Graduate-Level Google-Proof Q&A Benchmark
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Data Collection
        • 3.Dataset Analysis
        • 4.Baseline
        • 5.Related Work
        • 6.Limitations
        • 7.Conclusion
      • 2411.04368_SimpleQA: Measuring short-form factuality in large language models
        • Abstract
        • 1.Introduction
        • 2.Data Collection and Verification
        • 4.Measuring calibration
        • Appendix B Guessing strategy and F-score
    • 数据集-长文本
      • 2308.14508_LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
        • From Deepseek
      • 2402.05136_LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3 LV-Eval Benchmark
        • 4 Evaluation
        • Appendix
        • Appendix C Detailed Evaluation Results
        • Appendix D Detailed Ablation Results
      • 2404.06654_RULER: What’s the Real Context Size of Your Long-Context Language Models?
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 The Ruler Benchmark
        • 4 Experiments & Results
        • 5 Task Error Analysis
        • 6 Model Analysis
        • 7 Conclusion
        • 8 Limitations
        • Appendix A Models
        • Appendix B Task Configurations
        • Appendix C Task Correlation Analysis
        • Appendix D Prompt Templates
        • Appendix E Passkey Retrieval and Vanilla NIAH Results
        • Appendix F Additional Results
      • 2407.11963_NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Tasks and Datasets
        • 4 Experiments
        • 4.1.5 Impact of Language_ Which Model Performs Better under the Bilingual Scenario_
        • 5 Conclusion and Future Work
        • Appendix A Evaluated Models
        • Appendix B NeedleBench Prompt Examples
        • Appendix C Error Analysis Examples
    • 数据集-RAG
      • 1809.09600_HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Data Collection
        • 3 Processing and Benchmark Settings
        • 4 Dataset Analysis
        • 5 Experiments
        • 6 Related Work
        • 7 Conclusions
        • Appendix A Data Collection Details
        • 附录A 数据收集细节
        • Appendix B Further Data Analysis
        • Appendix C Full Wiki Setting Details
      • 2401.15391_MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
        • 总结
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 RAG with multi-Hop queries
        • 3 A Benchmarking Dataset: MultiHop-RAG
        • 4 Benchmarking RAG system using MultiHop-RAG
        • 5 Related Work
        • 6 Conclusion
        • Limitations
        • Appendix A Appendix A: GPT-4 Prompts Used for Data Generation
        • Appendix B: Dataset Examples
    • 数据集-图
      • 2402.07630_G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering
        • 总结
        • 示例讲解
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Formalization
        • 4 Proposed GraphQA Benchmark
        • 5 G-Retriever
        • 6 Experiments
        • 7 Conclusion
        • Acknowledgment
        • Appendix A Impact Statements
        • Appendix B Experiment
        • Appendix C GraphQA Benchmark
        • Appendix D Graph Retrieval-Augmented Generation (GraphRAG)
        • Appendix E Discussion on the Complexity
        • 附录E 复杂性讨论总结
        • Appendix F Hallucination in Graph LLMs
        • Appendix G Demonstrations
    • 数据集-编程
      • 2107.03374_HumanEval: Evaluating Large Language Models Trained on Code
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Evaluation Framework
        • 3.Code Fine-Tuning
        • 4.Supervised Fine-Tuning
        • 5.Docstring Generation
        • 6.Limitations
        • 7.Broader Impacts and Hazard Analysis
        • 8.Related Work
        • 9.Conclusions
      • 2108.07732_MBPP: Program Synthesis with Large Language Models
        • Abstract
        • 1 Introduction
        • 2 Datasets
        • 3 Model and Methods
        • 4 MBPP Synthesis Results
        • 5 Human-Model Collaboration Results
        • 6 Program Execution Results
        • 7 MathQA Results
        • 8 Related Work
        • 9 Risks and Limitations
        • 10 Conclusion
        • Appendix A Appendix
      • 2310.06770_SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 SWE-bench
        • 3 SWE-Llama: Fine-tuning CodeLlama for SWE-bench
        • 4 Experimental Setup
        • 5 Results
        • 6 Related Work
        • 7 Discussion
        • 8 Ethics Statement
        • 9 Reproducibility Statement
        • Appendix
        • Appendix A Benchmark Details
        • Appendix B Additional Details on Training SWE-Llama
        • Appendix C Additional Results
        • Appendix D Additional Experimental Details
        • Appendix E Societal Impact
        • Appendix F In-depth Analysis of SWE-Llama Generations
      • 2402.16694_HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
        • A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
        • Abstract
        • 1.   Introduction
        • 2.   Related work
        • 3.   HumanEval-XL
        • 4.   Experiments
        • 5.   Conclusion
        • Acknowledgments
        • Appendix A Experiment Settings
        • Appendix B Comprehensive Experiment Results
      • 2403.07974_LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
        • 总结
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 Holistic Evaluation
        • 3 Benchmark Curation
        • 4 Experiment Setup
        • 5 Results
        • 6 Related Work
        • 7 Limitations
        • 8 Conclusion
        • Appendix A Dataset
        • Appendix B UI
        • Appendix C Experimental Setup
        • Appendix D Results
        • Appendix E Qualitative Examples
      • 2407.10499_CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Works
        • 3 CIBench
        • 4 Experiments
        • 5 Conclusion
        • Appendix A Dataset Details
        • Appendix B Construction Prompts and Rules
        • Appendix C Experiment Example Demo
        • Appendix D Subjective Visualization Evaluation
        • Appendix E Dataset Error Analysis
        • Appendix F Human Annotator
        • Appendix G Ethical Consideration
      • 2410.03859_SWE-bench-Multimodal: Do AI Systems Generalize to Visual Software Domains?
        • 总结
        • Abstract
        • 1 Introduction
        • 2 SWE-bench Multimodal
        • 3 Evaluating on SWE-bench M
        • 4 Results
        • 5 Related Work
        • 6 Conclusion
        • Appendix A Dataset
        • Appendix B Collection
        • Appendix C Experiments
        • Appendix D Human Validation
        • Appendix E Limitations
      • 2410.06992_SWE-Bench+: Enhanced Coding Benchmark for LLMs
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Robustness Analysis of SWE-Bench
        • 3 Building SWE-Bench+
        • 4 Robustness of SWE-Bench+
        • 5 Effectiveness-aware Evaluation
        • 6 Related Work
        • 7 Conclusion
      • 2501.01257_CodeForces: Benchmarking Competition-level Code Generation of LLMs on CodeForces
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 CodeForces Benchmark
        • 4 Evaluation on Existing LLMs
        • 5 Analysis Experiments
        • 6 Discussion
        • 7 Conclusion
        • 8 Ethical Statement
        • Appendix A Model Cards
        • Appendix B Decoding Hyperparameters
        • Appendix C Analysis of Our Elo Rating Calculation System
        • Appendix D Human-comparable Elo Rating
        • Appendix E Problem Demonstration
        • Appendix F Special Judge
    • 数据集-数学
      • 2103.03874_MATH: Measuring Mathematical Problem Solving With the MATH Dataset
      • 2110.14168_GSM8K: Training Verifiers to Solve Math Word Problems
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Dataset
        • 3 Related Work
        • 4 Methods
        • 5 Additional Experiments
        • 6 Conclusion
        • Appendix A Dataset Details
        • Appendix B Hyperparameters
        • Appendix C Calculator Annotations
        • Appendix D Example Model Solutions
        • Appendix E Verifier Details
        • Appendix F Verifier Visualization
      • 2405.12209_MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
        • Abstract
        • 1 Introduction
        • 2 Methodology
        • 3 Experiments and Analysis
        • 4 Discussion
        • 5 Related Work
        • 6 Conclusion
        • 7 Limitations
        • 8 Ethical Considerations
        • Appendix A MathBench Statistics
        • Appendix B Detailed Experimental Results
        • Appendix C Extra Analysis
    • 数据集-图片
      • 2306.13394_MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 MME Evaluation Suite
        • 3 Experiments
        • 4 Analysis
        • 5 Conclusion
      • 2307.06281_MMBench: Is Your Multi-modal Model an All-around Player?
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 The construction of MMBench
        • 4 Evaluation Strategy
        • 5 Evaluation Results
        • 6 Conclusion
        • Appendix A More Details about the Data
        • Appendix B More Details on MMBench Construction
        • Appendix C More Details on LLM-based Choice Extraction
        • Appendix D Evaluation Settings and Results
      • 2307.16125_SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 SEED-Bench
        • 4 Evaluation Results
        • 5 Conclusion
      • 2311.12793_ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 ShareGPT4V Dataset
        • 4 ShareGPT4V-7B Model
        • 4.1 模型架构
        • 4.2 预训练
        • 4.3 监督微调(SFT)
        • 总结
        • 5 Experiments
        • 6 Conclusion
        • Appendix A Data Sources
        • Appendix B Caption Analysis
        • Appendix C Prompts
        • Appendix D Examples
      • 2506.18095_ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
        • 总结
        • Abstract
        • 1 Introduction
        • 2 ShareGPT-4o-Image
        • 3 Janus-4o: Fine-Tuning with ShareGPT-4o-Image
        • 4 Experiments
        • 5 conclusion
        • Appendix A Related Work
        • Appendix B Image Generation Categories
        • Appendix C Prompts for Generation
        • Appendix D Document Pipeline
        • Appendix E Ethical Considerations and Societal Impact
    • 数据集
      • 1804.07461_GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 2 相关工作总结
        • 3 Tasks
        • 3.1 Single-Sentence Tasks
        • 3.2 Similarity and Paraphrase Tasks
        • 3.3 Inference Tasks
        • 3.4 Evaluation
        • 4 Diagnostic Dataset
        • 4 诊断数据集(Diagnostic Dataset)
        • 5 Baselines
        • 5 Baselines 总结
        • 6 Benchmark Results
        • 6 Benchmark Results(基准测试结果)
        • 7 Analysis
        • 8 Conclusion
        • 8 结论
        • Acknowledgments
        • 致谢
        • Appendix A Additional Benchmark Details
        • Appendix B Additional Baseline Details
        • Appendix B Additional Baseline Details(附录B 其他基线细节)
        • Appendix C Development Set Results
        • Appendix C Development Set Results 总结
        • Appendix D Benchmark Website Details
        • Appendix D Benchmark Website Details(附录 D 基准网站详情)
        • Appendix E Additional Diagnostic Data Details
        • 附录 E:额外的诊断数据细节
        • 总结
      • 2009.03300_MMLU: Measuring Massive Multitask Language Understanding
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.A Multitask Test
        • 4.Experiments
        • 5.Discussion
        • 6.Conclusion
      • 2305.08322_C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
        • 总结
        • C-Eval_ A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
        • Abstract
        • 1 Introduction
        • 2 The C-Eval Evaluation Suite
        • 3 Experiment
        • 4 Related Work
        • 5 Discussion
        • Acknowledgement
        • Appendix A Author Contributions
        • Appendix B Detailed Stats of C-Eval
        • Appendix C Explanation Data Generation
        • Appendix D Evaluation Prompts
        • Appendix E Details of the models being evaluated
        • Appendix F Breakdown of Model Performance
        • Appendix G Option Bias
        • Appendix H Compute and Resources Used for Evaluation
      • 2306.09212_CMMLU: Measuring massive multitask language understanding in Chinese
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 CMMLU
        • 4 Experiments
        • Impact of model size on performance
        • 5 Conclusion
        • Appendix A Comparison to concurrent benchmarks
        • Appendix B CMMLU Subjects
        • Appendix C CMMLU Examples
        • Appendix D CMMLU Difficulty Distribution
        • Appendix E Emergent Ability shown in CMMLU subjects
        • Appendix F Models being Evaluated
        • Appendix G Strategies for Estimating Model Choices
        • Appendix H Regular expressions matching algorithmsl
        • Appendix I Correlation to other Benchmarks
        • Appendix J Breakdown of Model Performance
        • J.3 The effect of chain-of-thought prompt
      • 2307.15020_SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 SuperCLUE Benchmark
        • 4 Experiments
        • 5 Additional Analysis
        • 6 Conclusion
        • Appendix A Evaluation Process
        • Appendix B Capability Categories
      • 2311.12983_GAIA: a benchmark for General AI Assistants
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Related work
        • 3.GAIA
        • 4.LLMs results on GAIA
        • 5.Discussion
        • 6.Limitations
        • Appendix A Extended related work
        • Appendix C Extended description of GAIA
        • Appendix D Extended description of our question design framework
      • 2311.18743_AlignBench: Benchmarking Chinese Alignment of Large Language Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Dataset
        • 3 Methods
        • 4 Human Evaluation on AlignBench
        • 5 AlignBench: Benchmarking Results
        • 6 Related Work
        • 7 Conclusion
        • Appendix A Appendix
      • 2404.07972_OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
        • 总结
        • Abstract
        • 1. Introduction
        • 2. OSWORLD Environment
        • 3. OSWORLD Benchmark
        • 4. Benchmarking LLM and VLM Agent Baselines
        • 5. Analysis
        • 6. Related Work
        • 7. Conclusion and Future Work
        • A. Details of OSWORLD Environment
        • C. Details of Baseline Methods
        • D. Examples of Qualitative Analysis
      • 2406.04770_WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
        • 总结
        • Abstract
        • 1 Introduction
        • 2 WildBench Data Curation
        • 3 Automatic Evaluation with WildBench
        • 4 Results & Analysis
        • 5 Related Works
        • 6 Conclusion and Future Directions
        • Appendix A Task Categories
        • Appendix B More Information on WildBench Data
        • Appendix C More Information on WildBench Evaluation
        • Appendix D Prompt Template for Pairwise Evaluation Metric WB-Reward
        • Appendix E Prompt Template for Individual Evaluation Metric WB-Score
        • Appendix F Full WildBench Leaderboard
      • 2501.14249_HLE: Humanity’s Last Exam
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.Dataset
        • 4.Evaluation
        • 5.Discussion
  • 记忆
    • 综述
      • 2404.13501_LLM_Agent_Memory_Survey: A Survey on the Memory Mechanism of Large Language Model based Agents
        • 总结
        • 别人的总结
        • Abstract
        • 1 Introduction
        • 2 Related Surveys
        • 3 What is the Memory of LLM-based Agent
        • 4 Why We Need the Memory in LLM-based Agent
        • 5 How to Implement the Memory of LLM-based Agent
        • 5.1 Memory Sources(记忆来源)
        • 5.2 Memory Forms(记忆形式)
        • 5.3 Memory Operations(记忆操作)
        • 6 How to Evaluate the Memory in LLM-based Agent
        • 7 Memory-enhanced Agent Applications
        • 8 Limitations & Future Directions
        • 9 Conclusion
        • 9 结论
        • Acknowledgement
        • 致谢
      • 2505.00675_❇️Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
        • 总结
        • 总结2
        • Abstract
        • 1 Introduction
        • 2 Memory Foundations
        • 3 From Operations to Key Research Topics
        • 4 Memory In Practice
        • 总体总结:
        • 5 Memory in Humans and AI Systems
        • 6 Open Challenges and Future Directions
        • Appendix A GPT-based Pipeline Selection
        • Appendix B Relative Citation Index
        • Appendix C Chord Analysis of Interactions Among Memory Types, Operations, Topics, and Venues
      • 2507.21046_A Survey of Self-Evolving Agents What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
        • 总结
        • 图解
        • 关键点收集
      • 2512.13564❇️_MemorySurvey: Memory in the Age of AI Agents: A Survey
        • 总结
        • From Moonlight
        • Abstract
        • 1. Introduction
        • 2. Preliminaries: Formalizing Agents and Memory
        • 3. Form: What Carries Memory?
        • 4. Functions: Why Agents Need Memory?
        • 5. Dynamics: How Memory Operates and Evolves?
        • 6. Resources and Frameworks
        • 7. Positions and Frontiers
        • 8. Conclusion
      • Survey on AI Memory: Theories, Taxonomies, Evaluations, and Emerging Trends
        • 总结
        • From Moonlight
    • 上下文
      • 2510.04618_ACE: Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
        • 总结
        • From Moonlight
      • 附录
        • GENERATOR_PROMPT
        • CURATOR_PROMPT
        • CURATOR_PROMPT_NO_GT
        • REFLECTOR_PROMPT
        • REFLECTOR_PROMPT_NO_GT
    • 演进
      • 2504.15228_SICA: A Self-Improving Coding Agent
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Methods
        • 4 Experiments and Results
        • 5 Conclusion
        • 6 Safety Considerations
        • Appendix A Agent Prompts
        • Appendix B Example Traces
        • Appendix C Function Calling Interface
        • Appendix D Additional Result Details
      • 2512.18746_MemEvolve: Meta-Evolution of Agent Memory Systems
        • 总结
        • From Moonlight
      • 2602.02474_MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
        • 总结
        • 关键摘录
        • 附录-SKILL
        • 附录-PROMPT
      • 2603.02766_EvoSkill: Automated Skill Discovery for Multi-Agent Systems
        • 图解
        • 总结
      • 2603.18743_Memento-Skills: Let Agents Design Agents
        • 总结
    • RL
      • 2508.19828_Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
        • 总结
        • From Moonlight
        • Prompt
        • Algorithm
        • Abstract
        • 1 Introduction
        • 1 引言(Introduction)
        • 2 Related Work
        • 2 相关工作(Related Work)总结
        • 3 Method
        • 3 方法总结
        • 4 Experiments
        • 4 实验(Experiments)
        • 5 Conclusion
        • 5 结论(Conclusion)
        • Limitations
        • 局限性(Limitations)总结
        • Appendix A Case Study of Behavior of Agents before and after Fine-tuning
        • 附录A:微调前后智能体行为的案例研究总结
        • Appendix B Dataset Details
        • Appendix B Dataset Details
        • Appendix C Prompts
        • 附录 C 提示(Prompts)
        • Appendix D Implementation Details
        • 附录 D 实现细节(总结)
        • Appendix E Alogirthm
        • Appendix E Algorithm
        • Appendix F Extended Results and Type-Level Analysis
        • 附录 F 扩展结果与类型级分析
      • 2601.01885_AgeMem: Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Background and Related Work
        • 3 Method
        • 4 Experiments
        • 5 Conclusion
        • Limitations
        • Appendix A Detailed Design and Implementation of AgeMem
        • Appendix B Case Study: AgeMem in Action
        • Appendix C Experimental Implementation
        • Appendix D Additional Results
    • 通用
      • 1911.00172_kNN-LMs: Generalization through Memorization: Nearest Neighbor Language Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Nearest Neighbor Language Modeling
        • 3 Experimental Setup
        • 4 Experiments
        • 5 Tuning Nearest Neighbor Search
        • 6 Analysis
        • 7 Related Work
        • 8 Conclusion and Future Work
        • Appendix A Appendix
      • 2304.13343_SCMemory: Enhancing Large Language Model with Self-Controlled Memory Framework
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Self-Controlled Memory
        • 总结
        • 3 Experiments
        • 4 Related Work
        • 5 Conclusion
        • Limitations
        • Ethical Considerations
        • Appendix A Prompt List
        • Appendix B Long-term Dialogue QA Cases
        • Appendix C Book Summarization Cases
        • Appendix D Meeting Summarization Cases
      • 2305.11792_Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Method
        • 总结
        • 4 Datasets Collection
        • 5 Experiment
        • 5.1 LLMs 家族与评估细节
        • 5.2 主要实验
        • 5.3 人工评估
        • 6 Analysis
        • 7 Discussion
        • 8 Conclusion
        • Limitations
        • Ethics Statement
        • Acknowledgement
        • Appendix A Templates
        • Appendix B Different Method of Evaluation
        • Appendix C Discussion
        • Appendix D Helpfulness Analysis of Planning Step
      • 2305.17144_GITM: Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
        • From Deepseek
        • From Deepseek
      • 2306.03901_ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 ChatDB
        • 4 Evaluation
        • 4 评估
        • 5 Conclusion
        • 5 结论
      • 2308.10144_ExpeL: LLM Agents Are Experiential Learners
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Preliminaries
        • 4 ExpeL: An Experiential Learning Agent
        • 5 Experiments
        • 6 Conclusion and Limitations
        • Acknowledgement
        • Appendix A Detailed Related Works
        • Appendix B Broader Impacts
        • Appendix C Computational Resources
        • Appendix D Environment Details
        • Appendix E Environment, Agent, Retrieval Parameters
        • Appendix F Prompt Templates
        • Appendix G Example Insights
        • Appendix H Emergent Abilities Showcase
        • Appendix I Example Trajectories
        • Appendix J Additional Quantitative Results
      • 2309.02427_❇️CoALA: Cognitive Architectures for Language Agents
        • 总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 Background: From Strings to Symbolic AGI
        • 3 Connections between Language Models and Production Systems
        • 4 Cognitive Architectures for Language Agents (CoALA): A Conceptual Framework
        • 5 Case Studies
        • 6 Actionable Insights
        • 7 Discussion
        • 8 Conclusion
      • 2310.08560_MemGPT: Towards LLMs as Operating Systems
        • 总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 MemGPT (MemoryGPT)
        • 总结
        • 3 Experiments
        • 4 Related Work
        • 5 Conclusion
        • 6 Appendix
      • 2311.08719_Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory
        • 总结
        • From Deepseek
        • Abstract
        • 1 INTRODUCTION
        • 2 RELATED WORK
        • 3 METHODOLOGY
        • 4. Experiment
        • 5. Conclusion
      • 2312.17653_❇️LARP: Language-Agent Role Play for Open-World Games
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Cognitive Architecture
        • 4 Environment Interaction
        • 5 Personalities
        • 6 Discussions
        • 7 Conclusion
      • 2402.04624_MemoryLLM: Towards Self-Updatable Large Language Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Preliminaries
        • 3 MemoryLLM
        • 4 Experiments
        • 5 Related Work
        • 6 Conclusion and Future Work
        • Impact Statement
        • Appendix A Details in Methodology
        • Appendix B Implementation Details
        • Appendix C Additional Experiments
      • 2402.09727_ReadAgent: A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
        • 总结
        • 别人的总结
        • From Deepseek
      • 2404.11672_MemLLM: Finetuning LLMs to Use Explicit Read-Write Memory
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related work
        • 3 Methodology
        • 4 Experiments
        • 5 Conclusion
        • Limitations
        • Appendix A Memory-write Decoding Method
        • Appendix B Filtering Ambiguous Queries
        • Appendix C Memory-read Data Generation
        • Appendix D Hyperparameters Details
        • Appendix E Filtering Prompt
      • 2407.01178_❇️Memory3: Language Modeling with Explicit Memory
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 | Memory Circuitry Theory
        • 3 | Design
        • 4 | Pretraining Data
        • 5 | Pretrain
        • 6 | Fine-tuning and Alignment
        • 6 | 微调与对齐
        • 7 | Evaluation
        • 8 | Conclusion
        • 8 | 结论
        • Acknowledgement
        • 致谢
        • Appendix A Cost Estimation
        • A.1 | Implicit Memory
        • A.2 | Explicit Memory
        • A.3 | External Information
        • 总结与对比
        • 附注:知识保留问题(Remark 9)
        • Appendix B Vector Compression
        • 附录 B 向量压缩
        • Appendix C Supplementary Evaluation Results
        • 附录 C 补充评估结果总结
      • 2410.15665_LongTermMemory: The Foundation of AI Self-Evolution
        • 总结
        • 别人的总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 AI Self-Evolution
        • 总结
        • 3 LTM for AI Self-Evolution
        • 4 How to Construct LTM?
        • 5 How can LTM be used to achieve model self-Evolution?
        • 6 The Practice of model self-evolution based on LTM
        • 7 Our Future Plans
        • 8 Conclusion
        • Appendix A RTG prompt
      • 2502.00592_M+: Extending MemoryLLM with Scalable Long-Term Memory
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Methodology
        • 4 Experiments
        • 5 Conclusion and Future Work
        • Impact Statement
        • Appendix A Justifications of using deepspeed-stage-2
        • Appendix B Experiments on datasets NaturalQA
        • Appendix C Statistics of the Dataset of Long Documents
        • Appendix D Additional Training Details
        • Appendix E Discussions
      • 2502.12110_A-Mem: Agentic Memory for LLM Agents
        • 总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Methodolodgy
        • 4 Experiment
        • 5 Conclusions
        • 6 Limitations
        • Appendix A Experiment
        • Appendix B Prompt Templates and Examples
      • 2504.15965_❇️From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Overview
        • 3 Personal Memory
        • 4 System Memory
        • 5 Open Problems and Future Directions
        • 6 Conclusion
      • 2504.19413_❇️Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
        • 总结
        • 别人的总结
        • Abstract
        • 1 Introduction
        • 2 Proposed Methods
        • 总结
        • 3 Experimental Setup
        • 总结
        • 4 Evaluation Results, Analysis and Discussion.
        • 5 Conclusion and Future Work
        • 6 Acknowledgments
        • Appendix A Prompts
        • Appendix B Algorithm
        • Appendix C Selected Baselines
      • 2505.22101_MemOS: An Operating System for Memory-Augmented Generation (MAG) in LLM (Short Version)
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Memory in Large Language Models
        • 3 MemOS Design Philosophy
        • 4 MemOS
        • 4.1 MemOS 中的记忆类型
        • 4.2 记忆立方体(MemCube):核心资源
        • 4.3 MemOS 架构
        • 4.4 系统执行流程
        • 总结
        • 5 Conclusion
      • 2506.06326❇️_MemoryOS: Memory OS of AI Agent
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 MemoryOS
        • 4 Experiments
        • 5 Conclusion
      • 2505.22101_❇️MemOS: A Memory OS for AI System
        • 总结
        • LLM 总结:
        • Abstract
        • 1 Introduction
        • 2 Memory in Large Language Models
        • 3 MemOS Design Philosophy
        • 4 Memory Modeling in MemOS
        • 5 Architecture of MemOS
        • 6 Evaluation
        • 7 MemOS for Architecture Innovation and Applications
        • 8 Conclusion
      • 2508.09874_Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Background
        • 3 Memory Decoder
        • 4 Experimental Setup
        • 5 Results
        • 6 Analysis
        • 7 Related Work
        • 8 Conclusion
        • 9 Limitations
        • Appendix A Interpolation hyperparameter \(\alpha\) of all tasks
        • Appendix B Analysis of DAPT Performance on Downstream Tasks
        • Appendix C Knowledge-Intensive Reasoning Task Corpus Composition
        • Appendix D Domain-Specific Downstream Tasks
        • Appendix E Comparison with DAPT Model Interpolation
        • Appendix F In-Context Learning Performance Analysis
        • Appendix G Characteristics of kk-NN Distributions
        • Appendix H Alternative Loss Functions for Imitating kk-NN Distributions
      • 2509.06269_REMI: A Novel Causal Schema Memory Architecture for Personalized Lifestyle Recommendation Agents
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Research Objectives
        • 3. Related Work
        • 4. Proposed Method
        • 5. Evaluation Framework
        • 6. Results and Findings
        • 7. Discussion
        • 8. Conclusion
      • 2509.24704_MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
        • 总结
        • From Moonlight
        • From Deepseek&OpenAI
      • 2510.18866_❇️LightMem: Lightweight and Efficient Memory-Augmented Generation
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Preliminary
        • 3 lightmem architecture
        • 4 experiments
        • 5 related work
        • 6 conclusion and Future Work
        • Appendix A Usage of LLMs
        • Appendix B Methodology Details
        • Appendix C Experiment Details
        • 附录 C 实验细节总结
        • Appendix D Prompts
        • 附录 D 提示(Prompt)设计
      • 2601.02163_EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning
        • 总结
        • 图解
        • From Moonlight
    • 通用-Github
      • 2509.00xxxx_MemU: 一个前瞻性很强但尚不成熟的记忆框架
        • 主要内容
      • 2511.00xxx_MemMachine
        • 主要内容
    • 记忆相关Agent
      • 2504.10147_PersonalRAG❇️: A Survey of Personalization: From RAG to Agent
        • 总结
        • Abstract
        • 1. Introduction
        • 2. What is Personalization
        • 3. How to Adopt Personalization
        • 4. Where to Adopt Personalization
        • 5. Evaluation and Dataset
        • 6. Challenges and Future Directions
        • 7. Conclusion
      • 2506.07398❇️_G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems
        • 总结
        • 1. Introduction
        • 2 Related Works
        • 3 Preliminary
        • 4 G-Memory
        • 5 Experiment
        • 6 Conclusion & Limitation
        • A Experimental Details
        • B Additional Experiment Results
        • C Prompt Set
        • D Discussion with Related Works
      • 2507.02259_MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 The Proposed MemAgent
        • 总结
        • 4 Experiments
        • 5 Conclusion
        • 6 Computation Complexity
        • 7 Complete Out-Of-Domain Task Results
      • 2507.07957_MIRIX: Multi-Agent Memory System for LLM-Based Agents
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Application & Use Cases
        • 3 Methodology
        • 4 Experiments
        • 5 Related Work
        • 6 Conclusion and Future Work
        • Appendix A Full Experimental Results with Different Runs
      • 2509.25140❇️_ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Methodology
        • 4 Experiments
        • 5 Analysis
        • 6 Conclusion
        • 7 Acknowledgments
        • Appendix A Experiment Details
        • Appendix B Details for Experiment Settings
        • Appendix C Additional Analyses
        • Appendix D Future Directions
        • Appendix E Limitations
    • 记忆相关数据集
      • 2305.10250_MemoryBank: Enhancing Large Language Models with Long-Term Memory
        • 总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 MemoryBank: A Novel Memory Mechanism Tailored for LLMs
        • 总结
        • 3 SiliconFriend: An AI Chatbot Companion Powered by MemoryBank
        • 使用的三种大语言模型
        • SiliconFriend 的开发阶段
        • 总结
        • 重点总结
        • 4 Experiments
        • 5 Related Works
        • 5 相关工作(Related Works)
        • 6 Conclusion
        • 6 结论
      • 2308.08239_MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Methodology
        • 4 Experiments
        • 5 Conclusion
        • Appendix A Basic Published Datasets
        • Appendix B Involved Prompts
        • Appendix C Instruction Design Challenges
        • 1. 引言
        • 2. Prompt Copy(提示复制)
        • 3. Catastrophic Forgetting(灾难性遗忘)
        • 4. Prompt Misplacement(提示错位)
        • 5. 示例任务说明
        • 总结
      • 2402.17753_LoCoMo❇️: Evaluating Very Long-Term Conversational Memory of LLM Agents
        • 总结
        • 别人的总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Generative Pipeline for LoCoMo
        • 4 LoCoMo Evaluation Benchmark
        • 5 Experimental Setup
        • 6 Experimental Results
        • 7 Conclusion
        • 8 Limitations
        • 9 Broader Impacts
        • Appendix Overview
        • Appendix A Generative Pipeline for LoCoMo
        • Appendix B Dataset
        • Appendix C Experimental Setup
        • Appendix D Results
      • 2410.10813_LongMemEval: Benchmarking Chat Assist- ants on Long-Term Interactive Memory
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 LongMemEval
        • 4 A Unified View of Long-Term Memory Assistants
        • 5 Experiment Results
        • 6 Conclusion
        • Reproducibility Statement
        • Ethics Statement
        • Appendix A Supplemental Details for LongMemEval
        • Appendix B A Human Study on Commercial Memory Chatbots
        • Appendix C Unified Memory View
        • Appendix D Memory Optimizations: Implementation Details
        • Appendix E Extended Analyses
      • 2506.21605_MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Works
        • 3 Dataset Construction
        • 4 Benchmark
        • 5 Conclusion
        • Limitations
        • Ethics Statement
        • Acknowledgments
        • Appendix A Case Studies
        • Appendix B Detail Data Statics
        • Appendix C Data Creation Prompt
        • Appendix D Result Details
      • 2507.05257_MemoryAgentBench: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
        • 总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 MemoryAgentBench
        • 4 Experiments
        • 5 Conclusion and Future Work
        • Appendix A Details of Dataset
        • Appendix B Prompts
        • Appendix C Detailed Experimental Results
        • Appendix D Experimental Settings
      • 2510.27246_BEAM: Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs
        • 总结
        • Abstract
        • 1 Introduction
        • 2 BEAM: Benchmarking memory Capabilities of LLMs
        • 3 LIGHT: Improving Memory Capabilities of LLMs
        • 4 Experiments
        • 5 Related Work
        • 6 Conclusion
        • Acknowledgments
        • Appendix A Detailed Related Work
        • Appendix B Benchmark Design
        • Appendix C Detailed Experiments
        • Appendix D Nugget Design
        • Appendix E Examples from Different Components of BEAM
        • Appendix F Case Study
        • Appendix G Prompts
      • 2601.06966_RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction
        • 总结
        • 图解
    • 多模态记忆
      • 2506.05813_MAPLE: Multi-Agent Adaptive Planning with Long-Term Memory for Table Reasoning
        • 总结
        • Abstract
        • 1 Introduction
        • 2 MAPLE Framework
        • 3 Experiments
        • 4 Conclusion
        • Limitations
        • Appendix A Related Work
        • Appendix B Cognitive Architecture
        • Appendix C Memory Evolution Algorithm
        • Appendix D Case Study
        • Appendix E Addtional Experimental Results
        • Appendix F Example Prompts
        • 附录 F 示例提示(Example Prompts)
      • 2508.09736_M3-Agent❇️: Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Datasets
        • 4 Approach
        • 5 Experiments
        • 6 Conclusion and Future Work
        • 7 Acknowledgment
        • 8 M3-Bench-robot
        • 9 M3-Bench-web
        • 10 Implementation Details of Tools
        • 11 Demonstration Data Synthesis for Memorization
        • 12 Evaluation of Memorization
        • 13 RL Training Details
        • 14 Case Study
        • 15 Prompt Templates
      • 2509.11914_EgoMem: Lifelong Memory Agent for Full-duplex Omnimodal Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Task Definition and Preliminaries
        • 3 EgoMem
        • 4 Training Details
        • 5 Experiments
        • 6 Conclusion and Future Challenges
        • Acknowledgments
      • 2510.12422_VideoLucy: Deep Memory Backtracking for Long Video Understanding
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Method
        • 3 EgoMem Benchmark
        • 4 Experiments
        • 5 Related Work
        • 6 Conclusion
        • 7 Acknowledgments.
        • Appendix A Appendix
    • 参数记忆
      • 1907.05242_PKM: Large Memory Layers with Product Keys
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Related work
        • 3 Learnable product key memories
        • 4 Experiments
        • 5 Conclusion
      • 2305.02437_Selfmem: Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory
        • 总结
        • From Moonlight
      • 2407.04153_PEER: Mixture of A Million Experts
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Method
        • 3 Experiments
        • 4 Related Works
        • 5 Conclusion
        • Acknowledgments
      • 2412.09764_Memory+: Memory Layers at Scale
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 1 引言(Introduction)
        • 2 Related work
        • 2 相关工作(Related Work)
        • 3 Memory Augmented Architectures
        • 3 Memory Augmented Architectures
        • 4 Experimental setup
        • 4 实验设置(Experimental setup)
        • 5 Scaling results
        • 5 扩展结果总结
        • 6 Implications and shortcomings of the work
        • 6 工作的意义与不足
      • 2508.18756_UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Approach
        • 4 Experiments
        • 5 Conclusion
        • 6 Optimized Initialization
        • 7 Evaluation Benchmark
        • 8 Open-source model hyperparameters
    • 图结构记忆
      • 1905.05460_Cognitive Graph for Multi-Hop Reading Comprehension at Scale
        • Abstract
        • 1 Introduction
        • 2 Cognitive Graph QA Framework
        • 3 Implementation
        • 4 Experiment
        • 5 Related work
        • 6 Discussion and Conclusion
      • 2405.14831_HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
        • Abstract
        • 1 Introduction
        • 2 HippoRAG
        • 3 Experimental Setup
        • 4 Results
        • 5 Discussions
        • 6 Related Work
        • 7 Conclusions & Limitations
        • Appendices
        • Appendix A HippoRAG Pipeline Example
        • Appendix B Dataset Comparison
        • Appendix C Ablation Statistics
        • 附录 C 消融实验统计(Ablation Statistics)
        • Appendix D Intrinsic OpenIE Evaluation
        • 附录 D 内在的 OpenIE 评估
        • Appendix E Case Study on Path-Finding Multi-Hop QA
        • 附录E:路径查找多跳问答案例研究总结
        • Appendix F Error Analysis
        • Appendix G Cost and Efficiency Comparison
        • 附录 G 成本与效率对比
        • Appendix H Implementation Details & Compute Requirements
        • 附录 H 实现细节与计算需求
        • Appendix I LLM Prompts
        • 附录I:大语言模型提示
    • 应用-推荐
      • Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions
        • 论文基本信息
        • 核心内容简介
        • 重要性与影响
        • 总结
      • 08xx.xxxxx_SVD++: Factorization meets the neighborhood: a multifaceted collaborative filtering model
        • SVD++
        • Neighborhood Models
        • Latent Factor Models(潜在因子模型)
      • Recommender systems: An overview of different approaches to recommendations
        • 论文简介
        • 核心内容总结
        • 总结
      • 1902.07153_SGCN: Simplifying Graph Convolutional Networks
        • 总结
        • 前提知识
        • Abstract
        • 1 Introduction
        • 2 Simple Graph Convolution
        • 3 Spectral Analysis
        • 4 Related Works
        • 5 Experiments and Discussion
        • 6 Conclusion
        • Acknowledgement
        • Appendix A The spectrum of 𝚫~symsubscript~𝚫sym)
        • Appendix B Experiment Details
        • Appendix C Additional Experiments
      • 1905.08108_NGCF: Neural Graph Collaborative Filtering
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Methodology
        • 3. Related Work
        • 4. Experiments
        • 5. Conclusion and Future Work
      • 2001.10167_RGCF: Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach
        • 总结
        • Abstract
        • Introduction
        • Preliminaries and Related Work
        • Linear Residual Graph Convolutional Collaborative Filtering
        • Experiments
        • Conclusions
      • 2002.02126_LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Preliminaries
        • 3. Method
        • 4. Experiments
        • 5. Related Work
        • 6. Conclusion and Future Work
      • 2010.10783_SGL: Self-supervised Graph Learning for Recommendation
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Preliminaries
        • 3. Methodology
        • 4. Experiments
        • 5. Related Work
        • 6. Conclusion and Future Work
        • Appendix A Gradient of InfoNCE Loss w.r.t. node representation
      • 2112.08679_SimGCL: Are Graph Augmentations Necessary? Simple Graph Contrastive Learning for Recommendation
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Investigation of Graph Contrastive Learning in Recommendation
        • 3. SimGCL: Simple Graph Contrastive Learning for Recommendation
        • 4. Experimental Results
        • 5. Related Work
        • 6. Conclusion
        • Acknowledgement
      • 2202.06200_NCL: Improving Graph Collaborative Filtering with Neighborhood-enriched Contrastive Learning
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Preliminary
        • 3. Methodology
        • 4. Experiments
        • 5. Related work
        • 6. Conclusion And Future Work
        • Appendix A Pseudo-code for NCL
        • Appendix B Case Study on Selected Neighbors
      • 2203.13366_RLP_P5: A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Personalized Prompt Collection
        • 4. The P5 Paradigm and Model
        • 5. Experiments
        • 6. Conclusions and Future Work
        • Acknowledgment
        • D FULL LIST OF PERSONALIZED PROMPTS FOR AMAZON DATASETS
      • 2302.08191_LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Methodology
        • 4 Evaluation
        • 5 Conclusion
        • Appendix A Details of the Baselines
        • Appendix B Performance Comparison with Baselines (Continued)
        • Appendix C Theoretical Analysis
        • Appendix D Calculation of Complexity
        • Appendix E Performance Results under the New Setting
      • 2303.14524_ChatRec: Towards Interactive and Explainable LLMs-Augmented Recommender System
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Method
        • 4 Experiment
        • 5 Conclusion
        • Appendix 0.A Implementation Details
      • 2305.00447_TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation
        • 总结
        • Abstract
        • 1. Introduction
        • 2. TALLRec
        • 3. Experiments
        • 4. Related Work
        • 5. Conclusion
      • 2305.07001_InstructRec: Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Methodology
        • 3. Experiments
        • 4. Conclusion and Future Work
        • Appendix A Instruction Templates for Traditional Recommendation
        • Appendix B Instruction Templates for Traditional Product search
        • Appendix C Instruction Templates for Personalized Search
      • 2306.10933_KAR: Towards Open-World Recommendation with Knowledge Augmentation from Large Language Models
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Preliminaries
        • 4. Methodology
        • 5. Experiment
        • 6. Broader Impact
        • 7. Conclusion
      • 2308.11131_ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Preliminaries
        • 3. Methodology
        • 4. Experiment
        • 5. Related Work
        • 6. Conclusion
        • Appendix A Prompt Illustration
        • Appendix B Data Preprocessing
        • Appendix C Baseline Implementation
        • 总结
        • Appendix D Additional Experiments
      • 2310.15950_RLMRec: Representation Learning with Large Language Models for Recommendation
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methodology
        • 4. Evaluation
        • 5. Conclusion
        • Appendix A Supplementary Material
      • 2311.01343_CLLM4Rec: Collaborative Large Language Model for Recommender Systems
        • 总结
        • Abstract
        • 1. Introduction
        • 本节贡献(Contribution)
        • 2. Related Work
        • 2. 相关工作
        • 3. Methodology
        • 4. Empirical Study
        • 5. Conclusion
        • Acknowledgment
        • Appendix A Technical Details
        • Appendix B Experiments
      • 2502.18965_OneRec: Unifying Retrieve and Rank with Generative Recommender and Preference Alignment
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methods
        • 4. System Deployment
        • 5. Experiment
        • 总结
        • 6. Conclusion
      • 2508.20900_OneRec-V2 Technical Report
        • 总结 From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Lazy Decoder-Only Architecture
        • 3 Preference Alignment with Real-World User Interactions
        • 4 Online A/B Test
        • 5 Conclusion, Limitations, and Future Directions
        • Appendix
        • Appendix A Contributions
        • Appendix B Computational Complexity of Different Architecture
        • Appendix C Empirical Results
        • Appendix D Online Performance with Caching Disabled
      • 2510.11639_OneRec-Think: In-Text Reasoning for Generative Recommendation
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Preliminary
        • 4 Methodolody
        • 5 Experiments
        • 6 Conclusion
        • Limitations
        • Ethics Statement
        • Appendix A Appendix
      • 2511.11255_Align3GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation
        • 总结
        • Abstract
        • Introduction
        • Related Works
        • Methodology
        • Experiments
        • Conclusion
  • LLM 模型
    • NLP 模型
      • 1810.04805_BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
        • 1 Introduction
        • 2 Related Work
        • 3 BERT
        • Appendix A Additional Details for BERT
      • 18xx_GPT1: Improving Language Understanding by Generative Pre-Training
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Framework
        • 4 Experiments
        • 5 Analysis
        • 6 Conclusion
        • 引文口碑
        • 要点解读
      • 19xx_GPT2: Language Models are Unsupervised Multitask Learners
        • The Illustrated GPT-2
        • 参考
      • 2006.03654_DeBERTa: Decoding-enhanced BERT with Disentangled Attention
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Background
        • 3 The DeBERTa Architecture
        • 4 Scale Invariant Fine-Tuning
        • 4 尺度不变微调 (Scale Invariant Fine-Tuning)
        • 5 Experiment
        • 6 Conclusions
        • 7 Acknowledgments
        • Appendix A Appendix
      • 2012.00413_CPM: A Large-scale Generative Chinese Pre-trained Language Model
      • 2302.13971_LLaMA: Open and Efficient Foundation Language Models
      • 2307.09288_Llama 2: Open Foundation and Fine-Tuned Chat Models
      • 2309.16609_Qwen Technical Report
        • 1. Introduction
        • 2. Pretraining
        • 3. Alignment
        • 4. CODE-QWEN: SPECIALIZED MODEL FOR CODING
        • 5. MATH-QWEN: SPECIALIZED MODEL FOR MATHEMATICS REASONING
        • 6. Related Work
        • 7. Conclusion
        • A.1 MORE TRAINING DETAILS
        • A.2 EVALUATION
      • 2310.19341_Skywork: A More Open Bilingual Foundation Model
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Methodology
        • 3 Pre-training
        • 4 Evaluation
        • 5 Discussion
        • 6 Limitation
        • 7 Conclusion
        • Appendix A Details on GPT-7B vs. LLaMA-7B Experiment
        • Appendix B Preliminary Experiments on Distributed Training
        • Appendix C More Benchmark Results
        • Appendix D Details on LM Test Sets
      • 2401.14196_DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence
      • 2404.06395_MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
        • 5. Two Stage Pre-training Strategy
        • 6. Model
        • 7 MiniCPM Family
      • 2405.04434_DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
      • 2406.12793_ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
      • 2407.10671_Qwen2 Technical Report
        • Abstract
        • 1. Introduction
        • 2. Tokenizer & Model
        • 3. Pre-training
        • 4. Post-training
        • 5. Evaluation
        • 6. Conclusion
      • 2412.15115_Qwen2.5
        • Abstract
        • 1. Introduction
        • 2. Architecture and Tokenizer
        • 3. Pre-training
        • 4. Post-training
        • 5. Evaluation
        • 6. Conclusion
      • 2505.09388_Qwen3
        • Abstract
        • 1. Introduction
        • 2. Architecture
        • 3. Pre-training
        • 4. Post-training
        • 5. Conclusion
      • 2508.06471_GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Pre-Training
        • 3 Post-Training: Expert Model Iteration
        • 4 Evaluation
        • 5 Conclusion
        • 6 Contribution
    • 多模态模型
      • 2112.15093_CTR: Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study
        • Abstract
        • 1. Introduction
        • 2. Preliminaries
        • 3. Datasets
        • 4. Baselines
        • 5. An Empirical Study
        • 6. Conclusions
        • Appendix A Details of PRAB
        • Appendix C Visualization of Failure Cases.
      • 2304.08485_LLaVA: Visual Instruction Tuning
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. GPT-assisted Visual Instruction Data Generation
        • 4. Visual Instruction Tuning
        • 5. Experiments
        • 6. Conclusion
      • 2308.12966_Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
        • Methodology
        • Training
        • Evaluation
        • B. Data Format Details of Training
      • 2310.03744_LLaVA2: Improved Baselines with Visual Instruction Tuning
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Approach
        • 4. Empirical Evaluation
        • 5. Open Problems in LMMs
        • 6. Conclusion
        • A. Implementation Details
        • B. Qualitative Results
      • 2312.07533_VILA: On Pre-training for Visual Language Models
        • Abstract
        • 1. Introduction
        • 2. Background
        • 3. On Pre-training for Visual Language Models
        • 4. Experiments
        • 5. Related Work
        • 6. Conclusion
      • 2403.05525_DeepSeek-VL: Towards Real-World Vision-Language Understanding
        • Abstract
      • 2408.01800_MiniCPM-V: A GPT-4V Level MLLM on Your Phone
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Model Architecture
        • 4. Training
        • 5. End-side Deployment
        • 6. Experiments
        • 7. Conclusion
      • 2409.17146_Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
        • Abstract
        • 1. Introduction
        • 2. Architecture
        • 3. Data
        • 4. Training
        • 5. Evaluation
        • 6. Ablations
        • Appendix A: Model Details
        • Appendix B: Training Details
        • Appendix C: Evaluation Results
        • Appendix D: Result Details
        • Appendix E Ablations Details
        • Appendix F Data Details
        • Appendix G Dataset Examples
        • Appendix H Related Work
      • 2410.13848_Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
        • 总结
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Janus: A Simple, Unified and Flexible Multimodal Framework
        • 4 Experiments
        • 5 Conclusion
        • Appendix
        • Appendix A Details of Semantic Tokenizer Mentioned in Ablation Study
        • Appendix B Additional Qualitative Results
      • 2411.00774_Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
        • Abstract
        • 1. Introduction
        • 2. Model
        • 3. Experience
        • 4. Conclusion and Future Work
      • 2412.04468_NVILA: Efficient Frontier Visual Language Models
        • Abstract
        • 1. Introduction
        • 2. Approach
        • 3. Experiments
        • 4. More Capabilities
        • 5. Related Work
        • 6. Conclusion
      • 2502.13923_Qwen2.5-VL
        • Abstract
        • 1. Introduction
        • 2. Approach
        • 3. Experiments
        • 4. Conclusion
      • 2505.14683_BAGEL: Emerging Properties in Unified Multimodal Pretraining
        • 总结
        • From Deepseek
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Model
        • 3 Data
        • 4 Training
        • 5 Evaluation
        • 6 Emerging Properties
        • 7 Main Results
        • 8 Conclusion
        • 9 Acknowledgement
      • 2506.13642_Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Stream-Omni
        • 4. Experiments
        • 5. Results and Analyses
        • 6. Conclusion
        • Appendix A Construction of InstructOmni
        • Appendix B Construction of SpokenVisIT
      • 2506.13642_Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Stream-Omni
        • 3.2.1 Data Construction
        • 4 Experiments
        • 5 Results and Analyses
        • 6 Conclusion
        • Limitations
        • Appendix A Construction of InstructOmni
        • Appendix B Construction of SpokenVisIT
        • Appendix C Case Study
      • 2507.05595_PaddleOCR 3.0 Technical Report
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Core Capabilities
        • 3 Codebase Architecture Design
        • 4 Deployment
        • 5 Conclusion
        • Appendix A Acknowledgments
        • Appendix B Usage of command and API details
        • Appendix C More details on MCP host configuration
      • 2510.14528_PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
        • 总结
        • Abstract
        • 1 Introduction
        • 2 PaddleOCR-VL
        • 3 Dataset
        • 4 Evaluation
        • 5 Conclusion
        • Appendix A Training Dataset Details
        • Appendix B Supported Languages
        • Appendix C Inference Performance on Different Hardware Configurations
        • Appendix D Real-world Samples
        • Appendix E Compare with Others
    • Embedding 模型
      • 2506.05176_Qwen3_Embedding: Advancing Text Embedding and Reranking Through Foundation Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Model Architecture
        • 3 Models Training
        • 4 Evaluation
        • 4.1 Settings 评估设置
        • 4.2 Main Results 主要结果
        • 4.3 Analysis 分析
        • 总结
        • 5 Conclusion
        • Appendix A Appendix
    • LLM 音频
      • 2005.08100_Conformer: Convolution-augmented Transformer for Speech Recognition
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 Conformer Encoder
        • 3 Experiments
        • 4 Conclusion
      • 2106.07447_HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
        • 总结
        • LLM 总结
        • Abstract
        • I Introduction
        • II Method
        • III Related Work
        • IV Experimental Details
        • V Results
        • VI Conclusion
      • 2112.02418_YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
        • 关键概念
        • Abstract
        • 1. Introduction
        • 2. YourTTS Model
        • 3. Experiments
        • 4. Results and Discussion
        • 5. Zero-Shot Voice Conversion
        • 6. Speaker Adaptation
        • 7. Conclusions, limitations and future work
      • 2212.04356_whisper: Robust Speech Recognition via Large-Scale Weak Supervision
        • Abstract
        • 1. Introduction
        • 2. Approach
        • 3. Experiments
        • 4. Analysis and Ablations
        • 5. Related Work
        • 6. Limitations and Future Work
        • 7. Conclusions
        • A. Evaluation Datasets
        • B Compared Models
        • C. Text Standardization
      • 2301.02111_Vall-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Background: Speech Quantization
        • 4. VALL-E
        • 5. Experiments
        • 6. Conclusion, Limitations, and Future Work
      • 2303.03926_VALL-E_X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3 Cross-Lingual Codec Language Model
        • 4. VALL-E X Application
        • 5. Experiments
        • 6. Conclusion
        • A. Appendix
      • 2406.05370_VALL-E2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. VALL-E 2
        • 4. Experiments
        • 5. Conclusion
      • 2407.05407_CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
        • Abstract
        • 1. Instructions
        • 2. CosyVoice: A Scalable TTS model using Supervised Semantic Tokens
        • 3. Dataset
        • 4. Experimental Settings
        • 6. Conclusion
      • 2407.10759_Qwen2-Audio Technical Report
        • Abstract
        • 1. Introduction
        • 2. Methodology
        • 3. Experiments
        • 5. Conclusion
      • 2410.00037_Moshi: a speech-text foundation model for real-time dialogue
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.Model
        • 4. Datasets and Training
        • 5. Evaluation
        • 6.Safety
        • 7.Conclusion
      • 2412.10117_CosyVoice2: Scalable Streaming Speech Synthesis with Large Language Models
        • Abstract
        • 1. Instroduction
        • 2. CosyVoice 2
        • 3. Experimental Settings
        • 4. Experimental Results
        • 5. Conclusion
      • 2501.06282_MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
        • Abstract
        • 1.Instruction
        • 2.Related Work
        • 3.MinMo
        • 4.Experiments
        • 5.Conclusion
        • 6.Limitations
        • A. Prompts for Voice Understanding Tasks
      • 2505.02707_Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Voila: Voice-Language Foundation Models
        • 4. Experiments
        • 5. Conclusion
      • 2505.17589_CosyVoice3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
        • From LLM
        • Abstract
        • 1.Introduction
        • 2.CosyVoice 3
        • 3.The Multilingual Data Pipeline
        • 4.Experimental Settings
        • 5.Experimental Results
        • 6.Conclusion
        • 7.Limitations
      • 2512.20156_Fun-Audio-Chat Technical Report
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Methodology
        • 3 Experiments
        • 4 Conclusion
        • 5 Limitations
        • 5 局限性(Limitations)
        • 6 Contributions and Acknowledgments
    • LLM 视频
      • 2301.12597_BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
        • Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Method
        • 4 Experiment
        • 5 Limitation
        • 6 Conclusion
      • 2308.01390_OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
        • OpenFlamingo_ An Open-Source Framework for Training Large Autoregressive Vision-Language Models
        • Abstract
        • 1 Introduction
        • 2 Related work
        • 3 Approach
        • 4 Results
        • 5 Discussion
        • 6 Conclusion
        • Appendix A Extended results
        • Appendix B Additional notes on filtering MMC4
        • Appendix C Synthetic data prompt
        • Appendix D Image credits
      • 2503.20215_Qwen2.5-Omni Technical Report
        • Abstract
        • 1. Introduction
        • 2. Archtecture
        • 3 预训练
        • 4 后训练(Post-training)
        • 5. Evaluation
        • 6. Conclusion
    • LLM MoE
      • 2408.15664_AUXILIARY-LOSS-FREE LOAD BALANCING STRATEGY FOR MIXTURE-OF-EXPERTS
      • 2410.07490_MoDEM: Mixture of Domain Expert Models
      • 2601.07372_Engram: Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Architecture
        • 3 Scaling Laws and Sparsity Allocation
        • 4 Large Scale Pre-training
        • 5 Long Context Training
        • 6 Analysis
        • 7 Related Work
        • 8 Conclusion
        • Appendix A Detailed Model Architecture and Hyper Parameters
        • Appendix B Full Benchmark Curves
        • Appendix C Case Study of Tokenizer Compression
    • 商业模型
      • 2303.08774_GPT-4 Technical Report
      • 2312.11805_Gemini: A Family of Highly Capable Multimodal Models
        • Abstract
        • 1. Introduction
        • 2. Model Architecture
        • 3. Training Infrastructure
        • 5. Evaluation
        • 6. Post-Training Models
        • 7. Responsible Deployment
        • 8. Discussion and Conclusion
      • 2403.05530_Gemini1.5: Unlocking multimodal understanding across millions of tokens of context
      • 2406.02430_Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
        • Abstract
        • 1 Introduction
        • 2 Method
        • 3 Experiments
        • 4 Model extensions
        • 5 Model applications, limitations, and safety
        • 6 Authors (alphabetical order)
        • 7 Acknowledgement
      • 2407.04675_Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
        • Abstract
        • 1 Introduction
        • 2 Motivation
        • 3 Methods
        • 4 Model and Evaluation
        • 5 Conclusion
        • Appendix A Appendix
      • 2503.20020_Gemini2: Gemini Robotics: Bringing AI into the Physical World
      • 2504.xxxxx_Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning
      • 2505.07062_Seed1.5-VL Technical Report
        • Seed1.5-VL Technical Report
        • Abstract
        • 1 Introduction
        • 2 Architecture
        • 3 Pre-training
        • 3.2 Training Recipe
        • 4 Post-training
        • 4.4 Hybrid Reinforcement Learning
        • 5 Training Infrastructure
        • 6 Evaluation
        • 6.1.3 Video Task Evaluation
        • 6.3.2 Comparison with State-of-the-arts
        • 7 Conclusion and Next Steps
        • 8 Contributions and Acknowledgments
        • 9 Qualitative examples
        • 9.7 Visual Reasoning_ Visual Pattern Recognition
        • 9.19 Failure Cases_ Combinatorial Search I
        • 10 Evaluation Details
        • DREAM-1K
  • LLM 周边技术
    • Framework
      • 1712.05889_Ray: A Distributed Framework for Emerging AI Applications
        • Abstract
        • 1. Introduction
        • 2. Motivation and Requirements
        • 3. Programming and Computation Model
        • 4. Architecture
        • 5. Evaluation
        • 6 Related Work
        • 7 Discussion and Experiences
        • 8. Conclusion
      • 1910.02054_DeepSpeed_ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
        • Abstract
        • 1. Extended Introduction
        • 2. Related Work
        • 3 Where Did All the Memory Go?
        • 4 ZeRO: Insights and Overview
        • 5 Deep Dive into ZeRO-DP
        • 6 Deep Dive into ZeRO-R
        • 7 Communication Analysis of ZeRO-DP
        • 8. Communication Analysis of ZeRO-R
        • 9. Step Towards 1 Trillion Parameters
        • 10. Implementation and Evaluation
        • 11. Concluding Remarks
      • PyTorch: An Imperative Style, High-Performance Deep Learning Library
      • Transformers: State-of-the-Art Natural Language Processing
      • 2210.XX_Ray v2 Architecture
        • Overview
        • Architecture Overview
        • Object Management
        • Task Management
        • Resource Management and Scheduling
        • Actor management
        • Global Control Service
        • Cluster Management
        • Appendix
      • 2309.06180_vLLM: Efficient Memory Management for Large Language Model Serving with PagedAttention
        • 总结
        • 1. Introduction
        • 2. Background
        • 3. Memory Challenges in LLM Serving
        • 4. Method
        • 5. Implementation
        • 6. Evaluation
        • 7. Ablation Studies
        • 10. Conclusion
      • 2312.07104_SGLang❇️: Efficient Execution of Structured Language Model Programs
        • 总结
        • OpenAI GPT-4总结
        • Qwen-Plus总结
        • Abstract
        • 1 Introduction
        • 2 Programming Model
        • 3 Efficient KV Cache Reuse with RadixAttention
        • 4 Efficient Constrained Decoding with Compressed Finite State Machine
        • 5 Efficient Endpoint Calling with API Speculative Execution
        • 6 Evaluation
        • 7 Related Work
        • 8 Future Directions and Conclusion
        • Acknowledgement
        • Appendix A Additional Details on RadixAttention
        • Appendix B Additional Details on Compressed Finite State Machine
        • Appendix C Additional Experimental Setups and Results
        • Appendix D Compiler Mode
        • Appendix D 编译器模式
    • 大模型调优
      • 2101.00190_Prefix-Tuning: Optimizing Continuous Prompts for Generation
      • 2103.10385_p-tuning: GPT Understands, Too
      • 2104.08691_Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning
      • 2106.09685_LoRA: Low-Rank Adaptation of Large Language Models
      • 2401.01335_Self-Play: Fine-Tuning Converts Weak Language Models to Strong Language Models
      • 2402.09353_DoRA: Weight-Decomposed Low-Rank Adaptation
      • 2402.12354_LoRA+: Efficient Low Rank Adaptation of Large Models
      • 2403.03507_GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
      • 2403.13372_LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
        • 竞争框架
        • 3. Efficient Fine-Tuning Techniques
        • 4 LlamaFactory Framework
        • 6 Conclusion and Future Work
      • 2510.08396_FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 1 引言(Introduction)
        • 2 Revisiting MoE-based LoRA Methods
        • 第2章:重新审视基于MoE的LoRA方法
        • 3 FlyLoRA
        • 4 Experiments
        • 5 Discussion
        • 6 Related Work
        • 7 Conclusion
        • 8 Acknowledgments
        • 8 致谢
        • NeurIPS Paper Checklist
        • Appendix A Theoretical Analysis
        • Appendix B Additional Results
        • Appendix C Detailed Experimental Setting
        • Appendix D Limitations and Future Work
        • Appendix E Broader Impact
    • 通用技术
      • 🏀常用
        • 余弦退火
      • 2505.06708_Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Gated-Attention Layer
        • 3 Experiments
        • 4 Analysis: Non-Linearity, Sparsity, and Attention-Sink-Free
        • 5 Related Works
        • 6 Conclusion
        • 6 结论
        • Limitations
        • 局限性
        • Appendix A Supplement Experiments
      • 2510.29xxx.NL: Nested Learning: The Illusion of Deep Learning Architecture
        • 总结 From Zhihu
        • 总结 From Moonlight
    • 长上下文
      • 2510.07318_AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling
        • From Moonlight
        • Abstract
        • 1 Instruction
        • 2 Related work
        • 3 Method
        • 4 Experiments
        • 5 Conclusion and discussion
        • Acknowledgement
        • Acknowledgement(致谢)
        • 6 AHN instantiation
        • 7 Additional benchmark results
    • 大模型编辑
      • 2405.16720_LAW: Large Scale Knowledge Washing
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Preliminary
        • 4 Problem Setup
        • 5 Methodology
        • 6 Experiments
        • 总结
        • 7 Conclusion, Limitation, and Future Work
        • Ethics Statement
        • Reproducibility Statement
        • Appendix A Mathematical Details of Preliminary
        • Appendix B Implementation Details
        • Appendix C Additional Experiments
      • 2410.00487_SELF-PARAM: Self-Updatable Large Language Models by Integrating Context into Model Parameters
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Methodology
        • 4 Experiments
        • 5 Conclusion and Future Work
        • Ethics Statement
        • Reproducibility Statement
        • Appendix A Additional Settings
        • Appendix B Additional Experiments
      • 2410.02355_AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Preliminary
        • 3 Method
        • 4 Experiment
        • 5 Related Work
        • 6 Limitations & Future Discussion
        • 7 Conclusion
        • Ethics Statement
        • Reproducibility
        • Acknowledgement
        • Appendix A Experimental Setup
        • Appendix B Implementation Details of Current Model Editing & Related Proofs
        • Appendix C More Experimental Results
        • Appendix D Visualizing the Counterfact and ZSRE Datasets Through Examples
    • 分布式模型
      • 1701.06538_MoE: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
      • 1806.03377_PipeDream: Fast and Efficient Pipeline Parallel DNN Training
        • Abstract
        • 1. Introduction
        • 2. Background & Related Work
        • 3. Parallel Training in PipeDream
        • 4. Implementation
        • 5. Evaluation
      • 1811.06965_GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
        • 收集
        • 1. Introduction
        • 2. The GPipe Library
        • 3. Performance Analyses
        • 4. Image Classification
        • 5. Massive Massively Multilingual Machine Translation
        • 6. Design Features and Trade-Offs
      • 1909.08053_Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
        • 收集
        • Abstract
        • 1. Introduction
        • 2. Background and Challenges
        • 3. Model Parallel Transformers
      • 19xx_PipeDream: Generalized Pipeline Parallelism for DNN Training
        • 收集
        • ABSTRACT
        • 1. Introduction
        • 2. BACKGROUND AND RELATED WORK
        • 3. 流水线并行(PIPELINE PARALLELISM)
        • 4. 实现
        • 6. 结论
      • 2006.09503_PipeDream-2BW: Memory-Efficient Pipeline-Parallel DNN Training
        • Abstract
      • 2006.15704_PyTorch Distributed: Experiences on Accelerating Data Parallel Training
      • 2006.16668_GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
      • 2104.04473_Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
        • Abstract
      • 2205.14135_FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
        • Abstract
        • 1. Introduction
        • 2 Background
        • 3. FLASHATTENTION: Algorithm, Analysis, and Extensions
        • 4. Experiments
        • 5. Limitations and Future Directions
        • Appendix A Related Work
        • Appendix B Algorithm Details
        • Appendix C Proofs
        • Appendix D Extension Details
      • 2307.08691_FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
        • Abstract
        • 1. Introduction
        • 2. Background
        • 3. FlashAttention-2: Algorithm, Parallelism, and Work Partitioning
        • 4. Empirical Validation
        • 5. Discussion and Future Directions
      • 通用
    • LLM 量化
      • 通用
        • 混合精度
        • 浮点数格式
        • weight-only quantization
      • 2110.02861_bitsandbytes: 8-bit Optimizers via Block-wise Quantization
        • Abstract
        • 1. Background
        • 2. 8-bit Optimizers
        • 3. 8-bit vs 32-bit Optimizer Performance for common Benchmarks
        • 4. Analysis
        • 5. Related Work
      • 2206.01861_ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
        • Abstract
        • 1. Introduction
        • 2. Relative Work
        • 3. Background and Challenges
        • 4. Methodology
        • 5. Results
        • 6. Conclusions
        • Appendix A Background
        • Appendix D Details about System Optimization
      • 2206.09557_LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
        • Abstract
        • 1. Instructions
        • 2. Background
        • 3. Design Methodology of LUT-GEMM
        • 4. Experimental results
        • 5. Accelerating Quantized OPT-175B
        • 6. Conclusion
        • Appendix A LLM Inference Latency Breakdown
        • Appendix B Detailed Implementation
      • 2208.07339_LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
        • 相关参考
        • Abstract
        • 1. Introduction
        • 2. Background
        • 3. Int8 Matrix Multiplication at Scale
        • 4. Emergent Large Magnitude Features in Transformers at Scale
        • 5. Related Work
        • 6. Discussion and Limitations
        • 7. Broader Impacts
        • 其他
      • 2209.05433_FP8: FP8 Formats For Deep Learning
        • Abstract
        • 1. Introduction
        • 2. Aspects of FP8 Usage in Deep Learning
        • 3. FP8 Binary Interchange Format
        • 示例讲解
        • 4. Empirical Results
        • 5. Conclusions
      • 2210.17323_GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Background
        • 4. The GPTQ Algorithm
        • 5. Experimental Validation
        • 6. Summary and Limitations
      • 2211.10438_SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
        • Abstract
        • 1. Introduction
        • 2. Preliminaries
        • 3. Review of Quantization Difficulty
        • 4. SmoothQuant
        • 5. Experiments
        • 6. Related Work
        • 7. Conclusion
        • Appendix A. Discussion on Weight-Only Quantization
      • 2305.14314_QLoRA: Efficient Finetuning of Quantized LLMs
        • 关键词
        • Abstract
        • 1. Introduction
        • 2. Background
        • 3. QLoRA Finetuning
        • 4. QLoRA vs. Standard Finetuning
        • 5. Pushing the Chatbot State-of-the-art with QLoRA
        • 6. Qualitative Analysis
        • 7. Related Work
        • 8. Limitations and Discussion
      • 2306.00978_AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. AWQ: Activation-aware Weight Quantization
        • 4. TinyChat: Mapping AWQ onto Edge Platforms
        • 5. Experiments
        • 6. Conclusion
      • 2309.05516_AutoRound: Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methodology
        • 4. Experiments
        • 5. Conclusion
    • 图神经网络模型
      • 1812.08434_GNNs: Graph Neural Networks: A Review of Methods and Applications
        • 论文解读
        • 结论
        • Abstract
        • 1. Introduction
        • 2. General design pipeline of GNNs
        • 3. Instantiations of computational modules
        • 4. Variants considering graph type and scale(不同图类型与规模的GNN变体)
        • 5. Variants for different training settings
        • 6. A design example of GNN
        • 7. Analyses of GNNs
        • 8. Applications
        • ✅ 总结表格(图像 vs 文本):
        • 9. Open problems
        • 10. Conclusion
        • Appendix A. Datasets
    • LLM 安全
      • 2312.06674_Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
    • LLM强化学习
      • 🏀常用
        • 三大模型
        • 时序差分残差
        • Bradley-Terry模型
        • 马尔可夫决策过程
        • 动态规划
        • 贝尔曼方程
        • Q-learning
      • ❇️1502.05477_TRPO: Trust Region Policy Optimization
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Preliminaries
        • 3 Monotonic Improvement Guarantee for General Stochastic Policies
        • 4 Optimization of Parameterized Policies
        • 5 Sample-Based Estimation of the Objective and Constraint
        • 6 Practical Algorithm
        • 7 Connections with Prior Work
        • 8 Experiments
        • 9 Discussion
        • Appendix A Proof of Policy Improvement Bound
        • Appendix B Perturbation Theory Proof of Policy Improvement Bound
        • Appendix C Efficiently Solving the Trust-Region Constrained Optimization Problem
        • Appendix D Approximating Factored Policies with Neural Networks
        • Appendix E Experiment Parameters
        • Appendix F Learning Curves for the Atari Domain
      • 1602.01783_A3C: Asynchronous Methods for Deep Reinforcement Learning
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Reinforcement Learning Background
        • 4 Asynchronous RL Framework
        • 5 Experiments
        • 6 Conclusions and Discussion
        • 7 Optimization Details
        • 8 Experimental Setup
        • 9 Continuous Action Control Using the MuJoCo Physics Simulator
      • ❇️1707.06347_PPO: Proximal Policy Optimization Algorithms
        • 总结
        • From DeepSeek
        • 示例-FromDeepseek
      • ❇️2203.02155_InstructGPT: Training language models to follow instructions with human feedback
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related work
        • 3. Methods and experimental details
        • 4. Results
        • 5. Discussion
        • Appendix A Additional prompt data details
        • Appendix B Additional human data collection details
        • Appendix C Additional model details
        • Appendix D Automatic evaluation details
      • ❇️2305.18290_DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Preliminaries
        • 4 Direct Preference Optimization
        • 5 Theoretical Analysis of DPO
        • 6 Experiments
        • 7 Discussion
        • Author Contributions
        • Appendix A Mathematical Derivations
        • Appendix B DPO Implementation Details and Hyperparameters
        • Appendix C Further Details on the Experimental Set-Up
        • Appendix D Additional Empirical Results
      • 2310.12036ΨPO: A General Theoretical Paradigm to Understand Learning from Human Preferences
        • From Moonlight
      • 2402.03300_DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 1 Introduction(引言)
        • 1.1 Contributions(贡献)
        • 1.2 Summary of Evaluations and Metrics(评估与指标总结)
        • 2 Math Pre-Training
        • 2 Math Pre-Training(数学预训练)
        • 总结
        • 3 Supervised Fine-Tuning
        • 3 Supervised Fine-Tuning(监督微调)
        • 总体总结
        • 4 Reinforcement Learning
        • 5 Discussion
        • 5. 讨论
        • 6 Conclusion, Limitation, and Future Work
        • 6 结论、局限与未来工作
        • Appendix A Appendix
      • 2409.19256_❇️HybridFlow: A Flexible and Efficient RLHF Framework
        • 总结
        • LLM总结
        • From Moonlight
        • Abstract
        • 1. Introduction
        • 2. Background and Motivation
        • 3. HybridFlow Overview
        • 4. Hybrid Programming Model
        • 5. 3D-HybridEngine
        • 6. Auto Device Mapping
        • 7. Implementation
        • 8. Evaluation
        • 9. Discussions
        • 10. Related Work
        • 11. Conclusion
        • Appendix A Primitive APIs in HybridFlow
        • Appendix A HybridFlow 中的基本 API
        • Appendix B Transfer Protocols
        • 附录 B:数据传输协议(Transfer Protocols)
        • 表4:各模型类提供的关键函数
        • 算法 2:自动并行算法(Auto Parallelism Algorithm)
        • Appendix C Auto-Parallelism Algorithm
        • 附录C 自动并行算法
      • ❇️2503.14476_DAPO: An Open-Source LLM Reinforcement Learning System at Scale
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Preliminary
        • 3 DAPO
        • 4 Experiments
        • 5 Conclusion
        • Contributions
        • Acknowledgments
        • 6 Dataset Transformation
        • 7 Supplementary Case
      • 其他
        • 1703.03864_Evolution Strategies: as a Scalable Alternative to Reinforcement Learning
        • 2305.14387_AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
        • 2401.08417_CPO: Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
        • Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
        • 2403.00409_Provably Robust DPO: Aligning Language Models with Noisy Feedback
        • 2504.02495_DeepSeek-GRM: Inference-Time Scaling for Generalist Reward Modeling
        • 2504.13958_ToolRL: Reward is All Tool Learning Needs
    • 其他
      • 2305.20050_Let’s Verify Step by Step
        • 1. 研究背景
        • 2. 监督方法对比
        • 3. 核心发现
        • 总结
      • 2408.03314_Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
        • 1. Introduction
        • 3. How to Scale Test-Time Computation Optimally
        • 5. Scaling Test-Time Compute via Verifiers
        • 6. Refining the Proposal Distribution
        • 其他
      • 2412.14135_Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
        • FromGPT
        • 1. Introduction
        • 2. Background
        • 3. Policy Initialization
        • 4. Reward Design
        • 5. Search
        • 6. Learning
        • 7 Open-source o1 Project
        • 8. Future Directions
  • 机器学习
    • 近邻搜索
      • 10xx.xxxxx_PQ: Product Quantization for Nearest Neighbor Search
        • 总结
        • From Deepseek
        • From Deepseek 全文总结
        • 周边概念
      • 1603.09320_HNSW: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
        • 总结
        • From Deepseek
        • From Deepseek 全文总结
      • 2007.00808_ANCE: Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Preliminaries
        • 3 Analyses on The Convergence of Dense Retrieval Training
        • 4 Approximate Nearest Neighbor Noise Contrastive Estimation
        • 5 Experimental Methodologies
        • 6 Evaluation Results
        • 7 Related Work
        • 8 Conclusion
        • Appendix A Appendix
        • 总体总结
    • Embedding
      • 1603.09320_HNSW: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
        • 总结
        • From Deepseek
      • 2004.04906_DPR: Dense Passage Retrieval for Open-Domain Question Answering
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Background
        • 3 Dense Passage Retriever (DPR)
        • 4 Experimental Setup
        • 5 Experiments: Passage Retrieval
        • 6 Experiments: Question Answering
        • 7 Related Work
        • 8 Conclusion
        • Acknowledgments
        • Appendix A Distant Supervision
        • Appendix B Alternative Similarity Functions & Triplet Loss
        • Appendix C Qualitative Analysis
        • Appendix D Joint Training of Retriever and Reader
      • 2205.12035_RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related works
        • 3 Methodology
        • 4 Experimental Studies
        • 5 Conclusion
        • 6 Limitations
      • 2205.13147_MRL: Matryoshka Representation Learning
        • 总结
        • DeepSeek 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Matryoshka Representation Learning
        • 4 Applications
        • 5 Further Analysis and Ablations
        • 6 Discussion and Conclusions
        • Acknowledgments
        • Appendix A Code for Matryoshka ​Representation ​Learning
        • Appendix B Datasets
        • Appendix C Matryoshka Representation Learning Model Training
        • Appendix D Classification Results
        • Appendix E Image Retrieval
        • Appendix F Adaptive Retrieval
        • Appendix G Few-shot and Sample Efficiency
        • Appendix H Robustness Experiments
        • Appendix I In Practice Costs
        • Appendix J Analysis of Model Disagreement
        • Appendix K Ablation Studies
    • ML Vision
      • 1506.02640_You Only Look Once: Unified, Real-Time Object Detection
        • Abstract
      • 1612.08242_YOLO9000: Better, Faster, Stronger
        • Abstract
      • 1804.02767_YOLOv3
      • 2004.10934_YOLOv4: Optimal Speed and Accuracy of Object Detection
        • Abstract
      • 2205.00159_SVTR: Scene Text Recognition with a Single Visual Model
        • Abstract
        • 1. Introduction
        • 2. Method
        • 3. Experiments
        • 4. Conclusion
      • 2207.02696_YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
        • Abstract
      • Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
      • 2304.08485_Visual Instruction Tuning
      • 2402.13616_YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
        • Abstract
      • 2405.14458_YOLOv10: Real-Time End-to-End Object Detection
        • Abstract
      • 2411.15858_SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
        • 定义
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methods
        • 4 Experiments
        • 5. Conclusion
        • 8. More detail of real-world datasets
    • ML
      • 2108.00941_Human-in-the-loop: A Survey of Human-in-the-loop for Machine Learning
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Data Processing
        • 3 Model Training and Inference
        • 4 System construction and Application
        • 5 Discussion and Future Directions
        • 6 Conclusion
      • 2112.09332_WebGPT: Browser-assisted question-answering with human feedback
      • 2203.11147_GopherCite: Teaching language models to support answers with verified quotes
      • 2304.09848_Generative_Search: Evaluating Verifiability in Generative Search Engines
      • 2305.14251_FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
      • 2305.14627_ALCE: Enabling Large Language Models to Generate Text with Citations
        • NLI 在引用质量评估中的应用
        • 论文中用的prompt
      • 2307.02185_Citation: A Key to Building Responsible and Accountable Large Language Models
      • 2307.16883_HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
  • AI Agent
    • 通用 Agent
      • 2210.03629_ReAct
      • 2303.08268_Chat-with-the-Environment
        • 正文
      • 2303.11366_Reflexion: Language Agents with Verbal Reinforcement Learning
      • 2303.16434_TaskMatrix.AI
        • 大脑
        • 接口平台
        • API 选择器
      • 2304.03442_Generative-Agents
        • Generative Agent Architecture
      • 2307.07924_ChatDev: Communicative Agents for Software Development
      • 2308.00352_MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
      • 2308.04026_AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
      • 2308.08155_AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
      • 2308.10848_AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
        • 理念
      • 2310.06117_Step-Back: Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
      • 2312.04511_LLMCompiler: An LLM Compiler for Parallel Function Calling
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 2.1. Latency Optimization in LLMs(LLMs的延迟优化)
        • 2.2. Plan and Solve Strategy(计划与求解策略)
        • 2.3. Tool-Augmented LLMs(工具增强的LLMs)
        • 3. Methodology
        • 3.1. Function Calling Planner(功能调用规划器)
        • 3.2. Task Fetching Unit(任务获取单元)
        • 3.3. Executor(执行器)
        • 3.4. 动态重规划(Dynamic Replanning)
        • 4.LLMCompiler Details
        • 4.1. 用户提供的信息(User-Supplied Information)
        • 4.2. 流式Planner(Streamed Planner)
        • 5. Results
        • 6. Conclusions
        • 致谢(Acknowledgements)
        • A. Accuracy Analysis: ReAct vs. LLMCompiler
        • B. Failure Case Analysis of LLMCompiler
        • C. Related Work
        • D. Experimental Details
        • E. Analysis
        • 总结
        • F. Additional Discussions about Related Works
        • G. User-Supplied Examples for LLMCompiler Configuration
        • G.1 电影推荐示例提示语(Movie Recommendation Example Prompts)
        • G.2 24点游戏示例提示语(Game of 24 Example Prompts)
        • H. Pre-defined LLMCompiler Planner Prompts
        • I. ParallelQA Benchmark Generation
        • J. Details of the Game of 24 and the Tree-of-Thoughts Approach
        • K. Details of WebShop Experiments
      • 2402.18679_MetaGPT_DI: Data Interpreter: An LLM Agent For Data Science
        • INTRODUCTION
      • 2407.07061_IoA: Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
        • 2.1 OVERVIEW OF IOA
        • 2.2 ARCHITECTURE OF IOA
        • 2.3 KEY MECHANISMS
        • 2.5 Putting It All Together
      • 2408.08435_ADAS: Automated Design of Agentic Systems
        • Prompt
      • 2408.08435_ADAS: Automating Agentic Workflow Generation
        • Introduce
        • PRELIMINARY
      • 2410.17238_SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning
        • 1 Introduction
        • 2 Related Works
        • 3 Method
      • 2410.21012_FACT: Examining the Effectiveness of Iterative Context Rewriting for Multi-fact Retrieval
        • Introduce
      • 2504.01990_Advances and Challenges in Foundation Agents
      • 2506.12508_AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving
        • Abstract
        • 1.Introduction
        • 3.AgentOrchestra
        • 4.Experiments
      • 2510.08842_Maple: A Multi-agent System for Portable Deep Learning across Clusters
        • 总结
        • Abstract
        • I Introduction
        • II Background
        • III System Design
        • IV Implementation
        • V Experiments
        • VI Error Analysis
        • VII Related Work
        • VIII Conclusion
    • DeepResearch
      • 2509.13313_ReSum: Unlocking Long-Horizon Search Intelligencevia Context Summarization
        • 总结
        • From Moonlight
        • Abstract
        • 摘要(Abstract)总结
        • 1 Introduction
        • 1 引言(Introduction)
        • 2 Preliminary
        • 2. 预备知识(Preliminary)
        • 3 Methodology
        • 3 方法(Methodology)
        • 4 Experiments and Analysis
        • 5 Related Works
        • 5 相关工作总结
        • 6 Conclusion
        • 6 结论(Conclusion)
        • Appendix A Algorithm Pseudo-Code
        • Appendix B Prompt
        • Appendix C Implementation Details
        • 附录 C 实现细节(Appendix C Implementation Details)
        • Appendix D Discussion with MEM1
        • Appendix E Supplementary Materials for Experiments
        • 附录 E 实验补充材料
        • for user goal Extract number of specimens used in the study comparing jump performances of C. canis and C. felis felis as follows: …
        • 章节标题:Jump Performance Comparison of Ctenocephalides canis and Ctenocephalides felis felis
      • 2510.21618_❇️DeepAgent: A General Reasoning Agent with Scalable Toolsets
        • 总结
        • From Moonlight
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methodology
        • 4. Experimental Settings
        • 5. Experimental Results
        • 6. Conclusion
        • Appendix A Datasets
        • Appendix B Baselines
        • Appendix C Implementation Details
        • Appendix D Memory Schema
        • Appendix E Case Study
    • 视觉 Agent&AIOS
      • 2108.03353_ Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Dataset Creation
        • 4. Model Design
        • 其它
      • 2209.08199_ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Problem Setting: Tasks and Metrics
        • 4. Data Annotation
        • 5. Dataset Analysis
        • 6. Experiments and Baselines
        • 7. Conclusion
        • 8. Limitations
        • 9. Ethical Considerations
        • A. Data Annotation Details
        • B. Data Examples
      • 2212.06817_RT-1: ROBOTICS TRANSFORMER FOR REAL-WORLD CONTROL AT SCALE
        • ABSTRACT
        • 1. Introduction
        • 2. Related Work
        • 3. Preliminaries
        • 4. System Overview
        • 5. RT-1: ROBOTICS TRANSFORMER
        • 6. EXPERIMENTS
        • 7. CONCLUSIONS, LIMITATIONS AND FUTURE WORK
        • B. MODEL CARD
        • C. MODEL AND DATA
        • D. EXPERIMENTS
      • 2312.13771_AppAgent: Multimodal Agents as Smartphone Users
        • 3.1 Environment and Action Space
        • 3.2 Exploration Phase
        • 3.3 Deployment Phase
      • 2401.10935_SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
        • Abstract
        • 1. Introduction
        • 2. Related work
        • 3. Approach
        • 4. ScreenSpot: A Grounding Benchmark
        • 5. Experiments
        • 6. Conclusion
        • Limitations
        • Ethical considerations
        • A. Details of SeeClick Pre-training
        • B ScreenSpot Annotation & Evaluation
        • C. Downstream Agent Tasks
      • 2402.04615_ScreenAI: A Vision-Language Model for UI and Infographics Understanding
        • Abstract
        • 1. Introduction
        • 2. Methodology
        • 3. Automatic data generation
        • 4. Data Mixtures
        • 5. Experiments and Results
        • 6. Conclusions
        • A Definitions of Metrics
        • B. Screen Schema Examples
        • C. Prompts For LLM Generated Content
        • D. Screen Navigation Generated Examples
        • F. ScreenQA Short Answers Generation
        • G. Complex Question Answering Datasets
        • H. New Benchmarks Repositories
      • 2402.07939_UFO: A UI-Focused Agent for Windows OS Interaction
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.The Design of UFO
        • 4.Experiment
        • 5.Limitations & Lessons Learned
        • 6.Conclusion
      • 2403.16971_AIOS: LLM Agent Operating System
        • Abstract
        • 1. Introduction
        • 2. The Architecture of AIOS
        • 3. AIOS Kernel
        • 4 Evaluation
        • Appendix E Discussion
      • 2406.01014_Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
      • 2411.00820_AutoGLM: Autonomous Foundation Agents for GUIs
        • 总结
        • Abstract
        • 1 Introduction
        • 2 AutoGLM: Techniques and Insights
        • 3 Results
        • 3.1 在 Web 上的评估
        • 3.2 在 Android 上的评估
        • 4 Conclusion
      • 2411.02059_TableGPT2: A Large Multimodal Model with Tabular Data Integration
        • Abstract
      • 2501.11733_Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
        • Abstract
        • 1. Introduction
        • 2. Mobile-Agent-E
        • 3. Experiments
        • 4. Results
        • 5. Related Work
        • 6. Conclusion and Future Work
        • Appendix A Full Trajectory Comparison Example with Previous SOTA
        • Appendix B Error Recovery with Escalation to Manager
        • Appendix C Remaining Limitations
        • Appendix D All Tasks in Mobile-Eval-E Benchmark
        • Appendix E Atomic Operation Space
        • Appendix F Full list of Self-Evolved Shortcuts
        • Appendix G Full list of Self-Evolved Tips
      • 2501.12326_UI-TARS: Pioneering Automated GUI Interaction with Native Agents
        • Abstract
        • 1. Introduction
        • 2. Evolution Path of GUI Agents
        • 3. Core Capabilities of Native Agent Model
        • 4. UI-TARS
        • 5. Experiment
        • 6. Conclusion
      • 2502.14282_PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
        • Abstract
        • 1. Introduction
        • 2. PC-Agent
        • 3. Experiments
        • 4. Related Work
        • 5. Conclusion
      • 2504.14603_UFO2: The Desktop AgentOS
        • Abstract
        • 1.Introduction
        • 2.Background
        • 3.System Design of UFO2
        • 4.Picture-in-Picture Interface
        • 5.Implementation and Specialized Engineering Design
        • 6.Evaluation
        • 7.Discussion & Future Work
        • 8.Related Work
        • 9.Conclusion
      • 2508.04037_SEA: Self-Evolution Agent with Step-wise Reward for Computer Use
        • 总结
        • Abstract
        • I Introduction
        • I 引言
        • II Related Works
        • II Related Works
        • 总结
        • III Method
        • 总结
        • IV Experiments
        • IV 实验
        • V Conclusion
        • V 结论
    • 音频 Agent
      • 2509.06221_Beamforming-LLM: What, Where and When Did I Miss?
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methods
        • 4. Results
        • 5. Discussion and Conclusion
    • Tools
      • 2205.00445_MRKL
      • 2302.04761_Toolformer: Language Models Can Teach Themselves to Use Tools
      • 2303.17580_HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
      • 2307.16789_ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
        • 总结
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 Dataset Construction
        • 3 Experiments
        • 4 Related Work
        • 5 Conclusion
        • Appendix
        • Appendix A Implementation Details
    • AGI
      • 1905.10985_AI-GA: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence
      • 2408.06292_The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
  • RAG
    • 2005.11401_Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
    • 2312.10997_Retrieval-Augmented Generation for Large Language Models: A Survey
      • II. Overview of RAG
        • II-A Naive RAG
        • II-B Advanced RAG
        • II-C Modular RAG
        • II-D RAG vs Fine-tuning
      • III. Retrieval
        • III-A Retrieval Source
        • III-B Indexing Optimization
        • III-C Query Optimization
        • III-D Embedding
        • III-E Adapter
      • IV. Generation
        • IV-A Context Curation
        • IV-B LLM Fine-tuning
      • V. Augmentation process in RAG
        • V-A Iterative Retrieval
        • V-B Recursive Retrieval
        • V-C Adaptive Retrieval
      • VI. Task and Evaluation
        • VI-A Downstream Task
        • VI-B Evaluation Target
        • VI-C Evaluation Aspects
        • VI-D Evaluation Benchmarks and Tools
      • VII. Discussion and Future Prospects
        • VII-A RAG vs Long Context
        • VII-B RAG Robustness
        • VII-C Hybrid Approaches
        • VII-D Scaling laws of RAG
        • VII-E Production-Ready RAG
        • VII-F Multi-modal RAG
    • 2401.15884_CRAG: Corrective Retrieval Augmented Generation
    • 2403.14403_Adaptive-RAG
    • 2404.12457_RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
      • 总结
      • Abstract
      • 1. Introduction
        • 1. 引言概述
        • 2. 现有工作与局限
        • 3. RAGCache系统
        • 4. 实验结果
        • 5. 主要贡献
      • 2. Background
      • 3. RAG System Characterization
        • 一、性能瓶颈分析
        • 二、优化机会分析 —— 缓存中间状态
        • 总结
      • 4. RAGCache Overview
        • 主要内容总结如下:
        • 总结
      • 5. RAGCache Design
        • 5.1. Cache Structure and Replacement Policy
        • 5.2. Cache-aware Reordering
        • 5.3 动态推测流水线(Dynamic Speculative Pipelining)
        • 总结
      • 6. Implementation
        • 系统实现
        • 向量搜索优化(Pipelined Vector Search)
        • 容错机制(Fault Tolerance)
      • 7. Evaluation
        • 7.1 总体性能
        • 7.2 通用设置下的案例研究
        • 7.3 消融研究
        • 7.4 调度时间
        • 总结
      • 8. Discussion
      • 9. Related Work
      • 10. Conclusion
    • 2404.16130_GraphRAG: From Local to Global: A GraphRAG Approach to Query-Focused Summarization
      • 总结
      • LLM 总结
      • Abstract
      • 1 Introduction
      • 2 Background
        • 2.1 RAG方法与系统
        • 2.2 知识图谱在LLM与RAG中的应用
        • 2.3 自适应基准测试
        • 2.4 RAG评估标准
      • 3 Methods
        • 3.1 GraphRAG 工作流程
        • 3.2 全局理解问题生成
        • 3.3 全局理解评估标准
        • 总结
      • 4 Analysis
        • 4.1 实验1
        • 4.2 实验2
        • 总结
      • 5 Results
        • 5.1 实验一:不同方法在摘要任务中的表现比较
        • 5.2 实验二:基于声明的指标评估
        • 总结
      • 6 Discussion
        • 6.1 评估方法的局限性
        • 6.2 未来工作
        • 更广泛的影响
      • 7 Conclusion
      • Appendix A Entity and Relationship Extraction Approach
        • 1. 实体与关系抽取方法
        • 2. 自我反思(Self-Reflection)技术
        • 3. 分块大小与抽取效果的关系
        • 4. 实验结果(图3)
        • 总结
      • Appendix B Example Community Detection
      • Appendix C Context Window Selection
      • Appendix D Example Answer Comparison
      • Appendix E System Prompts
        • E.1 实体实例生成(Element Instance Generation)
        • E.2 社区摘要生成(Community Summary Generation)
        • E.3 社区问题回答生成(Community Answer Generation)
        • E.4 全局问题回答生成(Global Answer Generation)
      • Appendix F Evaluation Prompts
        • F.1 Relative Assessment Prompt
        • F.2 Relative Assessment Metrics
      • Appendix G Statistical Analysis
        • 统计方法:
        • 主要结果总结:
        • 总体趋势:
        • 重要结论:
    • 2405.16506_GRAG: Graph Retrieval-Augmented Generation
      • 总结
      • LLM 总结
      • Abstract
      • 1 Introduction
      • 2 Related Work
        • 2.1 Prompt Tuning
        • 2.2 LLMs在图相关任务中的应用
        • 2.3 图上的检索方法
      • 3 Problem Formalization
      • 4 Methodology
        • 概述
        • 4.1 文本子图检索
        • 文本子图索引(Indexing)
        • 文本子图排序(Ranking)
        • 文本子图软剪枝(Soft Pruning)
        • 总结
        • 4.2 Textual Graph Augmented Generation
        • 1. 文本视图(Text View of Textual Graphs)
        • 2. 图视图(Graph View of Textual Graphs)
        • 3. 生成阶段(Generation Phase)
        • 总结
      • 5 Experiments
        • 总结:第五章 实验部分
      • 6 Conclusion
      • 7 Limitations
      • Acknowledgments
      • Appendix A Appendix
        • 附录A 总结
        • 总结
    • 2406.13213_Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata
    • 2410.05779_LightRAG: Simple and Fast Retrieval-Augmented Generation
      • 总结
      • Abstract
      • 1 Introduction
      • 2 Retrieval-Augmented Generation
      • 3 The LightRAG Architecture
        • 一、LightRAG架构概述
        • 二、基于图的文本索引(Graph-based Text Indexing)
        • 三、双层检索范式(Dual-level Retrieval Paradigm)
        • 四、检索增强的答案生成(Retrieval-Augmented Answer Generation)
        • 五、复杂度分析
        • 总结
      • 4 Evaluation
        • 1. 实验设置(4.1 Experimental Settings)
        • 2. LightRAG 与现有 RAG 方法的对比(4.2 RQ1)
        • 3. 消融实验(4.3 RQ2)
        • 总结
        • 4.4 Case Study (RQ3)
        • 4.4 案例研究(RQ3)总结:
        • 4.5 模型成本与适应性分析(RQ4)总结:
        • 总体结论:
      • 5 Related Work
        • 第5章 相关工作(总结)
      • 6 Conclusion
      • 7 Appendix
    • 2410.10450_KBLaM: Knowledge Base augmented Language Model
      • Abstract
      • 1. Introduction
      • 2. Related work
      • 3. Background
        • Self-attention layer
      • 4. Augmenting LLM with the KB
        • Knowledge tokens
        • Rectangular Attention: Injecting knowledge token into prompt tokens
        • KB length generalization through attention score scaling
      • 5. KB instruction tuning
      • 6. EXPERIMENTS
        • 6.1 EXPERIMENT SETTING
        • 6.2 EXPERIMENT RESULTS
        • 总结亮点
      • 7. CONCLUSION
      • 8. LIMITATIONS AND FUTURE WORK
      • Appendix A Extended related work
      • Appendix B Ablation study
      • Appendix C Sample KB
      • SAMPLE Q&A
      • PROMPT
        • PROMPT FOR SYNTHETIC KB GENERATION
        • Prompt for open-ended Q&A generation
        • PROMPT FOR GPT EVALUATION OF OPEN-ENDED Q&A
        • PROMPT FOR LLAMA EVALUATION
        • QUESTION TEMPLATE
      • SAMPLE OUTPUT
        • SYNTHETIC KB
        • ENRON
    • 2504.03137_LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph
      • Abstract
      • Introduction
      • Related Work
        • LLM Prompt Engineering
        • KG-based LLM Reasoning
      • Preliminaries
        • 1. Knowledge Graph (KG)
        • 2. Anchor Entities
        • 3. Relation Link
        • 4. Reasoning Path
      • Methodology
        • Stage1: Reasoning Graph Retrieval
        • Stage2: Knowledge Embedding
        • Stage3: Knowledge Prompts Mixed Reasoning
      • Experiments
      • Conclusion
    • GraphRAG 官方文档
      • Indexing
        • > Indexing Architecture
        • > Indexing Dataflow
        • > Prompt Tuning
      • Query
  • 论文池
    • 2501.12948❇️_DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 1. Introduction(引言)
        • 2. Approach(方法)
        • 3. Experiment(实验)
        • 4. Discussion(讨论)
        • 5. Conclusion, Limitations, and Future Work(结论、局限与未来工作)
        • 6. A Contributions and Acknowledgments(贡献与致谢)
        • 总结重点
      • 1 Introduction
      • 1.1 Contributions(贡献)
        • 后训练:基于基础模型的大规模强化学习
        • 蒸馏:小模型也能很强大
      • 1.2 Summary of Evaluation Results(评估结果概要)
        • 推理任务
        • 知识任务
        • 其他任务
        • 小结
      • 2 Approach
      • 2.1 Overview(概述)
      • 2.2 DeepSeek-R1-Zero: 强化学习应用于基础模型
        • 2.2.1 强化学习算法
        • 2.2.2 奖励建模
        • 2.2.3 训练模板
        • 2.2.4 DeepSeek-R1-Zero的性能、自进化过程与“顿悟时刻”
      • 2.3 DeepSeek-R1: 强化学习结合冷启动
        • 2.3.1 冷启动(Cold Start)
        • 2.3.2 推理导向的强化学习
        • 2.3.3 拒收采样与监督微调(SFT)
        • 2.3.4 面向所有场景的强化学习
      • 2.4 蒸馏:将推理能力赋予小型模型
        • 总结
      • 3 Experiment
        • 3 Experiment 实验部分总结
        • 3.1 DeepSeek-R1 评估
        • 3.2 蒸馏模型评估
        • 总体总结
      • 4 Discussion
      • 4 讨论
        • 4.1 知识蒸馏 与 强化学习
        • 4.2 不成功的尝试
        • 总结
      • 5 Conclusion, Limitations, and Future Work
        • 总结
        • 局限性与未来工作
        • 总结重点
      • Appendix
        • 1. 附录的作用
        • 2. 常见附录内容
        • 3. 附录的编写规范
        • 4. 注意事项
      • Appendix A Contributions and Acknowledgments
      • 附录 A 贡献与致谢
        • 贡献者
        • 特别说明
        • 生成信息
    • 2504.03182_Graphiti: Bridging Graph and Relational Database Queries
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 关键点总结:
        • 附加信息:
      • 1. Introduction
        • 背景与问题
        • 研究目标
        • 核心贡献
        • 方法流程(图1)
        • 实验与实现
        • 总结贡献
      • 2. Motivating Example
        • 2.1 图数据库与关系数据库的对应关系
        • 2.2 SQL 与 Cypher 查询的语义差异
        • 2.3 数据库转换器(Database Transformer)
        • 2.4 诱导关系模式(Induced Relational Schema)
        • 2.5 语法导向的转译(Syntax-Directed Transpilation)
        • 2.6 查询等价性验证
        • 总结
      • 3. Preliminaries
        • 3. Preliminaries(预备知识)
        • 总结
      • 4. Problem Statement
        • 4. 问题陈述(Problem Statement)总结
        • 4.1 数据库转换语言(Language for Database Transformers)
        • 4.2 等价性检查问题(Equivalence Checking Problem)
        • 总结
      • 5. Equivalence Checking Algorithm
        • 概述
        • 5.1. 诱导关系模式和标准转换器推断
        • 5.2. 语法导向的转译
        • 5.3. 归约到SQL等价性检查
      • 6. Evaluation
        • Benchmarks(基准测试集)
        • 6.1. 使用 BMC 后端的 Graphiti 评估(VeriEQL)
        • 6.2. 使用演绎验证器的 Graphiti 评估(Mediator)
        • 6.3. 转译质量评估
        • 总结
      • 7. Related Work
        • 1. SQL 的自动推理(Automated reasoning for SQL)
        • 2. 数据库实例之间的迁移(Migration between database instances)
        • 3. 数据表示重构(Data representation refactoring)
        • 4. 图数据库查询语言(Graph database query languages)
        • 5. 数据库查询测试(Testing database queries)
        • 6. Cypher 查询转译工具(Transpiling Cypher queries)
        • 总结
      • 8. Limitation
        • 主要局限:
        • 重点说明:
        • 实用性验证:
        • 未来方向:
        • 总结:
      • 9. Conclusion and Future Work
        • 9. 结论与未来工作
      • Appendix A Semantics of Cypher Queries
        • 查询语义
        • 子句语义
        • 路径模式语义
        • 表达式语义
        • 谓词语义
      • Appendix B Transpilation of Cypher Predicates and Expressions
        • 1. 表达式的转译规则(Figure 21)
        • 2. 谓词的转译规则(Figure 22)
        • 示例 B.1
        • 总结
      • Appendix C An Equivalent Cypher Query of Motivating Example
        • 原始 Cypher 查询的问题
        • 修正后的 Cypher 查询
        • 查询结构(重点内容)
        • 总结
      • Appendix D Qualitative Analysis of Manually-Written Buggy Queries
        • 1. 使用嵌套 MATCH 而非存在性模式(Existential Pattern)
        • 2. 错误使用路径模式(Path Pattern)进行 OPTIONAL MATCH
        • 3. 同一标签的节点或边使用不当
        • 总结
      • Appendix E Comparing Graphiti’s Transpiler with OpenCypherTranspiler
        • 原文结构总结
        • 1. 总体比较
        • 2. 转译结果表格分析(Table 5)
        • 3. OpenCypherTranspiler 的典型错误示例
        • 总结
      • Appendix F Proofs
        • 定理 F.1(翻译的正确性)
        • 引理 F.2
        • 引理 F.3
        • 引理 F.4
        • 引理 F.5
        • 定理 F.6(翻译的完备性)
        • 引理 F.7
        • 引理 F.8
        • 引理 F.9
        • 引理 F.10
        • 引理 F.11
        • 引理 F.12
        • 定理 F.13(正确性)
        • 定理 F.14(完备性)
    • 2507.19849_Agentic Reinforced Policy Optimization
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 1 Introduction
        • 背景与动机
        • 现有方法的局限性
        • 问题分析与观察
        • 提出方法:ARPO
        • 实验与结果
        • 主要贡献总结
      • 2 Preliminary
        • 2.1 基于智能体的强化学习(Agentic Reinforcement Learning)
        • 2.2 推理过程中的Token熵分析(Analyzing Token Entropy in Agentic Reasoning)
        • 2.3 智能体工具设计(Agentic Tool Design)
      • 3 Agentic Reinforce Policy Optimization
        • 3. Agentic Reinforce Policy Optimization (ARPO)
        • 总结
      • 4 Experiment
        • 4.1 数据集
        • 4.2 基线方法
        • 4.3 训练指南
        • 4.4 评估指标
        • 4.5 主要结果
        • 4.6 定量分析
        • 4.7 ARPO的扩展性分析
      • 5 Related Work
        • 5.1 可验证奖励的强化学习(Reinforcement Learning with Verifiable Reward)
        • 5.2 代理式强化学习(Agentic Reinforcement Learning)
        • 总结
      • 6 Conclusion
        • 核心内容讲解:
        • 小结:
      • Appendix A Datasets
        • A.1 Mathematical Reasoning Benchmarks
        • A.2 Knowledge-Intensive Reasoning Benchmarks
        • A.3 Deep Search Benchmarks
        • 总结
      • Appendix B Baselines
        • 附录 B 基线模型
        • 总结
      • Appendix C Implementation Details
        • 附录 C 实现细节总结
        • 总结
      • Appendix D Theoretical Analysis and Proofs
        • D.1 软优势估计的理论分析
        • D.2 GPG 定理的理论证明
      • Appendix E The Algorithm Workflow of ARPO
        • 输入参数
        • 算法流程
        • 输出
        • 重点内容总结
        • 不重要内容精简
      • Appendix F Case Study
        • 表 4:HLE 数据集中的一个例子
        • 表 5:GAIA 数据集中的一个例子
        • 表 6:GAIA 数据集中的另一个例子
        • 表 7:HLE 数据集中的另一个例子
        • 表 8:AIME24 数据集中的一个例子
        • 表 9:HotpotQA 数据集中的一个例子
    • 2511.20857_Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 1 Introduction
        • 核心问题:记忆系统的静态性
        • 现有基准的局限性
        • 解决方案:Evo-Memory
        • 覆盖任务类型与记忆模块
        • 新方法:ExpRAG 与 ReMem
        • 贡献总结
      • 2 Related Work
        • 2.1 测试时学习
        • 2.2 自演化记忆
        • 图3:ReMem 智能体框架概述
      • 3 Evo-Memory: Evaluating Self-Evolving Memory in LLM Agents
        • 概述
        • 3.1 问题设定(Problem Formulation)
        • 3.2 ExpRAG: Experience Retrieval and Aggregation
        • 3.3 ReMem: Synergizing Reasoning, Acting, and Memory
        • 总结
      • 4 Experiments
        • 4.1 实验设置
        • 4.2 实验
        • 4.3 结果分析(RQ1)
        • 4.4 记忆改进分析(RQ2)
        • 4.5 任务序列:简单 vs. 困难(RQ3)
        • 4.6 反馈分析(RQ4)
        • 4.7 时间步性能(RQ5)
      • 5 Conclusion
      • 附录(Appendix)
        • 1. 2.1 Test-time Learning(测试时学习)
        • 2. 2.2 Self-evolving Memory(自演化记忆)
        • 3.1 Problem Formulation(问题定义)
        • 3.2 ExpRAG: Experience Retrieval and Aggregation(经验检索与聚合)
        • 3.3 ReMem: Synergizing Reasoning, Acting, and Memory(推理、行为与记忆的协同)
        • 4.1 Experimental Setup(实验设置)
        • 4.2 Experiments(实验设计)
        • 4.3 Analysis of Results (RQ1)(结果分析 - 研究问题1)
        • 4.4 Analysis of Memory Improvement (RQ2)(记忆改进分析 - 研究问题2)
        • 4.5 Task Sequence: Easy vs. Hard (RQ3)(任务顺序影响 - 研究问题3)
        • 4.6 Analysis of Feedback (RQ4)(反馈机制分析 - 研究问题4)
        • 4.7 Performance w.r.t Time Steps (RQ5)(时间步长性能分析 - 研究问题5)
      • 6. A Experimental Details(实验细节)
        • A.1 Datasets(数据集详情)
        • A.2 Configuration(配置参数)
        • A.3 Evaluation(评估细节)
        • A.4 Methods(方法实现细节)
      • 7. B Experiments(补充实验)
        • B.1 Additional Experiments(附加实验)
        • B.2 Additional Analysis of Memory Pruning(记忆剪枝分析)
        • B.3 Additional Comparative Curves on Single-turn Tasks(单轮任务对比曲线)
      • 8. C Prompts(提示模板)
      • 9. D Limitations(局限性)
      • 10. E Use of Large Language Models(大语言模型使用说明)
      • Appendix A Experimental Details
        • A.1 数据集(Datasets)
        • A.2 配置(Configuration)
        • A.3 评估(Evaluation)
        • A.4 方法(Methods)
        • 总结
      • Appendix B Experiments
        • B.1 附加实验
        • B.2 记忆剪枝的附加分析
        • B.3 单轮任务的附加对比曲线
        • 总结
        • 总结:
      • Appendix D Limitations
      • Appendix E Use of Large Language Models
        • 1. 使用目的
        • 2. 使用范围
        • 3. 结论
    • 2512.10696_Framework for Experience-Driven Agent Evolution
      • 总结
      • 图解
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 核心创新点(ReMe 的三大机制)
        • 实验与结果
        • 总结
      • 1 Introduction
        • 1.1 背景与动机
        • 1.2 理想程序性记忆系统的三大核心标准
        • 1.3 当前方法的局限性
        • 1.4 提出的方法:ReMe 框架
        • 1.5 实验结果与贡献
        • 1.6 主要贡献
      • 2 Related Works
        • 2.1 增强记忆的LLM智能体(Memory-enhanced LLM Agents)
        • 2.2 经验学习策略(Experience Learning Strategies)
        • 图2:ReMe框架概述(图示说明)
      • 3 Methodology
        • 3.1 ReMe 概览
        • 3.2 经验获取
        • 3.3 经验复用
        • 3.4 经验精炼
        • 表1:ReMe 与基线模型在 BFCL-V3 和 AppWorld 上的性能对比
        • 总结
      • 4 Experiments
        • 4.1 实验设置
        • 4.2 主要结果
        • 4.3 消融研究
        • 4.4 更多分析
      • 总结
      • 5 Conclusion
      • Limitations
        • 1. 固定的经验检索策略
        • 2. 经验验证机制的局限性
        • 3. 模型规模与总结能力的关系
      • Appendix A Dataset Details
        • BFCL-V3
        • AppWorld
      • Appendix B Baseline Details
        • LangMem
        • A-Mem
        • 总结
      • Appendix C Implementation Details
        • C.1 经验获取(Experience Acquisition)
        • C.2 经验检索(Experience Retrieval)
      • Appendix D Experience Examples
        • 1. ReMe 方法的经验提取示例
        • 2. 经验粒度的影响分析
        • 3. 不同粒度经验的结构与内容对比
        • 总结
      • Appendix E Additional Experimental Results
        • E.1 Retrieval Key Analysis(检索键分析)
        • E.2 Prompt Examples for Experience Extraction(经验提取的提示示例)
        • 总结
    • 2601.03192_MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory
      • 总结
        • 关键图片
        • 1. 文献背景、研究目的与问题概述
        • 2. 研究方法、关键数据与主要发现
        • 3. 新颖概念通俗解释
        • 4. 优缺点评价与后续研究方向
      • “阶段B”是如何“桥接”并最终筛选出最优记忆
        • 核心公式回顾
        • “桥接”过程详解:从阶段A到阶段B
        • 结论:阶段B是如何“桥接”的?
      • Q值的计算与更新机制
        • 一、Q值的本质
        • 二、Q值的更新公式(核心)
        • 三、具体计算示例
        • 四、为什么这样设计?
        • 五、完整的工作流程
        • 六、关键细节补充
        • 总结
    • 2601.11969_MemoryRewardBench: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 摘要(Abstract)
      • 1 Introduction
        • 1.1 背景与问题提出
        • 1.2 研究问题与贡献
        • 1.3 与现有基准的对比
        • 1.4 MemoryRewardBench 设计
        • 1.5 实验设置与主要发现
        • 1.6 记忆管理模式图示
      • 2 Related Work
        • 2.1 内存管理评估(Memory Management Evaluation)
        • 2.2 奖励模型(Reward Model)
        • 总结
      • 3 Introduce MemoryRewardBench
        • 3.1 内存管理模式
        • 3.2 任务概述
        • 3.3 基准构建
        • 总结
      • 4 Evaluation
        • 4.1 设置(Settings)
        • 4.2 总体观察(Overall Observation)
      • 5 Ablation Study
        • RM Selection and Notation(RM选择与符号说明)
        • 5.1 Effect of Memory Management Patterns(记忆管理模式的影响)
        • 5.2 Effect of RM Evaluation Criteria(RM评估标准的影响)
        • 5.3 Effect of Memory Management Trajectory Length(记忆管理轨迹长度的影响)
        • 5.4 Effect of Memory Augmentation Strategy(记忆增强策略的影响)
      • 总结
      • 6 Conclusion
        • 6 结论(Conclusion)
      • Appendix A Comparison between LongRewardBench and Existing Memory Benchmarks
        • 1. 总体对比(Table 1)
        • 2. 构建细节(Table 4)
        • 3. 案例分析
        • 总结
      • Appendix B Benchmark Construction
        • B.1 长上下文推理
        • B.2 多轮对话理解
        • B.3 长文本生成
        • 总结重点
      • Appendix C Evaluation Settings
        • 附录 C 评估设置
        • 总结:
      • Appendix D Details of Ablation Study
        • 附录 D 消融研究细节总结
        • 总结
    • 2603.10165_OpenClaw-RL: Train Any Agent Simply by Talking
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 核心
        • 摘要
      • Abstract
      • 1. Introduce
        • 核心观点概述:
        • 1.1 Waste 1 — Evaluative signals(评估信号)
        • 1.2 Waste 2 — Directive signals(指导信号)
        • 1.3 OpenClaw-RL 框架介绍
        • 1.4 Contributions(贡献)
      • 2 Problem Setting
        • 核心设定:
        • 1. 状态(State)
        • 2. 动作(Action)
        • 3. 状态转移(Transition)
        • 4. 奖励(Reward)
        • 与传统 RLVR 的对比
        • 总结
      • 3 OpenClaw-RL Infrastructure: Unified System for Personal and General Agents
        • 3.1 四个解耦组件的异步流水线
        • 3.2 面向个性化智能体的会话感知环境服务器
        • 3.3 可扩展性:从单用户个性化到大规模部署
        • 3.4 支持多种现实世界场景
        • 3.5 非阻塞记录与可观测性
        • 总结图示(图3)
      • 4. Learning from Next-State Signals: Unified RL Across Interaction Types
        • 4.1 Binary RL for Personal Agent
        • 4.2 Hindsight-Guided On-Policy Distillation (OPD) for Personal Agent
        • 4.3 Combine Binary and OPD Methods
        • 4.4 Step-wise Reward for General Agentic RL
        • 总结
      • 5 Experiments
        • 5.1 个人代理设置(Personal Agent Setup)
        • 5.2 通用代理设置(General Agent Setup)
        • 5.3 个人代理轨道:从对话信号中学习(Learning from Conversational Signals)
        • 5.4 通用代理:统一的终端、GUI、SWE 和工具调用 RL(Unified RL Across Terminal, GUI, SWE, and Tool-Call)
        • 总结
      • 6 Related Work
        • RL for LLMs(重点内容)
        • Agentic RL and tool-use(重点内容)
        • Process reward models(重点内容)
        • On-policy distillation and hindsight methods(重点内容)
        • RL training infrastructure(重点内容)
        • 总结
      • 7 Conclusion
        • 核心观点:
        • 交互类型:
        • 学习方法:
        • 最终成果:
        • 总结:
      • Appendix A Algorithm Pseudocode
        • Algorithm 1 Binary RL Pipeline(每轮主流程,双轨制)
        • Algorithm 2 OPD Pipeline(个性化代理轨道)
      • Appendix B More Optimization Examples
        • B.1 学生设置
        • B.2 教师设置
        • 总结
      • Appendix C Prompt Templates
        • C.1 个性化代理:PRM 判定提示(Binary RL, Personal)
        • C.2 个性化代理:OPD 回顾提示(OPD Hindsight Hint Prompt)
        • C.3 个性化代理:模拟器评估提示(Personalization Score Prompt)
        • C.4 通用代理:PRM 判定提示(General Agent: PRM Judge Prompt)
        • 总结
      • Appendix D Hyperparameters
        • 表5:不同设置下的超参数表
        • 重点内容解析:
        • 总结:
  • 论文池-sum
  • 论文待回收池
    • 2009.01325_Learning to summarize from human feedback
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 研究背景
        • 研究方法
        • 研究成果
        • 分析与验证
        • 研究意义
      • 1 Introduction
        • 背景与问题
        • 研究目标与任务选择
        • 方法概述
        • 主要贡献
        • 长期意义
      • 2 Related work
        • 与我们工作最直接相关的工作
        • 其他使用人类反馈的研究
        • 强化学习与自动评价指标
        • 模型结构与预训练方法的改进
      • 3 Method and experiment details
        • 3.1 高层方法论(High-level methodology)
        • 3.2 数据集与任务
        • 3.3 收集人类反馈(Collecting human feedback)
        • 3.4 模型(Models)
      • 4 Results
        • 4.1 基于人类反馈的 Reddit 帖子摘要
        • 4.2 迁移到新闻文章摘要
        • 4.3 理解奖励模型
        • 4.4 摘要自动评估指标分析
      • 5 Discussion
        • 1. Limitations(局限性)
        • 2. Future directions(未来方向)
        • 3. Broader impacts(更广泛影响)
        • 4. Acknowledgements(致谢)
      • Appendix A TL;DR dataset details
        • 数据集构成
        • 数据预处理步骤
        • 数据集局限性说明
      • Appendix B Further model training details
        • B.1 超参数设置
        • B.2 输入格式
        • 总结重点
      • Appendix C Human data collection details
        • C.1 Process for ensuring high-quality human data
        • C.2 Assessing human feedback quality
        • C.3 Labeler demographics
        • C.4 Labeler website
        • C.5 Instructions for labelers
        • C.6 Composition of the labeled dataset
        • C.7 Example comparison tasks
      • Appendix D Choice of baselines
      • Appendix E CNN/DM lead-3 vs reference summaries
        • 主要发现
        • 控制长度后的分析
        • 对摘要方法的质疑
        • 标注者行为分析
        • 参考摘要表现差的原因
        • 结论
      • Appendix F Controlling for summary length
        • 1. 控制摘要长度的背景与方法
        • 2. 实验结果与分析
        • 3. CNN/DM数据集上的长度控制实验
      • Appendix G Additional results
        • G.1 价值函数消融实验
        • G.2 沿质量维度评估策略
        • G.3 最优-N 优化研究
        • G.4 ROUGE分数
        • G.5 二元组重叠统计
        • G.6 奖励模型验证集
        • G.7 不同评估指标的一致性
        • 总结
      • Appendix H Samples
        • H.1 随机样本
        • H.2 过度优化样本
    • 2305.16300_Random-Access Infinite Context Length for Transformers
      • Abstract
      • 1 Introduction
      • 2 Related Work
      • 3 Methodology
        • 总体思路
        • 方法详解
        • 位置编码处理
        • 与其他方法的对比
        • 总结
        • 3.3 Memory & Computation
      • 4 Experiments
        • 4.1 语言建模实验
        • 4.2 微调预训练模型
        • 总结
      • 5 Future Work
      • 6 Conclusion
      • Acknowledgment
      • Appendix A Grouped Softmax Example
      • Appendix B Dataset Description
      • Appendix C Number of Unique Retrieved Blocks
      • Appendix D Context Miss Token
      • Appendix E Positional Augmentation
      • Appendix F Additional Extensions and Details
        • 1. 掩码语言建模(Masked Language Modeling)
        • 2. 与 Flash Attention 的结合
        • 3. 检索块数量与块大小的权衡
        • 总结
      • Appendix G Offloading KV Cache to CPU
    • 2405.17935_Tool Learning with Large Language Models: A Survey
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 关键词总结:
        • 重点内容强调:
        • 不重要内容精简:
      • 1 Introduction
        • 核心观点:
        • 1.1 历史背景与工具的重要性
        • 1.2 当前技术趋势:LLMs 的发展与局限
        • 1.3 工具学习的兴起
        • 1.4 研究现状与趋势
        • 1.5 本文结构与贡献
        • 1.6 与其他综述的比较
        • 1.7 本文结构图(Figure 2)
        • 1.8 GitHub 资源
        • 总结:
      • 2 Background
      • 2 背景(Background)
        • 什么是工具(What is a Tool?)
        • 什么是工具学习(What is Tool Learning?)
        • 总结
      • 3 Why Tool Learning?
      • 3 为什么需要工具学习?
        • 3.1 知识获取
        • 3.2 专业能力增强
        • 3.3 自动化与效率提升
        • 3.4 交互增强
        • 3.5 增强可解释性与用户信任
        • 3.6 提升鲁棒性与适应性
        • 总结图示(图3)
      • 4 How Tool Learning?
      • 4 工具学习的机制
        • 4.1 工具学习的整体范式
        • 4.2 任务规划(Task Planning)
        • 4.3 工具选择(Tool Selection)
        • 4.4 工具调用(Tool Calling)
        • 4.5 响应生成(Response Generation)
        • 表格:工具学习基准数据集汇总
        • 总结
      • 5 Benchmarks, Toolkits, and Evaluation
      • 5. Benchmarks(基准测试)
        • 5.1.1 通用基准(General Benchmarks)
        • 5.1.2 特定任务基准(Other Benchmarks)
      • 5.2 Toolkits(工具包)
      • 5.3 Evaluation(评估方法)
        • 5.3.1 任务规划(Task Planning)
        • 5.3.2 工具选择(Tool Selection)
        • 5.3.3 工具调用(Tool Calling)
        • 5.3.4 响应生成(Response Generation)
      • 总结
      • 6 Challenges and Future Directions
      • 6 挑战与未来方向(Challenges and Future Directions)
        • 6.1 工具学习中的高延迟问题(High Latency in Tool Learning)
        • 6.2 严谨而全面的评估体系(Rigorous and Comprehensive Evaluation)
        • 6.3 全面且易获取的工具集(Comprehensive and Accessible Tools)
        • 6.4 安全与鲁棒的工具学习(Safe and Robust Tool Learning)
        • 6.5 统一的工具学习框架(Unified Tool Learning Framework)
        • 6.6 真实世界的工具学习基准(Real-World Benchmark for Tool Learning)
        • 6.7 多模态工具学习(Tool Learning with Multi-Modal)
      • 总结
      • 7 Conclusion
      • 7 结论(总结)
        • 主要内容结构如下:
        • 1. 引言与基础概念
        • 2. 工具学习的重要性
        • 3. 工具学习的四个阶段
        • 4. 评估方法与基准测试
        • 5. 挑战与未来方向
        • 最后
        • 其他信息:
    • 2409.20163_MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 摘要总结
      • 1 Introduction
      • 1 Introduction
        • 本文的主要贡献如下:
        • 后续章节安排如下:
      • 2 Related Works
      • 2 相关工作
        • LLM-based agents 的应用与记忆机制
        • LLM-based agents 记忆能力的评估
        • 知识库问答(KBQA)与记忆评估的关联
        • 本文工作的贡献
      • 3 Methods
      • 3.1 Overview of MemSim
      • 3.2 Bayesian Relation Network
      • 3.3 Causal Generation Mechanism
      • 3.4 MemDaily: A Dataset in the Daily-life Scenario
      • 总结
      • 4 Evaluations
      • 4 评估(Evaluations)
      • 4.1 用户画像评估(Evaluation on User Profiles)
        • 评估指标
        • 基线方法
        • 评估结果
      • 4.2 用户消息评估(Evaluation on User Messages)
        • 评估指标
        • 基线方法
        • 评估结果
      • 4.3 问题与答案评估(Evaluation on Questions and Answers)
        • 评估结果
        • 总结
      • 5 Benchmark
      • 5 Benchmark 总结
        • 5.1 Experimental Settings(实验设置)
        • 5.2 Memory Mechanisms 的有效性(Effectiveness of Memory Mechanisms)
        • 5.3 Memory Mechanisms 的效率(Efficiency of Memory Mechanisms)
        • 总结
      • 6 Limitations and Conclusions
      • 6 局限与结论
      • Appendix A Proof in Bayesian Relation Network
      • 附录 A 贝叶斯关系网络的证明
        • A.1 定理 1(因子化)的证明
        • A.2 定理 2(祖先采样)的证明
      • 总体总结
      • Appendix B Extensive Evaluation on User Messages by GPT-4o
      • 附录 B GPT-4o 对用户消息的广泛评估
        • 表 10:GPT-4o 对用户消息评估的结果
      • Appendix C Extensive Benchmark on More Composite Datasets
      • 附录 C:在更多复合数据集上的广泛基准测试
        • C.1 MemDaily-10 的结果
        • C.2 MemDaily-50 的结果
        • C.3 MemDaily-200 的结果
        • 总结
      • Appendix D Case Studies
      • D.1 Case Study on Generated User Profiles
      • D.2 Case Study on User Messages
      • D.3 Case Study on Questions and Answers
        • 总结
    • 2411.00489_Human-inspired Perspectives: A Survey on AI Long-term Memory
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Human-inspired Perspectives: A Survey on AI Long-term Memory
        • 1. 引言(Introduction)
        • 2. 人类长期记忆的结构与机制(Structure and Mechanisms of Human Long-term Memory)
        • 3. AI系统中的长期记忆建模(Modeling Long-term Memory in AI Systems)
        • 4. 与人类记忆的类比分析(Human-inspired Analysis of AI Memory Systems)
        • 5. 应用场景(Applications of AI Long-term Memory)
        • 6. 挑战与未来方向(Challenges and Future Directions)
      • 总结
    • Human-inspired Perspectives: A Survey on AI Long-term Memory
      • 1. 引言(Introduction)
      • 2. 人类长期记忆的结构与机制(Structure and Mechanisms of Human Long-term Memory)
      • 3. AI系统中的长期记忆建模(Modeling Long-term Memory in AI Systems)
        • 3.1 编码阶段建模(Encoding)
        • 3.2 巩固阶段建模(Consolidation)
        • 3.3 提取阶段建模(Retrieval)
      • 4. 人类启发的AI长期记忆系统(Human-inspired AI Long-term Memory Systems)
      • 5. 挑战与未来方向(Challenges and Future Directions)
      • 6. 结论(Conclusion)
      • 总结评价
      • Human-inspired Perspectives: A Survey on AI Long-term Memory
      • 第一章:引言(Introduction)
        • 内容概述:
        • 重点内容:
        • 其他:
      • 第二章:人类长期记忆机制(Human Long-term Memory Mechanisms)
        • 内容概述:
        • 重点内容:
        • 其他:
      • 第三章:AI系统中的长期记忆建模(Modeling Long-term Memory in AI Systems)
        • 内容概述:
        • 分类与重点内容:
        • 其他:
      • 第四章:评估与挑战(Evaluation and Challenges)
        • 内容概述:
        • 重点内容:
        • 其他:
      • 第五章:未来方向(Future Directions)
        • 内容概述:
        • 重点内容:
      • 第六章:结论(Conclusion)
        • 内容概述:
      • 附录与表格(如有)
        • 表格内容(假设):
      • 总结
      • Labs
      • 5 Meta
        • 5 Meta
      • Abstract
      • Abstract(摘要)
      • 3. Long-term Memory in Human Brain(人脑中的长期记忆)
        • 3.1 Human Memory Hierarchy(人类记忆层次)
        • 3.2 Human Memory Processing(人类记忆处理机制)
        • 3.3 Summary(小结)
      • 4. Long-term Memory of AI: on Storage Formats(AI长期记忆:存储格式)
        • 4.1 Non-Parametric Memory(非参数记忆)
        • 4.2 Parametric Memory(参数记忆)
        • 4.3 Summary(小结)
      • 5. Long-term Memory of AI: on Human Perspectives(AI长期记忆:人类视角)
        • 5.1 Episodic Memory(情景记忆)
        • 5.2 Semantic Memory(语义记忆)
        • 5.3 Procedural Memory(程序性记忆)
        • 5.4 Summary(小结)
      • 6. A New Cognitive Architecture for Long-term Memory(新的长期记忆认知架构)
        • 6.1 Cognitive Architecture of Self-Adaptive Long-term Memory (SALM)
      • 7. Next Steps of AI Long-term Memory(AI长期记忆的未来方向)
        • 7.1 Measures of AI Long-term Memory(AI长期记忆的评估指标)
        • 7.2 Application of AI Long-term Memory(AI长期记忆的应用前景)
      • 总结
      • 1 Introduction
      • 1 引言(Introduction)
        • 核心观点:
        • 人类记忆对AI的启发:
        • 研究空白与本文贡献:
        • 文章结构概览:
      • 2 Research Background and Methodologies
        • 2 研究背景与方法
        • 总结
      • 3 Long-term Memory in Human Brain
        • 第三章:人脑中的长期记忆
        • 图表与数据说明
        • 重点总结
        • 数学与算法要点
        • 总结
      • 4 Long-term Memory of AI: on Storage Formats
      • 第4章:AI的长期记忆:存储形式
        • 概述
      • 4.1 非参数记忆(Non-Parametric Memory)
        • 4.1.1 存储方式
        • 4.1.2 检索方法
        • 4.1.3 遗忘机制
      • 4.2 参数记忆(Parametric Memory)
        • 4.2.1 存储机制
        • 4.2.2 检索机制
        • 4.2.3 遗忘机制
      • 4.3 总结
        • 与人类长期记忆的相似性(见图5):
        • 总体结论
      • 5 Long-term Memory of AI: on Human Perspectives
      • 5 人工智能的长期记忆:从人类视角出发
        • 5.1 情景记忆(Episodic Memory)
        • 5.2 语义记忆(Semantic Memory)
        • 5.3 程序记忆(Procedural Memory)
        • 5.4 总结(Summary)
      • 6 A New Cognitive Architecture for Long-term Memory
        • 6 面向长期记忆的新认知架构(A New Cognitive Architecture for Long-term Memory)
      • 7 Next Steps of AI Long-term Memory
        • 7 AI长期记忆的未来方向
        • 小结
      • 8 Conclusion
      • 8 总结(Conclusion)
        • 重点内容总结:
        • 数学公式、算法与数据:
    • 2501.00332_MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 核心内容讲解:
        • 小结:
      • 1 Introduction
        • 背景与问题
        • 解决方案:检索增强生成(RAG)
        • 问题与挑战
        • 提出的方法:MAIN-RAG
        • 主要贡献
        • 总结
      • 2 Preliminaries
        • 2.1 符号与目标(Notations and Objectives)
        • 2.2 噪声检索文档的影响(Impact of Noisy Retrieval Documents)
        • 2.3 相关工作(Related Works)
      • 3 Multi-Agent Filtering RAG (MAIN-RAG)
        • 3.1 MAIN-RAG 中 LLM 智能体的定义
        • 3.2 相关性判断的量化
        • 3.3 自适应判断阈值 τ_q
        • 总结要点
      • 4 Experiments
        • 4.1 任务与数据集
        • 4.2 基线模型
        • 4.3 实验设置
        • 4.4 定量分析(RQ1)
        • 4.5 自适应判断阈值 τ_q 的消融实验(RQ2)
        • 4.6 τ_q 的案例研究(RQ3)
        • 总结
      • 5 Conclusion and Future Work
        • 主要结论
        • 未来工作
        • 总结
      • 6 Limitations
        • 实验范围的限制
        • 环境影响的考量
        • 总结
      • Appendix A Computation Infrastructure
      • 附录A 计算基础设施
      • Appendix B Performance Comparison among MAIN-RAG and Its Variant Baselines
        • 核心结论:
        • 关键分析:
        • 图表支持:
        • 总结:
      • Appendix C System Instructions of Agent-1 (Predictor), Agent-2 (Judge), and Agent-3 (Final-Predictor)
        • Agent-1(预测器)的系统指令
        • Agent-2(评判器)的系统指令
        • Agent-3(最终预测器)的系统指令
        • 图 11:三个 Agent 的系统指令图示
        • 总结
      • Appendix D Case Studies of Different Adaptive Judge Bar τqsubscript𝜏𝑞\tau_{q}italic_τ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT in MAIN-RAG
        • 案例研究 1(高 τq)
        • 案例研究 2(低 τq)
        • 案例研究 3(中等 τq)
        • 图 12 - 15:不同数据集与 LLM 的案例对比
        • 总结
    • 2503.09149_MemVid: Memory-enhanced Retrieval Augmentation for Long Video Understanding
      • From Moonlight
        • 三行摘要
        • 关键词
        • 摘要
      • Abstract
      • 1. Introduction
      • 1. 引言(Introduction)总结
        • 贡献总结:
      • 2. Related Work
      • 2. Related Work
        • 2.1. 大型视觉-语言模型(Large Vision-language Models)
        • 2.2. 长视频视觉-语言模型(Long Large Vision-language Models)
        • 2.3. 基于检索增强的视频理解(Retrieval-augmented Video Understanding)
      • 3. Methodology
        • 3. Methodology
        • 总结
      • 4. Experiments
      • 4. 实验总结
        • 4.1. 实验设置
        • 4.2. 总体结果
        • 4.3. 消融实验
        • 4.4. 泛化性分析
        • 4.5. 效率分析
        • 4.6. 案例分析
      • 总结
      • 5. Conclusion
      • 5. 结论
    • 2505.02099_MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based Agents
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 研究背景
        • 研究现状
        • 本文贡献
        • 项目开源
        • 重点内容
        • 总结
      • 1. Introduction
      • 1. 引言(Introduction)
        • 记忆模块的重要性
        • 现有研究的不足
        • MemEngine:一个统一且模块化的记忆库
        • 总结
      • 2. Comparison with Relevant Libraries
      • 2. 与相关库的比较
        • 已有库分类
        • MemEngine 的优势
        • 对比表格详解(Table 1)
      • 3. MemEngine Library
      • 3. MemEngine Library
        • 3.1. Overview(概述)
        • 3.2. Memory Models(记忆模型)
        • 3.3. Memory Operations(记忆操作)
        • 3.4. Memory Functions(记忆功能)
        • 3.5. Memory Configurations(记忆配置)
        • 3.6. Memory Utilities(记忆工具)
      • 总结
      • 4. Usage of MemEngine
      • 4. MemEngine 的使用方式
        • 4.1 使用预实现的记忆模型
        • 4.2 定制新的记忆模型
      • 5. Conclusion
        • 5. 结论
        • 致谢
    • 2505.11271_Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models
      • Abstract
        • 重点内容强调
        • 补充信息
      • 1 Introduction
        • 1.1 现代大语言模型(LLMs)的应用与挑战
        • 1.2 链式流程中的中间输出与缓存机会
        • 1.3 现有优化方法与语义缓存
        • 1.4 语义缓存的应用场景
        • 1.5 本文的贡献
        • 1.6 实验结果与结论
        • 1.7 实际意义与价值
      • 2 Related Work
      • 2 相关工作
        • 2.1 提示缓存(Prompt Caching,基于KV的方法)
        • 2.2 语义缓存(Semantic Caching)
        • 2.3 其他缓存方法
        • 2.4 本文方法与现有方法的比较
        • 总结
      • 3 System design and Methodology
      • 3 系统设计与方法论
        • 3.1 观察与系统设计
        • 3.2 我们的语义缓存方法
      • 4 Experimental setup
      • 4 实验设计
        • 4.1 模拟设计
        • 4.2 数据集
        • 4.3 问题之间的相似性
        • 4.4 摘要
        • 4.5 评估指标
      • 5 Results and discussion
      • 5 实验结果与讨论
        • 5.1 检索方法的比较分析
        • 5.2 延迟细节
        • 5.3 不同相似度阈值与摘要长度的影响
        • 5.4 选择相似度阈值:效用与缓存命中率的权衡
        • 5.5 影响回答生成的因素
        • 5.6 对现实系统的影响
        • 5.7 挑战与限制
        • 总结
      • 6 Conclusion and future work
      • 6 结论与未来工作
        • 技术增强
        • 可扩展性与实际部署
        • 隐私问题
        • 更广泛的应用
    • 2505.13308_Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 1 Introduction
      • 1 引言(Introduction)
        • 1.1 大型语言模型(LLMs)的推理挑战
        • 1.2 现有改进方法及其局限性
        • 1.3 提出的替代方法:测试时实例级适应(TTIA)
        • 1.4 现有TTIA方法的局限性
        • 1.5 本文贡献:LatentSeek 框架
        • 1.6 实验结果与性能提升
        • 总结:
      • 2 Test-Time Instance-Level Policy Gradient in Latent Space
      • 2 测试时实例级潜在空间策略梯度
        • 2.1 问题定义:测试时实例级推理
        • 2.2 潜在空间中的策略梯度推理
        • 2.3 LatentSeek 算法
        • 总结
      • 3 Empirical Results
        • 3. Empirical Results 总结
        • 3.1 Experimental Setup(实验设置)
        • 3.2 State-of-the-art Test-time Reasoning Performance(测试时推理性能)
        • 3.3 Ideal Experiment: Perfect Sparse Reward Model(理想实验:完美稀疏奖励模型)
        • 3.4 Test-Time Scaling: scaling up the iteration of LatentSeek(测试时扩展:增加 LatentSeek 迭代次数)
        • 3.5 Algorithmic Statistics(算法统计)
        • 3.6 Qualitative Analysis(定性分析)
        • 总结
      • 4 Related Work
      • 4 相关工作(Related Work)总结
        • 一、语言模型的推理能力(Reasoning in Language Models)
        • 二、语言模型的强化学习(Reinforcement Learning for Language Models)
        • 三、可控生成与测试时优化(Controllable Generation and Test-Time Optimization)
        • 四、提示调优与软提示(Prompt Tuning and Soft Prompt)
        • 总体总结:
      • 5 Conclusion
      • 5 结论
        • 主要内容:
        • 总结:
      • Acknowledgement
      • Acknowledgement(致谢)
      • Appendix A Discussion and future works
      • A. 讨论与未来工作
        • Reward Models(奖励模型)
        • Latent Optimization(潜在空间优化)
        • Large Base Model(大基础模型)
      • Appendix B Methods of Test-Time Instance-Level Reasoning
        • 附录 B 测试时实例级推理方法
        • 总结
      • Appendix C Theoretical Analysis
      • 附录 C 理论分析总结
        • C.1 预备知识:多证明者交互证明与 NEXP
        • C.2 理论分析:独立更新
        • C.3 定理 C.10 与推论 C.11 的证明
      • 总结
      • Appendix D Derivation of Policy Gradient
      • 附录 D 策略梯度的推导
        • 1. 初始目标函数
        • 2. 对 \(\mathbf{z}\) 求梯度
        • 3. 利用对数导数技巧
        • 4. 利用策略的分解形式
        • 5. 得到最终结果
        • 总结
      • Appendix E Additional Experimental Results
        • 附录 E:更多实验结果总结
        • 总结
      • Appendix F Experimental Details
    • 附录 F 实验细节总结
      • F.1 提示设计
      • F.2 模型主干
      • F.3 基线方法
      • F.4 GSM8K实验
        • 数据集
        • 实验细节
      • F.5 MATH-500实验
        • 数据集
        • 实验细节
      • F.6 AIME2024实验
        • 数据集
        • 实验细节
      • 评估提示模板
      • 计算量估计
      • Appendix G Detailed FLOPs Calculation
      • 附录 G:详细 FLOPs 计算总结
        • G.1 前向传播 FLOPs 估算
        • G.2 Genius 方法的总 FLOPs
        • G.3 LatentSeek 方法的总 FLOPs
        • G.4 效率阈值分析
        • 总结
      • Appendix H Qualitative Analysis and Case Studies
      • 附录 H 定性分析与案例研究(Qualitative Analysis and Case Studies)
        • 1. 生成序列的词云分析(Wordclouds of the First Three Words)
        • 2. 案例研究(Case Studies)
        • 关键发现总结
        • 总结
      • Appendix I Computational Resources
      • 附录I 计算资源
      • Appendix J The Use of Large Language Models (LLMs)
        • 附录 J:大语言模型(LLMs)的使用
    • 2506.22815_Memory as a Service (MaaS): Rethinking Contextual Memory as Service-Oriented Modules for Collaborative Agents
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 1 Introduction
      • 2 Related Works
        • 2.1 个体内部内存的持久性
        • 2.2 跨实体内存共享
        • 总结
      • 3 MaaS: A Service-Oriented Memory Perspective
        • 3.1 Core Principles: From Local State to Callable Service
        • 3.2 The MaaS Architecture: Granting Public Service Capabilities to Private Memory
        • 高层实现架构(High-Level Implementation)
      • 4 MaaS Design Space and Application Scenarios
        • 4.1 内部实体(Intra-Entity)
        • 4.2 跨实体(Inter-Entity)
        • 4.3 群体级(Group-Level)
        • 总结
      • 5 Open Research Agenda
        • 5.1 公共维度带来的挑战:治理与协议(Challenges Arising from Public-Side: Governance and Protocols)
        • 5.2 私有维度带来的挑战:安全与信任(Challenges Arising from Privacy-Side: Security and Trust)
        • 5.3 交互涌现带来的挑战:生态系统与伦理(Challenges from Interaction Emergence: Ecosystem and Ethics)
        • 总结
      • 6 Conclusion: A Timely Perspective
    • 2506.24019_Ella: Embodied Social Agents with Lifelong Memory
      • Abstract
      • 1 Introduction
      • 1 引言(Introduction)总结
        • 研究背景与动机
        • 本文的贡献与方法
        • 本文核心贡献总结(重点内容)
        • 总结
      • 2 Related Work
      • 2 相关工作
        • 2.1 具身社交智能
        • 2.2 智能体记忆
        • 图2说明(Figure 2)
      • 3 Problem Setting
      • 3 问题设定
        • 1. 智能体与社交群组
        • 2. 智能体的初始知识
        • 3. 模拟环境与交互机制
        • 4. 控制评估与干预方式
        • 总结
      • 4 Ella: Embodied Lifelong Learning Agent
      • 4 Ella: Embodied Lifelong Learning Agent
        • 4.1 Name-centric Semantic Memory(名称中心语义记忆)
        • 4.2 Spatiotemporal Episodic Memory(时空情景记忆)
        • 4.3 Planning, Reaction, and Communication(规划、反应与通信)
        • 总结
      • 5 Experiments
      • 5 实验结果总结
        • 5.1 实验设置
        • 5.2 实验结果
        • 总结
      • 6 Limitations
      • 6 限制(Limitations)
        • Leverage the graph structure of the name-centric semantic memory.
        • Lifelong simulation of a community of agents in a visually rich, physics-realistic environment is computationally expensive.
        • All agents’ thinking processes are assumed to finish synchronously.
      • 7 Conclusion
      • 7 结论
      • Appendix A Broader Impact
      • Appendix A 更广泛的影响
      • Appendix B Additional Experiment Details
      • 附录 B 实验附加细节总结
        • B.1 虚拟社区 (Virtual Community)
        • B.2 计算资源 (Compute)
        • 总结要点:
      • Appendix C Additional Implementation Details
        • Appendix C Additional Implementation Details(附录C 额外的实现细节)
      • Appendix D Prompt Templates
      • Appendix D Prompt Templates
        • Figure 8: 生成日常计划的提示模板
        • Figure 9: 生成反应的提示模板
        • Figure 10: 生成语言输出的提示模板
        • Figure 11: 生成对话总结的提示模板
        • Figure 12: 从对话中提取知识的提示模板
        • 总体说明
    • 2507.10524_Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 1 Introduction
        • 背景与动机
        • 递归 Transformer 与挑战
        • MoR:统一框架
        • 概念与意义
        • 贡献总结(Contributions)
        • 总结
      • 2 Method
        • 2.1 Preliminary
        • 2.2 Mixture-of-Recursions (MoR)
        • 总结
      • 3 Experiments
        • 表格总结(Table 3)
        • 3.1 主要结果
        • 3.2 IsoFLOP 分析
        • 3.3 推理吞吐量评估
        • 总结
      • 4 Ablation Studies
        • 4.1 Parameter Sharing Strategies
        • 4.2 Routing Strategies
        • 4.3 KV Caching Strategies
        • 总结
      • 5 Analysis
        • 5.1 Compute-optimal Scaling Analysis
        • 5.2 Routing Analysis
        • 5.3 Test-time Scaling Analysis
        • 总结
      • 6 Related Work
        • Recursive Transformers(递归Transformer)
        • Adaptive Computation(自适应计算)
        • Routing Mechanism(路由机制)
        • Key-value Caching(键值缓存)
        • Latent Reasoning(隐式推理)
      • 7 Conclusion
        • 7.1 局限性与未来工作
        • 7.2 致谢
      • Appendix A Details of Design Choices for Mixture-of-Recursions
        • A.1 参数共享策略(Parameter-sharing Strategy)
        • A.2 路由策略(Routing Strategy)
        • A.3 KV 缓存策略(KV Caching Strategy)
        • 总结
      • Appendix B Experimental Setup
        • 训练设置
        • 评估设置
        • 模型架构细节
        • 表6:模型架构参数总结(重点)
      • Appendix C Expanded Results of IsoFLOP Analysis
        • 总体比较
        • Transformer的FLOPs近似计算
        • 带检查点复用的梯形学习率调度
        • 结果概览
      • Appendix D Details of Experimental Settings for Throughput Measurement
        • 实验系统与评估方法
        • 模型吞吐量对比设置
        • 批处理设置
        • 实现细节
      • Appendix E Expanded Results of Parameter Sharing Strategy
        • Middle-Cycle 是最稳定的选择
        • 持续预训练(up-training)下的表现
      • Appendix F Expanded Results of Design Choices for Router
        • F.1 设计配置细节
        • F.2 路由器性能评估指标
        • F.3 路由器设计的扩展评估结果
      • Appendix G Expanded Results of KV Cache Sharing Mechanism
        • G.1 递归 Transformer 中的关键值表示趋势
        • G.2 KV 缓存共享策略的性能比较
        • 总结
      • Appendix H Expanded Qualitative Results
        • H.1 Analysis on Adaptive Computation Paths
        • H.2 Analysis on Router Weights
        • 总结
    • 2509.08151_Trust Semantics Distillation for Collaborator Selection via Memory-Augmented Agentic AI
      • 第一章:引言(Introduction)
      • 第二章:相关工作(Related Work)
      • 第三章:方法论(Methodology)
        • 3.1 信任语义建模(Trust Semantics Modeling)
        • 3.2 记忆增强智能体架构(Memory-Augmented Agentic Architecture)
        • 3.3 信任语义蒸馏(Trust Semantics Distillation)
      • 第四章:实验与评估(Experiments and Evaluation)
      • 第五章:讨论(Discussion)
      • 第六章:结论与未来工作(Conclusion and Future Work)
      • Abstract
      • 摘要(Abstract)总结:
        • 核心问题:
        • 解决方案:
        • 实验结果:
        • 重点内容:
        • 非重点内容(简略):
      • I Introduction
      • I Introduction(引言)
        • 背景与动机
        • 协作伙伴选择的关键性
        • 信任评估的挑战
        • 本文贡献
      • II Agentic AI-Aided Teacher-Student Architecture for Trust Semantics Evaluation
      • II 基于智能体AI的师生架构用于信任语义评估
        • II-A LAM驱动的智能体AI用于信任语义评估
        • II-B 师生代理架构
        • 图表说明
        • 总结
      • III Task-Specific Trust Semantics Distillation
        • III 任务特定信任语义蒸馏(Task-Specific Trust Semantics Distillation)
        • 总结
      • IV Experimental Analysis
      • IV 实验分析(总结)
        • 1. 协作者评估时间(Collaborator Evaluation Time)
        • 2. 数据收集次数(Number of Data Collections)
        • 3. 协作者选择准确性(Collaborator Selection Accuracy)
        • 总结
      • V Future Directions
      • V 未来方向(Future Directions)
        • •  环境变化对动态信任的影响评估与缓解(Evaluation and Mitigation of Environmental Changes on Dynamic Trust)
        • •  预测性信任提炼(Predictive Trust Distillation)
        • •  数据缺失场景下的信任语义提取(Trust Semantics Extraction Under Data-Missing Scenarios)
      • VI Conclusion
      • VI 结论
        • 核心内容讲解:
        • 研究优势与创新点(重点内容):
        • 总结:
    • 2510.26493_The Context of Context Engineering
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 1 Introduction
        • 背景与问题提出
        • 上下文工程的兴起
        • 对上下文工程的误解与历史回顾
        • 核心观点:上下文工程是“熵减”过程
        • 上下文工程的演进阶段
        • 论文贡献
        • 后续章节安排
      • 2 Theoretical Framework
        • 2.1 形式化定义(Formal Definition)
        • 2.2 阶段划分(Stage Characterization)
        • 总结
      • 3 Historical Evolution
        • 3.1 超过20年前:1.0时代
        • 3.2 20年后:2.0时代
      • 4 Context Collection and Storage
        • 设计考虑(Design Considerations)
        • 基本设计原则
        • 4.1 典型策略(Era 1.0 和 Era 2.0)
        • 4.2 人类级上下文生态系统(Era 3.0)
        • 总结
      • 5 Context Management
        • 5.1 文本上下文处理
        • 5.2 多模态上下文处理
        • 5.3 上下文组织
        • 5.4 上下文抽象
      • 6 Context Usage
        • 6.1 系统内上下文共享
        • 6.2 跨系统上下文共享
        • 6.3 上下文选择与理解
        • 6.4 主动用户需求推断
        • 6.5 终身上下文的保存与更新
        • 6.6 新兴工程实践
      • 7 Applications
        • 7.1 命令行工具(CLI)
        • 7.2 深度研究(Deep Research)
        • 7.3 脑机接口(Brain-Computer Interfaces)
      • 8 Challenges and Future Directions
        • 情境收集仍受限且效率低下
        • 大规模情境的存储与管理
        • 模型对情境的理解能力有限
        • 长文本情境处理的性能瓶颈
        • 相关情境的筛选问题
        • 数字存在(Digital Presence)
        • 总结
      • 9 Conclusion
    • 2511.21689_ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
      • 总结
      • From Blog
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 核心内容讲解:
      • 1 Introduction
        • 核心问题与背景
        • 现有方法的局限
        • 协调范式的核心思想
        • 协调器的实现挑战
        • ToolOrchestra 方法概述
        • 实验结果与贡献
        • 主要贡献总结
      • 2 Agentic Problem Formulation
        • 2.1 任务建模
        • 2.2 多轮交互流程
        • 总结
      • 3 ToolOrchestra
        • 3.1 统一工具调用接口
        • 3.2 端到端代理强化学习
        • 3.3 数据合成
        • 总结
      • 4 Experimental Setting
        • 4.1 工具(Tools)
        • 4.2 基线模型(Baselines)
        • 4.3 评估配置(Evaluation Configuration)
        • 4.4 训练配置(Training Configuration)
        • 表格 1:Orchestrator-8B 与基线模型对比
        • 总结
      • 5 Experimental Results
        • 1. 基线方法表现不佳
        • 2. 工具与模型结合提升性能
        • 3. Orchestrator-8B 表现突出
        • 4. 关键优势
        • 5. 结论
      • 6 Analysis
        • 6.1 工具使用分析(Tool Use Analysis)
        • 6.2 成本分析(Cost Analysis)
        • 6.3 泛化能力(Generalization)
        • 6.4 用户偏好(User Preferences)
        • 总结
      • 7 Related Work
        • 7.1 从工具学习到通用智能体(From Tool Learning to Generalist Agents)
        • 7.2 从工具使用的准确性到效率与可控性(From Tool-Use Accuracy to Efficiency and Controllability)
        • 总结
      • 8 Conclusion
        • 主要内容总结:
        • 1. 方法概述
        • 2. 核心贡献
        • 3. 实验结果
        • 4. 未来展望
        • 重点内容强调:
        • 数学/算法相关说明:
        • 总结:
      • Appendix A Pilot Study
        • 实验设置
        • 实验结果
        • 关键结论
        • 重点强调
      • Appendix B Evaluation Benchmarks
        • 1. Humanity’s Last Exam (HLE)
        • 2. FRAMES
        • 3. τ²-Bench(τ²-Bench)
      • Appendix C Model description for Qwen3-32B
        • 数学与定量推理
        • 科学领域知识
        • 逻辑推理能力
        • 人文学科知识
        • 编程与函数调用能力
        • 总体评价
      • Appendix D Tools in training
        • • Query Writer(查询生成器)
        • • Web Search(网络搜索)
        • • Local Search(本地搜索)
        • • Code Writer + Interpreter(代码生成与执行)
        • • Math Models(数学模型)
        • • Generalist Models(通用模型)
        • 总结
      • Appendix E Third-party API
        • 总结说明:
      • Appendix F Humane preference example
        • 内容总结:
        • 工具列表(Tools)
        • 偏好指令(Preference instruction, PIPI)
        • 偏好向量(Preference vector, PP)
        • 总结
      • Appendix G Use of LLMs Disclosure
      • Appendix H Generalization of pricing configurations
        • 实验设置与方法
        • 实验结果
        • 结论
      • Appendix I Data Synthesis
      • Appendix J Breakdown of ToolScale
        • 重点分析:
        • 总结:
      • Appendix K Data synthesis prompts and examples
        • 表6:生成领域主题的模型提示
        • 表7:生成数据库模式的模型提示
        • 表8:生成数据库条目的模型提示
        • 表9:验证数据库条目的模型提示
        • 表10:生成函数的模型提示
        • 表11:生成意图的模型提示
        • 表12:生成任务的模型提示
        • 表13:演化任务的模型提示
        • 表14:数据库模式示例
      • Appendix L Calculation of rewards for preference-aware benchmark
      • 附录 L 偏好感知基准奖励的计算
        • 奖励计算方法
        • 表格分析
        • 总结
  • 其他
    • 数据集&数据蒸馏
      • 1811.10959v3_Dataset Distillation
        • ABSTRACT
        • LLM总结
        • 1. INTRODUCTION
        • 3. APPROACH
      • 2502.20653_Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 7. Conclusion
      • 通用
        • Dataset distillation
    • 3D
      • 2003.08934_NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Neural Radiance Field Scene Representation
        • 4. Volume Rendering with Radiance Fields
        • 5. Optimizing a Neural Radiance Field
        • 6. Result
        • 7. Conclusion
      • 2203.08586: Deep vanishing point detection: Geometric priors make dataset variations vanish
        • 概念
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Geometric priors for VP detection
        • 4. Experiments
        • 5. Conclusion and limitations
      • 2312.14132_DUSt3R: Geometric 3D Vision Made Easy
        • 关键词
        • 相关概念
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Method
        • 4. Experiments with DUSt3R
        • 5. Conclusion
        • Appendix A 附录概览
        • Appendix B. Qualitative results
        • Appendix C. Extended Related Work
        • Appendix D. 多视角姿态估计(Multi-view Pose Estimation)
        • Appendix E. 视觉定位(Visual Localization)
        • Appendix F. Training details
      • 2406.09756_MASt3R: Grounding Image Matching in 3D with MASt3R
        • 前言
        • Abstract
        • 1. Introduction
        • 🧠 思维导图式总结
        • 2. Related works
        • 🧠 总结思维导图
        • 3. Method
        • 4. Experimental results
        • 5. Conclusion
        • Appendix
        • Appendix A Additional Qualitative Results
        • B. Fast Reciprocal Matching
        • C. Coarse-to-Fine
        • D. Detailed experimental settings
      • 2412.09401_SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos
        • 术语
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Method
        • 4. Experiments
        • 5. Conclusion
        • 6. 致谢
        • Appendix
        • Appendix A Implementation details
        • Appendix B Details for experimental settings
        • Appendix C Additional comparisons and analyses
        • D. More visual results
      • 2412.12392_MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
        • GPT
        • 先验知识
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Method
        • 4. Results
        • 5. Limitations and Future Work(局限与未来工作)
        • 🧾 6. Conclusion(总结)
        • 🧠 总结一句话版:
        • 8. Initialisation(初始化)
        • 9. Runtime Breakdown(运行时分析)
        • 10. Evaluation Setup(评估设置)
        • 11. EuRoC 结果总结
      • 2503.11651_VGGT: Visual Geometry Grounded Transformer
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Method
        • 4. Experiments
        • 5. Discussions
        • 6. Conclusions
        • Appendix A Formal Definitions
        • Appendix B Implementation Details
        • Appendix C Additional Experiments
        • Appendix D Qualitative Examples
        • Appendix E Related Work
    • 其他
      • 2204.00598_SocraticModels: Composing Zero-Shot Multimodal Reasoning with Language
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Problem Setting, Background, and Related Work
        • 3 Socratic Models
        • 4 Evaluation: Methods and Results
        • 5 Applications: Methods and Demonstrations
        • 6 Discussion
        • Acknowledgments and Disclosure of Funding
        • Appendix A Overview
        • Appendix B Unsupervised Socratic Model Selection
        • Appendix C Additional Notes on Experiments
        • Appendix D Egocentric Perception Appendix
        • Appendix E Scaling Up Socratic Video Search
        • Appendix F Additional Notes on Robot Experiments
        • Appendix G Socratic Deductive Reasoning
        • Appendix H Broader Impact: Energy and Resource Consumption
      • A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
        • The Basic Idea Behind CRC Algorithms
        • Polynomical Arithmetic
        • Binary Arithmetic with No Carries
        • 一个可用的实例
        • Choosing A Poly
        • A Straightforward CRC Implementation
        • A Table-Driven Implementation
        • A Slightly Mangled Table-Driven Implementation
        • 参考
      • Distributed Representations of Sentences and Documents
新溪-gordon
  • Docs »
  • 记忆 »
  • Survey on AI Memory: Theories, Taxonomies, Evaluations, and Emerging Trends
  • View page source

On This Page

  • Survey on AI Memory: Theories, Taxonomies, Evaluations, and Emerging Trends
    • 总结
    • From Moonlight
      • 三句摘要
      • 关键词
      • 摘要

Survey on AI Memory: Theories, Taxonomies, Evaluations, and Emerging Trends¶

  • GitHub: https://github.com/BAI-LAB/Survey-on-AI-Memory

  • https://github.com/BAI-LAB/Survey-on-AI-Memory

总结¶

From Moonlight¶

三句摘要¶

  1. 🎯 这篇综述旨在解决当前AI记忆研究的碎片化问题,为LLM驱动的智能体提供一个统一的理论框架和全面的概述。

  2. 📖 作者提出了一个结构化的“4W记忆分类法”,详细分析了AI记忆的生命周期、信息类型、存储方式和信息模态,从而系统地整理了现有研究。

  3. 🤖 该工作还系统回顾了单智能体和多智能体系统中的记忆架构、功能和评估方法,并讨论了新兴趋势与挑战,为未来AI记忆的发展提供了路线图。

关键词¶

  • 4W Memory Taxonomy: 是一种系统性分类人工智能记忆机制的结构化框架,由“When-What-HoW-Which”四个维度构成。

  • Cognitive Psychology: 是理解人类思维过程和记忆机制的学科,为人工智能记忆系统的设计提供了理论灵感。

  • Atkinson–Shiffrin Tri-Store Model: 是认知心理学中的经典模型,将人类记忆概念化为感觉记忆、短时记忆和长时记忆三个相互作用的存储单元。

  • Complementary Learning Systems Theory: 是认知神经科学中的理论,认为大脑的记忆架构是海马体和新皮质之间协同作用的产物,海马体负责快速编码和索引,新皮质负责渐进式更新和长期存储。

  • Index and Content Separation Pattern: 是一种AI记忆设计模式,通过维护紧凑的片段键与检索器相结合,指向长期存储中详细丰富的内容,有效克服了上下文窗口固有的容量限制。

  • Multiphase Consolidation Pattern: 是一种AI记忆设计模式,在关键时刻将近期片段痕迹转化为摘要、反思和可重用技能,从而将长期记忆组织成结构化和通用化的形式。

  • Structured Coordination Pattern: 是一种AI记忆设计模式,将活动工作空间组织成由中央控制器监督的专用缓冲区,实现口头、视觉和工具相关输出的并行、无干扰维护,并支持决策时的动态集成。

  • Agent Memory: 指的是一种功能性工作流,它通过感知-规划-行动循环协调以支持自治行为和复杂任务的执行。

  • LLM Memory: 主要指大型语言模型的计算核心,存在于预训练模型权重中的参数记忆和通过上下文窗口管理的运行时记忆两种特定状态。

  • Memory vs. Knowledge: 指的是记忆作为随交互动态演化的存储,与知识作为为稳定性而固化并可重复使用的静态沉淀物之间的区别。

  • Memory vs. Context: 区分了记忆作为应用程序级状态管理器,存在于模型短暂的执行周期之外以封装更广泛的用户交互和代理历史,与上下文作为大型语言模型中的即时执行环境。

  • Memory vs. Experience: 指的是记忆作为特定交互的基础记录,保留原始数据点,与经验作为更高阶的认知构建,通过处理原始痕迹合成抽象的、可转移的模式。

  • Memory Lifecycle: 这一维度考察AI智能体系统中记忆的时间跨度,即记忆存在多久以及在何种范围内保持可访问性。

  • Transient Memory: 指的是在即时输入处理期间短暂存在的记忆,作为进一步处理或存储之前感官输入的临时缓冲区。

  • Session Memory: 指的是在单个任务执行或会话交互过程中持续存在的记忆,但在此会话结束后不再保留。

  • Persistent Memory: 指的是能够在单个会话之外持续存在的记忆,并可在多次交互、任务甚至不同智能体实例之间访问。

  • Memory Type: 这一维度考察记忆捕获的知识的性质,包括程序性技能、陈述性事实、元认知反思和个性化模型。

  • Procedural Memory: 封装可执行知识,包括技能、行动工作流和工具利用,其主要目标是促进面向目标的行动。

  • Declarative Memory: 存储事实知识和感知观察,包括环境观察、事实知识库、事件日志和上下文描述。

  • Metacognitive Memory: 指的是关于智能体自身思维和能力的知识,使其能够跟踪自身表现、识别优缺点、反思过去的行为并调整策略。

  • Personalized Memory: 存储关于其他智能体和用户的信息,例如他们的偏好、行为和关系,从而使AI智能体能够记住和建模个体。

  • Memory Storage: 这一维度考察记忆在AI智能体系统中是如何被表示和存储的,决定了记忆的物理形式、访问模式和计算特性。

  • Implicit Storage: 将记忆存储在模型架构内部,无论是其训练权重(参数记忆)还是处理过程中的隐藏状态(潜在记忆)。

  • Parametric Memory: 指的是通过预训练或微调等训练过程直接嵌入模型参数和权重中的记忆。

  • Latent Memory: 指的是不明确存储在模型参数中,而是通过模型的学习潜在空间隐式表示的记忆。

  • Explicit Storage: 将记忆存储在模型外部,以文本、向量或图等格式存在,使其更容易检索、更新和解释。

  • Raw Memory: 以文本、视觉和听觉格式存储信息,包括对话历史、压缩对话和其他表示形式,代表了最可解释的存储方法。

  • Vector Memory: 将信息存储为高维语义空间中的密集、连续值向量,通常由神经网络存储模型生成。

  • Graph Memory: 将信息编码为实体及其关系的显式网络,通常使用图数据库实现,非常适合表示复杂连接。

  • Modality Type: 根据其处理的信息格式对AI记忆进行分类,分为单模态和多模态记忆。

  • Single-modal Memory: 侧重于存储、更新和检索来自单一模态的信息,其中文本是最成熟和广泛采用的形式。

  • Multimodal Memory: 集成来自多种模态的信息,包括文本、图像、音频和视频,使智能体能够感知和推理复杂环境。

  • Single-Agent System: 指的是为单个智能体设计的人工智能记忆架构和功能机制。

  • Memory Architecture: 描述了AI记忆系统中的信息组织形式和设计差异。

  • Hierarchical Memory Architecture: 是一种为智能体系统设计的记忆架构,通过分层存储结构和动态管理来解决LLM上下文窗口容量有限与对长期信息存储和检索的无限需求之间的矛盾。

  • OS-like Memory Architectures: 这种架构借鉴操作系统设计,采用分层存储和动态管理机制来解决长期交互中记忆一致性和资源分配的挑战。

  • Cognitive Evolution Memory Architectures: 这种架构模拟人类认知过程或融合心智理论,使智能体能够开发可演化的记忆和策略系统以进行自我优化。

  • Graph and Temporal Memory Architectures: 通过图结构(例如,实体-关系图)或时间模型来组织信息,以捕获复杂关系,旨在增强智能体记忆。

  • Memory Storage Function: 作为AI智能体的核心模块,主要功能是将零碎的观测数据转化为结构化和持久的记忆记录,并建立标准化的索引架构以支持后续检索。

  • Memory Retrieval: 是指从大规模记忆库中精确检索和整合信息以指导生成过程,从而缓解幻觉并增强推理能力。

  • Memory Updating: 是修改、替换或整合现有存储内容的过程,确保智能体通过纠正错误或过时信息并在新数据到达时整合零碎知识来防止错误再次发生。

  • Self-Evolution: 指的是智能体在持续交互和任务执行过程中,动态迭代和优化所获得的知识、技能和行为策略的能力。

  • Association (AI Memory): 指的是将多模态信号(文本、视觉、音频、交互)整合到连贯的情境模型中以构建记忆的过程。

  • Multi-Agent Systems (MAS): 是指多个智能体通过记忆机制进行交互和协作以实现集体智能的系统。

  • Communication Mechanisms (MAS): 指的是多智能体系统中用于介导记忆共享的通信方式,包括显式和隐式两种主要模态。

  • Explicit Communication (MAS): 涉及智能体之间符号信息的刻意传输,范围从非结构化自然语言对话到高度结构化、正式化的协议。

  • Implicit Communication (MAS): 允许多智能体系统在没有直接、有意的智能体间消息传递的情况下进行协调,而是通过观察共享环境或内部状态的共享表示来推断他者的意图或状态。

  • Memory Sharing Mechanisms (MAS): 指的是多智能体系统中,用于促进集体智能共享记忆的机制,通常在任务级别和步骤级别上进行优化。

  • Task-Level Memory Sharing: 指的是整合不同任务执行的经验,以促进长期演化和跨领域知识转移的机制。

  • Step-Level Memory Sharing: 指的是在单个协作工作流的精细执行阶段,将特定信息动态分配给相关智能体的过程。

  • Evaluation Metrics (AI Memory): 衡量AI记忆系统性能的标准,包括记忆检索能力、动态更新能力、高级认知能力和系统效率。

  • Memory Retrieval Capability: 评估记忆系统准确、全面地定位与当前查询相关信息的能力。

  • Dynamic Updating Capability: 评估记忆系统正确维护其知识库新鲜度的能力,超越静态信息检索。

  • Advanced Cognitive Capability: 代表智能体超越简单的存储和检索,利用记忆进行高阶推理的能力。

  • System Efficiency (AI Memory): 评估AI记忆模块在实际部署中的工程可行性,涵盖延迟、令牌开销和存储效率。

  • Evaluation Benchmarks (AI Memory): 用于系统化组织记忆评估的测试集,根据其评估任务的核心特征进行分类。

摘要¶

该调查报告《Survey on AI Memory: Theories, Taxonomies, Evaluations, and Emerging Trends》全面审视了AI系统中的记忆机制,强调其在推动LLM驱动智能体实现动态适应、复杂推理和经验学习方面的核心作用。该报告旨在通过统一的理论框架,弥合计算机制与认知心理学中类人记忆过程之间的差距,并提出了一个结构化的“4W Memory Taxonomy”来系统化分析记忆系统。

2 理论基础 (Theoretical Foundations) 报告首先为记忆增强型智能体奠定了概念基础,融合了生物学原理与计算定义:

  • 2.1 跨学科基础 (Interdisciplinary Foundations):

    • Atkinson–Shiffrin 三阶段模型 (Tri-Store Model):将人类记忆概念化为感觉记忆 (sensory memory)、短时记忆/工作记忆 (short-term/working memory) 和长时记忆 (long-term memory) 三个相互作用的存储阶段,通过控制过程(如注意力、复述、检索)协调。感觉记忆捕获瞬时高保真输入;短时记忆是有限容量的活跃工作空间;长时记忆是长期储存的事实、事件和技能的庞大知识库。

    • 工作记忆模型 (Working Memory Model):Baddeley 和 Hitch 提出,将短时存储重构为多组件工作记忆系统。中央执行器 (central executive) 作为有限容量的控制器,指导注意力并协调资源。它由语音循环 (phonological loop)、视空间画板 (visuospatial sketchpad) 和情节缓冲器 (episodic buffer) 支持,后者整合信息以形成统一的多模态事件。

    • 互补学习系统理论 (Complementary Learning Systems Theory, CLS):该理论将大脑的记忆架构概念化为海马体 (hippocampus) 和新皮层 (neocortex) 之间的协同伙伴关系。海马体快速编码和索引新经验片段,而新皮层则作为深度存储,通过渐进式更新保护现有知识。这种互补分工使得新事件能够快速捕获,并随后在睡眠等安静时期通过海马体重新激活,引导新皮层进行稳定、渐进的整合。

    • 对AI记忆设计的启示 (Implications for AI Memory Design):

      1. 索引与内容分离模式 (Index and content separation pattern):通过维护紧凑的情节键和连接到长时存储中详细内容的检索器,克服上下文窗口限制,实现高效检索。

      2. 多阶段整合模式 (Multiphase consolidation pattern):在战略性时刻将近期事件痕迹转化为摘要、反思和可复用技能,使长时记忆结构化和通用化。

      3. 结构化协调模式 (Structured coordination pattern):将活跃工作空间组织成专业缓冲区,由中央控制器监督,实现口头、视觉和工具相关输出的并行维护和动态集成。

  • 2.2 “记忆”在AI中的边界 (Boundaries of “Memory” in AI):

    • AI Memory vs. Agent Memory vs. LLM Memory:

      • LLM memory:主要指计算核心的低级机制,存在于预训练模型权重中的参数记忆和通过上下文窗口管理的运行时记忆,侧重于即时生成准确性。

      • Agent memory:在此基础上扩展为功能工作流,系统地支持自主行为(感知-规划-行动循环),将数据结构化为程序性、声明性和元认知格式,实现经验学习和策略细化。

      • AI memory:最广泛的定义,涵盖信息持久性和演化,是终身学习的总体认知概念,目标是持续适应和人类对齐。

    • Memory vs. Knowledge:Memory是动态存储,通过交互演化,包含参数权重和向量数据库等非参数外部存储;Knowledge是静态沉淀,是经过整理的模式、本体和通用概括。Memory关注“刚刚发生什么”,Knowledge关注“通常如何运作”。两者边界可渗透,记忆可通过整合转化为知识,知识指导记忆形成。

    • Memory vs. Context:Context主要代表LLM内的即时执行环境,是模型处理特定推理步骤所需的有界计算缓冲区;Memory是应用层面的状态管理器,存在于模型短暂执行周期之外,用于维护用户交互和智能体历史的更大范围。Context在处理过程中被清除或覆盖,而Memory在系统接口层面持续存在。

    • Memory vs. Experience:Memory是特定交互的基础记录,保存原始数据点,是“发生过什么”的静态存储库,缺乏固有的通用性;Experience是更高阶的认知构建,将原始痕迹合成为抽象的、可迁移的模式,使智能体能将过去上下文中学到的经验泛化到新任务。通过反思和整合机制,智能体将原始情景记录提炼为精炼的认知策略,并存储回记忆中。

3 AI记忆的分类 (Taxonomy of AI Memory) 报告提出了一个“4W Memory Taxonomy”框架,即When-What-HoW-Which,系统地对AI记忆系统进行分类:

  • 3.1 按记忆生命周期分类 (Classification by Memory Lifecycle):考察记忆的时间跨度及其持久性。

    • 瞬时记忆 (Transient Memory):极短寿命,仅存在于即时输入处理期间,作为感知输入的临时缓冲区(如Transformer中的KV Cache,Voyager处理的Minecraft像素输入)。高波动性。

    • 会话记忆 (Session Memory):在单个任务执行或对话交互中持续存在,但在会话结束后不保留(如LLM的上下文窗口,MemoryOS中的会话级信息)。主动维护和操作信息。

    • 持久记忆 (Persistent Memory):超越个体会话而存在,可跨多个交互、任务甚至智能体实例访问。存储在外部数据库、文件系统或模型参数中。分为参数性 (parametric) 和非参数性 (non-parametric)。

  • 3.2 按记忆类型分类 (Classification by Memory Type):考察记忆捕获的知识性质及其功能角色。

    • 程序性记忆 (Procedural Memory):封装可执行知识,包括技能、行动工作流和工具利用(如MemGPT中的多步骤规划序列,MemoryBank捕获的决策模式,Generative Agents的社交互动模式)。

    • 声明性记忆 (Declarative Memory):存储事实性知识和感知观察(包括情景记忆和语义记忆)(如Generative Agents的记忆流,ReAct的观察结果,VideoAgent的视觉环境表示)。

    • 元认知记忆 (Metacognitive Memory):关于智能体自身思维和能力的知识,使其能追踪性能、反思行动、调整策略(如Reflexion的反射性语言记忆,Memento的自我认知记忆)。

    • 个性化记忆 (Personalized Memory):存储关于其他智能体和用户的信息,如偏好、行为和关系(如Mem0的用户偏好,MemoryBank的用户画像,MemoryOS的用户相关记忆系统)。

  • 3.3 按记忆存储方式分类 (Classification by Memory Storage):考察记忆的物理形式、访问模式和计算特性。

    • 隐式存储 (Implicit Storage):记忆存储在模型架构内部。

      • 参数性记忆 (Parametric Memory):直接嵌入在模型参数和权重中,通过训练过程(预训练、微调、LoRA)获得(如Toolformer的工具使用模式,Baijia的角色扮演能力)。优点是无需显式检索的快速推理,但存在灾难性遗忘、更新成本高、可解释性有限等问题。

      • 潜在记忆 (Latent Memory):通过模型学习到的潜在空间隐式表示(如MemoRAG的紧凑隐藏状态记忆,MemoryLLM的记忆token)。用于捕获复杂的、高维的知识表示。

    • 显式存储 (Explicit Storage):记忆存储在模型外部,形式如文本、向量或图。

      • 原始记忆 (Raw Memory):以文本、视觉或听觉格式存储信息(如对话历史、压缩对话)(如MemoryOS的文件文本存储,AMem的情景回忆)。可解释性最高,与LLM上下文窗口无缝集成。

      • 向量记忆 (Vector Memory):将信息存储为高维语义空间中的稠密连续值向量,通过相似性检索(如FAISS)实现高效检索(如MemOS的向量化记忆,Mem0的向量编码记忆)。

      • 图记忆 (Graph Memory):将信息编码为实体及其关系的显式网络,通常使用图数据库(如Neo4j)(如Zep的时间知识图谱,Mem0的图结构记忆,Cognee的知识图谱)。适用于复杂关系推理。

  • 3.4 按模态类型分类 (Classification by Modality Type):考察记忆处理的信息格式。

    • 单模态记忆 (Single-modal memory):处理单一模态的信息,文本是最成熟和广泛采用的形式。计算效率高,有效记忆跨度长(如MemoryOS的分层存储,Mem0的摘要和选择性更新,Zep的图基结构)。

    • 多模态记忆 (Multimodal memory):整合来自多种模态的信息(文本、图像、音频、视频),通常由多模态基础模型支持。

      • 原始模态表示 (Raw Modality Representation):将原始多模态数据编码为高维嵌入向量,以实现快速访问和特征复用(如Memory-QA的视觉记忆,Moviechat的嵌入式记忆,Optimus和JARVIS-1的嵌入式多模态记忆)。

      • 苏格拉底表示范式 (Socratic Representation Paradigm):利用多模态到文本的抽象策略,将异构多模态输入转换为结构化文本描述(如Ego-LLaVA的视觉经验到文本,MIRIX和MM-VID的视觉流描述,M3-Agent的文本情景和语义记忆)。语言作为统一的跨模态中介,降低存储开销,提高可解释性。

4 单智能体系统中的AI记忆 (AI Memory in Single-Agent System) 本节系统概述了为单智能体系统设计的AI记忆架构和功能机制。

  • 4.1 典型智能体记忆架构 (Typical Agent Memory Architecture):

    • 分层记忆架构 (Hierarchical Memory Architecture):通过分层存储结构和动态管理解决LLM有限上下文窗口与长期信息存储需求之间的矛盾,模拟人类记忆的分层组织(如HMT的分层感官、短时、长时记忆,H-MEM的四级语义抽象)。

    • 类操作系统记忆架构 (OS-like Memory Architectures):借鉴操作系统设计,采用分层存储和动态管理机制处理长期交互中的记忆一致性和资源分配(如MemGPT的页面调度技术,MemoryOS的“热度驱动”分段页面调度,MEMOS的MemCube单元统一管理异构知识)。

    • 认知演化记忆架构 (Cognitive Evolution Memory Architectures):模拟人类认知过程或融入心智理论 (Theory of Mind),使智能体能够开发可演化的记忆和策略系统进行自我优化(如AUGUSTUS的“编码-存储-检索-行动”闭环,Nemori的“预测-校准”循环)。

    • 图与时间记忆架构 (Graph and Temporal Memory Architectures):利用图结构(特别是知识图谱)来建模复杂关系依赖,编码时间动态以实现精确记忆生命周期管理,并提高推理准确性(如Zep的时间知识图谱,Mem0的图结构记忆,MemTree的树状分层结构)。

  • 4.2 AI记忆的基本功能 (Basic Functions of AI Memory):

    • 记忆存储 (Memory Storage):将分散的观察数据转换为结构化、持久的记忆记录,并建立标准化索引架构以支持后续检索。每个记忆单元配置时间戳、源标识符和结构化语义字段作为索引维度。存储内容分为程序性、声明性、元认知和个性化记忆。存储格式分为显式存储(可直接访问和解释)和隐式存储(编码在模型参数中)。

    • 记忆检索 (Memory Retrieval):从大规模记忆存储库中精确检索和整合信息,指导生成过程,减少幻觉并增强推理能力。主要分为:

      • 向量化检索 (Vector-based retrieval):将离散记忆内容映射到嵌入向量空间,通过语义相似性计算进行检索(如RAG架构)。

      • 分层检索 (Hierarchical retrieval):将记忆结构化为语义抽象层级,先定位宏观意图再深入细节。

      • 图基检索 (Graph-based retrieval):将记忆元素表示为相互连接的节点和边缘,模拟人类联想记忆机制(如Zep的Graphiti引擎,Mem0)。

      • 多模态检索 (Multimodal retrieval):将视觉信息与语义标签集成,扩展检索任务范围(如HippoMM对视听流的结构化,MovieChat的密集观察到稀疏记录的凝练)。

    • 记忆更新 (Memory Updating):修订、替换或整合现有存储内容,纠正错误或过时信息,并在新数据到达时整合碎片化知识。分为:

      • 增量更新 (Incremental updates):连续将新感知经验和信息注入记忆库而不干扰现有知识(如Zep的非损耗性增量合成,MemoryLLM/M+的潜空间扩张)。

      • 纠正性更新 (Corrective updates):旨在纠正模型内过时或错误知识(如H-MEM的动态权重调节,WISE的双参数记忆方案)。

      • 整合更新 (Consolidation updates):通过语义抽象和碎片化记忆的摘要优化存储结构,提高检索效率(如MemoryOS的“热度驱动”整合,LightMem的认知启发式睡眠时间整合,MemoryField的引力场融合)。

      • 遗忘更新 (Forgetting updates):通过算法主动删除或抑制冗余、敏感或低价值信息(如MEOW的“倒置事实”标签)。

  • 4.3 AI记忆的高级功能 (Advanced Functions of AI Memory):

    • 自我演化 (Self-Evolution):智能体通过持续交互和任务执行,动态迭代和优化所获取的知识、技能和行为策略的能力。将AI经验提炼为可演化的结构(如适应性目标、可调节约束、更新的因果关系、迭代行动模式),从而减少增量学习成本,提高对噪声和新颖性的鲁棒性(如LightSearcher的推理和工具调用轨迹蒸馏,Voyager的扩展可执行技能库)。

    • 关联 (Association):将多模态信号(文本、视觉、音频、交互)整合到连贯的情境模型中以构建记忆。通过融合(实体、时间戳、位置)、跨注意力机制和图式链接,减少歧义,提高引用解析,保持记忆跨帧/对话的连续性(如M3-Agent的动作/事实记忆图谱,Mem-0g的实体/关系保留)。

  • 4.4 单智能体记忆范式的局限性 (Limitations of Single-Agent Memory Paradigms):

    • 记忆错位 (Memory Misalignment):智能体对全局状态的感知分歧,导致输出基于过时或碎片化的私人记忆,损害系统连贯性。

    • 冗余循环 (The Redundancy Cycle):缺乏统一的进度记录,导致智能体重复劳动,浪费计算资源和存储空间。

    • 集体智能停滞 (Stagnation of Collective Intelligence):记忆隔离阻碍共享知识的积累,导致宝贵见解(如API解决方案)被孤立,后续智能体无法利用基础知识库,阻碍集体智能演化。

5 多智能体系统中的记忆机制 (Memory Mechanisms in Multi-Agent Systems, MAS) 本节探讨MAS中集体智能的基础架构,旨在弥合瞬时交互与持久知识之间的鸿沟,解决孤立记忆模型的局限性。

  • 5.1 MAS中的通信机制 (Communication Mechanisms in MAS):MAS中有效协作依赖于通过记忆共享实现的通信。主要有两种模态:显式符号交换和隐式状态协调。

    • 显式通信 (Explicit Communication):智能体之间符号信息的刻意传输。

      • 非结构化自然语言 (Unstructured Natural Language):智能体通过自然语言交互协调,由明确的角色提示指导(如ChatDEV)。灵活性高,但存在歧义、冗余、token消耗高的问题。

      • 结构化数据模式 (Structured Data Schemas):限制智能体间信息交换为预定义、机器可解释的格式(如MetaGPT强制使用UML图表)。高保真,可靠,将结构化数据作为记忆片段传输。

      • 动态分配 (Dynamic Allocation):信息不再依赖静态一对一交互,而是根据任务需求从共享记忆空间动态路由到相关智能体(如RCR-Router)。将信息生产与消费解耦,实现更灵活和可扩展的智能体协作。

    • 隐式通信 (Implicit Communication):在没有直接智能体间信息传递的情况下,通过个体智能体程序内部处理实现协调。

      • 潜在表示 (Latent Representation):智能体直接共享其内部的连续潜在表示(隐藏嵌入),而非离散的自然语言token(如LatentMAS,”Dense Communication”和”Thought-to-Thought”交互)。旨在实现高表达能力。

      • 压缩知识 (Compressed Knowledge):将压缩机制应用于模型的最终隐藏状态,以优化推理效率同时保留语义保真度(如Interlat框架)。通过高效压缩技术直接传输潜在状态,使智能体能更好地利用微妙的内部信息。

  • 5.2 MAS中的记忆共享机制 (Memory Sharing Mechanism in MAS):共享记忆是集体智能的基础,研究在两个粒度优化:任务级和步骤级。

    • 任务级记忆共享 (Task-Level Memory Sharing):从不同任务执行中整合经验,促进长期演化和跨领域迁移。

      • 同质经验积累 (Homogeneous Experience Accumulation):智能体团队在特定任务执行中积累经验,将原始历史数据转化为可演化的智慧或经验,通过记忆抽象提炼高层见解、程序技能和抽象策略(如G-Memory,SEDM的推理轨迹蒸馏,MemoryOS++1的群体经验搜索)。

      • 异质信息传输 (Heterogeneous Information Transfer):促进执行异构任务的不同智能体之间的信息交换,建立共享知识池,使智能体能检索和复制同伴已验证的解决方案路径(如MS的横向知识转移)。

    • 步骤级记忆共享 (Step-Level Memory Sharing):在单个协作工作流的粒度执行阶段,动态分配特定信息给相关智能体。旨在解决多智能体协作中的“噪声-上下文权衡”问题,通过上下文路由实现,而非广播全局状态。系统分析每个智能体的功能角色和任务的当前阶段,仅传递关键信息片段(如RCR-Router)。

6 AI记忆的评估 (Evaluation on AI Memory) 报告提出了评估LLM驱动智能体记忆的综合分类法,包括四个核心维度。

  • 6.1 记忆机制的评估指标 (Evaluation Metrics for Memory Mechanisms):

    • 记忆检索能力 (Memory Retrieval Capability):

      • 检索性能 (Retrieval Performance):直接评估记忆模块的质量,关注覆盖率、精确度和排名质量(指标:Recall@\(k\),Precision@\(k\),NDCG@\(k\))。

      • 响应正确性 (Response Correctness):通过下游任务的成功率间接评估检索质量,答案可直接从原文中找到(指标:Accuracy,F1-Score,BLEU-N,ROUGE-L)。

    • 动态更新能力 (Dynamic Updating Capability):

      • 记忆修改 (Memory Modification):评估系统在新冲突信息出现时正确修改现有记录的能力(指标:Update Accuracy,Hallucination Rate,Omission Rate)。

      • 记忆写入 (Memory Writing):评估将原始交互文本转换为存储记忆的忠实度和完整性(指标:Memory Recall,Memory Accuracy,F1-score)。

      • 记忆遗忘 (Memory Forgetting):指算法上选择性地删除特定数据对模型参数记忆的影响,同时保留不相关知识的完整性(指标:Truth Ratio,ROUGE-L Recall)。

    • 高级认知能力 (Advanced Cognitive Capability):

      • 泛化能力 (Generalization):LLM有效将所获知识或技能迁移和应用于未见任务的能力(指标:Success Rate)。

      • 时间感知 (Temporal Perception):智能体在交互中维护和更新事件和实体状态连贯时间线的能力(指标:Kendall’s \(\tau\),Accuracy)。

      • 个性化 (Personalization):智能体利用长时记忆根据用户历史、身份和行为模式提供定制服务的能力(指标:Accuracy,Human or LLM-based Scoring)。

    • 系统效率 (System Efficiency):评估实际部署的工程可行性,对可扩展性和用户体验至关重要。

      • 延迟 (Latency):特定操作(如检索或写入)的精确时间成本(指标:Percentile Latency)。

      • token开销 (Token Overhead):将检索到的记忆打包到提示中单轮交互所消耗的token总数(指标:Tokens Consumed)。

      • 存储效率 (Storage Efficiency):记忆模块在有限约束下尽可能减少物理存储占用,同时保留所有关键信息的能力(指标:Storage Cost)。

  • 6.2 评估基准 (Evaluation Benchmarks):

    • 静态记忆评估 (Static Memory Evaluation):强调从固定、非更新输入中检索记忆(例如:LoCoMo、EpisodicGen、LongBench、RULER、HotpotQA)。

    • 动态记忆评估 (Dynamic Memory Evaluation):评估智能体管理记忆更新和适应演变上下文的核心能力(例如:MemoryAgentBench、MemoryBench、HaluMem、DialSim、BEAM)。

    • 个性化记忆评估 (Personalization Memory Evaluation):评估记忆合成和维护演变用户档案及个性化偏好的关键能力(例如:MemBench、MemSim、PERSONAMEM、LongMemEval、PerLTQA、PREFEVAL)。

    • 环境记忆评估 (Environment Memory Evaluation):评估记忆在复杂外部环境中支持顺序行动的实际效果(例如:WebChoreArena、MT-Mind2Web、StoryBench)。

    • 多模态记忆评估 (Multimodal Memory Evaluation):测试记忆对异构非文本模态的时空信息进行对齐和检索的能力(例如:Video-MME、MLVU、LVBench、M3-Bench、EgoSchema、EgoLifeQA、Memory-QA、MMNeedle)。

  • 6.3 评估挑战 (Evaluation Challenges):

    • 数据集构建 (Dataset Construction):缺乏统一、高质量的数据集。

    • 性能归因的模糊性 (Ambiguity in Performance Attribution):难以隔离记忆的贡献。

    • 评估指标的困境 (Dilemma of Evaluation Metrics):没有单一指标能捕捉其全部复杂性。

该报告通过综合认知理论与工程基准,为AI记忆的理论理解和技术发展提供了一条连贯的路线图。

Next Previous

© Copyright 2010-2025, 新溪-gordon.

备案号 京ICP备16018553号
Built with Sphinx using a theme provided by Read the Docs
.