新溪-gordon
V2026.02
  • 通用定义
    • 评测标准
      • 常用评测标准
      • 准确率(Accuracy)
      • 精确率(Precision, 精准率)
      • 召回率(Recall)
      • F1 Score
      • 可视化精度和召回率
        • 混淆矩阵(confusion matrix)
        • 受试者特征曲线(ROC 曲线,Receiver Operating Characteristic curve)
      • Recall@k
        • 核心思想一句话概括
        • 公式与计算
        • 举例说明
        • 为什么需要 Recall@k?
        • 重要特性和注意事项
        • 总结
      • Precision@k
        • 核心思想一句话概括
        • 公式与计算
        • 举例说明
        • 为什么需要 Precision@k?
        • 重要特性和注意事项
        • 总结
      • HR@k
        • 一、核心定义:什么是 HR@N?
        • 二、计算公式
        • 三、举例说明
        • 四、指标的特点与解读
        • 五、与其他指标的关系和对比
        • 六、典型应用场景
        • 总结
      • NDCG@k
        • 核心思想一句话概括
        • 从 CG 到 DCG 再到 NDCG
        • 计算步骤举例
        • 为什么 NDCG@k 如此重要?
        • 总结
      • MRR@k
        • 一、核心概念:什么是 MRR@K?
        • 二、为什么需要 MRR@K?
        • 三、如何计算 MRR@K?
        • 四、举例说明
        • 五、MRR@K 的特点和注意事项
        • 六、与其他指标的区别
        • 为什么 MRR@10 常和 Recall@1000 一起使用?
        • 总结
        • 总结
      • MAP@k
        • 一句话理解
        • 拆解 acronym (首字母缩略词)
        • 通过一个例子彻底搞懂
        • 为什么MAP@K如此重要?
        • 总结
      • AUC (Area Under the ROC Curve)
        • 为什么要用AUC?
        • 详细拆解:
      • LogLoss(Logarithmic Loss)
        • 为什么要用LogLoss?
        • 详细拆解:
        • 总结与对比
        • 在实际业务中如何看?
      • Jaccard 相似系数
        • 一、是什么?
        • 二、计算公式
        • 三、核心性质
        • 四、一个简单的例子
        • 五、Jaccard 距离
        • 六、主要应用场景
        • 七、优缺点
        • 总结
      • PASS@k
        • 一、定义直观解释
        • 二、数学定义
        • 三、为什么有用
        • 五、总结一句话
    • 通用记忆
      • 总结与展望
      • 记忆类型
        • 短期记忆(Short-Term Memory)
        • 长期记忆
        • 情节记忆(Episodic Memory)
        • 语义记忆(Semantic Memory)
        • 工作记忆(Working Memory)
        • 程序性记忆(Procedural Memory)
        • 感觉记忆(Sensory Memory)
        • 图示
        • 长记忆的必要性与挑战
        • 参考
      • 【定义】Cattell–Horn–Carroll理论
        • 背景:核心内容与演变
        • 三层层级系统
        • CHC 理论的意义
        • 总结
      • Reciprocal Rank Fusion (RRF) 算法
        • 公式
        • 计算步骤
        • 优点与缺点
        • 应用场景
        • 总结
  • 综述论文
    • 近邻搜索
      • 2508.09834❇️_Overview_LLM: Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Linear Sequence Modeling
        • 3 Sparse Sequence Modeling
        • 4 Efficient Full Attention
        • 5 Sparse Mixture-of-Experts
        • 6 Hybrid Architectures
        • 7 Diffusion Large Language Models
        • 8 Applications to Other Modalities
        • 9 Conclusion and Future Directions
  • 评测基准
    • 评测基准
      • 02xx.xxxxx_BLEU: a Method for Automatic Evaluation of Machine Translation
        • 总结
        • Abstract
        • 示例讲解
        • 1. Introduction
        • 2.The Baseline BLEU Metric
        • 3.The BLEU Evaluation
        • 4.The Human Evaluation
        • 5.BLEU vs The Human Evaluation
        • 6.Conclusion
      • 0401.xxxxx_ROUGE: A Package for Automatic Evaluation of Summaries
        • 总结
        • Abstract
        • 1.Introduction
        • 2.ROUGE-N: N-gram Co-Occurrence Statistics
        • 3.ROUGE-L: Longest Common Subsequence
        • 4 ROUGE-W: Weighted Longest Common Subsequence
        • 5.ROUGE-S: Skip-Bigram Co-Occurrence Statistics
        • 6 Evaluations of ROUGE
        • 7 Conclusions
      • 1803.01937_ROUGE2.0: Updated and Improved Measures for Evaluation of Summarization Tasks
        • Abstract
        • 1. Problems with the current ROUGE measures
        • 2. ROUGE 2.0
      • 1804.08771_SacreBLEU: A Call for Clarity in Reporting BLEU Scores
        • BLEU
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Problem Description
        • 3 A way forward
        • 4 Summary
      • 2303.08896_SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Background and Related Work
        • 3 Grey-Box Factuality Assessment
        • 4 Black-Box Factuality Assessment
        • 5 SelfCheckGPT
        • 6 Data and Annotation
        • 7 Experiments
        • 8 Conclusions
        • Limitations
        • Ethics Statement
        • Acknowledgments
        • Appendix A Models and Implementation
        • Appendix B SelfCheckGPT with QA
        • Appendix C SelfCheckGPT with Prompt
        • Appendix D Additional Experimental Results
      • 2306.05685_Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 MT-Bench and Chatbot Arena
        • 3 LLM as a Judge
        • 4 Agreement Evaluation
        • 5 Human Preference Benchmark and Standardized Benchmark
        • 6 Discussion
        • 7 Conclusion
        • Appendix A Prompt templates
        • Appendix B Case Study
        • Appendix C Data Collection
        • Appendix D Additional Experimental Results
        • Appendix E Training Details of Vicuna Models
        • Appendix F Exploring Vicuna as a judge
      • 2403.04132_Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 2 相关工作(Related Work)
        • 3 Human Preference Data Collection
        • 3 人类偏好数据收集
        • 4 From Pairwise Comparisons to Rankings
        • 5 Efficient Approximate Ranking
        • 6 Data Analysis
        • 7 Experiments
        • 7 实验
        • 8 Discussion
        • 8 讨论
        • 9 Conclusion
        • 9 结论
        • Acknowledgments
        • 致谢
        • Appendix A Confidence Interval Simulation Study
        • 附录 A 置信区间模拟研究
        • Appendix B The Nonparametric Bradley-Terry Model
        • 附录 B:非参数 Bradley-Terry 模型
        • Appendix C Valid P-Value
        • 1. p值的定义
        • 2. p值的等价表达式
        • 3. 有效性证明的关键步骤
        • 4. 证明结论
        • 总结
        • Appendix D Sample Prompts
      • 2404.04475_AlpacaEval LC: A Simple Way to Debias Automatic Evaluators
        • 总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 Background and Problem Setting
        • 3 Length-Controlled AlpacaEval
        • 4 Results
        • 5 Discussion
      • 2511.03506_HaluMem: Evaluating Hallucinations in Memory Systems of Agents
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Problem Definition
        • 4 Methodology for Constructing HaluMem
        • 5 Evaluation Framework of HaluMem
        • 6 Experiments
        • 7 Conclusion
        • Appendix A Supplementary Details of HaluMem
        • Appendix B Special Configurations for Some Memory Systems
        • Appendix C Annotation Guidelines and Instructions
        • Appendix D Prompts
        • Appendix E Examples from the Process of Constructing HaluMem
    • 数据集-Agent
      • 2308.03688_AgentBench: Evaluating LLMs as Agents
        • 总结
        • From Deepseek
        • 数据集示例
      • 2312.14033_T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
        • 总结
        • Abstract
        • 1 Introduction
        • 2 T-Eval
        • 3 Experiments
        • 4 Discussion
        • 5 Related Work
        • 6 Conclusion
        • Appendix A T-Eval Benchmark Details
        • Appendix B Implementation Details
        • Appendix C Detailed Evaluation Metrics
        • Appendix D API Documentation
      • 2406.12045_τ-bench: A Benchmark for Tool-Agent-User
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.τ-bench: A benchmark for T ool-A gent-U ser Interaction
        • 4. Benchmark Construction
        • 5.Experiments
        • 6.Disscussion
      • 2506.07982_𝜏²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 \(\tau^{2}\)-bench: Evaluating Agents in a Dual-Control Environment
        • 4 Experiments
        • 5 Conclusion
        • Broader Impact
        • Appendix
        • Appendix A Telecom Domain
        • Appendix B Verifying Original \(\tau^{2}\)-bench
        • Appendix C Prompts
        • Appendix D Domain Policies
        • Appendix E User Simulator Quality
    • 数据集-QA
      • 2109.07958_TruthfulQA: Measuring How Models Mimic Human Falsehoods
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 The TruthfulQA Benchmark
        • 3 Experiments
        • 4 Results
        • 5 Discussion
        • 6 Related Work
        • 7 Conclusion
        • 8 Ethics and Impact
        • Appendix A Additional examples from TruthfulQA
        • Appendix B Additional results
        • Appendix C Dataset construction
        • Appendix D Human evaluations
        • Appendix E Prompts
        • Appendix F Checking for data quality and disagreement
      • 2311.12022_GPQA: A Graduate-Level Google-Proof Q&A Benchmark
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Data Collection
        • 3.Dataset Analysis
        • 4.Baseline
        • 5.Related Work
        • 6.Limitations
        • 7.Conclusion
      • 2411.04368_SimpleQA: Measuring short-form factuality in large language models
        • Abstract
        • 1.Introduction
        • 2.Data Collection and Verification
        • 4.Measuring calibration
        • Appendix B Guessing strategy and F-score
    • 数据集-长文本
      • 2308.14508_LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
        • From Deepseek
      • 2402.05136_LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3 LV-Eval Benchmark
        • 4 Evaluation
        • Appendix
        • Appendix C Detailed Evaluation Results
        • Appendix D Detailed Ablation Results
      • 2404.06654_RULER: What’s the Real Context Size of Your Long-Context Language Models?
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 The Ruler Benchmark
        • 4 Experiments & Results
        • 5 Task Error Analysis
        • 6 Model Analysis
        • 7 Conclusion
        • 8 Limitations
        • Appendix A Models
        • Appendix B Task Configurations
        • Appendix C Task Correlation Analysis
        • Appendix D Prompt Templates
        • Appendix E Passkey Retrieval and Vanilla NIAH Results
        • Appendix F Additional Results
      • 2407.11963_NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Tasks and Datasets
        • 4 Experiments
        • 4.1.5 Impact of Language_ Which Model Performs Better under the Bilingual Scenario_
        • 5 Conclusion and Future Work
        • Appendix A Evaluated Models
        • Appendix B NeedleBench Prompt Examples
        • Appendix C Error Analysis Examples
    • 数据集-RAG
      • 1809.09600_HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Data Collection
        • 3 Processing and Benchmark Settings
        • 4 Dataset Analysis
        • 5 Experiments
        • 6 Related Work
        • 7 Conclusions
        • Appendix A Data Collection Details
        • 附录A 数据收集细节
        • Appendix B Further Data Analysis
        • Appendix C Full Wiki Setting Details
      • 2401.15391_MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
        • 总结
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 RAG with multi-Hop queries
        • 3 A Benchmarking Dataset: MultiHop-RAG
        • 4 Benchmarking RAG system using MultiHop-RAG
        • 5 Related Work
        • 6 Conclusion
        • Limitations
        • Appendix A Appendix A: GPT-4 Prompts Used for Data Generation
        • Appendix B: Dataset Examples
    • 数据集-图
      • 2402.07630_G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering
        • 总结
        • 示例讲解
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Formalization
        • 4 Proposed GraphQA Benchmark
        • 5 G-Retriever
        • 6 Experiments
        • 7 Conclusion
        • Acknowledgment
        • Appendix A Impact Statements
        • Appendix B Experiment
        • Appendix C GraphQA Benchmark
        • Appendix D Graph Retrieval-Augmented Generation (GraphRAG)
        • Appendix E Discussion on the Complexity
        • 附录E 复杂性讨论总结
        • Appendix F Hallucination in Graph LLMs
        • Appendix G Demonstrations
    • 数据集-编程
      • 2107.03374_HumanEval: Evaluating Large Language Models Trained on Code
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Evaluation Framework
        • 3.Code Fine-Tuning
        • 4.Supervised Fine-Tuning
        • 5.Docstring Generation
        • 6.Limitations
        • 7.Broader Impacts and Hazard Analysis
        • 8.Related Work
        • 9.Conclusions
      • 2108.07732_MBPP: Program Synthesis with Large Language Models
        • Abstract
        • 1 Introduction
        • 2 Datasets
        • 3 Model and Methods
        • 4 MBPP Synthesis Results
        • 5 Human-Model Collaboration Results
        • 6 Program Execution Results
        • 7 MathQA Results
        • 8 Related Work
        • 9 Risks and Limitations
        • 10 Conclusion
        • Appendix A Appendix
      • 2310.06770_SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 SWE-bench
        • 3 SWE-Llama: Fine-tuning CodeLlama for SWE-bench
        • 4 Experimental Setup
        • 5 Results
        • 6 Related Work
        • 7 Discussion
        • 8 Ethics Statement
        • 9 Reproducibility Statement
        • Appendix
        • Appendix A Benchmark Details
        • Appendix B Additional Details on Training SWE-Llama
        • Appendix C Additional Results
        • Appendix D Additional Experimental Details
        • Appendix E Societal Impact
        • Appendix F In-depth Analysis of SWE-Llama Generations
      • 2402.16694_HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
        • A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
        • Abstract
        • 1.   Introduction
        • 2.   Related work
        • 3.   HumanEval-XL
        • 4.   Experiments
        • 5.   Conclusion
        • Acknowledgments
        • Appendix A Experiment Settings
        • Appendix B Comprehensive Experiment Results
      • 2403.07974_LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
        • 总结
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 Holistic Evaluation
        • 3 Benchmark Curation
        • 4 Experiment Setup
        • 5 Results
        • 6 Related Work
        • 7 Limitations
        • 8 Conclusion
        • Appendix A Dataset
        • Appendix B UI
        • Appendix C Experimental Setup
        • Appendix D Results
        • Appendix E Qualitative Examples
      • 2407.10499_CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Works
        • 3 CIBench
        • 4 Experiments
        • 5 Conclusion
        • Appendix A Dataset Details
        • Appendix B Construction Prompts and Rules
        • Appendix C Experiment Example Demo
        • Appendix D Subjective Visualization Evaluation
        • Appendix E Dataset Error Analysis
        • Appendix F Human Annotator
        • Appendix G Ethical Consideration
      • 2410.03859_SWE-bench-Multimodal: Do AI Systems Generalize to Visual Software Domains?
        • 总结
        • Abstract
        • 1 Introduction
        • 2 SWE-bench Multimodal
        • 3 Evaluating on SWE-bench M
        • 4 Results
        • 5 Related Work
        • 6 Conclusion
        • Appendix A Dataset
        • Appendix B Collection
        • Appendix C Experiments
        • Appendix D Human Validation
        • Appendix E Limitations
      • 2410.06992_SWE-Bench+: Enhanced Coding Benchmark for LLMs
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Robustness Analysis of SWE-Bench
        • 3 Building SWE-Bench+
        • 4 Robustness of SWE-Bench+
        • 5 Effectiveness-aware Evaluation
        • 6 Related Work
        • 7 Conclusion
      • 2501.01257_CodeForces: Benchmarking Competition-level Code Generation of LLMs on CodeForces
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 CodeForces Benchmark
        • 4 Evaluation on Existing LLMs
        • 5 Analysis Experiments
        • 6 Discussion
        • 7 Conclusion
        • 8 Ethical Statement
        • Appendix A Model Cards
        • Appendix B Decoding Hyperparameters
        • Appendix C Analysis of Our Elo Rating Calculation System
        • Appendix D Human-comparable Elo Rating
        • Appendix E Problem Demonstration
        • Appendix F Special Judge
    • 数据集-数学
      • 2103.03874_MATH: Measuring Mathematical Problem Solving With the MATH Dataset
      • 2110.14168_GSM8K: Training Verifiers to Solve Math Word Problems
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Dataset
        • 3 Related Work
        • 4 Methods
        • 5 Additional Experiments
        • 6 Conclusion
        • Appendix A Dataset Details
        • Appendix B Hyperparameters
        • Appendix C Calculator Annotations
        • Appendix D Example Model Solutions
        • Appendix E Verifier Details
        • Appendix F Verifier Visualization
      • 2405.12209_MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
        • Abstract
        • 1 Introduction
        • 2 Methodology
        • 3 Experiments and Analysis
        • 4 Discussion
        • 5 Related Work
        • 6 Conclusion
        • 7 Limitations
        • 8 Ethical Considerations
        • Appendix A MathBench Statistics
        • Appendix B Detailed Experimental Results
        • Appendix C Extra Analysis
    • 数据集-图片
      • 2306.13394_MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 MME Evaluation Suite
        • 3 Experiments
        • 4 Analysis
        • 5 Conclusion
      • 2307.06281_MMBench: Is Your Multi-modal Model an All-around Player?
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 The construction of MMBench
        • 4 Evaluation Strategy
        • 5 Evaluation Results
        • 6 Conclusion
        • Appendix A More Details about the Data
        • Appendix B More Details on MMBench Construction
        • Appendix C More Details on LLM-based Choice Extraction
        • Appendix D Evaluation Settings and Results
      • 2307.16125_SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 SEED-Bench
        • 4 Evaluation Results
        • 5 Conclusion
      • 2311.12793_ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 ShareGPT4V Dataset
        • 4 ShareGPT4V-7B Model
        • 4.1 模型架构
        • 4.2 预训练
        • 4.3 监督微调(SFT)
        • 总结
        • 5 Experiments
        • 6 Conclusion
        • Appendix A Data Sources
        • Appendix B Caption Analysis
        • Appendix C Prompts
        • Appendix D Examples
      • 2506.18095_ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
        • 总结
        • Abstract
        • 1 Introduction
        • 2 ShareGPT-4o-Image
        • 3 Janus-4o: Fine-Tuning with ShareGPT-4o-Image
        • 4 Experiments
        • 5 conclusion
        • Appendix A Related Work
        • Appendix B Image Generation Categories
        • Appendix C Prompts for Generation
        • Appendix D Document Pipeline
        • Appendix E Ethical Considerations and Societal Impact
    • 数据集
      • 1804.07461_GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 2 相关工作总结
        • 3 Tasks
        • 3.1 Single-Sentence Tasks
        • 3.2 Similarity and Paraphrase Tasks
        • 3.3 Inference Tasks
        • 3.4 Evaluation
        • 4 Diagnostic Dataset
        • 4 诊断数据集(Diagnostic Dataset)
        • 5 Baselines
        • 5 Baselines 总结
        • 6 Benchmark Results
        • 6 Benchmark Results(基准测试结果)
        • 7 Analysis
        • 8 Conclusion
        • 8 结论
        • Acknowledgments
        • 致谢
        • Appendix A Additional Benchmark Details
        • Appendix B Additional Baseline Details
        • Appendix B Additional Baseline Details(附录B 其他基线细节)
        • Appendix C Development Set Results
        • Appendix C Development Set Results 总结
        • Appendix D Benchmark Website Details
        • Appendix D Benchmark Website Details(附录 D 基准网站详情)
        • Appendix E Additional Diagnostic Data Details
        • 附录 E:额外的诊断数据细节
        • 总结
      • 2009.03300_MMLU: Measuring Massive Multitask Language Understanding
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.A Multitask Test
        • 4.Experiments
        • 5.Discussion
        • 6.Conclusion
      • 2305.08322_C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
        • 总结
        • C-Eval_ A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
        • Abstract
        • 1 Introduction
        • 2 The C-Eval Evaluation Suite
        • 3 Experiment
        • 4 Related Work
        • 5 Discussion
        • Acknowledgement
        • Appendix A Author Contributions
        • Appendix B Detailed Stats of C-Eval
        • Appendix C Explanation Data Generation
        • Appendix D Evaluation Prompts
        • Appendix E Details of the models being evaluated
        • Appendix F Breakdown of Model Performance
        • Appendix G Option Bias
        • Appendix H Compute and Resources Used for Evaluation
      • 2306.09212_CMMLU: Measuring massive multitask language understanding in Chinese
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 CMMLU
        • 4 Experiments
        • Impact of model size on performance
        • 5 Conclusion
        • Appendix A Comparison to concurrent benchmarks
        • Appendix B CMMLU Subjects
        • Appendix C CMMLU Examples
        • Appendix D CMMLU Difficulty Distribution
        • Appendix E Emergent Ability shown in CMMLU subjects
        • Appendix F Models being Evaluated
        • Appendix G Strategies for Estimating Model Choices
        • Appendix H Regular expressions matching algorithmsl
        • Appendix I Correlation to other Benchmarks
        • Appendix J Breakdown of Model Performance
        • J.3 The effect of chain-of-thought prompt
      • 2307.15020_SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 SuperCLUE Benchmark
        • 4 Experiments
        • 5 Additional Analysis
        • 6 Conclusion
        • Appendix A Evaluation Process
        • Appendix B Capability Categories
      • 2311.12983_GAIA: a benchmark for General AI Assistants
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Related work
        • 3.GAIA
        • 4.LLMs results on GAIA
        • 5.Discussion
        • 6.Limitations
        • Appendix A Extended related work
        • Appendix C Extended description of GAIA
        • Appendix D Extended description of our question design framework
      • 2311.18743_AlignBench: Benchmarking Chinese Alignment of Large Language Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Dataset
        • 3 Methods
        • 4 Human Evaluation on AlignBench
        • 5 AlignBench: Benchmarking Results
        • 6 Related Work
        • 7 Conclusion
        • Appendix A Appendix
      • 2404.07972_OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
        • 总结
        • Abstract
        • 1. Introduction
        • 2. OSWORLD Environment
        • 3. OSWORLD Benchmark
        • 4. Benchmarking LLM and VLM Agent Baselines
        • 5. Analysis
        • 6. Related Work
        • 7. Conclusion and Future Work
        • A. Details of OSWORLD Environment
        • C. Details of Baseline Methods
        • D. Examples of Qualitative Analysis
      • 2406.04770_WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
        • 总结
        • Abstract
        • 1 Introduction
        • 2 WildBench Data Curation
        • 3 Automatic Evaluation with WildBench
        • 4 Results & Analysis
        • 5 Related Works
        • 6 Conclusion and Future Directions
        • Appendix A Task Categories
        • Appendix B More Information on WildBench Data
        • Appendix C More Information on WildBench Evaluation
        • Appendix D Prompt Template for Pairwise Evaluation Metric WB-Reward
        • Appendix E Prompt Template for Individual Evaluation Metric WB-Score
        • Appendix F Full WildBench Leaderboard
      • 2501.14249_HLE: Humanity’s Last Exam
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.Dataset
        • 4.Evaluation
        • 5.Discussion
  • 记忆
    • 综述
      • 2505.00675_❇️Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Memory Foundations
        • 3 From Operations to Key Research Topics
        • 4 Memory In Practice
        • 总体总结:
        • 5 Memory in Humans and AI Systems
        • 6 Open Challenges and Future Directions
        • Appendix A GPT-based Pipeline Selection
        • Appendix B Relative Citation Index
        • Appendix C Chord Analysis of Interactions Among Memory Types, Operations, Topics, and Venues
      • 2512.13564❇️_MemorySurvey: Memory in the Age of AI Agents: A Survey
        • 总结
        • From Moonlight
        • Abstract
        • 1. Introduction
        • 2. Preliminaries: Formalizing Agents and Memory
        • 3. Form: What Carries Memory?
        • 4. Functions: Why Agents Need Memory?
        • 5. Dynamics: How Memory Operates and Evolves?
        • 6. Resources and Frameworks
        • 7. Positions and Frontiers
        • 8. Conclusion
    • RL
      • 2508.19828_Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
        • 总结
        • From Moonlight
        • Prompt
        • Algorithm
        • Abstract
        • 1 Introduction
        • 1 引言(Introduction)
        • 2 Related Work
        • 2 相关工作(Related Work)总结
        • 3 Method
        • 3 方法总结
        • 4 Experiments
        • 4 实验(Experiments)
        • 5 Conclusion
        • 5 结论(Conclusion)
        • Limitations
        • 局限性(Limitations)总结
        • Appendix A Case Study of Behavior of Agents before and after Fine-tuning
        • 附录A:微调前后智能体行为的案例研究总结
        • Appendix B Dataset Details
        • Appendix B Dataset Details
        • Appendix C Prompts
        • 附录 C 提示(Prompts)
        • Appendix D Implementation Details
        • 附录 D 实现细节(总结)
        • Appendix E Alogirthm
        • Appendix E Algorithm
        • Appendix F Extended Results and Type-Level Analysis
        • 附录 F 扩展结果与类型级分析
      • 2601.01885_AgeMem: Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Background and Related Work
        • 3 Method
        • 4 Experiments
        • 5 Conclusion
        • Limitations
        • Appendix A Detailed Design and Implementation of AgeMem
        • Appendix B Case Study: AgeMem in Action
        • Appendix C Experimental Implementation
        • Appendix D Additional Results
    • 通用
      • 1911.00172_kNN-LMs: Generalization through Memorization: Nearest Neighbor Language Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Nearest Neighbor Language Modeling
        • 3 Experimental Setup
        • 4 Experiments
        • 5 Tuning Nearest Neighbor Search
        • 6 Analysis
        • 7 Related Work
        • 8 Conclusion and Future Work
        • Appendix A Appendix
      • 2304.13343_SCMemory: Enhancing Large Language Model with Self-Controlled Memory Framework
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Self-Controlled Memory
        • 总结
        • 3 Experiments
        • 4 Related Work
        • 5 Conclusion
        • Limitations
        • Ethical Considerations
        • Appendix A Prompt List
        • Appendix B Long-term Dialogue QA Cases
        • Appendix C Book Summarization Cases
        • Appendix D Meeting Summarization Cases
      • 2305.11792_Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Method
        • 总结
        • 4 Datasets Collection
        • 5 Experiment
        • 5.1 LLMs 家族与评估细节
        • 5.2 主要实验
        • 5.3 人工评估
        • 6 Analysis
        • 7 Discussion
        • 8 Conclusion
        • Limitations
        • Ethics Statement
        • Acknowledgement
        • Appendix A Templates
        • Appendix B Different Method of Evaluation
        • Appendix C Discussion
        • Appendix D Helpfulness Analysis of Planning Step
      • 2305.17144_GITM: Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory
        • From Deepseek
        • From Deepseek
      • 2306.03901_ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 ChatDB
        • 4 Evaluation
        • 4 评估
        • 5 Conclusion
        • 5 结论
      • 2308.10144_ExpeL: LLM Agents Are Experiential Learners
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Preliminaries
        • 4 ExpeL: An Experiential Learning Agent
        • 5 Experiments
        • 6 Conclusion and Limitations
        • Acknowledgement
        • Appendix A Detailed Related Works
        • Appendix B Broader Impacts
        • Appendix C Computational Resources
        • Appendix D Environment Details
        • Appendix E Environment, Agent, Retrieval Parameters
        • Appendix F Prompt Templates
        • Appendix G Example Insights
        • Appendix H Emergent Abilities Showcase
        • Appendix I Example Trajectories
        • Appendix J Additional Quantitative Results
      • 2309.02427_❇️CoALA: Cognitive Architectures for Language Agents
        • 总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 Background: From Strings to Symbolic AGI
        • 3 Connections between Language Models and Production Systems
        • 4 Cognitive Architectures for Language Agents (CoALA): A Conceptual Framework
        • 5 Case Studies
        • 6 Actionable Insights
        • 7 Discussion
        • 8 Conclusion
      • 2310.08560_MemGPT: Towards LLMs as Operating Systems
        • 总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 MemGPT (MemoryGPT)
        • 总结
        • 3 Experiments
        • 4 Related Work
        • 5 Conclusion
        • 6 Appendix
      • 2311.08719_Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory
        • 总结
        • From Deepseek
        • Abstract
        • 1 INTRODUCTION
        • 2 RELATED WORK
        • 3 METHODOLOGY
        • 4. Experiment
        • 5. Conclusion
      • 2312.17653_❇️LARP: Language-Agent Role Play for Open-World Games
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Cognitive Architecture
        • 4 Environment Interaction
        • 5 Personalities
        • 6 Discussions
        • 7 Conclusion
      • 2402.04624_MemoryLLM: Towards Self-Updatable Large Language Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Preliminaries
        • 3 MemoryLLM
        • 4 Experiments
        • 5 Related Work
        • 6 Conclusion and Future Work
        • Impact Statement
        • Appendix A Details in Methodology
        • Appendix B Implementation Details
        • Appendix C Additional Experiments
      • 2402.09727_ReadAgent: A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
        • 总结
        • 别人的总结
        • From Deepseek
      • 2404.11672_MemLLM: Finetuning LLMs to Use Explicit Read-Write Memory
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related work
        • 3 Methodology
        • 4 Experiments
        • 5 Conclusion
        • Limitations
        • Appendix A Memory-write Decoding Method
        • Appendix B Filtering Ambiguous Queries
        • Appendix C Memory-read Data Generation
        • Appendix D Hyperparameters Details
        • Appendix E Filtering Prompt
      • 2404.13501_LLM_Agent_Memory_Survey: A Survey on the Memory Mechanism of Large Language Model based Agents
        • 总结
        • 别人的总结
        • Abstract
        • 1 Introduction
        • 2 Related Surveys
        • 3 What is the Memory of LLM-based Agent
        • 4 Why We Need the Memory in LLM-based Agent
        • 5 How to Implement the Memory of LLM-based Agent
        • 5.1 Memory Sources(记忆来源)
        • 5.2 Memory Forms(记忆形式)
        • 5.3 Memory Operations(记忆操作)
        • 6 How to Evaluate the Memory in LLM-based Agent
        • 7 Memory-enhanced Agent Applications
        • 8 Limitations & Future Directions
        • 9 Conclusion
        • 9 结论
        • Acknowledgement
        • 致谢
      • 2407.01178_❇️Memory3: Language Modeling with Explicit Memory
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 | Memory Circuitry Theory
        • 3 | Design
        • 4 | Pretraining Data
        • 5 | Pretrain
        • 6 | Fine-tuning and Alignment
        • 6 | 微调与对齐
        • 7 | Evaluation
        • 8 | Conclusion
        • 8 | 结论
        • Acknowledgement
        • 致谢
        • Appendix A Cost Estimation
        • A.1 | Implicit Memory
        • A.2 | Explicit Memory
        • A.3 | External Information
        • 总结与对比
        • 附注:知识保留问题(Remark 9)
        • Appendix B Vector Compression
        • 附录 B 向量压缩
        • Appendix C Supplementary Evaluation Results
        • 附录 C 补充评估结果总结
      • 2410.15665_LongTermMemory: The Foundation of AI Self-Evolution
        • 总结
        • 别人的总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 AI Self-Evolution
        • 总结
        • 3 LTM for AI Self-Evolution
        • 4 How to Construct LTM?
        • 5 How can LTM be used to achieve model self-Evolution?
        • 6 The Practice of model self-evolution based on LTM
        • 7 Our Future Plans
        • 8 Conclusion
        • Appendix A RTG prompt
      • 2502.00592_M+: Extending MemoryLLM with Scalable Long-Term Memory
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Methodology
        • 4 Experiments
        • 5 Conclusion and Future Work
        • Impact Statement
        • Appendix A Justifications of using deepspeed-stage-2
        • Appendix B Experiments on datasets NaturalQA
        • Appendix C Statistics of the Dataset of Long Documents
        • Appendix D Additional Training Details
        • Appendix E Discussions
      • 2502.12110_A-Mem: Agentic Memory for LLM Agents
        • 总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Methodolodgy
        • 4 Experiment
        • 5 Conclusions
        • 6 Limitations
        • Appendix A Experiment
        • Appendix B Prompt Templates and Examples
      • 2504.15965_❇️From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Overview
        • 3 Personal Memory
        • 4 System Memory
        • 5 Open Problems and Future Directions
        • 6 Conclusion
      • 2504.19413_❇️Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
        • 总结
        • 别人的总结
        • Abstract
        • 1 Introduction
        • 2 Proposed Methods
        • 总结
        • 3 Experimental Setup
        • 总结
        • 4 Evaluation Results, Analysis and Discussion.
        • 5 Conclusion and Future Work
        • 6 Acknowledgments
        • Appendix A Prompts
        • Appendix B Algorithm
        • Appendix C Selected Baselines
      • 2505.22101_MemOS: An Operating System for Memory-Augmented Generation (MAG) in LLM (Short Version)
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Memory in Large Language Models
        • 3 MemOS Design Philosophy
        • 4 MemOS
        • 4.1 MemOS 中的记忆类型
        • 4.2 记忆立方体(MemCube):核心资源
        • 4.3 MemOS 架构
        • 4.4 系统执行流程
        • 总结
        • 5 Conclusion
      • 2506.06326❇️_MemoryOS: Memory OS of AI Agent
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 MemoryOS
        • 4 Experiments
        • 5 Conclusion
      • 2505.22101_❇️MemOS: A Memory OS for AI System
        • 总结
        • LLM 总结:
        • Abstract
        • 1 Introduction
        • 2 Memory in Large Language Models
        • 3 MemOS Design Philosophy
        • 4 Memory Modeling in MemOS
        • 5 Architecture of MemOS
        • 6 Evaluation
        • 7 MemOS for Architecture Innovation and Applications
        • 8 Conclusion
      • 2508.09874_Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Background
        • 3 Memory Decoder
        • 4 Experimental Setup
        • 5 Results
        • 6 Analysis
        • 7 Related Work
        • 8 Conclusion
        • 9 Limitations
        • Appendix A Interpolation hyperparameter \(\alpha\) of all tasks
        • Appendix B Analysis of DAPT Performance on Downstream Tasks
        • Appendix C Knowledge-Intensive Reasoning Task Corpus Composition
        • Appendix D Domain-Specific Downstream Tasks
        • Appendix E Comparison with DAPT Model Interpolation
        • Appendix F In-Context Learning Performance Analysis
        • Appendix G Characteristics of kk-NN Distributions
        • Appendix H Alternative Loss Functions for Imitating kk-NN Distributions
      • 2509.06269_REMI: A Novel Causal Schema Memory Architecture for Personalized Lifestyle Recommendation Agents
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Research Objectives
        • 3. Related Work
        • 4. Proposed Method
        • 5. Evaluation Framework
        • 6. Results and Findings
        • 7. Discussion
        • 8. Conclusion
      • 2509.24704_MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
        • 总结
        • From Moonlight
        • From Deepseek&OpenAI
      • 2510.18866_❇️LightMem: Lightweight and Efficient Memory-Augmented Generation
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Preliminary
        • 3 lightmem architecture
        • 4 experiments
        • 5 related work
        • 6 conclusion and Future Work
        • Appendix A Usage of LLMs
        • Appendix B Methodology Details
        • Appendix C Experiment Details
        • 附录 C 实验细节总结
        • Appendix D Prompts
        • 附录 D 提示(Prompt)设计
      • 2512.18746_MemEvolve: Meta-Evolution of Agent Memory Systems
        • From Moonlight
      • 2601.02163_EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning
        • 总结
        • 图解
        • From Moonlight
    • 通用-Github
      • 2509.00xxxx_MemU: 一个前瞻性很强但尚不成熟的记忆框架
        • 主要内容
      • 2511.00xxx_MemMachine
        • 主要内容
    • 记忆相关Agent
      • 2504.10147_PersonalRAG❇️: A Survey of Personalization: From RAG to Agent
        • 总结
        • Abstract
        • 1. Introduction
        • 2. What is Personalization
        • 3. How to Adopt Personalization
        • 4. Where to Adopt Personalization
        • 5. Evaluation and Dataset
        • 6. Challenges and Future Directions
        • 7. Conclusion
      • 2506.07398❇️_G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems
        • 总结
        • 1. Introduction
        • 2 Related Works
        • 3 Preliminary
        • 4 G-Memory
        • 5 Experiment
        • 6 Conclusion & Limitation
        • A Experimental Details
        • B Additional Experiment Results
        • C Prompt Set
        • D Discussion with Related Works
      • 2507.02259_MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 The Proposed MemAgent
        • 总结
        • 4 Experiments
        • 5 Conclusion
        • 6 Computation Complexity
        • 7 Complete Out-Of-Domain Task Results
      • 2507.07957_MIRIX: Multi-Agent Memory System for LLM-Based Agents
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Application & Use Cases
        • 3 Methodology
        • 4 Experiments
        • 5 Related Work
        • 6 Conclusion and Future Work
        • Appendix A Full Experimental Results with Different Runs
      • 2509.25140❇️_ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Methodology
        • 4 Experiments
        • 5 Analysis
        • 6 Conclusion
        • 7 Acknowledgments
        • Appendix A Experiment Details
        • Appendix B Details for Experiment Settings
        • Appendix C Additional Analyses
        • Appendix D Future Directions
        • Appendix E Limitations
    • 记忆相关数据集
      • 2305.10250_MemoryBank: Enhancing Large Language Models with Long-Term Memory
        • 总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 MemoryBank: A Novel Memory Mechanism Tailored for LLMs
        • 总结
        • 3 SiliconFriend: An AI Chatbot Companion Powered by MemoryBank
        • 使用的三种大语言模型
        • SiliconFriend 的开发阶段
        • 总结
        • 重点总结
        • 4 Experiments
        • 5 Related Works
        • 5 相关工作(Related Works)
        • 6 Conclusion
        • 6 结论
      • 2308.08239_MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Methodology
        • 4 Experiments
        • 5 Conclusion
        • Appendix A Basic Published Datasets
        • Appendix B Involved Prompts
        • Appendix C Instruction Design Challenges
        • 1. 引言
        • 2. Prompt Copy(提示复制)
        • 3. Catastrophic Forgetting(灾难性遗忘)
        • 4. Prompt Misplacement(提示错位)
        • 5. 示例任务说明
        • 总结
      • 2402.17753_LoCoMo❇️: Evaluating Very Long-Term Conversational Memory of LLM Agents
        • 总结
        • 别人的总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Generative Pipeline for LoCoMo
        • 4 LoCoMo Evaluation Benchmark
        • 5 Experimental Setup
        • 6 Experimental Results
        • 7 Conclusion
        • 8 Limitations
        • 9 Broader Impacts
        • Appendix Overview
        • Appendix A Generative Pipeline for LoCoMo
        • Appendix B Dataset
        • Appendix C Experimental Setup
        • Appendix D Results
      • 2410.10813_LongMemEval: Benchmarking Chat Assist- ants on Long-Term Interactive Memory
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 LongMemEval
        • 4 A Unified View of Long-Term Memory Assistants
        • 5 Experiment Results
        • 6 Conclusion
        • Reproducibility Statement
        • Ethics Statement
        • Appendix A Supplemental Details for LongMemEval
        • Appendix B A Human Study on Commercial Memory Chatbots
        • Appendix C Unified Memory View
        • Appendix D Memory Optimizations: Implementation Details
        • Appendix E Extended Analyses
      • 2506.21605_MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Works
        • 3 Dataset Construction
        • 4 Benchmark
        • 5 Conclusion
        • Limitations
        • Ethics Statement
        • Acknowledgments
        • Appendix A Case Studies
        • Appendix B Detail Data Statics
        • Appendix C Data Creation Prompt
        • Appendix D Result Details
      • 2507.05257_MemoryAgentBench: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
        • 总结
        • From Deepseek
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 MemoryAgentBench
        • 4 Experiments
        • 5 Conclusion and Future Work
        • Appendix A Details of Dataset
        • Appendix B Prompts
        • Appendix C Detailed Experimental Results
        • Appendix D Experimental Settings
      • 2510.27246_BEAM: Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs
        • 总结
        • Abstract
        • 1 Introduction
        • 2 BEAM: Benchmarking memory Capabilities of LLMs
        • 3 LIGHT: Improving Memory Capabilities of LLMs
        • 4 Experiments
        • 5 Related Work
        • 6 Conclusion
        • Acknowledgments
        • Appendix A Detailed Related Work
        • Appendix B Benchmark Design
        • Appendix C Detailed Experiments
        • Appendix D Nugget Design
        • Appendix E Examples from Different Components of BEAM
        • Appendix F Case Study
        • Appendix G Prompts
    • 多模态记忆
      • 2506.05813_MAPLE: Multi-Agent Adaptive Planning with Long-Term Memory for Table Reasoning
        • 总结
        • Abstract
        • 1 Introduction
        • 2 MAPLE Framework
        • 3 Experiments
        • 4 Conclusion
        • Limitations
        • Appendix A Related Work
        • Appendix B Cognitive Architecture
        • Appendix C Memory Evolution Algorithm
        • Appendix D Case Study
        • Appendix E Addtional Experimental Results
        • Appendix F Example Prompts
        • 附录 F 示例提示(Example Prompts)
      • 2508.09736_M3-Agent❇️: Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Datasets
        • 4 Approach
        • 5 Experiments
        • 6 Conclusion and Future Work
        • 7 Acknowledgment
        • 8 M3-Bench-robot
        • 9 M3-Bench-web
        • 10 Implementation Details of Tools
        • 11 Demonstration Data Synthesis for Memorization
        • 12 Evaluation of Memorization
        • 13 RL Training Details
        • 14 Case Study
        • 15 Prompt Templates
      • 2509.11914_EgoMem: Lifelong Memory Agent for Full-duplex Omnimodal Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Task Definition and Preliminaries
        • 3 EgoMem
        • 4 Training Details
        • 5 Experiments
        • 6 Conclusion and Future Challenges
        • Acknowledgments
      • 2510.12422_VideoLucy: Deep Memory Backtracking for Long Video Understanding
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Method
        • 3 EgoMem Benchmark
        • 4 Experiments
        • 5 Related Work
        • 6 Conclusion
        • 7 Acknowledgments.
        • Appendix A Appendix
    • 参数记忆
      • 1907.05242_PKM: Large Memory Layers with Product Keys
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Related work
        • 3 Learnable product key memories
        • 4 Experiments
        • 5 Conclusion
      • 2305.02437_Selfmem: Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory
        • 总结
        • From Moonlight
      • 2407.04153_PEER: Mixture of A Million Experts
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Method
        • 3 Experiments
        • 4 Related Works
        • 5 Conclusion
        • Acknowledgments
      • 2412.09764_Memory+: Memory Layers at Scale
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 1 引言(Introduction)
        • 2 Related work
        • 2 相关工作(Related Work)
        • 3 Memory Augmented Architectures
        • 3 Memory Augmented Architectures
        • 4 Experimental setup
        • 4 实验设置(Experimental setup)
        • 5 Scaling results
        • 5 扩展结果总结
        • 6 Implications and shortcomings of the work
        • 6 工作的意义与不足
      • 2508.18756_UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Approach
        • 4 Experiments
        • 5 Conclusion
        • 6 Optimized Initialization
        • 7 Evaluation Benchmark
        • 8 Open-source model hyperparameters
    • 图结构记忆
      • 1905.05460_Cognitive Graph for Multi-Hop Reading Comprehension at Scale
        • Abstract
        • 1 Introduction
        • 2 Cognitive Graph QA Framework
        • 3 Implementation
        • 4 Experiment
        • 5 Related work
        • 6 Discussion and Conclusion
      • 2405.14831_HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
        • Abstract
        • 1 Introduction
        • 2 HippoRAG
        • 3 Experimental Setup
        • 4 Results
        • 5 Discussions
        • 6 Related Work
        • 7 Conclusions & Limitations
        • Appendices
        • Appendix A HippoRAG Pipeline Example
        • Appendix B Dataset Comparison
        • Appendix C Ablation Statistics
        • 附录 C 消融实验统计(Ablation Statistics)
        • Appendix D Intrinsic OpenIE Evaluation
        • 附录 D 内在的 OpenIE 评估
        • Appendix E Case Study on Path-Finding Multi-Hop QA
        • 附录E:路径查找多跳问答案例研究总结
        • Appendix F Error Analysis
        • Appendix G Cost and Efficiency Comparison
        • 附录 G 成本与效率对比
        • Appendix H Implementation Details & Compute Requirements
        • 附录 H 实现细节与计算需求
        • Appendix I LLM Prompts
        • 附录I:大语言模型提示
    • 应用-推荐
      • Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions
        • 论文基本信息
        • 核心内容简介
        • 重要性与影响
        • 总结
      • 08xx.xxxxx_SVD++: Factorization meets the neighborhood: a multifaceted collaborative filtering model
        • SVD++
        • Neighborhood Models
        • Latent Factor Models(潜在因子模型)
      • Recommender systems: An overview of different approaches to recommendations
        • 论文简介
        • 核心内容总结
        • 总结
      • 1902.07153_SGCN: Simplifying Graph Convolutional Networks
        • 总结
        • 前提知识
        • Abstract
        • 1 Introduction
        • 2 Simple Graph Convolution
        • 3 Spectral Analysis
        • 4 Related Works
        • 5 Experiments and Discussion
        • 6 Conclusion
        • Acknowledgement
        • Appendix A The spectrum of 𝚫~symsubscript~𝚫sym)
        • Appendix B Experiment Details
        • Appendix C Additional Experiments
      • 1905.08108_NGCF: Neural Graph Collaborative Filtering
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Methodology
        • 3. Related Work
        • 4. Experiments
        • 5. Conclusion and Future Work
      • 2001.10167_RGCF: Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach
        • 总结
        • Abstract
        • Introduction
        • Preliminaries and Related Work
        • Linear Residual Graph Convolutional Collaborative Filtering
        • Experiments
        • Conclusions
      • 2002.02126_LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Preliminaries
        • 3. Method
        • 4. Experiments
        • 5. Related Work
        • 6. Conclusion and Future Work
      • 2010.10783_SGL: Self-supervised Graph Learning for Recommendation
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Preliminaries
        • 3. Methodology
        • 4. Experiments
        • 5. Related Work
        • 6. Conclusion and Future Work
        • Appendix A Gradient of InfoNCE Loss w.r.t. node representation
      • 2112.08679_SimGCL: Are Graph Augmentations Necessary? Simple Graph Contrastive Learning for Recommendation
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Investigation of Graph Contrastive Learning in Recommendation
        • 3. SimGCL: Simple Graph Contrastive Learning for Recommendation
        • 4. Experimental Results
        • 5. Related Work
        • 6. Conclusion
        • Acknowledgement
      • 2202.06200_NCL: Improving Graph Collaborative Filtering with Neighborhood-enriched Contrastive Learning
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Preliminary
        • 3. Methodology
        • 4. Experiments
        • 5. Related work
        • 6. Conclusion And Future Work
        • Appendix A Pseudo-code for NCL
        • Appendix B Case Study on Selected Neighbors
      • 2203.13366_RLP_P5: A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5)
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Personalized Prompt Collection
        • 4. The P5 Paradigm and Model
        • 5. Experiments
        • 6. Conclusions and Future Work
        • Acknowledgment
        • D FULL LIST OF PERSONALIZED PROMPTS FOR AMAZON DATASETS
      • 2302.08191_LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Methodology
        • 4 Evaluation
        • 5 Conclusion
        • Appendix A Details of the Baselines
        • Appendix B Performance Comparison with Baselines (Continued)
        • Appendix C Theoretical Analysis
        • Appendix D Calculation of Complexity
        • Appendix E Performance Results under the New Setting
      • 2303.14524_ChatRec: Towards Interactive and Explainable LLMs-Augmented Recommender System
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Method
        • 4 Experiment
        • 5 Conclusion
        • Appendix 0.A Implementation Details
      • 2305.00447_TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation
        • 总结
        • Abstract
        • 1. Introduction
        • 2. TALLRec
        • 3. Experiments
        • 4. Related Work
        • 5. Conclusion
      • 2305.07001_InstructRec: Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Methodology
        • 3. Experiments
        • 4. Conclusion and Future Work
        • Appendix A Instruction Templates for Traditional Recommendation
        • Appendix B Instruction Templates for Traditional Product search
        • Appendix C Instruction Templates for Personalized Search
      • 2306.10933_KAR: Towards Open-World Recommendation with Knowledge Augmentation from Large Language Models
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Preliminaries
        • 4. Methodology
        • 5. Experiment
        • 6. Broader Impact
        • 7. Conclusion
      • 2308.11131_ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Preliminaries
        • 3. Methodology
        • 4. Experiment
        • 5. Related Work
        • 6. Conclusion
        • Appendix A Prompt Illustration
        • Appendix B Data Preprocessing
        • Appendix C Baseline Implementation
        • 总结
        • Appendix D Additional Experiments
      • 2310.15950_RLMRec: Representation Learning with Large Language Models for Recommendation
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methodology
        • 4. Evaluation
        • 5. Conclusion
        • Appendix A Supplementary Material
      • 2311.01343_CLLM4Rec: Collaborative Large Language Model for Recommender Systems
        • 总结
        • Abstract
        • 1. Introduction
        • 本节贡献(Contribution)
        • 2. Related Work
        • 2. 相关工作
        • 3. Methodology
        • 4. Empirical Study
        • 5. Conclusion
        • Acknowledgment
        • Appendix A Technical Details
        • Appendix B Experiments
      • 2502.18965_OneRec: Unifying Retrieve and Rank with Generative Recommender and Preference Alignment
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methods
        • 4. System Deployment
        • 5. Experiment
        • 总结
        • 6. Conclusion
      • 2508.20900_OneRec-V2 Technical Report
        • 总结 From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Lazy Decoder-Only Architecture
        • 3 Preference Alignment with Real-World User Interactions
        • 4 Online A/B Test
        • 5 Conclusion, Limitations, and Future Directions
        • Appendix
        • Appendix A Contributions
        • Appendix B Computational Complexity of Different Architecture
        • Appendix C Empirical Results
        • Appendix D Online Performance with Caching Disabled
      • 2510.11639_OneRec-Think: In-Text Reasoning for Generative Recommendation
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Preliminary
        • 4 Methodolody
        • 5 Experiments
        • 6 Conclusion
        • Limitations
        • Ethics Statement
        • Appendix A Appendix
      • 2511.11255_Align3GR: Unified Multi-Level Alignment for LLM-based Generative Recommendation
        • 总结
        • Abstract
        • Introduction
        • Related Works
        • Methodology
        • Experiments
        • Conclusion
  • LLM 模型
    • NLP 模型
      • 1810.04805_BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
        • 1 Introduction
        • 2 Related Work
        • 3 BERT
        • Appendix A Additional Details for BERT
      • 18xx_GPT1: Improving Language Understanding by Generative Pre-Training
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Framework
        • 4 Experiments
        • 5 Analysis
        • 6 Conclusion
        • 引文口碑
        • 要点解读
      • 19xx_GPT2: Language Models are Unsupervised Multitask Learners
        • The Illustrated GPT-2
        • 参考
      • 2006.03654_DeBERTa: Decoding-enhanced BERT with Disentangled Attention
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Background
        • 3 The DeBERTa Architecture
        • 4 Scale Invariant Fine-Tuning
        • 4 尺度不变微调 (Scale Invariant Fine-Tuning)
        • 5 Experiment
        • 6 Conclusions
        • 7 Acknowledgments
        • Appendix A Appendix
      • 2012.00413_CPM: A Large-scale Generative Chinese Pre-trained Language Model
      • 2302.13971_LLaMA: Open and Efficient Foundation Language Models
      • 2307.09288_Llama 2: Open Foundation and Fine-Tuned Chat Models
      • 2309.16609_Qwen Technical Report
        • 1. Introduction
        • 2. Pretraining
        • 3. Alignment
        • 4. CODE-QWEN: SPECIALIZED MODEL FOR CODING
        • 5. MATH-QWEN: SPECIALIZED MODEL FOR MATHEMATICS REASONING
        • 6. Related Work
        • 7. Conclusion
        • A.1 MORE TRAINING DETAILS
        • A.2 EVALUATION
      • 2310.19341_Skywork: A More Open Bilingual Foundation Model
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Methodology
        • 3 Pre-training
        • 4 Evaluation
        • 5 Discussion
        • 6 Limitation
        • 7 Conclusion
        • Appendix A Details on GPT-7B vs. LLaMA-7B Experiment
        • Appendix B Preliminary Experiments on Distributed Training
        • Appendix C More Benchmark Results
        • Appendix D Details on LM Test Sets
      • 2401.14196_DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence
      • 2404.06395_MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
        • 5. Two Stage Pre-training Strategy
        • 6. Model
        • 7 MiniCPM Family
      • 2405.04434_DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
      • 2406.12793_ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
      • 2407.10671_Qwen2 Technical Report
        • Abstract
        • 1. Introduction
        • 2. Tokenizer & Model
        • 3. Pre-training
        • 4. Post-training
        • 5. Evaluation
        • 6. Conclusion
      • 2412.15115_Qwen2.5
        • Abstract
        • 1. Introduction
        • 2. Architecture and Tokenizer
        • 3. Pre-training
        • 4. Post-training
        • 5. Evaluation
        • 6. Conclusion
      • 2505.09388_Qwen3
        • Abstract
        • 1. Introduction
        • 2. Architecture
        • 3. Pre-training
        • 4. Post-training
        • 5. Conclusion
      • 2508.06471_GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Pre-Training
        • 3 Post-Training: Expert Model Iteration
        • 4 Evaluation
        • 5 Conclusion
        • 6 Contribution
    • 多模态模型
      • 2112.15093_CTR: Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study
        • Abstract
        • 1. Introduction
        • 2. Preliminaries
        • 3. Datasets
        • 4. Baselines
        • 5. An Empirical Study
        • 6. Conclusions
        • Appendix A Details of PRAB
        • Appendix C Visualization of Failure Cases.
      • 2304.08485_LLaVA: Visual Instruction Tuning
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. GPT-assisted Visual Instruction Data Generation
        • 4. Visual Instruction Tuning
        • 5. Experiments
        • 6. Conclusion
      • 2308.12966_Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
        • Methodology
        • Training
        • Evaluation
        • B. Data Format Details of Training
      • 2310.03744_LLaVA2: Improved Baselines with Visual Instruction Tuning
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Approach
        • 4. Empirical Evaluation
        • 5. Open Problems in LMMs
        • 6. Conclusion
        • A. Implementation Details
        • B. Qualitative Results
      • 2312.07533_VILA: On Pre-training for Visual Language Models
        • Abstract
        • 1. Introduction
        • 2. Background
        • 3. On Pre-training for Visual Language Models
        • 4. Experiments
        • 5. Related Work
        • 6. Conclusion
      • 2403.05525_DeepSeek-VL: Towards Real-World Vision-Language Understanding
        • Abstract
      • 2408.01800_MiniCPM-V: A GPT-4V Level MLLM on Your Phone
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Model Architecture
        • 4. Training
        • 5. End-side Deployment
        • 6. Experiments
        • 7. Conclusion
      • 2409.17146_Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
        • Abstract
        • 1. Introduction
        • 2. Architecture
        • 3. Data
        • 4. Training
        • 5. Evaluation
        • 6. Ablations
        • Appendix A: Model Details
        • Appendix B: Training Details
        • Appendix C: Evaluation Results
        • Appendix D: Result Details
        • Appendix E Ablations Details
        • Appendix F Data Details
        • Appendix G Dataset Examples
        • Appendix H Related Work
      • 2410.13848_Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
        • 总结
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Janus: A Simple, Unified and Flexible Multimodal Framework
        • 4 Experiments
        • 5 Conclusion
        • Appendix
        • Appendix A Details of Semantic Tokenizer Mentioned in Ablation Study
        • Appendix B Additional Qualitative Results
      • 2411.00774_Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
        • Abstract
        • 1. Introduction
        • 2. Model
        • 3. Experience
        • 4. Conclusion and Future Work
      • 2412.04468_NVILA: Efficient Frontier Visual Language Models
        • Abstract
        • 1. Introduction
        • 2. Approach
        • 3. Experiments
        • 4. More Capabilities
        • 5. Related Work
        • 6. Conclusion
      • 2502.13923_Qwen2.5-VL
        • Abstract
        • 1. Introduction
        • 2. Approach
        • 3. Experiments
        • 4. Conclusion
      • 2505.14683_BAGEL: Emerging Properties in Unified Multimodal Pretraining
        • 总结
        • From Deepseek
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Model
        • 3 Data
        • 4 Training
        • 5 Evaluation
        • 6 Emerging Properties
        • 7 Main Results
        • 8 Conclusion
        • 9 Acknowledgement
      • 2506.13642_Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Stream-Omni
        • 4. Experiments
        • 5. Results and Analyses
        • 6. Conclusion
        • Appendix A Construction of InstructOmni
        • Appendix B Construction of SpokenVisIT
      • 2506.13642_Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Stream-Omni
        • 3.2.1 Data Construction
        • 4 Experiments
        • 5 Results and Analyses
        • 6 Conclusion
        • Limitations
        • Appendix A Construction of InstructOmni
        • Appendix B Construction of SpokenVisIT
        • Appendix C Case Study
      • 2507.05595_PaddleOCR 3.0 Technical Report
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Core Capabilities
        • 3 Codebase Architecture Design
        • 4 Deployment
        • 5 Conclusion
        • Appendix A Acknowledgments
        • Appendix B Usage of command and API details
        • Appendix C More details on MCP host configuration
      • 2510.14528_PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
        • 总结
        • Abstract
        • 1 Introduction
        • 2 PaddleOCR-VL
        • 3 Dataset
        • 4 Evaluation
        • 5 Conclusion
        • Appendix A Training Dataset Details
        • Appendix B Supported Languages
        • Appendix C Inference Performance on Different Hardware Configurations
        • Appendix D Real-world Samples
        • Appendix E Compare with Others
    • Embedding 模型
      • 2506.05176_Qwen3_Embedding: Advancing Text Embedding and Reranking Through Foundation Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Model Architecture
        • 3 Models Training
        • 4 Evaluation
        • 4.1 Settings 评估设置
        • 4.2 Main Results 主要结果
        • 4.3 Analysis 分析
        • 总结
        • 5 Conclusion
        • Appendix A Appendix
    • LLM 音频
      • 2005.08100_Conformer: Convolution-augmented Transformer for Speech Recognition
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 Conformer Encoder
        • 3 Experiments
        • 4 Conclusion
      • 2106.07447_HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
        • 总结
        • LLM 总结
        • Abstract
        • I Introduction
        • II Method
        • III Related Work
        • IV Experimental Details
        • V Results
        • VI Conclusion
      • 2112.02418_YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
        • 关键概念
        • Abstract
        • 1. Introduction
        • 2. YourTTS Model
        • 3. Experiments
        • 4. Results and Discussion
        • 5. Zero-Shot Voice Conversion
        • 6. Speaker Adaptation
        • 7. Conclusions, limitations and future work
      • 2212.04356_whisper: Robust Speech Recognition via Large-Scale Weak Supervision
        • Abstract
        • 1. Introduction
        • 2. Approach
        • 3. Experiments
        • 4. Analysis and Ablations
        • 5. Related Work
        • 6. Limitations and Future Work
        • 7. Conclusions
        • A. Evaluation Datasets
        • B Compared Models
        • C. Text Standardization
      • 2301.02111_Vall-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Background: Speech Quantization
        • 4. VALL-E
        • 5. Experiments
        • 6. Conclusion, Limitations, and Future Work
      • 2303.03926_VALL-E_X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3 Cross-Lingual Codec Language Model
        • 4. VALL-E X Application
        • 5. Experiments
        • 6. Conclusion
        • A. Appendix
      • 2406.05370_VALL-E2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. VALL-E 2
        • 4. Experiments
        • 5. Conclusion
      • 2407.05407_CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
        • Abstract
        • 1. Instructions
        • 2. CosyVoice: A Scalable TTS model using Supervised Semantic Tokens
        • 3. Dataset
        • 4. Experimental Settings
        • 6. Conclusion
      • 2407.10759_Qwen2-Audio Technical Report
        • Abstract
        • 1. Introduction
        • 2. Methodology
        • 3. Experiments
        • 5. Conclusion
      • 2410.00037_Moshi: a speech-text foundation model for real-time dialogue
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.Model
        • 4. Datasets and Training
        • 5. Evaluation
        • 6.Safety
        • 7.Conclusion
      • 2412.10117_CosyVoice2: Scalable Streaming Speech Synthesis with Large Language Models
        • Abstract
        • 1. Instroduction
        • 2. CosyVoice 2
        • 3. Experimental Settings
        • 4. Experimental Results
        • 5. Conclusion
      • 2501.06282_MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
        • Abstract
        • 1.Instruction
        • 2.Related Work
        • 3.MinMo
        • 4.Experiments
        • 5.Conclusion
        • 6.Limitations
        • A. Prompts for Voice Understanding Tasks
      • 2505.02707_Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Voila: Voice-Language Foundation Models
        • 4. Experiments
        • 5. Conclusion
      • 2505.17589_CosyVoice3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
        • From LLM
        • Abstract
        • 1.Introduction
        • 2.CosyVoice 3
        • 3.The Multilingual Data Pipeline
        • 4.Experimental Settings
        • 5.Experimental Results
        • 6.Conclusion
        • 7.Limitations
      • 2512.20156_Fun-Audio-Chat Technical Report
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Methodology
        • 3 Experiments
        • 4 Conclusion
        • 5 Limitations
        • 5 局限性(Limitations)
        • 6 Contributions and Acknowledgments
    • LLM 视频
      • 2301.12597_BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
        • Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Method
        • 4 Experiment
        • 5 Limitation
        • 6 Conclusion
      • 2308.01390_OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
        • OpenFlamingo_ An Open-Source Framework for Training Large Autoregressive Vision-Language Models
        • Abstract
        • 1 Introduction
        • 2 Related work
        • 3 Approach
        • 4 Results
        • 5 Discussion
        • 6 Conclusion
        • Appendix A Extended results
        • Appendix B Additional notes on filtering MMC4
        • Appendix C Synthetic data prompt
        • Appendix D Image credits
      • 2503.20215_Qwen2.5-Omni Technical Report
        • Abstract
        • 1. Introduction
        • 2. Archtecture
        • 3 预训练
        • 4 后训练(Post-training)
        • 5. Evaluation
        • 6. Conclusion
    • LLM MoE
      • 2408.15664_AUXILIARY-LOSS-FREE LOAD BALANCING STRATEGY FOR MIXTURE-OF-EXPERTS
      • 2410.07490_MoDEM: Mixture of Domain Expert Models
      • 2601.07372_Engram: Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 2 Architecture
        • 3 Scaling Laws and Sparsity Allocation
        • 4 Large Scale Pre-training
        • 5 Long Context Training
        • 6 Analysis
        • 7 Related Work
        • 8 Conclusion
        • Appendix A Detailed Model Architecture and Hyper Parameters
        • Appendix B Full Benchmark Curves
        • Appendix C Case Study of Tokenizer Compression
    • 商业模型
      • 2303.08774_GPT-4 Technical Report
      • 2312.11805_Gemini: A Family of Highly Capable Multimodal Models
        • Abstract
        • 1. Introduction
        • 2. Model Architecture
        • 3. Training Infrastructure
        • 5. Evaluation
        • 6. Post-Training Models
        • 7. Responsible Deployment
        • 8. Discussion and Conclusion
      • 2403.05530_Gemini1.5: Unlocking multimodal understanding across millions of tokens of context
      • 2406.02430_Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
        • Abstract
        • 1 Introduction
        • 2 Method
        • 3 Experiments
        • 4 Model extensions
        • 5 Model applications, limitations, and safety
        • 6 Authors (alphabetical order)
        • 7 Acknowledgement
      • 2407.04675_Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
        • Abstract
        • 1 Introduction
        • 2 Motivation
        • 3 Methods
        • 4 Model and Evaluation
        • 5 Conclusion
        • Appendix A Appendix
      • 2503.20020_Gemini2: Gemini Robotics: Bringing AI into the Physical World
      • 2504.xxxxx_Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning
      • 2505.07062_Seed1.5-VL Technical Report
        • Seed1.5-VL Technical Report
        • Abstract
        • 1 Introduction
        • 2 Architecture
        • 3 Pre-training
        • 3.2 Training Recipe
        • 4 Post-training
        • 4.4 Hybrid Reinforcement Learning
        • 5 Training Infrastructure
        • 6 Evaluation
        • 6.1.3 Video Task Evaluation
        • 6.3.2 Comparison with State-of-the-arts
        • 7 Conclusion and Next Steps
        • 8 Contributions and Acknowledgments
        • 9 Qualitative examples
        • 9.7 Visual Reasoning_ Visual Pattern Recognition
        • 9.19 Failure Cases_ Combinatorial Search I
        • 10 Evaluation Details
        • DREAM-1K
  • LLM 周边技术
    • Framework
      • 1712.05889_Ray: A Distributed Framework for Emerging AI Applications
        • Abstract
        • 1. Introduction
        • 2. Motivation and Requirements
        • 3. Programming and Computation Model
        • 4. Architecture
        • 5. Evaluation
        • 6 Related Work
        • 7 Discussion and Experiences
        • 8. Conclusion
      • 1910.02054_DeepSpeed_ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
        • Abstract
        • 1. Extended Introduction
        • 2. Related Work
        • 3 Where Did All the Memory Go?
        • 4 ZeRO: Insights and Overview
        • 5 Deep Dive into ZeRO-DP
        • 6 Deep Dive into ZeRO-R
        • 7 Communication Analysis of ZeRO-DP
        • 8. Communication Analysis of ZeRO-R
        • 9. Step Towards 1 Trillion Parameters
        • 10. Implementation and Evaluation
        • 11. Concluding Remarks
      • PyTorch: An Imperative Style, High-Performance Deep Learning Library
      • Transformers: State-of-the-Art Natural Language Processing
      • 2210.XX_Ray v2 Architecture
        • Overview
        • Architecture Overview
        • Object Management
        • Task Management
        • Resource Management and Scheduling
        • Actor management
        • Global Control Service
        • Cluster Management
        • Appendix
      • 2309.06180_vLLM: Efficient Memory Management for Large Language Model Serving with PagedAttention
        • 总结
        • 1. Introduction
        • 2. Background
        • 3. Memory Challenges in LLM Serving
        • 4. Method
        • 5. Implementation
        • 6. Evaluation
        • 7. Ablation Studies
        • 10. Conclusion
      • 2312.07104_SGLang❇️: Efficient Execution of Structured Language Model Programs
        • 总结
        • OpenAI GPT-4总结
        • Qwen-Plus总结
        • Abstract
        • 1 Introduction
        • 2 Programming Model
        • 3 Efficient KV Cache Reuse with RadixAttention
        • 4 Efficient Constrained Decoding with Compressed Finite State Machine
        • 5 Efficient Endpoint Calling with API Speculative Execution
        • 6 Evaluation
        • 7 Related Work
        • 8 Future Directions and Conclusion
        • Acknowledgement
        • Appendix A Additional Details on RadixAttention
        • Appendix B Additional Details on Compressed Finite State Machine
        • Appendix C Additional Experimental Setups and Results
        • Appendix D Compiler Mode
        • Appendix D 编译器模式
    • 大模型调优
      • 2101.00190_Prefix-Tuning: Optimizing Continuous Prompts for Generation
      • 2103.10385_p-tuning: GPT Understands, Too
      • 2104.08691_Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning
      • 2106.09685_LoRA: Low-Rank Adaptation of Large Language Models
      • 2401.01335_Self-Play: Fine-Tuning Converts Weak Language Models to Strong Language Models
      • 2402.09353_DoRA: Weight-Decomposed Low-Rank Adaptation
      • 2402.12354_LoRA+: Efficient Low Rank Adaptation of Large Models
      • 2403.03507_GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
      • 2403.13372_LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
        • 竞争框架
        • 3. Efficient Fine-Tuning Techniques
        • 4 LlamaFactory Framework
        • 6 Conclusion and Future Work
      • 2510.08396_FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts
        • 总结
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 1 引言(Introduction)
        • 2 Revisiting MoE-based LoRA Methods
        • 第2章:重新审视基于MoE的LoRA方法
        • 3 FlyLoRA
        • 4 Experiments
        • 5 Discussion
        • 6 Related Work
        • 7 Conclusion
        • 8 Acknowledgments
        • 8 致谢
        • NeurIPS Paper Checklist
      • NeurIPS 论文检查清单总结
        • 1. 声明(Claims)
        • 2. 限制(Limitations)
        • 3. 理论假设与证明(Theory assumptions and proofs)
        • 4. 实验结果可复现性(Experimental result reproducibility)
        • 5. 数据与代码开放访问(Open access to data and code)
        • 6. 实验设置/细节(Experimental setting/details)
        • 7. 实验统计显著性(Experiment statistical significance)
        • 8. 实验计算资源(Experiments compute resources)
        • 9. 伦理准则(Code of ethics)
        • 10. 广泛影响(Broader impacts)
        • 11. 安全措施(Safeguards)
        • 12. 现有资产许可(Licenses for existing assets)
        • 13. 新资产(New assets)
        • 14. 众包与人类受试者研究(Crowdsourcing and research with human subjects)
        • 15. 人类受试者研究的IRB批准(IRB approvals)
        • 16. 大语言模型使用声明(Declaration of LLM usage)
        • Appendix A Theoretical Analysis
      • 附录 A 理论分析总结
        • A.1 稀疏随机投影的距离保持性质
        • A.2 Top-k 激活促进秩级解耦
        • A.3 随机投影诱导近似子空间正交性
      • 总结归纳
        • Appendix B Additional Results
      • 附录 B:附加实验结果总结
        • B.1 更大模型上的评估
        • B.2 更多基线方法的比较
        • B.3 训练时间与内存消耗
        • B.4 高级模型合并技术的多任务性能
        • B.5 负载均衡策略的消融实验
        • B.6 K选择策略的消融实验
        • B.7 矩阵 A 初始化方案的消融实验
        • B.8 合并与非合并场景的性能差距分析
      • 总体总结
        • Appendix C Detailed Experimental Setting
        • Appendix D Limitations and Future Work
        • 附录 D 局限性与未来工作
        • Appendix E Broader Impact
    • 通用技术
      • 🏀常用
        • 余弦退火
      • 2505.06708_Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Gated-Attention Layer
        • 3 Experiments
        • 4 Analysis: Non-Linearity, Sparsity, and Attention-Sink-Free
        • 5 Related Works
        • 6 Conclusion
        • 6 结论
        • Limitations
        • 局限性
        • Appendix A Supplement Experiments
      • 2510.29xxx.NL: Nested Learning: The Illusion of Deep Learning Architecture
        • 总结 From Zhihu
        • 总结 From Moonlight
    • 长上下文
      • 2510.07318_AHN: Artificial Hippocampus Networks for Efficient Long-Context Modeling
        • From Moonlight
        • Abstract
        • 1 Instruction
        • 2 Related work
        • 3 Method
        • 4 Experiments
        • 5 Conclusion and discussion
        • Acknowledgement
        • Acknowledgement(致谢)
        • 6 AHN instantiation
        • 7 Additional benchmark results
    • 大模型编辑
      • 2405.16720_LAW: Large Scale Knowledge Washing
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Preliminary
        • 4 Problem Setup
        • 5 Methodology
        • 6 Experiments
        • 总结
        • 7 Conclusion, Limitation, and Future Work
        • Ethics Statement
        • Reproducibility Statement
        • Appendix A Mathematical Details of Preliminary
        • Appendix B Implementation Details
        • Appendix C Additional Experiments
      • 2410.00487_SELF-PARAM: Self-Updatable Large Language Models by Integrating Context into Model Parameters
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Methodology
        • 4 Experiments
        • 5 Conclusion and Future Work
        • Ethics Statement
        • Reproducibility Statement
        • Appendix A Additional Settings
        • Appendix B Additional Experiments
      • 2410.02355_AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Preliminary
        • 3 Method
        • 4 Experiment
        • 5 Related Work
        • 6 Limitations & Future Discussion
        • 7 Conclusion
        • Ethics Statement
        • Reproducibility
        • Acknowledgement
        • Appendix A Experimental Setup
        • Appendix B Implementation Details of Current Model Editing & Related Proofs
        • Appendix C More Experimental Results
        • Appendix D Visualizing the Counterfact and ZSRE Datasets Through Examples
    • 分布式模型
      • 1701.06538_MoE: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
      • 1806.03377_PipeDream: Fast and Efficient Pipeline Parallel DNN Training
        • Abstract
        • 1. Introduction
        • 2. Background & Related Work
        • 3. Parallel Training in PipeDream
        • 4. Implementation
        • 5. Evaluation
      • 1811.06965_GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
        • 收集
        • 1. Introduction
        • 2. The GPipe Library
        • 3. Performance Analyses
        • 4. Image Classification
        • 5. Massive Massively Multilingual Machine Translation
        • 6. Design Features and Trade-Offs
      • 1909.08053_Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
        • 收集
        • Abstract
        • 1. Introduction
        • 2. Background and Challenges
        • 3. Model Parallel Transformers
      • 19xx_PipeDream: Generalized Pipeline Parallelism for DNN Training
        • 收集
        • ABSTRACT
        • 1. Introduction
        • 2. BACKGROUND AND RELATED WORK
        • 3. 流水线并行(PIPELINE PARALLELISM)
        • 4. 实现
        • 6. 结论
      • 2006.09503_PipeDream-2BW: Memory-Efficient Pipeline-Parallel DNN Training
        • Abstract
      • 2006.15704_PyTorch Distributed: Experiences on Accelerating Data Parallel Training
      • 2006.16668_GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
      • 2104.04473_Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
        • Abstract
      • 2205.14135_FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
        • Abstract
        • 1. Introduction
        • 2 Background
        • 3. FLASHATTENTION: Algorithm, Analysis, and Extensions
        • 4. Experiments
        • 5. Limitations and Future Directions
        • Appendix A Related Work
        • Appendix B Algorithm Details
        • Appendix C Proofs
        • Appendix D Extension Details
      • 2307.08691_FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
        • Abstract
        • 1. Introduction
        • 2. Background
        • 3. FlashAttention-2: Algorithm, Parallelism, and Work Partitioning
        • 4. Empirical Validation
        • 5. Discussion and Future Directions
      • 通用
    • LLM 量化
      • 通用
        • 混合精度
        • 浮点数格式
        • weight-only quantization
      • 2110.02861_bitsandbytes: 8-bit Optimizers via Block-wise Quantization
        • Abstract
        • 1. Background
        • 2. 8-bit Optimizers
        • 3. 8-bit vs 32-bit Optimizer Performance for common Benchmarks
        • 4. Analysis
        • 5. Related Work
      • 2206.01861_ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
        • Abstract
        • 1. Introduction
        • 2. Relative Work
        • 3. Background and Challenges
        • 4. Methodology
        • 5. Results
        • 6. Conclusions
        • Appendix A Background
        • Appendix D Details about System Optimization
      • 2206.09557_LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
        • Abstract
        • 1. Instructions
        • 2. Background
        • 3. Design Methodology of LUT-GEMM
        • 4. Experimental results
        • 5. Accelerating Quantized OPT-175B
        • 6. Conclusion
        • Appendix A LLM Inference Latency Breakdown
        • Appendix B Detailed Implementation
      • 2208.07339_LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
        • 相关参考
        • Abstract
        • 1. Introduction
        • 2. Background
        • 3. Int8 Matrix Multiplication at Scale
        • 4. Emergent Large Magnitude Features in Transformers at Scale
        • 5. Related Work
        • 6. Discussion and Limitations
        • 7. Broader Impacts
        • 其他
      • 2209.05433_FP8: FP8 Formats For Deep Learning
        • Abstract
        • 1. Introduction
        • 2. Aspects of FP8 Usage in Deep Learning
        • 3. FP8 Binary Interchange Format
        • 示例讲解
        • 4. Empirical Results
        • 5. Conclusions
      • 2210.17323_GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Background
        • 4. The GPTQ Algorithm
        • 5. Experimental Validation
        • 6. Summary and Limitations
      • 2211.10438_SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
        • Abstract
        • 1. Introduction
        • 2. Preliminaries
        • 3. Review of Quantization Difficulty
        • 4. SmoothQuant
        • 5. Experiments
        • 6. Related Work
        • 7. Conclusion
        • Appendix A. Discussion on Weight-Only Quantization
      • 2305.14314_QLoRA: Efficient Finetuning of Quantized LLMs
        • 关键词
        • Abstract
        • 1. Introduction
        • 2. Background
        • 3. QLoRA Finetuning
        • 4. QLoRA vs. Standard Finetuning
        • 5. Pushing the Chatbot State-of-the-art with QLoRA
        • 6. Qualitative Analysis
        • 7. Related Work
        • 8. Limitations and Discussion
      • 2306.00978_AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. AWQ: Activation-aware Weight Quantization
        • 4. TinyChat: Mapping AWQ onto Edge Platforms
        • 5. Experiments
        • 6. Conclusion
      • 2309.05516_AutoRound: Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methodology
        • 4. Experiments
        • 5. Conclusion
    • 图神经网络模型
      • 1812.08434_GNNs: Graph Neural Networks: A Review of Methods and Applications
        • 论文解读
        • 结论
        • Abstract
        • 1. Introduction
        • 2. General design pipeline of GNNs
        • 3. Instantiations of computational modules
        • 4. Variants considering graph type and scale(不同图类型与规模的GNN变体)
        • 5. Variants for different training settings
        • 6. A design example of GNN
        • 7. Analyses of GNNs
        • 8. Applications
        • ✅ 总结表格(图像 vs 文本):
        • 9. Open problems
        • 10. Conclusion
        • Appendix A. Datasets
    • LLM 安全
      • 2312.06674_Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
    • LLM强化学习
      • 🏀常用
        • 三大模型
        • 时序差分残差
        • Bradley-Terry模型
        • 马尔可夫决策过程
        • 动态规划
        • 贝尔曼方程
        • Q-learning
      • ❇️1502.05477_TRPO: Trust Region Policy Optimization
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Preliminaries
        • 3 Monotonic Improvement Guarantee for General Stochastic Policies
        • 4 Optimization of Parameterized Policies
        • 5 Sample-Based Estimation of the Objective and Constraint
        • 6 Practical Algorithm
        • 7 Connections with Prior Work
        • 8 Experiments
        • 9 Discussion
        • Appendix A Proof of Policy Improvement Bound
        • Appendix B Perturbation Theory Proof of Policy Improvement Bound
        • Appendix C Efficiently Solving the Trust-Region Constrained Optimization Problem
        • Appendix D Approximating Factored Policies with Neural Networks
        • Appendix E Experiment Parameters
        • Appendix F Learning Curves for the Atari Domain
      • 1602.01783_A3C: Asynchronous Methods for Deep Reinforcement Learning
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Reinforcement Learning Background
        • 4 Asynchronous RL Framework
        • 5 Experiments
        • 6 Conclusions and Discussion
        • 7 Optimization Details
        • 8 Experimental Setup
        • 9 Continuous Action Control Using the MuJoCo Physics Simulator
      • ❇️1707.06347_PPO: Proximal Policy Optimization Algorithms
        • 总结
        • From DeepSeek
        • 示例-FromDeepseek
      • ❇️2203.02155_InstructGPT: Training language models to follow instructions with human feedback
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related work
        • 3. Methods and experimental details
        • 4. Results
        • 5. Discussion
        • Appendix A Additional prompt data details
        • Appendix B Additional human data collection details
        • Appendix C Additional model details
        • Appendix D Automatic evaluation details
      • ❇️2305.18290_DPO: Direct Preference Optimization: Your Language Model is Secretly a Reward Model
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Preliminaries
        • 4 Direct Preference Optimization
        • 5 Theoretical Analysis of DPO
        • 6 Experiments
        • 7 Discussion
        • Author Contributions
        • Appendix A Mathematical Derivations
        • Appendix B DPO Implementation Details and Hyperparameters
        • Appendix C Further Details on the Experimental Set-Up
        • Appendix D Additional Empirical Results
      • 2310.12036ΨPO: A General Theoretical Paradigm to Understand Learning from Human Preferences
        • From Moonlight
      • 2402.03300_DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
        • From Moonlight
        • Abstract
        • 1 Introduction
        • 1 Introduction(引言)
        • 1.1 Contributions(贡献)
        • 1.2 Summary of Evaluations and Metrics(评估与指标总结)
        • 2 Math Pre-Training
        • 2 Math Pre-Training(数学预训练)
        • 总结
        • 3 Supervised Fine-Tuning
        • 3 Supervised Fine-Tuning(监督微调)
        • 总体总结
        • 4 Reinforcement Learning
        • 5 Discussion
        • 5. 讨论
        • 6 Conclusion, Limitation, and Future Work
        • 6 结论、局限与未来工作
        • Appendix A Appendix
      • 2409.19256_❇️HybridFlow: A Flexible and Efficient RLHF Framework
        • 总结
        • LLM总结
        • From Moonlight
        • Abstract
        • 1. Introduction
        • 2. Background and Motivation
        • 3. HybridFlow Overview
        • 4. Hybrid Programming Model
        • 5. 3D-HybridEngine
        • 6. Auto Device Mapping
        • 7. Implementation
        • 8. Evaluation
        • 9. Discussions
        • 10. Related Work
        • 11. Conclusion
        • Appendix A Primitive APIs in HybridFlow
        • Appendix A HybridFlow 中的基本 API
        • Appendix B Transfer Protocols
        • 附录 B:数据传输协议(Transfer Protocols)
        • 表4:各模型类提供的关键函数
        • 算法 2:自动并行算法(Auto Parallelism Algorithm)
        • Appendix C Auto-Parallelism Algorithm
        • 附录C 自动并行算法
      • ❇️2503.14476_DAPO: An Open-Source LLM Reinforcement Learning System at Scale
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Preliminary
        • 3 DAPO
        • 4 Experiments
        • 5 Conclusion
        • Contributions
        • Acknowledgments
        • 6 Dataset Transformation
        • 7 Supplementary Case
      • 其他
        • 1703.03864_Evolution Strategies: as a Scalable Alternative to Reinforcement Learning
        • 2305.14387_AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
        • 2401.08417_CPO: Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
        • Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
        • 2403.00409_Provably Robust DPO: Aligning Language Models with Noisy Feedback
        • 2504.02495_DeepSeek-GRM: Inference-Time Scaling for Generalist Reward Modeling
        • 2504.13958_ToolRL: Reward is All Tool Learning Needs
    • 其他
      • 2305.20050_Let’s Verify Step by Step
        • 1. 研究背景
        • 2. 监督方法对比
        • 3. 核心发现
        • 总结
      • 2408.03314_Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
        • 1. Introduction
        • 3. How to Scale Test-Time Computation Optimally
        • 5. Scaling Test-Time Compute via Verifiers
        • 6. Refining the Proposal Distribution
        • 其他
      • 2412.14135_Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
        • FromGPT
        • 1. Introduction
        • 2. Background
        • 3. Policy Initialization
        • 4. Reward Design
        • 5. Search
        • 6. Learning
        • 7 Open-source o1 Project
        • 8. Future Directions
  • 机器学习
    • 近邻搜索
      • 10xx.xxxxx_PQ: Product Quantization for Nearest Neighbor Search
        • 总结
        • From Deepseek
        • From Deepseek 全文总结
        • 周边概念
      • 1603.09320_HNSW: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
        • 总结
        • From Deepseek
        • From Deepseek 全文总结
      • 2007.00808_ANCE: Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Preliminaries
        • 3 Analyses on The Convergence of Dense Retrieval Training
        • 4 Approximate Nearest Neighbor Noise Contrastive Estimation
        • 5 Experimental Methodologies
        • 6 Evaluation Results
        • 7 Related Work
        • 8 Conclusion
        • Appendix A Appendix
        • 总体总结
    • Embedding
      • 1603.09320_HNSW: Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
        • 总结
        • From Deepseek
      • 2004.04906_DPR: Dense Passage Retrieval for Open-Domain Question Answering
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Background
        • 3 Dense Passage Retriever (DPR)
        • 4 Experimental Setup
        • 5 Experiments: Passage Retrieval
        • 6 Experiments: Question Answering
        • 7 Related Work
        • 8 Conclusion
        • Acknowledgments
        • Appendix A Distant Supervision
        • Appendix B Alternative Similarity Functions & Triplet Loss
        • Appendix C Qualitative Analysis
        • Appendix D Joint Training of Retriever and Reader
      • 2205.12035_RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related works
        • 3 Methodology
        • 4 Experimental Studies
        • 5 Conclusion
        • 6 Limitations
      • 2205.13147_MRL: Matryoshka Representation Learning
        • 总结
        • DeepSeek 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Matryoshka Representation Learning
        • 4 Applications
        • 5 Further Analysis and Ablations
        • 6 Discussion and Conclusions
        • Acknowledgments
        • Appendix A Code for Matryoshka ​Representation ​Learning
        • Appendix B Datasets
        • Appendix C Matryoshka Representation Learning Model Training
        • Appendix D Classification Results
        • Appendix E Image Retrieval
        • Appendix F Adaptive Retrieval
        • Appendix G Few-shot and Sample Efficiency
        • Appendix H Robustness Experiments
        • Appendix I In Practice Costs
        • Appendix J Analysis of Model Disagreement
        • Appendix K Ablation Studies
    • ML Vision
      • 1506.02640_You Only Look Once: Unified, Real-Time Object Detection
        • Abstract
      • 1612.08242_YOLO9000: Better, Faster, Stronger
        • Abstract
      • 1804.02767_YOLOv3
      • 2004.10934_YOLOv4: Optimal Speed and Accuracy of Object Detection
        • Abstract
      • 2205.00159_SVTR: Scene Text Recognition with a Single Visual Model
        • Abstract
        • 1. Introduction
        • 2. Method
        • 3. Experiments
        • 4. Conclusion
      • 2207.02696_YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
        • Abstract
      • Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
      • 2304.08485_Visual Instruction Tuning
      • 2402.13616_YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
        • Abstract
      • 2405.14458_YOLOv10: Real-Time End-to-End Object Detection
        • Abstract
      • 2411.15858_SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
        • 定义
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methods
        • 4 Experiments
        • 5. Conclusion
        • 8. More detail of real-world datasets
    • ML
      • 2108.00941_Human-in-the-loop: A Survey of Human-in-the-loop for Machine Learning
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Data Processing
        • 3 Model Training and Inference
        • 4 System construction and Application
        • 5 Discussion and Future Directions
        • 6 Conclusion
      • 2112.09332_WebGPT: Browser-assisted question-answering with human feedback
      • 2203.11147_GopherCite: Teaching language models to support answers with verified quotes
      • 2304.09848_Generative_Search: Evaluating Verifiability in Generative Search Engines
      • 2305.14251_FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
      • 2305.14627_ALCE: Enabling Large Language Models to Generate Text with Citations
        • NLI 在引用质量评估中的应用
        • 论文中用的prompt
      • 2307.02185_Citation: A Key to Building Responsible and Accountable Large Language Models
      • 2307.16883_HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
  • AI Agent
    • 通用 Agent
      • 2210.03629_ReAct
      • 2303.08268_Chat-with-the-Environment
        • 正文
      • 2303.11366_Reflexion: Language Agents with Verbal Reinforcement Learning
      • 2303.16434_TaskMatrix.AI
        • 大脑
        • 接口平台
        • API 选择器
      • 2304.03442_Generative-Agents
        • Generative Agent Architecture
      • 2307.07924_ChatDev: Communicative Agents for Software Development
      • 2308.00352_MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
      • 2308.04026_AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
      • 2308.08155_AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
      • 2308.10848_AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
        • 理念
      • 2310.06117_Step-Back: Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
      • 2312.04511_LLMCompiler: An LLM Compiler for Parallel Function Calling
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 2.1. Latency Optimization in LLMs(LLMs的延迟优化)
        • 2.2. Plan and Solve Strategy(计划与求解策略)
        • 2.3. Tool-Augmented LLMs(工具增强的LLMs)
        • 3. Methodology
        • 3.1. Function Calling Planner(功能调用规划器)
        • 3.2. Task Fetching Unit(任务获取单元)
        • 3.3. Executor(执行器)
        • 3.4. 动态重规划(Dynamic Replanning)
        • 4.LLMCompiler Details
        • 4.1. 用户提供的信息(User-Supplied Information)
        • 4.2. 流式Planner(Streamed Planner)
        • 5. Results
        • 6. Conclusions
        • 致谢(Acknowledgements)
        • A. Accuracy Analysis: ReAct vs. LLMCompiler
        • B. Failure Case Analysis of LLMCompiler
        • C. Related Work
        • D. Experimental Details
        • E. Analysis
        • 总结
        • F. Additional Discussions about Related Works
        • G. User-Supplied Examples for LLMCompiler Configuration
        • G.1 电影推荐示例提示语(Movie Recommendation Example Prompts)
        • G.2 24点游戏示例提示语(Game of 24 Example Prompts)
        • H. Pre-defined LLMCompiler Planner Prompts
        • I. ParallelQA Benchmark Generation
        • J. Details of the Game of 24 and the Tree-of-Thoughts Approach
        • K. Details of WebShop Experiments
      • 2402.18679_MetaGPT_DI: Data Interpreter: An LLM Agent For Data Science
        • INTRODUCTION
      • 2407.07061_IoA: Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
        • 2.1 OVERVIEW OF IOA
        • 2.2 ARCHITECTURE OF IOA
        • 2.3 KEY MECHANISMS
        • 2.5 Putting It All Together
      • 2408.08435_ADAS: Automated Design of Agentic Systems
        • Prompt
      • 2408.08435_ADAS: Automating Agentic Workflow Generation
        • Introduce
        • PRELIMINARY
      • 2410.17238_SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning
        • 1 Introduction
        • 2 Related Works
        • 3 Method
      • 2410.21012_FACT: Examining the Effectiveness of Iterative Context Rewriting for Multi-fact Retrieval
        • Introduce
      • 2504.01990_Advances and Challenges in Foundation Agents
      • 2506.12508_AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving
        • Abstract
        • 1.Introduction
        • 3.AgentOrchestra
        • 4.Experiments
      • 2510.08842_Maple: A Multi-agent System for Portable Deep Learning across Clusters
        • 总结
        • Abstract
        • I Introduction
        • II Background
        • III System Design
        • IV Implementation
        • V Experiments
        • VI Error Analysis
        • VII Related Work
        • VIII Conclusion
    • DeepResearch
      • 2509.13313_ReSum: Unlocking Long-Horizon Search Intelligencevia Context Summarization
        • 总结
        • From Moonlight
        • Abstract
        • 摘要(Abstract)总结
        • 1 Introduction
        • 1 引言(Introduction)
        • 2 Preliminary
        • 2. 预备知识(Preliminary)
        • 3 Methodology
        • 3 方法(Methodology)
        • 4 Experiments and Analysis
        • 5 Related Works
        • 5 相关工作总结
        • 6 Conclusion
        • 6 结论(Conclusion)
        • Appendix A Algorithm Pseudo-Code
        • Appendix B Prompt
        • Appendix C Implementation Details
        • 附录 C 实现细节(Appendix C Implementation Details)
        • Appendix D Discussion with MEM1
        • Appendix E Supplementary Materials for Experiments
        • 附录 E 实验补充材料
        • for user goal Extract number of specimens used in the study comparing jump performances of C. canis and C. felis felis as follows: …
        • 章节标题:Jump Performance Comparison of Ctenocephalides canis and Ctenocephalides felis felis
      • 2510.21618_❇️DeepAgent: A General Reasoning Agent with Scalable Toolsets
        • 总结
        • From Moonlight
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methodology
        • 4. Experimental Settings
        • 5. Experimental Results
        • 6. Conclusion
        • Appendix A Datasets
        • Appendix B Baselines
        • Appendix C Implementation Details
        • Appendix D Memory Schema
        • Appendix E Case Study
    • 视觉 Agent&AIOS
      • 2108.03353_ Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Dataset Creation
        • 4. Model Design
        • 其它
      • 2209.08199_ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Problem Setting: Tasks and Metrics
        • 4. Data Annotation
        • 5. Dataset Analysis
        • 6. Experiments and Baselines
        • 7. Conclusion
        • 8. Limitations
        • 9. Ethical Considerations
        • A. Data Annotation Details
        • B. Data Examples
      • 2212.06817_RT-1: ROBOTICS TRANSFORMER FOR REAL-WORLD CONTROL AT SCALE
        • ABSTRACT
        • 1. Introduction
        • 2. Related Work
        • 3. Preliminaries
        • 4. System Overview
        • 5. RT-1: ROBOTICS TRANSFORMER
        • 6. EXPERIMENTS
        • 7. CONCLUSIONS, LIMITATIONS AND FUTURE WORK
        • B. MODEL CARD
        • C. MODEL AND DATA
        • D. EXPERIMENTS
      • 2312.13771_AppAgent: Multimodal Agents as Smartphone Users
        • 3.1 Environment and Action Space
        • 3.2 Exploration Phase
        • 3.3 Deployment Phase
      • 2401.10935_SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
        • Abstract
        • 1. Introduction
        • 2. Related work
        • 3. Approach
        • 4. ScreenSpot: A Grounding Benchmark
        • 5. Experiments
        • 6. Conclusion
        • Limitations
        • Ethical considerations
        • A. Details of SeeClick Pre-training
        • B ScreenSpot Annotation & Evaluation
        • C. Downstream Agent Tasks
      • 2402.04615_ScreenAI: A Vision-Language Model for UI and Infographics Understanding
        • Abstract
        • 1. Introduction
        • 2. Methodology
        • 3. Automatic data generation
        • 4. Data Mixtures
        • 5. Experiments and Results
        • 6. Conclusions
        • A Definitions of Metrics
        • B. Screen Schema Examples
        • C. Prompts For LLM Generated Content
        • D. Screen Navigation Generated Examples
        • F. ScreenQA Short Answers Generation
        • G. Complex Question Answering Datasets
        • H. New Benchmarks Repositories
      • 2402.07939_UFO: A UI-Focused Agent for Windows OS Interaction
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.The Design of UFO
        • 4.Experiment
        • 5.Limitations & Lessons Learned
        • 6.Conclusion
      • 2403.16971_AIOS: LLM Agent Operating System
        • Abstract
        • 1. Introduction
        • 2. The Architecture of AIOS
        • 3. AIOS Kernel
        • 4 Evaluation
        • Appendix E Discussion
      • 2406.01014_Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
      • 2411.00820_AutoGLM: Autonomous Foundation Agents for GUIs
        • 总结
        • Abstract
        • 1 Introduction
        • 2 AutoGLM: Techniques and Insights
        • 3 Results
        • 3.1 在 Web 上的评估
        • 3.2 在 Android 上的评估
        • 4 Conclusion
      • 2411.02059_TableGPT2: A Large Multimodal Model with Tabular Data Integration
        • Abstract
      • 2501.11733_Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
        • Abstract
        • 1. Introduction
        • 2. Mobile-Agent-E
        • 3. Experiments
        • 4. Results
        • 5. Related Work
        • 6. Conclusion and Future Work
        • Appendix A Full Trajectory Comparison Example with Previous SOTA
        • Appendix B Error Recovery with Escalation to Manager
        • Appendix C Remaining Limitations
        • Appendix D All Tasks in Mobile-Eval-E Benchmark
        • Appendix E Atomic Operation Space
        • Appendix F Full list of Self-Evolved Shortcuts
        • Appendix G Full list of Self-Evolved Tips
      • 2501.12326_UI-TARS: Pioneering Automated GUI Interaction with Native Agents
        • Abstract
        • 1. Introduction
        • 2. Evolution Path of GUI Agents
        • 3. Core Capabilities of Native Agent Model
        • 4. UI-TARS
        • 5. Experiment
        • 6. Conclusion
      • 2502.14282_PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
        • Abstract
        • 1. Introduction
        • 2. PC-Agent
        • 3. Experiments
        • 4. Related Work
        • 5. Conclusion
      • 2504.14603_UFO2: The Desktop AgentOS
        • Abstract
        • 1.Introduction
        • 2.Background
        • 3.System Design of UFO2
        • 4.Picture-in-Picture Interface
        • 5.Implementation and Specialized Engineering Design
        • 6.Evaluation
        • 7.Discussion & Future Work
        • 8.Related Work
        • 9.Conclusion
      • 2508.04037_SEA: Self-Evolution Agent with Step-wise Reward for Computer Use
        • 总结
        • Abstract
        • I Introduction
        • I 引言
        • II Related Works
        • II Related Works
        • 总结
        • III Method
        • 总结
        • IV Experiments
        • IV 实验
        • V Conclusion
        • V 结论
    • 音频 Agent
      • 2509.06221_Beamforming-LLM: What, Where and When Did I Miss?
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methods
        • 4. Results
        • 5. Discussion and Conclusion
    • Tools
      • 2205.00445_MRKL
      • 2302.04761_Toolformer: Language Models Can Teach Themselves to Use Tools
      • 2303.17580_HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
      • 2307.16789_ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
        • 总结
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 Dataset Construction
        • 3 Experiments
        • 4 Related Work
        • 5 Conclusion
        • Appendix
        • Appendix A Implementation Details
    • AGI
      • 1905.10985_AI-GA: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence
      • 2408.06292_The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
  • RAG
    • 2005.11401_Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
    • 2312.10997_Retrieval-Augmented Generation for Large Language Models: A Survey
      • II. Overview of RAG
        • II-A Naive RAG
        • II-B Advanced RAG
        • II-C Modular RAG
        • II-D RAG vs Fine-tuning
      • III. Retrieval
        • III-A Retrieval Source
        • III-B Indexing Optimization
        • III-C Query Optimization
        • III-D Embedding
        • III-E Adapter
      • IV. Generation
        • IV-A Context Curation
        • IV-B LLM Fine-tuning
      • V. Augmentation process in RAG
        • V-A Iterative Retrieval
        • V-B Recursive Retrieval
        • V-C Adaptive Retrieval
      • VI. Task and Evaluation
        • VI-A Downstream Task
        • VI-B Evaluation Target
        • VI-C Evaluation Aspects
        • VI-D Evaluation Benchmarks and Tools
      • VII. Discussion and Future Prospects
        • VII-A RAG vs Long Context
        • VII-B RAG Robustness
        • VII-C Hybrid Approaches
        • VII-D Scaling laws of RAG
        • VII-E Production-Ready RAG
        • VII-F Multi-modal RAG
    • 2401.15884_CRAG: Corrective Retrieval Augmented Generation
    • 2403.14403_Adaptive-RAG
    • 2404.12457_RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
      • 总结
      • Abstract
      • 1. Introduction
        • 1. 引言概述
        • 2. 现有工作与局限
        • 3. RAGCache系统
        • 4. 实验结果
        • 5. 主要贡献
      • 2. Background
      • 3. RAG System Characterization
        • 一、性能瓶颈分析
        • 二、优化机会分析 —— 缓存中间状态
        • 总结
      • 4. RAGCache Overview
        • 主要内容总结如下:
        • 总结
      • 5. RAGCache Design
        • 5.1. Cache Structure and Replacement Policy
        • 5.2. Cache-aware Reordering
        • 5.3 动态推测流水线(Dynamic Speculative Pipelining)
        • 总结
      • 6. Implementation
        • 系统实现
        • 向量搜索优化(Pipelined Vector Search)
        • 容错机制(Fault Tolerance)
      • 7. Evaluation
        • 7.1 总体性能
        • 7.2 通用设置下的案例研究
        • 7.3 消融研究
        • 7.4 调度时间
        • 总结
      • 8. Discussion
      • 9. Related Work
      • 10. Conclusion
    • 2404.16130_GraphRAG: From Local to Global: A GraphRAG Approach to Query-Focused Summarization
      • 总结
      • LLM 总结
      • Abstract
      • 1 Introduction
      • 2 Background
        • 2.1 RAG方法与系统
        • 2.2 知识图谱在LLM与RAG中的应用
        • 2.3 自适应基准测试
        • 2.4 RAG评估标准
      • 3 Methods
        • 3.1 GraphRAG 工作流程
        • 3.2 全局理解问题生成
        • 3.3 全局理解评估标准
        • 总结
      • 4 Analysis
        • 4.1 实验1
        • 4.2 实验2
        • 总结
      • 5 Results
        • 5.1 实验一:不同方法在摘要任务中的表现比较
        • 5.2 实验二:基于声明的指标评估
        • 总结
      • 6 Discussion
        • 6.1 评估方法的局限性
        • 6.2 未来工作
        • 更广泛的影响
      • 7 Conclusion
      • Appendix A Entity and Relationship Extraction Approach
        • 1. 实体与关系抽取方法
        • 2. 自我反思(Self-Reflection)技术
        • 3. 分块大小与抽取效果的关系
        • 4. 实验结果(图3)
        • 总结
      • Appendix B Example Community Detection
      • Appendix C Context Window Selection
      • Appendix D Example Answer Comparison
      • Appendix E System Prompts
        • E.1 实体实例生成(Element Instance Generation)
        • E.2 社区摘要生成(Community Summary Generation)
        • E.3 社区问题回答生成(Community Answer Generation)
        • E.4 全局问题回答生成(Global Answer Generation)
      • Appendix F Evaluation Prompts
        • F.1 Relative Assessment Prompt
        • F.2 Relative Assessment Metrics
      • Appendix G Statistical Analysis
        • 统计方法:
        • 主要结果总结:
        • 总体趋势:
        • 重要结论:
    • 2405.16506_GRAG: Graph Retrieval-Augmented Generation
      • 总结
      • LLM 总结
      • Abstract
      • 1 Introduction
      • 2 Related Work
        • 2.1 Prompt Tuning
        • 2.2 LLMs在图相关任务中的应用
        • 2.3 图上的检索方法
      • 3 Problem Formalization
      • 4 Methodology
        • 概述
        • 4.1 文本子图检索
        • 文本子图索引(Indexing)
        • 文本子图排序(Ranking)
        • 文本子图软剪枝(Soft Pruning)
        • 总结
        • 4.2 Textual Graph Augmented Generation
        • 1. 文本视图(Text View of Textual Graphs)
        • 2. 图视图(Graph View of Textual Graphs)
        • 3. 生成阶段(Generation Phase)
        • 总结
      • 5 Experiments
        • 总结:第五章 实验部分
      • 6 Conclusion
      • 7 Limitations
      • Acknowledgments
      • Appendix A Appendix
        • 附录A 总结
        • 总结
    • 2406.13213_Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata
    • 2410.05779_LightRAG: Simple and Fast Retrieval-Augmented Generation
      • 总结
      • Abstract
      • 1 Introduction
      • 2 Retrieval-Augmented Generation
      • 3 The LightRAG Architecture
        • 一、LightRAG架构概述
        • 二、基于图的文本索引(Graph-based Text Indexing)
        • 三、双层检索范式(Dual-level Retrieval Paradigm)
        • 四、检索增强的答案生成(Retrieval-Augmented Answer Generation)
        • 五、复杂度分析
        • 总结
      • 4 Evaluation
        • 1. 实验设置(4.1 Experimental Settings)
        • 2. LightRAG 与现有 RAG 方法的对比(4.2 RQ1)
        • 3. 消融实验(4.3 RQ2)
        • 总结
        • 4.4 Case Study (RQ3)
        • 4.4 案例研究(RQ3)总结:
        • 4.5 模型成本与适应性分析(RQ4)总结:
        • 总体结论:
      • 5 Related Work
        • 第5章 相关工作(总结)
      • 6 Conclusion
      • 7 Appendix
    • 2410.10450_KBLaM: Knowledge Base augmented Language Model
      • Abstract
      • 1. Introduction
      • 2. Related work
      • 3. Background
        • Self-attention layer
      • 4. Augmenting LLM with the KB
        • Knowledge tokens
        • Rectangular Attention: Injecting knowledge token into prompt tokens
        • KB length generalization through attention score scaling
      • 5. KB instruction tuning
      • 6. EXPERIMENTS
        • 6.1 EXPERIMENT SETTING
        • 6.2 EXPERIMENT RESULTS
        • 总结亮点
      • 7. CONCLUSION
      • 8. LIMITATIONS AND FUTURE WORK
      • Appendix A Extended related work
      • Appendix B Ablation study
      • Appendix C Sample KB
      • SAMPLE Q&A
      • PROMPT
        • PROMPT FOR SYNTHETIC KB GENERATION
        • Prompt for open-ended Q&A generation
        • PROMPT FOR GPT EVALUATION OF OPEN-ENDED Q&A
        • PROMPT FOR LLAMA EVALUATION
        • QUESTION TEMPLATE
      • SAMPLE OUTPUT
        • SYNTHETIC KB
        • ENRON
    • 2504.03137_LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph
      • Abstract
      • Introduction
      • Related Work
        • LLM Prompt Engineering
        • KG-based LLM Reasoning
      • Preliminaries
        • 1. Knowledge Graph (KG)
        • 2. Anchor Entities
        • 3. Relation Link
        • 4. Reasoning Path
      • Methodology
        • Stage1: Reasoning Graph Retrieval
        • Stage2: Knowledge Embedding
        • Stage3: Knowledge Prompts Mixed Reasoning
      • Experiments
      • Conclusion
    • GraphRAG 官方文档
      • Indexing
        • > Indexing Architecture
        • > Indexing Dataflow
        • > Prompt Tuning
      • Query
  • 论文池
    • 2501.12948❇️_DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 1. Introduction(引言)
        • 2. Approach(方法)
        • 3. Experiment(实验)
        • 4. Discussion(讨论)
        • 5. Conclusion, Limitations, and Future Work(结论、局限与未来工作)
        • 6. A Contributions and Acknowledgments(贡献与致谢)
        • 总结重点
      • 1 Introduction
      • 1.1 Contributions(贡献)
        • 后训练:基于基础模型的大规模强化学习
        • 蒸馏:小模型也能很强大
      • 1.2 Summary of Evaluation Results(评估结果概要)
        • 推理任务
        • 知识任务
        • 其他任务
        • 小结
      • 2 Approach
      • 2.1 Overview(概述)
      • 2.2 DeepSeek-R1-Zero: 强化学习应用于基础模型
        • 2.2.1 强化学习算法
        • 2.2.2 奖励建模
        • 2.2.3 训练模板
        • 2.2.4 DeepSeek-R1-Zero的性能、自进化过程与“顿悟时刻”
      • 2.3 DeepSeek-R1: 强化学习结合冷启动
        • 2.3.1 冷启动(Cold Start)
        • 2.3.2 推理导向的强化学习
        • 2.3.3 拒收采样与监督微调(SFT)
        • 2.3.4 面向所有场景的强化学习
      • 2.4 蒸馏:将推理能力赋予小型模型
        • 总结
      • 3 Experiment
        • 3 Experiment 实验部分总结
        • 3.1 DeepSeek-R1 评估
        • 3.2 蒸馏模型评估
        • 总体总结
      • 4 Discussion
      • 4 讨论
        • 4.1 知识蒸馏 与 强化学习
        • 4.2 不成功的尝试
        • 总结
      • 5 Conclusion, Limitations, and Future Work
        • 总结
        • 局限性与未来工作
        • 总结重点
      • Appendix
        • 1. 附录的作用
        • 2. 常见附录内容
        • 3. 附录的编写规范
        • 4. 注意事项
      • Appendix A Contributions and Acknowledgments
      • 附录 A 贡献与致谢
        • 贡献者
        • 特别说明
        • 生成信息
    • 2504.03182_Graphiti: Bridging Graph and Relational Database Queries
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 关键点总结:
        • 附加信息:
      • 1. Introduction
        • 背景与问题
        • 研究目标
        • 核心贡献
        • 方法流程(图1)
        • 实验与实现
        • 总结贡献
      • 2. Motivating Example
        • 2.1 图数据库与关系数据库的对应关系
        • 2.2 SQL 与 Cypher 查询的语义差异
        • 2.3 数据库转换器(Database Transformer)
        • 2.4 诱导关系模式(Induced Relational Schema)
        • 2.5 语法导向的转译(Syntax-Directed Transpilation)
        • 2.6 查询等价性验证
        • 总结
      • 3. Preliminaries
        • 3. Preliminaries(预备知识)
        • 总结
      • 4. Problem Statement
        • 4. 问题陈述(Problem Statement)总结
        • 4.1 数据库转换语言(Language for Database Transformers)
        • 4.2 等价性检查问题(Equivalence Checking Problem)
        • 总结
      • 5. Equivalence Checking Algorithm
        • 概述
        • 5.1. 诱导关系模式和标准转换器推断
        • 5.2. 语法导向的转译
        • 5.3. 归约到SQL等价性检查
      • 6. Evaluation
        • Benchmarks(基准测试集)
        • 6.1. 使用 BMC 后端的 Graphiti 评估(VeriEQL)
        • 6.2. 使用演绎验证器的 Graphiti 评估(Mediator)
        • 6.3. 转译质量评估
        • 总结
      • 7. Related Work
        • 1. SQL 的自动推理(Automated reasoning for SQL)
        • 2. 数据库实例之间的迁移(Migration between database instances)
        • 3. 数据表示重构(Data representation refactoring)
        • 4. 图数据库查询语言(Graph database query languages)
        • 5. 数据库查询测试(Testing database queries)
        • 6. Cypher 查询转译工具(Transpiling Cypher queries)
        • 总结
      • 8. Limitation
        • 主要局限:
        • 重点说明:
        • 实用性验证:
        • 未来方向:
        • 总结:
      • 9. Conclusion and Future Work
        • 9. 结论与未来工作
      • Appendix A Semantics of Cypher Queries
        • 查询语义
        • 子句语义
        • 路径模式语义
        • 表达式语义
        • 谓词语义
      • Appendix B Transpilation of Cypher Predicates and Expressions
        • 1. 表达式的转译规则(Figure 21)
        • 2. 谓词的转译规则(Figure 22)
        • 示例 B.1
        • 总结
      • Appendix C An Equivalent Cypher Query of Motivating Example
        • 原始 Cypher 查询的问题
        • 修正后的 Cypher 查询
        • 查询结构(重点内容)
        • 总结
      • Appendix D Qualitative Analysis of Manually-Written Buggy Queries
        • 1. 使用嵌套 MATCH 而非存在性模式(Existential Pattern)
        • 2. 错误使用路径模式(Path Pattern)进行 OPTIONAL MATCH
        • 3. 同一标签的节点或边使用不当
        • 总结
      • Appendix E Comparing Graphiti’s Transpiler with OpenCypherTranspiler
        • 原文结构总结
        • 1. 总体比较
        • 2. 转译结果表格分析(Table 5)
        • 3. OpenCypherTranspiler 的典型错误示例
        • 总结
      • Appendix F Proofs
        • 定理 F.1(翻译的正确性)
        • 引理 F.2
        • 引理 F.3
        • 引理 F.4
        • 引理 F.5
        • 定理 F.6(翻译的完备性)
        • 引理 F.7
        • 引理 F.8
        • 引理 F.9
        • 引理 F.10
        • 引理 F.11
        • 引理 F.12
        • 定理 F.13(正确性)
        • 定理 F.14(完备性)
    • 2505.00675_Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 1 Introduction
      • 2 Memory Foundations
      • 2 记忆基础
        • 2.1 记忆分类
        • 2.2 记忆操作
        • 2.3 记忆管理
        • 2.4 记忆使用
        • 小结
      • 3 From Operations to Key Research Topics
      • 3 从操作到关键研究主题
        • 3.1 长期记忆
        • 3.2 长上下文
        • 3.3 参数化记忆修改
        • 3.4 多源记忆
        • 未来方向
      • 4 Memory In Practice
      • 4 Memory In Practice(记忆在实践中的应用)
        • 4.1 Applications(应用)
        • 4.2 Products(产品)
        • 4.3 Tools(工具)
      • 5 Memory in Humans and AI Systems
      • 5 人类与人工智能系统的记忆
        • 记忆系统的基本功能与结构
        • 人类与人工智能记忆的差异
        • 面向未来:记忆系统带来的挑战
        • 表2:人类与智能体记忆的关键差异
      • 6 Open Challenges and Future Directions
      • 6 开放性挑战与未来方向
        • 6.1 专题方向
        • 6.2 更广泛视角(Broader Perspectives)
      • Appendix A GPT-based Pipeline Selection
      • 附录 A 基于 GPT 的流水线选择
      • Appendix B Relative Citation Index
      • Appendix B: Relative Citation Index
        • 1. 论文“年龄”计算方法
        • 2. 引用与年龄关系的建模方式
        • 3. 数据收集与处理
        • 4. RCI 的计算公式
        • 5. RCI 的应用与发现
        • 6. 图表说明
        • 总结
      • Appendix C Chord Analysis of Interactions Among Memory Types, Operations, Topics, and Venues
      • 附录C 记忆交互的和声分析:类型、操作、主题和会议场所
        • C.1 记忆类型、操作和主题的交互
        • C.2 记忆交互在会议场所中
        • 附录C相关图表和表格总结
        • 附录C总结
    • 2507.19849_Agentic Reinforced Policy Optimization
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
    • Agentic Reinforced Policy Optimization
    • Agentic Reinforced Policy Optimization
      • Agentic Reinforced Policy Optimization
        • 摘要(Abstract)
        • 1. 引言(Introduction)
        • 2. 相关工作(Related Work)
        • 3. 方法(Methodology)
        • 4. 实验(Experiments)
        • 5. 讨论(Discussion)
        • 6. 结论(Conclusion)
      • Agentic Reinforced Policy Optimization
        • 标题:Agentic Reinforced Policy Optimization(ARPO)
        • 作者与单位
        • 联系方式与项目链接
        • 备注说明
        • 总结
      • Abstract
      • 摘要(Abstract)总结
      • 1 Introduction
      • 1 引言(Introduction)
        • 背景与动机
        • 现有方法的局限性
        • 问题分析与观察
        • 提出方法:ARPO
        • 实验与结果
        • 主要贡献总结
      • 2 Preliminary
      • 2 预备知识(Preliminary)
        • 2.1 基于智能体的强化学习(Agentic Reinforcement Learning)
        • 2.2 推理过程中的Token熵分析(Analyzing Token Entropy in Agentic Reasoning)
        • 2.3 智能体工具设计(Agentic Tool Design)
      • 3 Agentic Reinforce Policy Optimization
        • 3. Agentic Reinforce Policy Optimization (ARPO)
        • 总结
      • 4 Experiment
      • 4 实验
        • 4.1 数据集
        • 4.2 基线方法
        • 4.3 训练指南
        • 4.4 评估指标
        • 4.5 主要结果
        • 4.6 定量分析
        • 4.7 ARPO的扩展性分析
      • 5 Related Work
      • 5 相关工作(Related Work)总结
        • 5.1 可验证奖励的强化学习(Reinforcement Learning with Verifiable Reward)
        • 5.2 代理式强化学习(Agentic Reinforcement Learning)
        • 总结
      • 6 Conclusion
      • 6 结论(Conclusion)
        • 核心内容讲解:
        • 小结:
      • Appendix
      • Appendix A Datasets
      • Appendix A Datasets
        • A.1 Mathematical Reasoning Benchmarks
        • A.2 Knowledge-Intensive Reasoning Benchmarks
        • A.3 Deep Search Benchmarks
        • 总结
      • Appendix B Baselines
        • 附录 B 基线模型
        • 总结
      • Appendix C Implementation Details
        • 附录 C 实现细节总结
        • 总结
      • Appendix D Theoretical Analysis and Proofs
      • 附录 D 理论分析与证明
        • D.1 软优势估计的理论分析
        • D.2 GPG 定理的理论证明
      • Appendix E The Algorithm Workflow of ARPO
      • 附录 E:ARPO 算法流程
        • 输入参数
        • 算法流程
        • 输出
        • 重点内容总结
        • 不重要内容精简
      • Appendix F Case Study
      • 附录 F 案例研究
        • 表 4:HLE 数据集中的一个例子
        • 表 5:GAIA 数据集中的一个例子
        • 表 6:GAIA 数据集中的另一个例子
        • 表 7:HLE 数据集中的另一个例子
        • 表 8:AIME24 数据集中的一个例子
        • 表 9:HotpotQA 数据集中的一个例子
    • 2511.20857_Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 1 Introduction
        • 核心问题:记忆系统的静态性
        • 现有基准的局限性
        • 解决方案:Evo-Memory
        • 覆盖任务类型与记忆模块
        • 新方法:ExpRAG 与 ReMem
        • 贡献总结
      • 2 Related Work
        • 2.1 测试时学习
        • 2.2 自演化记忆
        • 图3:ReMem 智能体框架概述
      • 3 Evo-Memory: Evaluating Self-Evolving Memory in LLM Agents
        • 概述
        • 3.1 问题设定(Problem Formulation)
        • 3.2 ExpRAG: Experience Retrieval and Aggregation
        • 3.3 ReMem: Synergizing Reasoning, Acting, and Memory
        • 总结
      • 4 Experiments
        • 4.1 实验设置
        • 4.2 实验
        • 4.3 结果分析(RQ1)
        • 4.4 记忆改进分析(RQ2)
        • 4.5 任务序列:简单 vs. 困难(RQ3)
        • 4.6 反馈分析(RQ4)
        • 4.7 时间步性能(RQ5)
      • 5 Conclusion
      • 附录(Appendix)
        • 1. 2.1 Test-time Learning(测试时学习)
        • 2. 2.2 Self-evolving Memory(自演化记忆)
        • 3.1 Problem Formulation(问题定义)
        • 3.2 ExpRAG: Experience Retrieval and Aggregation(经验检索与聚合)
        • 3.3 ReMem: Synergizing Reasoning, Acting, and Memory(推理、行为与记忆的协同)
        • 4.1 Experimental Setup(实验设置)
        • 4.2 Experiments(实验设计)
        • 4.3 Analysis of Results (RQ1)(结果分析 - 研究问题1)
        • 4.4 Analysis of Memory Improvement (RQ2)(记忆改进分析 - 研究问题2)
        • 4.5 Task Sequence: Easy vs. Hard (RQ3)(任务顺序影响 - 研究问题3)
        • 4.6 Analysis of Feedback (RQ4)(反馈机制分析 - 研究问题4)
        • 4.7 Performance w.r.t Time Steps (RQ5)(时间步长性能分析 - 研究问题5)
      • 6. A Experimental Details(实验细节)
        • A.1 Datasets(数据集详情)
        • A.2 Configuration(配置参数)
        • A.3 Evaluation(评估细节)
        • A.4 Methods(方法实现细节)
      • 7. B Experiments(补充实验)
        • B.1 Additional Experiments(附加实验)
        • B.2 Additional Analysis of Memory Pruning(记忆剪枝分析)
        • B.3 Additional Comparative Curves on Single-turn Tasks(单轮任务对比曲线)
      • 8. C Prompts(提示模板)
      • 9. D Limitations(局限性)
      • 10. E Use of Large Language Models(大语言模型使用说明)
      • Appendix A Experimental Details
        • A.1 数据集(Datasets)
        • A.2 配置(Configuration)
        • A.3 评估(Evaluation)
        • A.4 方法(Methods)
        • 总结
      • Appendix B Experiments
        • B.1 附加实验
        • B.2 记忆剪枝的附加分析
        • B.3 单轮任务的附加对比曲线
        • 总结
        • 总结:
      • Appendix D Limitations
      • Appendix E Use of Large Language Models
        • 1. 使用目的
        • 2. 使用范围
        • 3. 结论
    • 2512.10696_Framework for Experience-Driven Agent Evolution
      • 总结
      • 图解
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 核心创新点(ReMe 的三大机制)
        • 实验与结果
        • 总结
      • 1 Introduction
        • 1.1 背景与动机
        • 1.2 理想程序性记忆系统的三大核心标准
        • 1.3 当前方法的局限性
        • 1.4 提出的方法:ReMe 框架
        • 1.5 实验结果与贡献
        • 1.6 主要贡献
      • 2 Related Works
        • 2.1 增强记忆的LLM智能体(Memory-enhanced LLM Agents)
        • 2.2 经验学习策略(Experience Learning Strategies)
        • 图2:ReMe框架概述(图示说明)
      • 3 Methodology
        • 3.1 ReMe 概览
        • 3.2 经验获取
        • 3.3 经验复用
        • 3.4 经验精炼
        • 表1:ReMe 与基线模型在 BFCL-V3 和 AppWorld 上的性能对比
        • 总结
      • 4 Experiments
        • 4.1 实验设置
        • 4.2 主要结果
        • 4.3 消融研究
        • 4.4 更多分析
      • 总结
      • 5 Conclusion
      • Limitations
        • 1. 固定的经验检索策略
        • 2. 经验验证机制的局限性
        • 3. 模型规模与总结能力的关系
      • Appendix A Dataset Details
        • BFCL-V3
        • AppWorld
      • Appendix B Baseline Details
        • LangMem
        • A-Mem
        • 总结
      • Appendix C Implementation Details
        • C.1 经验获取(Experience Acquisition)
        • C.2 经验检索(Experience Retrieval)
      • Appendix D Experience Examples
        • 1. ReMe 方法的经验提取示例
        • 2. 经验粒度的影响分析
        • 3. 不同粒度经验的结构与内容对比
        • 总结
      • Appendix E Additional Experimental Results
        • E.1 Retrieval Key Analysis(检索键分析)
        • E.2 Prompt Examples for Experience Extraction(经验提取的提示示例)
        • 总结
  • 论文池-sum
  • 论文待回收池
    • 2009.01325_Learning to summarize from human feedback
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 研究背景
        • 研究方法
        • 研究成果
        • 分析与验证
        • 研究意义
      • 1 Introduction
        • 背景与问题
        • 研究目标与任务选择
        • 方法概述
        • 主要贡献
        • 长期意义
      • 2 Related work
        • 与我们工作最直接相关的工作
        • 其他使用人类反馈的研究
        • 强化学习与自动评价指标
        • 模型结构与预训练方法的改进
      • 3 Method and experiment details
        • 3.1 高层方法论(High-level methodology)
        • 3.2 数据集与任务
        • 3.3 收集人类反馈(Collecting human feedback)
        • 3.4 模型(Models)
      • 4 Results
        • 4.1 基于人类反馈的 Reddit 帖子摘要
        • 4.2 迁移到新闻文章摘要
        • 4.3 理解奖励模型
        • 4.4 摘要自动评估指标分析
      • 5 Discussion
        • 1. Limitations(局限性)
        • 2. Future directions(未来方向)
        • 3. Broader impacts(更广泛影响)
        • 4. Acknowledgements(致谢)
      • Appendix A TL;DR dataset details
        • 数据集构成
        • 数据预处理步骤
        • 数据集局限性说明
      • Appendix B Further model training details
        • B.1 超参数设置
        • B.2 输入格式
        • 总结重点
      • Appendix C Human data collection details
        • C.1 Process for ensuring high-quality human data
        • C.2 Assessing human feedback quality
        • C.3 Labeler demographics
        • C.4 Labeler website
        • C.5 Instructions for labelers
        • C.6 Composition of the labeled dataset
        • C.7 Example comparison tasks
      • Appendix D Choice of baselines
      • Appendix E CNN/DM lead-3 vs reference summaries
        • 主要发现
        • 控制长度后的分析
        • 对摘要方法的质疑
        • 标注者行为分析
        • 参考摘要表现差的原因
        • 结论
      • Appendix F Controlling for summary length
        • 1. 控制摘要长度的背景与方法
        • 2. 实验结果与分析
        • 3. CNN/DM数据集上的长度控制实验
      • Appendix G Additional results
        • G.1 价值函数消融实验
        • G.2 沿质量维度评估策略
        • G.3 最优-N 优化研究
        • G.4 ROUGE分数
        • G.5 二元组重叠统计
        • G.6 奖励模型验证集
        • G.7 不同评估指标的一致性
        • 总结
      • Appendix H Samples
        • H.1 随机样本
        • H.2 过度优化样本
    • 2305.16300_Random-Access Infinite Context Length for Transformers
      • Abstract
      • 1 Introduction
      • 2 Related Work
      • 3 Methodology
        • 总体思路
        • 方法详解
        • 位置编码处理
        • 与其他方法的对比
        • 总结
        • 3.3 Memory & Computation
      • 4 Experiments
        • 4.1 语言建模实验
        • 4.2 微调预训练模型
        • 总结
      • 5 Future Work
      • 6 Conclusion
      • Acknowledgment
      • Appendix A Grouped Softmax Example
      • Appendix B Dataset Description
      • Appendix C Number of Unique Retrieved Blocks
      • Appendix D Context Miss Token
      • Appendix E Positional Augmentation
      • Appendix F Additional Extensions and Details
        • 1. 掩码语言建模(Masked Language Modeling)
        • 2. 与 Flash Attention 的结合
        • 3. 检索块数量与块大小的权衡
        • 总结
      • Appendix G Offloading KV Cache to CPU
    • 2405.17935_Tool Learning with Large Language Models: A Survey
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 关键词总结:
        • 重点内容强调:
        • 不重要内容精简:
      • 1 Introduction
        • 核心观点:
        • 1.1 历史背景与工具的重要性
        • 1.2 当前技术趋势:LLMs 的发展与局限
        • 1.3 工具学习的兴起
        • 1.4 研究现状与趋势
        • 1.5 本文结构与贡献
        • 1.6 与其他综述的比较
        • 1.7 本文结构图(Figure 2)
        • 1.8 GitHub 资源
        • 总结:
      • 2 Background
      • 2 背景(Background)
        • 什么是工具(What is a Tool?)
        • 什么是工具学习(What is Tool Learning?)
        • 总结
      • 3 Why Tool Learning?
      • 3 为什么需要工具学习?
        • 3.1 知识获取
        • 3.2 专业能力增强
        • 3.3 自动化与效率提升
        • 3.4 交互增强
        • 3.5 增强可解释性与用户信任
        • 3.6 提升鲁棒性与适应性
        • 总结图示(图3)
      • 4 How Tool Learning?
      • 4 工具学习的机制
        • 4.1 工具学习的整体范式
        • 4.2 任务规划(Task Planning)
        • 4.3 工具选择(Tool Selection)
        • 4.4 工具调用(Tool Calling)
        • 4.5 响应生成(Response Generation)
        • 表格:工具学习基准数据集汇总
        • 总结
      • 5 Benchmarks, Toolkits, and Evaluation
      • 5. Benchmarks(基准测试)
        • 5.1.1 通用基准(General Benchmarks)
        • 5.1.2 特定任务基准(Other Benchmarks)
      • 5.2 Toolkits(工具包)
      • 5.3 Evaluation(评估方法)
        • 5.3.1 任务规划(Task Planning)
        • 5.3.2 工具选择(Tool Selection)
        • 5.3.3 工具调用(Tool Calling)
        • 5.3.4 响应生成(Response Generation)
      • 总结
      • 6 Challenges and Future Directions
      • 6 挑战与未来方向(Challenges and Future Directions)
        • 6.1 工具学习中的高延迟问题(High Latency in Tool Learning)
        • 6.2 严谨而全面的评估体系(Rigorous and Comprehensive Evaluation)
        • 6.3 全面且易获取的工具集(Comprehensive and Accessible Tools)
        • 6.4 安全与鲁棒的工具学习(Safe and Robust Tool Learning)
        • 6.5 统一的工具学习框架(Unified Tool Learning Framework)
        • 6.6 真实世界的工具学习基准(Real-World Benchmark for Tool Learning)
        • 6.7 多模态工具学习(Tool Learning with Multi-Modal)
      • 总结
      • 7 Conclusion
      • 7 结论(总结)
        • 主要内容结构如下:
        • 1. 引言与基础概念
        • 2. 工具学习的重要性
        • 3. 工具学习的四个阶段
        • 4. 评估方法与基准测试
        • 5. 挑战与未来方向
        • 最后
        • 其他信息:
    • 2409.20163_MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 摘要总结
      • 1 Introduction
      • 1 Introduction
        • 本文的主要贡献如下:
        • 后续章节安排如下:
      • 2 Related Works
      • 2 相关工作
        • LLM-based agents 的应用与记忆机制
        • LLM-based agents 记忆能力的评估
        • 知识库问答(KBQA)与记忆评估的关联
        • 本文工作的贡献
      • 3 Methods
      • 3.1 Overview of MemSim
      • 3.2 Bayesian Relation Network
      • 3.3 Causal Generation Mechanism
      • 3.4 MemDaily: A Dataset in the Daily-life Scenario
      • 总结
      • 4 Evaluations
      • 4 评估(Evaluations)
      • 4.1 用户画像评估(Evaluation on User Profiles)
        • 评估指标
        • 基线方法
        • 评估结果
      • 4.2 用户消息评估(Evaluation on User Messages)
        • 评估指标
        • 基线方法
        • 评估结果
      • 4.3 问题与答案评估(Evaluation on Questions and Answers)
        • 评估结果
        • 总结
      • 5 Benchmark
      • 5 Benchmark 总结
        • 5.1 Experimental Settings(实验设置)
        • 5.2 Memory Mechanisms 的有效性(Effectiveness of Memory Mechanisms)
        • 5.3 Memory Mechanisms 的效率(Efficiency of Memory Mechanisms)
        • 总结
      • 6 Limitations and Conclusions
      • 6 局限与结论
      • Appendix A Proof in Bayesian Relation Network
      • 附录 A 贝叶斯关系网络的证明
        • A.1 定理 1(因子化)的证明
        • A.2 定理 2(祖先采样)的证明
      • 总体总结
      • Appendix B Extensive Evaluation on User Messages by GPT-4o
      • 附录 B GPT-4o 对用户消息的广泛评估
        • 表 10:GPT-4o 对用户消息评估的结果
      • Appendix C Extensive Benchmark on More Composite Datasets
      • 附录 C:在更多复合数据集上的广泛基准测试
        • C.1 MemDaily-10 的结果
        • C.2 MemDaily-50 的结果
        • C.3 MemDaily-200 的结果
        • 总结
      • Appendix D Case Studies
      • D.1 Case Study on Generated User Profiles
      • D.2 Case Study on User Messages
      • D.3 Case Study on Questions and Answers
        • 总结
    • 2411.00489_Human-inspired Perspectives: A Survey on AI Long-term Memory
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Human-inspired Perspectives: A Survey on AI Long-term Memory
        • 1. 引言(Introduction)
        • 2. 人类长期记忆的结构与机制(Structure and Mechanisms of Human Long-term Memory)
        • 3. AI系统中的长期记忆建模(Modeling Long-term Memory in AI Systems)
        • 4. 与人类记忆的类比分析(Human-inspired Analysis of AI Memory Systems)
        • 5. 应用场景(Applications of AI Long-term Memory)
        • 6. 挑战与未来方向(Challenges and Future Directions)
      • 总结
    • Human-inspired Perspectives: A Survey on AI Long-term Memory
      • 1. 引言(Introduction)
      • 2. 人类长期记忆的结构与机制(Structure and Mechanisms of Human Long-term Memory)
      • 3. AI系统中的长期记忆建模(Modeling Long-term Memory in AI Systems)
        • 3.1 编码阶段建模(Encoding)
        • 3.2 巩固阶段建模(Consolidation)
        • 3.3 提取阶段建模(Retrieval)
      • 4. 人类启发的AI长期记忆系统(Human-inspired AI Long-term Memory Systems)
      • 5. 挑战与未来方向(Challenges and Future Directions)
      • 6. 结论(Conclusion)
      • 总结评价
      • Human-inspired Perspectives: A Survey on AI Long-term Memory
      • 第一章:引言(Introduction)
        • 内容概述:
        • 重点内容:
        • 其他:
      • 第二章:人类长期记忆机制(Human Long-term Memory Mechanisms)
        • 内容概述:
        • 重点内容:
        • 其他:
      • 第三章:AI系统中的长期记忆建模(Modeling Long-term Memory in AI Systems)
        • 内容概述:
        • 分类与重点内容:
        • 其他:
      • 第四章:评估与挑战(Evaluation and Challenges)
        • 内容概述:
        • 重点内容:
        • 其他:
      • 第五章:未来方向(Future Directions)
        • 内容概述:
        • 重点内容:
      • 第六章:结论(Conclusion)
        • 内容概述:
      • 附录与表格(如有)
        • 表格内容(假设):
      • 总结
      • Labs
      • 5 Meta
        • 5 Meta
      • Abstract
      • Abstract(摘要)
      • 3. Long-term Memory in Human Brain(人脑中的长期记忆)
        • 3.1 Human Memory Hierarchy(人类记忆层次)
        • 3.2 Human Memory Processing(人类记忆处理机制)
        • 3.3 Summary(小结)
      • 4. Long-term Memory of AI: on Storage Formats(AI长期记忆:存储格式)
        • 4.1 Non-Parametric Memory(非参数记忆)
        • 4.2 Parametric Memory(参数记忆)
        • 4.3 Summary(小结)
      • 5. Long-term Memory of AI: on Human Perspectives(AI长期记忆:人类视角)
        • 5.1 Episodic Memory(情景记忆)
        • 5.2 Semantic Memory(语义记忆)
        • 5.3 Procedural Memory(程序性记忆)
        • 5.4 Summary(小结)
      • 6. A New Cognitive Architecture for Long-term Memory(新的长期记忆认知架构)
        • 6.1 Cognitive Architecture of Self-Adaptive Long-term Memory (SALM)
      • 7. Next Steps of AI Long-term Memory(AI长期记忆的未来方向)
        • 7.1 Measures of AI Long-term Memory(AI长期记忆的评估指标)
        • 7.2 Application of AI Long-term Memory(AI长期记忆的应用前景)
      • 总结
      • 1 Introduction
      • 1 引言(Introduction)
        • 核心观点:
        • 人类记忆对AI的启发:
        • 研究空白与本文贡献:
        • 文章结构概览:
      • 2 Research Background and Methodologies
        • 2 研究背景与方法
        • 总结
      • 3 Long-term Memory in Human Brain
        • 第三章:人脑中的长期记忆
        • 图表与数据说明
        • 重点总结
        • 数学与算法要点
        • 总结
      • 4 Long-term Memory of AI: on Storage Formats
      • 第4章:AI的长期记忆:存储形式
        • 概述
      • 4.1 非参数记忆(Non-Parametric Memory)
        • 4.1.1 存储方式
        • 4.1.2 检索方法
        • 4.1.3 遗忘机制
      • 4.2 参数记忆(Parametric Memory)
        • 4.2.1 存储机制
        • 4.2.2 检索机制
        • 4.2.3 遗忘机制
      • 4.3 总结
        • 与人类长期记忆的相似性(见图5):
        • 总体结论
      • 5 Long-term Memory of AI: on Human Perspectives
      • 5 人工智能的长期记忆:从人类视角出发
        • 5.1 情景记忆(Episodic Memory)
        • 5.2 语义记忆(Semantic Memory)
        • 5.3 程序记忆(Procedural Memory)
        • 5.4 总结(Summary)
      • 6 A New Cognitive Architecture for Long-term Memory
        • 6 面向长期记忆的新认知架构(A New Cognitive Architecture for Long-term Memory)
      • 7 Next Steps of AI Long-term Memory
        • 7 AI长期记忆的未来方向
        • 小结
      • 8 Conclusion
      • 8 总结(Conclusion)
        • 重点内容总结:
        • 数学公式、算法与数据:
    • 2501.00332_MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 核心内容讲解:
        • 小结:
      • 1 Introduction
        • 背景与问题
        • 解决方案:检索增强生成(RAG)
        • 问题与挑战
        • 提出的方法:MAIN-RAG
        • 主要贡献
        • 总结
      • 2 Preliminaries
        • 2.1 符号与目标(Notations and Objectives)
        • 2.2 噪声检索文档的影响(Impact of Noisy Retrieval Documents)
        • 2.3 相关工作(Related Works)
      • 3 Multi-Agent Filtering RAG (MAIN-RAG)
        • 3.1 MAIN-RAG 中 LLM 智能体的定义
        • 3.2 相关性判断的量化
        • 3.3 自适应判断阈值 τ_q
        • 总结要点
      • 4 Experiments
        • 4.1 任务与数据集
        • 4.2 基线模型
        • 4.3 实验设置
        • 4.4 定量分析(RQ1)
        • 4.5 自适应判断阈值 τ_q 的消融实验(RQ2)
        • 4.6 τ_q 的案例研究(RQ3)
        • 总结
      • 5 Conclusion and Future Work
        • 主要结论
        • 未来工作
        • 总结
      • 6 Limitations
        • 实验范围的限制
        • 环境影响的考量
        • 总结
      • Appendix A Computation Infrastructure
      • 附录A 计算基础设施
      • Appendix B Performance Comparison among MAIN-RAG and Its Variant Baselines
        • 核心结论:
        • 关键分析:
        • 图表支持:
        • 总结:
      • Appendix C System Instructions of Agent-1 (Predictor), Agent-2 (Judge), and Agent-3 (Final-Predictor)
        • Agent-1(预测器)的系统指令
        • Agent-2(评判器)的系统指令
        • Agent-3(最终预测器)的系统指令
        • 图 11:三个 Agent 的系统指令图示
        • 总结
      • Appendix D Case Studies of Different Adaptive Judge Bar τqsubscript𝜏𝑞\tau_{q}italic_τ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT in MAIN-RAG
        • 案例研究 1(高 τq)
        • 案例研究 2(低 τq)
        • 案例研究 3(中等 τq)
        • 图 12 - 15:不同数据集与 LLM 的案例对比
        • 总结
    • 2503.09149_MemVid: Memory-enhanced Retrieval Augmentation for Long Video Understanding
      • From Moonlight
        • 三行摘要
        • 关键词
        • 摘要
      • Abstract
      • 1. Introduction
      • 1. 引言(Introduction)总结
        • 贡献总结:
      • 2. Related Work
      • 2. Related Work
        • 2.1. 大型视觉-语言模型(Large Vision-language Models)
        • 2.2. 长视频视觉-语言模型(Long Large Vision-language Models)
        • 2.3. 基于检索增强的视频理解(Retrieval-augmented Video Understanding)
      • 3. Methodology
        • 3. Methodology
        • 总结
      • 4. Experiments
      • 4. 实验总结
        • 4.1. 实验设置
        • 4.2. 总体结果
        • 4.3. 消融实验
        • 4.4. 泛化性分析
        • 4.5. 效率分析
        • 4.6. 案例分析
      • 总结
      • 5. Conclusion
      • 5. 结论
    • 2505.02099_MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based Agents
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 研究背景
        • 研究现状
        • 本文贡献
        • 项目开源
        • 重点内容
        • 总结
      • 1. Introduction
      • 1. 引言(Introduction)
        • 记忆模块的重要性
        • 现有研究的不足
        • MemEngine:一个统一且模块化的记忆库
        • 总结
      • 2. Comparison with Relevant Libraries
      • 2. 与相关库的比较
        • 已有库分类
        • MemEngine 的优势
        • 对比表格详解(Table 1)
      • 3. MemEngine Library
      • 3. MemEngine Library
        • 3.1. Overview(概述)
        • 3.2. Memory Models(记忆模型)
        • 3.3. Memory Operations(记忆操作)
        • 3.4. Memory Functions(记忆功能)
        • 3.5. Memory Configurations(记忆配置)
        • 3.6. Memory Utilities(记忆工具)
      • 总结
      • 4. Usage of MemEngine
      • 4. MemEngine 的使用方式
        • 4.1 使用预实现的记忆模型
        • 4.2 定制新的记忆模型
      • 5. Conclusion
        • 5. 结论
        • 致谢
    • 2505.11271_Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models
      • Abstract
        • 重点内容强调
        • 补充信息
      • 1 Introduction
        • 1.1 现代大语言模型(LLMs)的应用与挑战
        • 1.2 链式流程中的中间输出与缓存机会
        • 1.3 现有优化方法与语义缓存
        • 1.4 语义缓存的应用场景
        • 1.5 本文的贡献
        • 1.6 实验结果与结论
        • 1.7 实际意义与价值
      • 2 Related Work
      • 2 相关工作
        • 2.1 提示缓存(Prompt Caching,基于KV的方法)
        • 2.2 语义缓存(Semantic Caching)
        • 2.3 其他缓存方法
        • 2.4 本文方法与现有方法的比较
        • 总结
      • 3 System design and Methodology
      • 3 系统设计与方法论
        • 3.1 观察与系统设计
        • 3.2 我们的语义缓存方法
      • 4 Experimental setup
      • 4 实验设计
        • 4.1 模拟设计
        • 4.2 数据集
        • 4.3 问题之间的相似性
        • 4.4 摘要
        • 4.5 评估指标
      • 5 Results and discussion
      • 5 实验结果与讨论
        • 5.1 检索方法的比较分析
        • 5.2 延迟细节
        • 5.3 不同相似度阈值与摘要长度的影响
        • 5.4 选择相似度阈值:效用与缓存命中率的权衡
        • 5.5 影响回答生成的因素
        • 5.6 对现实系统的影响
        • 5.7 挑战与限制
        • 总结
      • 6 Conclusion and future work
      • 6 结论与未来工作
        • 技术增强
        • 可扩展性与实际部署
        • 隐私问题
        • 更广泛的应用
    • 2505.13308_Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 1 Introduction
      • 1 引言(Introduction)
        • 1.1 大型语言模型(LLMs)的推理挑战
        • 1.2 现有改进方法及其局限性
        • 1.3 提出的替代方法:测试时实例级适应(TTIA)
        • 1.4 现有TTIA方法的局限性
        • 1.5 本文贡献:LatentSeek 框架
        • 1.6 实验结果与性能提升
        • 总结:
      • 2 Test-Time Instance-Level Policy Gradient in Latent Space
      • 2 测试时实例级潜在空间策略梯度
        • 2.1 问题定义:测试时实例级推理
        • 2.2 潜在空间中的策略梯度推理
        • 2.3 LatentSeek 算法
        • 总结
      • 3 Empirical Results
        • 3. Empirical Results 总结
        • 3.1 Experimental Setup(实验设置)
        • 3.2 State-of-the-art Test-time Reasoning Performance(测试时推理性能)
        • 3.3 Ideal Experiment: Perfect Sparse Reward Model(理想实验:完美稀疏奖励模型)
        • 3.4 Test-Time Scaling: scaling up the iteration of LatentSeek(测试时扩展:增加 LatentSeek 迭代次数)
        • 3.5 Algorithmic Statistics(算法统计)
        • 3.6 Qualitative Analysis(定性分析)
        • 总结
      • 4 Related Work
      • 4 相关工作(Related Work)总结
        • 一、语言模型的推理能力(Reasoning in Language Models)
        • 二、语言模型的强化学习(Reinforcement Learning for Language Models)
        • 三、可控生成与测试时优化(Controllable Generation and Test-Time Optimization)
        • 四、提示调优与软提示(Prompt Tuning and Soft Prompt)
        • 总体总结:
      • 5 Conclusion
      • 5 结论
        • 主要内容:
        • 总结:
      • Acknowledgement
      • Acknowledgement(致谢)
      • Appendix A Discussion and future works
      • A. 讨论与未来工作
        • Reward Models(奖励模型)
        • Latent Optimization(潜在空间优化)
        • Large Base Model(大基础模型)
      • Appendix B Methods of Test-Time Instance-Level Reasoning
        • 附录 B 测试时实例级推理方法
        • 总结
      • Appendix C Theoretical Analysis
      • 附录 C 理论分析总结
        • C.1 预备知识:多证明者交互证明与 NEXP
        • C.2 理论分析:独立更新
        • C.3 定理 C.10 与推论 C.11 的证明
      • 总结
      • Appendix D Derivation of Policy Gradient
      • 附录 D 策略梯度的推导
        • 1. 初始目标函数
        • 2. 对 \(\mathbf{z}\) 求梯度
        • 3. 利用对数导数技巧
        • 4. 利用策略的分解形式
        • 5. 得到最终结果
        • 总结
      • Appendix E Additional Experimental Results
        • 附录 E:更多实验结果总结
        • 总结
      • Appendix F Experimental Details
    • 附录 F 实验细节总结
      • F.1 提示设计
      • F.2 模型主干
      • F.3 基线方法
      • F.4 GSM8K实验
        • 数据集
        • 实验细节
      • F.5 MATH-500实验
        • 数据集
        • 实验细节
      • F.6 AIME2024实验
        • 数据集
        • 实验细节
      • 评估提示模板
      • 计算量估计
      • Appendix G Detailed FLOPs Calculation
      • 附录 G:详细 FLOPs 计算总结
        • G.1 前向传播 FLOPs 估算
        • G.2 Genius 方法的总 FLOPs
        • G.3 LatentSeek 方法的总 FLOPs
        • G.4 效率阈值分析
        • 总结
      • Appendix H Qualitative Analysis and Case Studies
      • 附录 H 定性分析与案例研究(Qualitative Analysis and Case Studies)
        • 1. 生成序列的词云分析(Wordclouds of the First Three Words)
        • 2. 案例研究(Case Studies)
        • 关键发现总结
        • 总结
      • Appendix I Computational Resources
      • 附录I 计算资源
      • Appendix J The Use of Large Language Models (LLMs)
        • 附录 J:大语言模型(LLMs)的使用
    • 2506.22815_Memory as a Service (MaaS): Rethinking Contextual Memory as Service-Oriented Modules for Collaborative Agents
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 1 Introduction
      • 2 Related Works
        • 2.1 个体内部内存的持久性
        • 2.2 跨实体内存共享
        • 总结
      • 3 MaaS: A Service-Oriented Memory Perspective
        • 3.1 Core Principles: From Local State to Callable Service
        • 3.2 The MaaS Architecture: Granting Public Service Capabilities to Private Memory
        • 高层实现架构(High-Level Implementation)
      • 4 MaaS Design Space and Application Scenarios
        • 4.1 内部实体(Intra-Entity)
        • 4.2 跨实体(Inter-Entity)
        • 4.3 群体级(Group-Level)
        • 总结
      • 5 Open Research Agenda
        • 5.1 公共维度带来的挑战:治理与协议(Challenges Arising from Public-Side: Governance and Protocols)
        • 5.2 私有维度带来的挑战:安全与信任(Challenges Arising from Privacy-Side: Security and Trust)
        • 5.3 交互涌现带来的挑战:生态系统与伦理(Challenges from Interaction Emergence: Ecosystem and Ethics)
        • 总结
      • 6 Conclusion: A Timely Perspective
    • 2506.24019_Ella: Embodied Social Agents with Lifelong Memory
      • Abstract
      • 1 Introduction
      • 1 引言(Introduction)总结
        • 研究背景与动机
        • 本文的贡献与方法
        • 本文核心贡献总结(重点内容)
        • 总结
      • 2 Related Work
      • 2 相关工作
        • 2.1 具身社交智能
        • 2.2 智能体记忆
        • 图2说明(Figure 2)
      • 3 Problem Setting
      • 3 问题设定
        • 1. 智能体与社交群组
        • 2. 智能体的初始知识
        • 3. 模拟环境与交互机制
        • 4. 控制评估与干预方式
        • 总结
      • 4 Ella: Embodied Lifelong Learning Agent
      • 4 Ella: Embodied Lifelong Learning Agent
        • 4.1 Name-centric Semantic Memory(名称中心语义记忆)
        • 4.2 Spatiotemporal Episodic Memory(时空情景记忆)
        • 4.3 Planning, Reaction, and Communication(规划、反应与通信)
        • 总结
      • 5 Experiments
      • 5 实验结果总结
        • 5.1 实验设置
        • 5.2 实验结果
        • 总结
      • 6 Limitations
      • 6 限制(Limitations)
        • Leverage the graph structure of the name-centric semantic memory.
        • Lifelong simulation of a community of agents in a visually rich, physics-realistic environment is computationally expensive.
        • All agents’ thinking processes are assumed to finish synchronously.
      • 7 Conclusion
      • 7 结论
      • Appendix A Broader Impact
      • Appendix A 更广泛的影响
      • Appendix B Additional Experiment Details
      • 附录 B 实验附加细节总结
        • B.1 虚拟社区 (Virtual Community)
        • B.2 计算资源 (Compute)
        • 总结要点:
      • Appendix C Additional Implementation Details
        • Appendix C Additional Implementation Details(附录C 额外的实现细节)
      • Appendix D Prompt Templates
      • Appendix D Prompt Templates
        • Figure 8: 生成日常计划的提示模板
        • Figure 9: 生成反应的提示模板
        • Figure 10: 生成语言输出的提示模板
        • Figure 11: 生成对话总结的提示模板
        • Figure 12: 从对话中提取知识的提示模板
        • 总体说明
    • 2507.10524_Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 1 Introduction
        • 背景与动机
        • 递归 Transformer 与挑战
        • MoR:统一框架
        • 概念与意义
        • 贡献总结(Contributions)
        • 总结
      • 2 Method
        • 2.1 Preliminary
        • 2.2 Mixture-of-Recursions (MoR)
        • 总结
      • 3 Experiments
        • 表格总结(Table 3)
        • 3.1 主要结果
        • 3.2 IsoFLOP 分析
        • 3.3 推理吞吐量评估
        • 总结
      • 4 Ablation Studies
        • 4.1 Parameter Sharing Strategies
        • 4.2 Routing Strategies
        • 4.3 KV Caching Strategies
        • 总结
      • 5 Analysis
        • 5.1 Compute-optimal Scaling Analysis
        • 5.2 Routing Analysis
        • 5.3 Test-time Scaling Analysis
        • 总结
      • 6 Related Work
        • Recursive Transformers(递归Transformer)
        • Adaptive Computation(自适应计算)
        • Routing Mechanism(路由机制)
        • Key-value Caching(键值缓存)
        • Latent Reasoning(隐式推理)
      • 7 Conclusion
        • 7.1 局限性与未来工作
        • 7.2 致谢
      • Appendix A Details of Design Choices for Mixture-of-Recursions
        • A.1 参数共享策略(Parameter-sharing Strategy)
        • A.2 路由策略(Routing Strategy)
        • A.3 KV 缓存策略(KV Caching Strategy)
        • 总结
      • Appendix B Experimental Setup
        • 训练设置
        • 评估设置
        • 模型架构细节
        • 表6:模型架构参数总结(重点)
      • Appendix C Expanded Results of IsoFLOP Analysis
        • 总体比较
        • Transformer的FLOPs近似计算
        • 带检查点复用的梯形学习率调度
        • 结果概览
      • Appendix D Details of Experimental Settings for Throughput Measurement
        • 实验系统与评估方法
        • 模型吞吐量对比设置
        • 批处理设置
        • 实现细节
      • Appendix E Expanded Results of Parameter Sharing Strategy
        • Middle-Cycle 是最稳定的选择
        • 持续预训练(up-training)下的表现
      • Appendix F Expanded Results of Design Choices for Router
        • F.1 设计配置细节
        • F.2 路由器性能评估指标
        • F.3 路由器设计的扩展评估结果
      • Appendix G Expanded Results of KV Cache Sharing Mechanism
        • G.1 递归 Transformer 中的关键值表示趋势
        • G.2 KV 缓存共享策略的性能比较
        • 总结
      • Appendix H Expanded Qualitative Results
        • H.1 Analysis on Adaptive Computation Paths
        • H.2 Analysis on Router Weights
        • 总结
    • 2509.08151_Trust Semantics Distillation for Collaborator Selection via Memory-Augmented Agentic AI
      • 第一章:引言(Introduction)
      • 第二章:相关工作(Related Work)
      • 第三章:方法论(Methodology)
        • 3.1 信任语义建模(Trust Semantics Modeling)
        • 3.2 记忆增强智能体架构(Memory-Augmented Agentic Architecture)
        • 3.3 信任语义蒸馏(Trust Semantics Distillation)
      • 第四章:实验与评估(Experiments and Evaluation)
      • 第五章:讨论(Discussion)
      • 第六章:结论与未来工作(Conclusion and Future Work)
      • Abstract
      • 摘要(Abstract)总结:
        • 核心问题:
        • 解决方案:
        • 实验结果:
        • 重点内容:
        • 非重点内容(简略):
      • I Introduction
      • I Introduction(引言)
        • 背景与动机
        • 协作伙伴选择的关键性
        • 信任评估的挑战
        • 本文贡献
      • II Agentic AI-Aided Teacher-Student Architecture for Trust Semantics Evaluation
      • II 基于智能体AI的师生架构用于信任语义评估
        • II-A LAM驱动的智能体AI用于信任语义评估
        • II-B 师生代理架构
        • 图表说明
        • 总结
      • III Task-Specific Trust Semantics Distillation
        • III 任务特定信任语义蒸馏(Task-Specific Trust Semantics Distillation)
        • 总结
      • IV Experimental Analysis
      • IV 实验分析(总结)
        • 1. 协作者评估时间(Collaborator Evaluation Time)
        • 2. 数据收集次数(Number of Data Collections)
        • 3. 协作者选择准确性(Collaborator Selection Accuracy)
        • 总结
      • V Future Directions
      • V 未来方向(Future Directions)
        • •  环境变化对动态信任的影响评估与缓解(Evaluation and Mitigation of Environmental Changes on Dynamic Trust)
        • •  预测性信任提炼(Predictive Trust Distillation)
        • •  数据缺失场景下的信任语义提取(Trust Semantics Extraction Under Data-Missing Scenarios)
      • VI Conclusion
      • VI 结论
        • 核心内容讲解:
        • 研究优势与创新点(重点内容):
        • 总结:
    • 2510.26493_The Context of Context Engineering
      • 总结
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
      • 1 Introduction
        • 背景与问题提出
        • 上下文工程的兴起
        • 对上下文工程的误解与历史回顾
        • 核心观点:上下文工程是“熵减”过程
        • 上下文工程的演进阶段
        • 论文贡献
        • 后续章节安排
      • 2 Theoretical Framework
        • 2.1 形式化定义(Formal Definition)
        • 2.2 阶段划分(Stage Characterization)
        • 总结
      • 3 Historical Evolution
        • 3.1 超过20年前:1.0时代
        • 3.2 20年后:2.0时代
      • 4 Context Collection and Storage
        • 设计考虑(Design Considerations)
        • 基本设计原则
        • 4.1 典型策略(Era 1.0 和 Era 2.0)
        • 4.2 人类级上下文生态系统(Era 3.0)
        • 总结
      • 5 Context Management
        • 5.1 文本上下文处理
        • 5.2 多模态上下文处理
        • 5.3 上下文组织
        • 5.4 上下文抽象
      • 6 Context Usage
        • 6.1 系统内上下文共享
        • 6.2 跨系统上下文共享
        • 6.3 上下文选择与理解
        • 6.4 主动用户需求推断
        • 6.5 终身上下文的保存与更新
        • 6.6 新兴工程实践
      • 7 Applications
        • 7.1 命令行工具(CLI)
        • 7.2 深度研究(Deep Research)
        • 7.3 脑机接口(Brain-Computer Interfaces)
      • 8 Challenges and Future Directions
        • 情境收集仍受限且效率低下
        • 大规模情境的存储与管理
        • 模型对情境的理解能力有限
        • 长文本情境处理的性能瓶颈
        • 相关情境的筛选问题
        • 数字存在(Digital Presence)
        • 总结
      • 9 Conclusion
    • 2511.21689_ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
      • 总结
      • From Blog
      • From Moonlight
        • 三句摘要
        • 关键词
        • 摘要
      • Abstract
        • 核心内容讲解:
      • 1 Introduction
        • 核心问题与背景
        • 现有方法的局限
        • 协调范式的核心思想
        • 协调器的实现挑战
        • ToolOrchestra 方法概述
        • 实验结果与贡献
        • 主要贡献总结
      • 2 Agentic Problem Formulation
        • 2.1 任务建模
        • 2.2 多轮交互流程
        • 总结
      • 3 ToolOrchestra
        • 3.1 统一工具调用接口
        • 3.2 端到端代理强化学习
        • 3.3 数据合成
        • 总结
      • 4 Experimental Setting
        • 4.1 工具(Tools)
        • 4.2 基线模型(Baselines)
        • 4.3 评估配置(Evaluation Configuration)
        • 4.4 训练配置(Training Configuration)
        • 表格 1:Orchestrator-8B 与基线模型对比
        • 总结
      • 5 Experimental Results
        • 1. 基线方法表现不佳
        • 2. 工具与模型结合提升性能
        • 3. Orchestrator-8B 表现突出
        • 4. 关键优势
        • 5. 结论
      • 6 Analysis
        • 6.1 工具使用分析(Tool Use Analysis)
        • 6.2 成本分析(Cost Analysis)
        • 6.3 泛化能力(Generalization)
        • 6.4 用户偏好(User Preferences)
        • 总结
      • 7 Related Work
        • 7.1 从工具学习到通用智能体(From Tool Learning to Generalist Agents)
        • 7.2 从工具使用的准确性到效率与可控性(From Tool-Use Accuracy to Efficiency and Controllability)
        • 总结
      • 8 Conclusion
        • 主要内容总结:
        • 1. 方法概述
        • 2. 核心贡献
        • 3. 实验结果
        • 4. 未来展望
        • 重点内容强调:
        • 数学/算法相关说明:
        • 总结:
      • Appendix A Pilot Study
        • 实验设置
        • 实验结果
        • 关键结论
        • 重点强调
      • Appendix B Evaluation Benchmarks
        • 1. Humanity’s Last Exam (HLE)
        • 2. FRAMES
        • 3. τ²-Bench(τ²-Bench)
      • Appendix C Model description for Qwen3-32B
        • 数学与定量推理
        • 科学领域知识
        • 逻辑推理能力
        • 人文学科知识
        • 编程与函数调用能力
        • 总体评价
      • Appendix D Tools in training
        • • Query Writer(查询生成器)
        • • Web Search(网络搜索)
        • • Local Search(本地搜索)
        • • Code Writer + Interpreter(代码生成与执行)
        • • Math Models(数学模型)
        • • Generalist Models(通用模型)
        • 总结
      • Appendix E Third-party API
        • 总结说明:
      • Appendix F Humane preference example
        • 内容总结:
        • 工具列表(Tools)
        • 偏好指令(Preference instruction, PIPI)
        • 偏好向量(Preference vector, PP)
        • 总结
      • Appendix G Use of LLMs Disclosure
      • Appendix H Generalization of pricing configurations
        • 实验设置与方法
        • 实验结果
        • 结论
      • Appendix I Data Synthesis
      • Appendix J Breakdown of ToolScale
        • 重点分析:
        • 总结:
      • Appendix K Data synthesis prompts and examples
        • 表6:生成领域主题的模型提示
        • 表7:生成数据库模式的模型提示
        • 表8:生成数据库条目的模型提示
        • 表9:验证数据库条目的模型提示
        • 表10:生成函数的模型提示
        • 表11:生成意图的模型提示
        • 表12:生成任务的模型提示
        • 表13:演化任务的模型提示
        • 表14:数据库模式示例
      • Appendix L Calculation of rewards for preference-aware benchmark
      • 附录 L 偏好感知基准奖励的计算
        • 奖励计算方法
        • 表格分析
        • 总结
  • 其他
    • 数据集&数据蒸馏
      • 1811.10959v3_Dataset Distillation
        • ABSTRACT
        • LLM总结
        • 1. INTRODUCTION
        • 3. APPROACH
      • 2502.20653_Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 7. Conclusion
      • 通用
        • Dataset distillation
    • 3D
      • 2003.08934_NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Neural Radiance Field Scene Representation
        • 4. Volume Rendering with Radiance Fields
        • 5. Optimizing a Neural Radiance Field
        • 6. Result
        • 7. Conclusion
      • 2203.08586: Deep vanishing point detection: Geometric priors make dataset variations vanish
        • 概念
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Geometric priors for VP detection
        • 4. Experiments
        • 5. Conclusion and limitations
      • 2312.14132_DUSt3R: Geometric 3D Vision Made Easy
        • 关键词
        • 相关概念
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Method
        • 4. Experiments with DUSt3R
        • 5. Conclusion
        • Appendix A 附录概览
        • Appendix B. Qualitative results
        • Appendix C. Extended Related Work
        • Appendix D. 多视角姿态估计(Multi-view Pose Estimation)
        • Appendix E. 视觉定位(Visual Localization)
        • Appendix F. Training details
      • 2406.09756_MASt3R: Grounding Image Matching in 3D with MASt3R
        • 前言
        • Abstract
        • 1. Introduction
        • 🧠 思维导图式总结
        • 2. Related works
        • 🧠 总结思维导图
        • 3. Method
        • 4. Experimental results
        • 5. Conclusion
        • Appendix
        • Appendix A Additional Qualitative Results
        • B. Fast Reciprocal Matching
        • C. Coarse-to-Fine
        • D. Detailed experimental settings
      • 2412.09401_SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos
        • 术语
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Method
        • 4. Experiments
        • 5. Conclusion
        • 6. 致谢
        • Appendix
        • Appendix A Implementation details
        • Appendix B Details for experimental settings
        • Appendix C Additional comparisons and analyses
        • D. More visual results
      • 2412.12392_MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
        • GPT
        • 先验知识
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Method
        • 4. Results
        • 5. Limitations and Future Work(局限与未来工作)
        • 🧾 6. Conclusion(总结)
        • 🧠 总结一句话版:
        • 8. Initialisation(初始化)
        • 9. Runtime Breakdown(运行时分析)
        • 10. Evaluation Setup(评估设置)
        • 11. EuRoC 结果总结
      • 2503.11651_VGGT: Visual Geometry Grounded Transformer
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Method
        • 4. Experiments
        • 5. Discussions
        • 6. Conclusions
        • Appendix A Formal Definitions
        • Appendix B Implementation Details
        • Appendix C Additional Experiments
        • Appendix D Qualitative Examples
        • Appendix E Related Work
    • 其他
      • 2204.00598_SocraticModels: Composing Zero-Shot Multimodal Reasoning with Language
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Problem Setting, Background, and Related Work
        • 3 Socratic Models
        • 4 Evaluation: Methods and Results
        • 5 Applications: Methods and Demonstrations
        • 6 Discussion
        • Acknowledgments and Disclosure of Funding
        • Appendix A Overview
        • Appendix B Unsupervised Socratic Model Selection
        • Appendix C Additional Notes on Experiments
        • Appendix D Egocentric Perception Appendix
        • Appendix E Scaling Up Socratic Video Search
        • Appendix F Additional Notes on Robot Experiments
        • Appendix G Socratic Deductive Reasoning
        • Appendix H Broader Impact: Energy and Resource Consumption
      • A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
        • The Basic Idea Behind CRC Algorithms
        • Polynomical Arithmetic
        • Binary Arithmetic with No Carries
        • 一个可用的实例
        • Choosing A Poly
        • A Straightforward CRC Implementation
        • A Table-Driven Implementation
        • A Slightly Mangled Table-Driven Implementation
        • 参考
      • Distributed Representations of Sentences and Documents
新溪-gordon
  • Docs »
  • 机器学习 »
  • 2304.08485_Visual Instruction Tuning
  • View page source

主页

索引

模块索引

搜索页面

2304.08485_Visual Instruction Tuning¶

  • https://arxiv.org/abs/2304.08485

  • GitHub: https://github.com/haotian-liu/LLaVA

主页

索引

模块索引

搜索页面

Next Previous

© Copyright 2010-2025, 新溪-gordon.

备案号 京ICP备16018553号
Built with Sphinx using a theme provided by Read the Docs
.