新溪-gordon
V2025.07
  • 通用
    • 通用
      • 如何看一个论文是不是重要
    • 学术网站
      • 整体分析
      • 1. 学术搜索平台(核心功能:检索与发现文献)
        • Google Scholar
        • Semantic Scholar
        • Web of Science
        • 百度学术
      • 2. 资源共享平台(核心功能:免费获取付费文献)
        • Sci-Hub
        • Library Genesis (LibGen)
        • Unpaywall
      • 论文数据库(核心功能:存储与提供文献原文)
        • ACL Anthology
        • ArXiv
        • 知网 CNKI
        • 万方数据库
  • 评测基准
    • 评测基准
      • 02xx.xxxxx_BLEU: a Method for Automatic Evaluation of Machine Translation
        • 总结
        • Abstract
        • 示例讲解
        • 1. Introduction
        • 2.The Baseline BLEU Metric
        • 3.The BLEU Evaluation
        • 4.The Human Evaluation
        • 5.BLEU vs The Human Evaluation
        • 6.Conclusion
      • 0401.xxxxx_ROUGE: A Package for Automatic Evaluation of Summaries
        • 总结
        • Abstract
        • 1.Introduction
        • 2.ROUGE-N: N-gram Co-Occurrence Statistics
        • 3.ROUGE-L: Longest Common Subsequence
        • 4 ROUGE-W: Weighted Longest Common Subsequence
        • 5.ROUGE-S: Skip-Bigram Co-Occurrence Statistics
        • 6 Evaluations of ROUGE
        • 7 Conclusions
      • 1803.01937_ROUGE2.0: Updated and Improved Measures for Evaluation of Summarization Tasks
        • Abstract
        • 1. Problems with the current ROUGE measures
        • 2. ROUGE 2.0
      • 1804.08771_SacreBLEU: A Call for Clarity in Reporting BLEU Scores
        • BLEU
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Problem Description
        • 3 A way forward
        • 4 Summary
      • 2306.05685_Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 MT-Bench and Chatbot Arena
        • 3 LLM as a Judge
        • 4 Agreement Evaluation
        • 5 Human Preference Benchmark and Standardized Benchmark
        • 6 Discussion
        • 7 Conclusion
        • Appendix A Prompt templates
        • Appendix B Case Study
        • Appendix C Data Collection
        • Appendix D Additional Experimental Results
        • Appendix E Training Details of Vicuna Models
        • Appendix F Exploring Vicuna as a judge
    • 数据集-Agent
      • 2312.14033_T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
        • 总结
        • Abstract
        • 1 Introduction
        • 2 T-Eval
        • 3 Experiments
        • 4 Discussion
        • 5 Related Work
        • 6 Conclusion
        • Appendix A T-Eval Benchmark Details
        • Appendix B Implementation Details
        • Appendix C Detailed Evaluation Metrics
        • Appendix D API Documentation
      • 2406.12045_τ-bench: A Benchmark for Tool-Agent-User
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.τ-bench: A benchmark for T ool-A gent-U ser Interaction
        • 4. Benchmark Construction
        • 5.Experiments
        • 6.Disscussion
      • 2506.07982_𝜏²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 τ2-bench: Evaluating Agents in a Dual-Control Environment
        • 4 Experiments
        • 5 Conclusion
        • Broader Impact
        • Appendix
        • Appendix A Telecom Domain
        • Appendix B Verifying Original τ2-bench
        • Appendix C Prompts
        • Appendix D Domain Policies
        • Appendix E User Simulator Quality
    • 数据集-QA
      • 1809.09600_HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Data Collection
        • 3 Processing and Benchmark Settings
        • 4 Dataset Analysis
        • 5 Experiments
        • 6 Related Work
        • 7 Conclusions
        • Appendix A Data Collection Details
        • 附录A 数据收集细节
        • Appendix B Further Data Analysis
        • Appendix C Full Wiki Setting Details
      • 2109.07958_TruthfulQA: Measuring How Models Mimic Human Falsehoods
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 The TruthfulQA Benchmark
        • 3 Experiments
        • 4 Results
        • 5 Discussion
        • 6 Related Work
        • 7 Conclusion
        • 8 Ethics and Impact
        • Appendix A Additional examples from TruthfulQA
        • Appendix B Additional results
        • Appendix C Dataset construction
        • Appendix D Human evaluations
        • Appendix E Prompts
        • Appendix F Checking for data quality and disagreement
      • 2311.12022_GPQA: A Graduate-Level Google-Proof Q&A Benchmark
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Data Collection
        • 3.Dataset Analysis
        • 4.Baseline
        • 5.Related Work
        • 6.Limitations
        • 7.Conclusion
      • 2411.04368_SimpleQA: Measuring short-form factuality in large language models
        • Abstract
        • 1.Introduction
        • 2.Data Collection and Verification
        • 4.Measuring calibration
        • Appendix B Guessing strategy and F-score
    • 数据集-编程
      • 2107.03374_HumanEval: Evaluating Large Language Models Trained on Code
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Evaluation Framework
        • 3.Code Fine-Tuning
        • 4.Supervised Fine-Tuning
        • 5.Docstring Generation
        • 6.Limitations
        • 7.Broader Impacts and Hazard Analysis
        • 8.Related Work
        • 9.Conclusions
      • 2108.07732_MBPP: Program Synthesis with Large Language Models
        • Abstract
        • 1 Introduction
        • 2 Datasets
        • 3 Model and Methods
        • 4 MBPP Synthesis Results
        • 5 Human-Model Collaboration Results
        • 6 Program Execution Results
        • 7 MathQA Results
        • 8 Related Work
        • 9 Risks and Limitations
        • 10 Conclusion
        • Appendix A Appendix
      • 2310.06770_SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 SWE-bench
        • 3 SWE-Llama: Fine-tuning CodeLlama for SWE-bench
        • 4 Experimental Setup
        • 5 Results
        • 6 Related Work
        • 7 Discussion
        • 8 Ethics Statement
        • 9 Reproducibility Statement
        • Appendix
        • Appendix A Benchmark Details
        • Appendix B Additional Details on Training SWE-Llama
        • Appendix C Additional Results
        • Appendix D Additional Experimental Details
        • Appendix E Societal Impact
        • Appendix F In-depth Analysis of SWE-Llama Generations
      • 2402.16694_HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
        • A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
        • Abstract
        • 1.   Introduction
        • 2.   Related work
        • 3.   HumanEval-XL
        • 4.   Experiments
        • 5.   Conclusion
        • Acknowledgments
        • Appendix A Experiment Settings
        • Appendix B Comprehensive Experiment Results
      • 2403.07974_LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
        • 总结
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 Holistic Evaluation
        • 3 Benchmark Curation
        • 4 Experiment Setup
        • 5 Results
        • 6 Related Work
        • 7 Limitations
        • 8 Conclusion
        • Appendix A Dataset
        • Appendix B UI
        • Appendix C Experimental Setup
        • Appendix D Results
        • Appendix E Qualitative Examples
      • 2407.10499_CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Works
        • 3 CIBench
        • 4 Experiments
        • 5 Conclusion
        • Appendix A Dataset Details
        • Appendix B Construction Prompts and Rules
        • Appendix C Experiment Example Demo
        • Appendix D Subjective Visualization Evaluation
        • Appendix E Dataset Error Analysis
        • Appendix F Human Annotator
        • Appendix G Ethical Consideration
      • 2410.03859_SWE-bench-Multimodal: Do AI Systems Generalize to Visual Software Domains?
        • 总结
        • Abstract
        • 1 Introduction
        • 2 SWE-bench Multimodal
        • 3 Evaluating on SWE-bench M
        • 4 Results
        • 5 Related Work
        • 6 Conclusion
        • Appendix A Dataset
        • Appendix B Collection
        • Appendix C Experiments
        • Appendix D Human Validation
        • Appendix E Limitations
      • 2410.06992_SWE-Bench+: Enhanced Coding Benchmark for LLMs
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Robustness Analysis of SWE-Bench
        • 3 Building SWE-Bench+
        • 4 Robustness of SWE-Bench+
        • 5 Effectiveness-aware Evaluation
        • 6 Related Work
        • 7 Conclusion
      • 2501.01257_CodeForces: Benchmarking Competition-level Code Generation of LLMs on CodeForces
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 CodeForces Benchmark
        • 4 Evaluation on Existing LLMs
        • 5 Analysis Experiments
        • 6 Discussion
        • 7 Conclusion
        • 8 Ethical Statement
        • Appendix A Model Cards
        • Appendix B Decoding Hyperparameters
        • Appendix C Analysis of Our Elo Rating Calculation System
        • Appendix D Human-comparable Elo Rating
        • Appendix E Problem Demonstration
        • Appendix F Special Judge
    • 数据集-长文本
      • 2402.05136_LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K
        • 总结
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3 LV-Eval Benchmark
        • 4 Evaluation
        • Appendix
        • Appendix C Detailed Evaluation Results
        • Appendix D Detailed Ablation Results
      • 2402.17753_LoCoMo: Evaluating Very Long-Term Conversational Memory of LLM Agents
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Generative Pipeline for LoCoMo
        • 4 LoCoMo Evaluation Benchmark
        • 5 Experimental Setup
        • 6 Experimental Results
        • 7 Conclusion
        • 8 Limitations
        • 9 Broader Impacts
        • Appendix Overview
        • Appendix A Generative Pipeline for LoCoMo
        • Appendix B Dataset
        • Appendix C Experimental Setup
        • Appendix D Results
      • 2404.06654_RULER: What’s the Real Context Size of Your Long-Context Language Models?
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 The Ruler Benchmark
        • 4 Experiments & Results
        • 5 Task Error Analysis
        • 6 Model Analysis
        • 7 Conclusion
        • 8 Limitations
        • Appendix A Models
        • Appendix B Task Configurations
        • Appendix C Task Correlation Analysis
        • Appendix D Prompt Templates
        • Appendix E Passkey Retrieval and Vanilla NIAH Results
        • Appendix F Additional Results
      • 2407.11963_NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Tasks and Datasets
        • 4 Experiments
        • 4.1.5 Impact of Language_ Which Model Performs Better under the Bilingual Scenario_
        • 5 Conclusion and Future Work
        • Appendix A Evaluated Models
        • Appendix B NeedleBench Prompt Examples
        • Appendix C Error Analysis Examples
    • 数据集-数学
      • 2103.03874_MATH: Measuring Mathematical Problem Solving With the MATH Dataset
      • 2110.14168_GSM8K: Training Verifiers to Solve Math Word Problems
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Dataset
        • 3 Related Work
        • 4 Methods
        • 5 Additional Experiments
        • 6 Conclusion
        • Appendix A Dataset Details
        • Appendix B Hyperparameters
        • Appendix C Calculator Annotations
        • Appendix D Example Model Solutions
        • Appendix E Verifier Details
        • Appendix F Verifier Visualization
      • 2405.12209_MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
        • Abstract
        • 1 Introduction
        • 2 Methodology
        • 3 Experiments and Analysis
        • 4 Discussion
        • 5 Related Work
        • 6 Conclusion
        • 7 Limitations
        • 8 Ethical Considerations
        • Appendix A MathBench Statistics
        • Appendix B Detailed Experimental Results
        • Appendix C Extra Analysis
    • 数据集-图片
      • 2306.13394_MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 MME Evaluation Suite
        • 3 Experiments
        • 4 Analysis
        • 5 Conclusion
      • 2307.06281_MMBench: Is Your Multi-modal Model an All-around Player?
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 The construction of MMBench
        • 4 Evaluation Strategy
        • 5 Evaluation Results
        • 6 Conclusion
        • Appendix A More Details about the Data
        • Appendix B More Details on MMBench Construction
        • Appendix C More Details on LLM-based Choice Extraction
        • Appendix D Evaluation Settings and Results
      • 2307.16125_SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 SEED-Bench
        • 4 Evaluation Results
        • 5 Conclusion
      • 2311.12793_ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 ShareGPT4V Dataset
        • 4 ShareGPT4V-7B Model
        • 4.1 模型架构
        • 4.2 预训练
        • 4.3 监督微调(SFT)
        • 总结
        • 5 Experiments
        • 6 Conclusion
        • Appendix A Data Sources
        • Appendix B Caption Analysis
        • Appendix C Prompts
        • Appendix D Examples
      • 2506.18095_ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
        • 总结
        • Abstract
        • 1 Introduction
        • 2 ShareGPT-4o-Image
        • 3 Janus-4o: Fine-Tuning with ShareGPT-4o-Image
        • 4 Experiments
        • 5 conclusion
        • Appendix A Related Work
        • Appendix B Image Generation Categories
        • Appendix C Prompts for Generation
        • Appendix D Document Pipeline
        • Appendix E Ethical Considerations and Societal Impact
    • 数据集
      • 通用
        • 评测标准
        • 准确率(Accuracy)
        • 精确率(Precision, 精准率)
        • 召回率(Recall)
        • F1 Score
        • 可视化精度和召回率
      • 2009.03300_MMLU: Measuring Massive Multitask Language Understanding
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.A Multitask Test
        • 4.Experiments
        • 5.Discussion
        • 6.Conclusion
      • 2305.08322_C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
        • 总结
        • C-Eval_ A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
        • Abstract
        • 1 Introduction
        • 2 The C-Eval Evaluation Suite
        • 3 Experiment
        • 4 Related Work
        • 5 Discussion
        • Acknowledgement
        • Appendix A Author Contributions
        • Appendix B Detailed Stats of C-Eval
        • Appendix C Explanation Data Generation
        • Appendix D Evaluation Prompts
        • Appendix E Details of the models being evaluated
        • Appendix F Breakdown of Model Performance
        • Appendix G Option Bias
        • Appendix H Compute and Resources Used for Evaluation
      • 2306.09212_CMMLU: Measuring massive multitask language understanding in Chinese
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 CMMLU
        • 4 Experiments
        • Impact of model size on performance
        • 5 Conclusion
        • Appendix A Comparison to concurrent benchmarks
        • Appendix B CMMLU Subjects
        • Appendix C CMMLU Examples
        • Appendix D CMMLU Difficulty Distribution
        • Appendix E Emergent Ability shown in CMMLU subjects
        • Appendix F Models being Evaluated
        • Appendix G Strategies for Estimating Model Choices
        • Appendix H Regular expressions matching algorithmsl
        • Appendix I Correlation to other Benchmarks
        • Appendix J Breakdown of Model Performance
        • J.3 The effect of chain-of-thought prompt
      • 2307.15020_SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 SuperCLUE Benchmark
        • 4 Experiments
        • 5 Additional Analysis
        • 6 Conclusion
        • Appendix A Evaluation Process
        • Appendix B Capability Categories
      • 2311.12983_GAIA: a benchmark for General AI Assistants
        • 总结
        • Abstract
        • 1.Introduction
        • 2.Related work
        • 3.GAIA
        • 4.LLMs results on GAIA
        • 5.Discussion
        • 6.Limitations
        • Appendix A Extended related work
        • Appendix C Extended description of GAIA
        • Appendix D Extended description of our question design framework
      • 2404.07972_OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
        • 总结
        • Abstract
        • 1. Introduction
        • 2. OSWORLD Environment
        • 3. OSWORLD Benchmark
        • 4. Benchmarking LLM and VLM Agent Baselines
        • 5. Analysis
        • 6. Related Work
        • 7. Conclusion and Future Work
        • A. Details of OSWORLD Environment
        • C. Details of Baseline Methods
        • D. Examples of Qualitative Analysis
      • 2501.14249_HLE: Humanity’s Last Exam
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.Dataset
        • 4.Evaluation
        • 5.Discussion
  • LLM 模型
    • NLP 模型
      • 1810.04805_BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
        • 1 Introduction
        • 2 Related Work
        • 3 BERT
        • Appendix A Additional Details for BERT
      • 18xx_GPT1: Improving Language Understanding by Generative Pre-Training
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Framework
        • 4 Experiments
        • 5 Analysis
        • 6 Conclusion
        • 引文口碑
        • 要点解读
      • 19xx_GPT2: Language Models are Unsupervised Multitask Learners
        • The Illustrated GPT-2
        • 参考
      • 2012.00413_CPM: A Large-scale Generative Chinese Pre-trained Language Model
      • 2302.13971_LLaMA: Open and Efficient Foundation Language Models
      • 2307.09288_Llama 2: Open Foundation and Fine-Tuned Chat Models
      • 2309.16609_Qwen Technical Report
        • 1. Introduction
        • 2. Pretraining
        • 3. Alignment
        • 4. CODE-QWEN: SPECIALIZED MODEL FOR CODING
        • 5. MATH-QWEN: SPECIALIZED MODEL FOR MATHEMATICS REASONING
        • 6. Related Work
        • 7. Conclusion
        • A.1 MORE TRAINING DETAILS
        • A.2 EVALUATION
      • 2310.19341_Skywork: A More Open Bilingual Foundation Model
        • 总结
        • LLM 总结
        • Abstract
        • 1 Introduction
        • 2 Methodology
        • 3 Pre-training
        • 4 Evaluation
        • 5 Discussion
        • 6 Limitation
        • 7 Conclusion
        • Appendix A Details on GPT-7B vs. LLaMA-7B Experiment
        • Appendix B Preliminary Experiments on Distributed Training
        • Appendix C More Benchmark Results
        • Appendix D Details on LM Test Sets
      • 2401.14196_DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence
      • 2404.06395_MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
        • 5. Two Stage Pre-training Strategy
        • 6. Model
        • 7 MiniCPM Family
      • 2405.04434_DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
      • 2406.12793_ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
      • 2407.10671_Qwen2 Technical Report
        • Abstract
        • 1. Introduction
        • 2. Tokenizer & Model
        • 3. Pre-training
        • 4. Post-training
        • 5. Evaluation
        • 6. Conclusion
      • 2412.15115_Qwen2.5
        • Abstract
        • 1. Introduction
        • 2. Architecture and Tokenizer
        • 3. Pre-training
        • 4. Post-training
        • 5. Evaluation
        • 6. Conclusion
      • 2505.09388_Qwen3
        • Abstract
        • 1. Introduction
        • 2. Architecture
        • 3. Pre-training
        • 4. Post-training
        • 5. Conclusion
    • 多模态模型
      • 2112.15093_CTR: Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study
        • Abstract
        • 1. Introduction
        • 2. Preliminaries
        • 3. Datasets
        • 4. Baselines
        • 5. An Empirical Study
        • 6. Conclusions
        • Appendix A Details of PRAB
        • Appendix C Visualization of Failure Cases.
      • 2304.08485_LLaVA: Visual Instruction Tuning
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. GPT-assisted Visual Instruction Data Generation
        • 4. Visual Instruction Tuning
        • 5. Experiments
        • 6. Conclusion
      • 2308.12966_Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
        • Methodology
        • Training
        • Evaluation
        • B. Data Format Details of Training
      • 2310.03744_LLaVA2: Improved Baselines with Visual Instruction Tuning
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Approach
        • 4. Empirical Evaluation
        • 5. Open Problems in LMMs
        • 6. Conclusion
        • A. Implementation Details
        • B. Qualitative Results
      • 2312.07533_VILA: On Pre-training for Visual Language Models
        • Abstract
        • 1. Introduction
        • 2. Background
        • 3. On Pre-training for Visual Language Models
        • 4. Experiments
        • 5. Related Work
        • 6. Conclusion
      • 2403.05525_DeepSeek-VL: Towards Real-World Vision-Language Understanding
        • Abstract
      • 2408.01800_MiniCPM-V: A GPT-4V Level MLLM on Your Phone
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Model Architecture
        • 4. Training
        • 5. End-side Deployment
        • 6. Experiments
        • 7. Conclusion
      • 2409.17146_Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
        • Abstract
        • 1. Introduction
        • 2. Architecture
        • 3. Data
        • 4. Training
        • 5. Evaluation
        • 6. Ablations
        • Appendix A: Model Details
        • Appendix B: Training Details
        • Appendix C: Evaluation Results
        • Appendix D: Result Details
        • Appendix E Ablations Details
        • Appendix F Data Details
        • Appendix G Dataset Examples
        • Appendix H Related Work
      • 2410.13848_Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
        • 总结
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Janus: A Simple, Unified and Flexible Multimodal Framework
        • 4 Experiments
        • 5 Conclusion
        • Appendix
        • Appendix A Details of Semantic Tokenizer Mentioned in Ablation Study
        • Appendix B Additional Qualitative Results
      • 2411.00774_Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
        • Abstract
        • 1. Introduction
        • 2. Model
        • 3. Experience
        • 4. Conclusion and Future Work
      • 2412.04468_NVILA: Efficient Frontier Visual Language Models
        • Abstract
        • 1. Introduction
        • 2. Approach
        • 3. Experiments
        • 4. More Capabilities
        • 5. Related Work
        • 6. Conclusion
      • 2502.13923_Qwen2.5-VL
        • Abstract
        • 1. Introduction
        • 2. Approach
        • 3. Experiments
        • 4. Conclusion
      • 2503.20215_Qwen2.5-Omni Technical Report
        • Abstract
        • 1. Introduction
        • 2. Archtecture
        • 3 预训练
        • 4 后训练(Post-training)
        • 5. Evaluation
        • 6. Conclusion
      • 2506.13642_Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Stream-Omni
        • 4. Experiments
        • 5. Results and Analyses
        • 6. Conclusion
        • Appendix A Construction of InstructOmni
        • Appendix B Construction of SpokenVisIT
      • 2506.13642_Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Stream-Omni
        • 3.2.1 Data Construction
        • 4 Experiments
        • 5 Results and Analyses
        • 6 Conclusion
        • Limitations
        • Appendix A Construction of InstructOmni
        • Appendix B Construction of SpokenVisIT
        • Appendix C Case Study
    • LLM 音频
      • 2005.08100_Conformer: Convolution-augmented Transformer for Speech Recognition
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 Conformer Encoder
        • 3 Experiments
        • 4 Conclusion
      • 2106.07447_HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
        • 总结
        • LLM 总结
        • Abstract
        • I Introduction
        • II Method
        • III Related Work
        • IV Experimental Details
        • V Results
        • VI Conclusion
      • 2112.02418_YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
        • 关键概念
        • Abstract
        • 1. Introduction
        • 2. YourTTS Model
        • 3. Experiments
        • 4. Results and Discussion
        • 5. Zero-Shot Voice Conversion
        • 6. Speaker Adaptation
        • 7. Conclusions, limitations and future work
      • 2212.04356_whisper: Robust Speech Recognition via Large-Scale Weak Supervision
        • Abstract
        • 1. Introduction
        • 2. Approach
        • 3. Experiments
        • 4. Analysis and Ablations
        • 5. Related Work
        • 6. Limitations and Future Work
        • 7. Conclusions
        • A. Evaluation Datasets
        • B Compared Models
        • C. Text Standardization
      • 2301.02111_Vall-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Background: Speech Quantization
        • 4. VALL-E
        • 5. Experiments
        • 6. Conclusion, Limitations, and Future Work
      • 2303.03926_VALL-E_X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3 Cross-Lingual Codec Language Model
        • 4. VALL-E X Application
        • 5. Experiments
        • 6. Conclusion
        • A. Appendix
      • 2406.05370_VALL-E2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. VALL-E 2
        • 4. Experiments
        • 5. Conclusion
      • 2407.05407_CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
        • Abstract
        • 1. Instructions
        • 2. CosyVoice: A Scalable TTS model using Supervised Semantic Tokens
        • 3. Dataset
        • 4. Experimental Settings
        • 6. Conclusion
      • 2407.10759_Qwen2-Audio Technical Report
        • Abstract
        • 1. Introduction
        • 2. Methodology
        • 3. Experiments
        • 5. Conclusion
      • 2410.00037_Moshi: a speech-text foundation model for real-time dialogue
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.Model
        • 4. Datasets and Training
        • 5. Evaluation
        • 6.Safety
        • 7.Conclusion
      • 2412.10117_CosyVoice2: Scalable Streaming Speech Synthesis with Large Language Models
        • Abstract
        • 1. Instroduction
        • 2. CosyVoice 2
        • 3. Experimental Settings
        • 4. Experimental Results
        • 5. Conclusion
      • 2501.06282_MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
        • Abstract
        • 1.Instruction
        • 2.Related Work
        • 3.MinMo
        • 4.Experiments
        • 5.Conclusion
        • 6.Limitations
        • A. Prompts for Voice Understanding Tasks
      • 2505.02707_Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Voila: Voice-Language Foundation Models
        • 4. Experiments
        • 5. Conclusion
      • 2505.17589_CosyVoice3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
        • Abstract
        • 1.Introduction
        • 2.CosyVoice 3
        • 3.The Multilingual Data Pipeline
        • 4.Experimental Settings
        • 5.Experimental Results
        • 6.Conclusion
        • 7.Limitations
    • LLM 视频
      • 2301.12597_BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
        • Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
        • Abstract
        • 1 Introduction
        • 2 Related Work
        • 3 Method
        • 4 Experiment
        • 5 Limitation
        • 6 Conclusion
      • 2308.01390_OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
        • OpenFlamingo_ An Open-Source Framework for Training Large Autoregressive Vision-Language Models
        • Abstract
        • 1 Introduction
        • 2 Related work
        • 3 Approach
        • 4 Results
        • 5 Discussion
        • 6 Conclusion
        • Appendix A Extended results
        • Appendix B Additional notes on filtering MMC4
        • Appendix C Synthetic data prompt
        • Appendix D Image credits
    • LLM MoE
      • 2408.15664_AUXILIARY-LOSS-FREE LOAD BALANCING STRATEGY FOR MIXTURE-OF-EXPERTS
      • 2410.07490_MoDEM: Mixture of Domain Expert Models
    • 商业模型
      • 2303.08774_GPT-4 Technical Report
      • 2312.11805_Gemini: A Family of Highly Capable Multimodal Models
        • Abstract
        • 1. Introduction
        • 2. Model Architecture
        • 3. Training Infrastructure
        • 5. Evaluation
        • 6. Post-Training Models
        • 7. Responsible Deployment
        • 8. Discussion and Conclusion
      • 2403.05530_Gemini1.5: Unlocking multimodal understanding across millions of tokens of context
      • 2406.02430_Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
        • Abstract
        • 1 Introduction
        • 2 Method
        • 3 Experiments
        • 4 Model extensions
        • 5 Model applications, limitations, and safety
        • 6 Authors (alphabetical order)
        • 7 Acknowledgement
      • 2407.04675_Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
        • Abstract
        • 1 Introduction
        • 2 Motivation
        • 3 Methods
        • 4 Model and Evaluation
        • 5 Conclusion
        • Appendix A Appendix
      • 2503.20020_Gemini2: Gemini Robotics: Bringing AI into the Physical World
      • 2504.xxxxx_Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning
      • 2505.07062_Seed1.5-VL Technical Report
        • Seed1.5-VL Technical Report
        • Abstract
        • 1 Introduction
        • 2 Architecture
        • 3 Pre-training
        • 3.2 Training Recipe
        • 4 Post-training
        • 4.4 Hybrid Reinforcement Learning
        • 5 Training Infrastructure
        • 6 Evaluation
        • 6.1.3 Video Task Evaluation
        • 6.3.2 Comparison with State-of-the-arts
        • 7 Conclusion and Next Steps
        • 8 Contributions and Acknowledgments
        • 9 Qualitative examples
        • 9.7 Visual Reasoning_ Visual Pattern Recognition
        • 9.19 Failure Cases_ Combinatorial Search I
        • 10 Evaluation Details
        • DREAM-1K
  • LLM 周边技术
    • Framework
      • 1712.05889_Ray: A Distributed Framework for Emerging AI Applications
        • Abstract
        • 1. Introduction
        • 2. Motivation and Requirements
        • 3. Programming and Computation Model
        • 4. Architecture
        • 5. Evaluation
        • 6 Related Work
        • 7 Discussion and Experiences
        • 8. Conclusion
      • 1910.02054_DeepSpeed_ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
        • Abstract
        • 1. Extended Introduction
        • 2. Related Work
        • 3 Where Did All the Memory Go?
        • 4 ZeRO: Insights and Overview
        • 5 Deep Dive into ZeRO-DP
        • 6 Deep Dive into ZeRO-R
        • 7 Communication Analysis of ZeRO-DP
        • 8. Communication Analysis of ZeRO-R
        • 9. Step Towards 1 Trillion Parameters
        • 10. Implementation and Evaluation
        • 11. Concluding Remarks
      • PyTorch: An Imperative Style, High-Performance Deep Learning Library
      • Transformers: State-of-the-Art Natural Language Processing
      • 2210.XX_Ray v2 Architecture
        • Overview
        • Architecture Overview
        • Object Management
        • Task Management
        • Resource Management and Scheduling
        • Actor management
        • Global Control Service
        • Cluster Management
        • Appendix
      • 2309.06180_vLLM: Efficient Memory Management for Large Language Model Serving with PagedAttention
        • 总结
        • 1. Introduction
        • 2. Background
        • 3. Memory Challenges in LLM Serving
        • 4. Method
        • 5. Implementation
        • 6. Evaluation
        • 7. Ablation Studies
        • 10. Conclusion
    • 大模型调优
      • 2101.00190_Prefix-Tuning: Optimizing Continuous Prompts for Generation
      • 2103.10385_p-tuning: GPT Understands, Too
      • 2104.08691_Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning
      • 2106.09685_LoRA: Low-Rank Adaptation of Large Language Models
      • 2401.01335_Self-Play: Fine-Tuning Converts Weak Language Models to Strong Language Models
      • 2402.09353_DoRA: Weight-Decomposed Low-Rank Adaptation
      • 2402.12354_LoRA+: Efficient Low Rank Adaptation of Large Models
      • 2403.03507_GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
      • 2403.13372_LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
        • 竞争框架
        • 3. Efficient Fine-Tuning Techniques
        • 4 LlamaFactory Framework
        • 6 Conclusion and Future Work
    • 分布式模型
      • 1701.06538_MoE: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
      • 1806.03377_PipeDream: Fast and Efficient Pipeline Parallel DNN Training
        • Abstract
        • 1. Introduction
        • 2. Background & Related Work
        • 3. Parallel Training in PipeDream
        • 4. Implementation
        • 5. Evaluation
      • 1811.06965_GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
        • 收集
        • 1. Introduction
        • 2. The GPipe Library
        • 3. Performance Analyses
        • 4. Image Classification
        • 5. Massive Massively Multilingual Machine Translation
        • 6. Design Features and Trade-Offs
      • 1909.08053_Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
        • 收集
        • Abstract
        • 1. Introduction
        • 2. Background and Challenges
        • 3. Model Parallel Transformers
      • 19xx_PipeDream: Generalized Pipeline Parallelism for DNN Training
        • 收集
        • ABSTRACT
        • 1. Introduction
        • 2. BACKGROUND AND RELATED WORK
        • 3. 流水线并行(PIPELINE PARALLELISM)
        • 4. 实现
        • 6. 结论
      • 2006.09503_PipeDream-2BW: Memory-Efficient Pipeline-Parallel DNN Training
        • Abstract
      • 2006.15704_PyTorch Distributed: Experiences on Accelerating Data Parallel Training
      • 2006.16668_GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
      • 2104.04473_Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
        • Abstract
      • 2205.14135_FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
        • Abstract
        • 1. Introduction
        • 2 Background
        • 3. FLASHATTENTION: Algorithm, Analysis, and Extensions
        • 4. Experiments
        • 5. Limitations and Future Directions
        • Appendix A Related Work
        • Appendix B Algorithm Details
        • Appendix C Proofs
        • Appendix D Extension Details
      • 2307.08691_FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
        • Abstract
        • 1. Introduction
        • 2. Background
        • 3. FlashAttention-2: Algorithm, Parallelism, and Work Partitioning
        • 4. Empirical Validation
        • 5. Discussion and Future Directions
      • 通用
    • LLM 量化
      • 通用
        • 混合精度
        • 浮点数格式
        • weight-only quantization
      • 2110.02861_bitsandbytes: 8-bit Optimizers via Block-wise Quantization
        • Abstract
        • 1. Background
        • 2. 8-bit Optimizers
        • 3. 8-bit vs 32-bit Optimizer Performance for common Benchmarks
        • 4. Analysis
        • 5. Related Work
      • 2206.01861_ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
        • Abstract
        • 1. Introduction
        • 2. Relative Work
        • 3. Background and Challenges
        • 4. Methodology
        • 5. Results
        • 6. Conclusions
        • Appendix A Background
        • Appendix D Details about System Optimization
      • 2206.09557_LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
        • Abstract
        • 1. Instructions
        • 2. Background
        • 3. Design Methodology of LUT-GEMM
        • 4. Experimental results
        • 5. Accelerating Quantized OPT-175B
        • 6. Conclusion
        • Appendix A LLM Inference Latency Breakdown
        • Appendix B Detailed Implementation
      • 2208.07339_LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
        • 相关参考
        • Abstract
        • 1. Introduction
        • 2. Background
        • 3. Int8 Matrix Multiplication at Scale
        • 4. Emergent Large Magnitude Features in Transformers at Scale
        • 5. Related Work
        • 6. Discussion and Limitations
        • 7. Broader Impacts
        • 其他
      • 2209.05433_FP8: FP8 Formats For Deep Learning
        • Abstract
        • 1. Introduction
        • 2. Aspects of FP8 Usage in Deep Learning
        • 3. FP8 Binary Interchange Format
        • 示例讲解
        • 4. Empirical Results
        • 5. Conclusions
      • 2210.17323_GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Background
        • 4. The GPTQ Algorithm
        • 5. Experimental Validation
        • 6. Summary and Limitations
      • 2211.10438_SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
        • Abstract
        • 1. Introduction
        • 2. Preliminaries
        • 3. Review of Quantization Difficulty
        • 4. SmoothQuant
        • 5. Experiments
        • 6. Related Work
        • 7. Conclusion
        • Appendix A. Discussion on Weight-Only Quantization
      • 2305.14314_QLoRA: Efficient Finetuning of Quantized LLMs
        • 关键词
        • Abstract
        • 1. Introduction
        • 2. Background
        • 3. QLoRA Finetuning
        • 4. QLoRA vs. Standard Finetuning
        • 5. Pushing the Chatbot State-of-the-art with QLoRA
        • 6. Qualitative Analysis
        • 7. Related Work
        • 8. Limitations and Discussion
      • 2306.00978_AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. AWQ: Activation-aware Weight Quantization
        • 4. TinyChat: Mapping AWQ onto Edge Platforms
        • 5. Experiments
        • 6. Conclusion
      • 2309.05516_AutoRound: Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methodology
        • 4. Experiments
        • 5. Conclusion
    • LLM 安全
      • 2312.06674_Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
    • LLM强化学习
      • 1703.03864_Evolution Strategies: as a Scalable Alternative to Reinforcement Learning
      • 2504.02495_DeepSeek-GRM: Inference-Time Scaling for Generalist Reward Modeling
        • Abstract
        • 1. Introduction
        • 2. Preliminaries
        • 3. Self-Principled Critique Tuning (SPCT)
        • 4. Inference-Time Scaling with SPCT
        • 5. Results on Reward Modeling Benchmarks
        • 6. Related Work
        • 7. Conclusion and Future Work
        • A. Additional Related Work
        • B. Limitations and Future Directions
        • G. Prompt Templates
      • 2504.13958_ToolRL: Reward is All Tool Learning Needs
    • 其他
      • 2203.02155_Training language models to follow instructions with human feedback(InstructGPT)
        • Abstract
        • 1. Introduction
        • 2. Related work
        • 3. Methods and experimental details
        • 4. Results
        • 5. Discussion
        • Appendix A Additional prompt data details
        • Appendix B Additional human data collection details
        • Appendix C Additional model details
        • Appendix D Automatic evaluation details
      • 2305.20050_Let’s Verify Step by Step
        • 1. 研究背景
        • 2. 监督方法对比
        • 3. 核心发现
        • 总结
      • 2408.03314_Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
        • 1. Introduction
        • 3. How to Scale Test-Time Computation Optimally
        • 5. Scaling Test-Time Compute via Verifiers
        • 6. Refining the Proposal Distribution
        • 其他
      • 2412.14135_Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
        • FromGPT
        • 1. Introduction
        • 2. Background
        • 3. Policy Initialization
        • 4. Reward Design
        • 5. Search
        • 6. Learning
        • 7 Open-source o1 Project
        • 8. Future Directions
  • 机器学习
    • ML Vision
      • 1506.02640_You Only Look Once: Unified, Real-Time Object Detection
        • Abstract
      • 1612.08242_YOLO9000: Better, Faster, Stronger
        • Abstract
      • 1804.02767_YOLOv3
      • 2004.10934_YOLOv4: Optimal Speed and Accuracy of Object Detection
        • Abstract
      • 2205.00159_SVTR: Scene Text Recognition with a Single Visual Model
        • Abstract
        • 1. Introduction
        • 2. Method
        • 3. Experiments
        • 4. Conclusion
      • 2207.02696_YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
        • Abstract
      • Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
      • 2304.08485_Visual Instruction Tuning
      • 2402.13616_YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
        • Abstract
      • 2405.14458_YOLOv10: Real-Time End-to-End Object Detection
        • Abstract
      • 2411.15858_SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
        • 定义
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Methods
        • 4 Experiments
        • 5. Conclusion
        • 8. More detail of real-world datasets
    • ML
      • 2112.09332_WebGPT: Browser-assisted question-answering with human feedback
      • 2203.11147_GopherCite: Teaching language models to support answers with verified quotes
      • 2304.09848_Generative_Search: Evaluating Verifiability in Generative Search Engines
      • 2305.14251_FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
      • 2305.14627_ALCE: Enabling Large Language Models to Generate Text with Citations
        • NLI 在引用质量评估中的应用
        • 论文中用的prompt
      • 2307.02185_Citation: A Key to Building Responsible and Accountable Large Language Models
      • 2307.16883_HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
  • AI Agent
    • 通用 Agent
      • 2210.03629_ReAct
      • 2303.08268_Chat-with-the-Environment
        • 正文
      • 2303.11366_Reflexion: Language Agents with Verbal Reinforcement Learning
      • 2303.16434_TaskMatrix.AI
        • 大脑
        • 接口平台
        • API 选择器
      • 2304.03442_Generative-Agents
        • Generative Agent Architecture
      • 2307.07924_ChatDev: Communicative Agents for Software Development
      • 2308.00352_MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
      • 2308.04026_AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
      • 2308.08155_AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
      • 2308.10848_AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
        • 理念
      • 2310.06117_Step-Back: Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
      • 2402.18679_MetaGPT_DI: Data Interpreter: An LLM Agent For Data Science
        • INTRODUCTION
      • 2407.07061_IoA: Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
        • 2.1 OVERVIEW OF IOA
        • 2.2 ARCHITECTURE OF IOA
        • 2.3 KEY MECHANISMS
        • 2.5 Putting It All Together
      • 2408.08435_ADAS: Automated Design of Agentic Systems
        • Prompt
      • 2408.08435_ADAS: Automating Agentic Workflow Generation
        • Introduce
        • PRELIMINARY
      • 2410.17238_SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning
        • 1 Introduction
        • 2 Related Works
        • 3 Method
      • 2410.21012_FACT: Examining the Effectiveness of Iterative Context Rewriting for Multi-fact Retrieval
        • Introduce
      • 2504.01990_Advances and Challenges in Foundation Agents
      • 2506.12508_AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving
        • Abstract
        • 1.Introduction
        • 3.AgentOrchestra
        • 4.Experiments
    • 视觉 Agent&AIOS
      • 2108.03353_ Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Dataset Creation
        • 4. Model Design
        • 其它
      • 2209.08199_ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Problem Setting: Tasks and Metrics
        • 4. Data Annotation
        • 5. Dataset Analysis
        • 6. Experiments and Baselines
        • 7. Conclusion
        • 8. Limitations
        • 9. Ethical Considerations
        • A. Data Annotation Details
        • B. Data Examples
      • 2212.06817_RT-1: ROBOTICS TRANSFORMER FOR REAL-WORLD CONTROL AT SCALE
        • ABSTRACT
        • 1. Introduction
        • 2. Related Work
        • 3. Preliminaries
        • 4. System Overview
        • 5. RT-1: ROBOTICS TRANSFORMER
        • 6. EXPERIMENTS
        • 7. CONCLUSIONS, LIMITATIONS AND FUTURE WORK
        • B. MODEL CARD
        • C. MODEL AND DATA
        • D. EXPERIMENTS
      • 2312.13771_AppAgent: Multimodal Agents as Smartphone Users
        • 3.1 Environment and Action Space
        • 3.2 Exploration Phase
        • 3.3 Deployment Phase
      • 2401.10935_SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
        • Abstract
        • 1. Introduction
        • 2. Related work
        • 3. Approach
        • 4. ScreenSpot: A Grounding Benchmark
        • 5. Experiments
        • 6. Conclusion
        • Limitations
        • Ethical considerations
        • A. Details of SeeClick Pre-training
        • B ScreenSpot Annotation & Evaluation
        • C. Downstream Agent Tasks
      • 2402.04615_ScreenAI: A Vision-Language Model for UI and Infographics Understanding
        • Abstract
        • 1. Introduction
        • 2. Methodology
        • 3. Automatic data generation
        • 4. Data Mixtures
        • 5. Experiments and Results
        • 6. Conclusions
        • A Definitions of Metrics
        • B. Screen Schema Examples
        • C. Prompts For LLM Generated Content
        • D. Screen Navigation Generated Examples
        • F. ScreenQA Short Answers Generation
        • G. Complex Question Answering Datasets
        • H. New Benchmarks Repositories
      • 2402.07939_UFO: A UI-Focused Agent for Windows OS Interaction
        • Abstract
        • 1.Introduction
        • 2.Related Work
        • 3.The Design of UFO
        • 4.Experiment
        • 5.Limitations & Lessons Learned
        • 6.Conclusion
      • 2403.16971_AIOS: LLM Agent Operating System
        • Abstract
        • 1. Introduction
        • 2. The Architecture of AIOS
        • 3. AIOS Kernel
        • 4 Evaluation
        • Appendix E Discussion
      • 2406.01014_Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
      • 2411.02059_TableGPT2: A Large Multimodal Model with Tabular Data Integration
        • Abstract
      • 2501.11733_Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
        • Abstract
        • 1. Introduction
        • 2. Mobile-Agent-E
        • 3. Experiments
        • 4. Results
        • 5. Related Work
        • 6. Conclusion and Future Work
        • Appendix A Full Trajectory Comparison Example with Previous SOTA
        • Appendix B Error Recovery with Escalation to Manager
        • Appendix C Remaining Limitations
        • Appendix D All Tasks in Mobile-Eval-E Benchmark
        • Appendix E Atomic Operation Space
        • Appendix F Full list of Self-Evolved Shortcuts
        • Appendix G Full list of Self-Evolved Tips
      • 2501.12326_UI-TARS: Pioneering Automated GUI Interaction with Native Agents
        • Abstract
        • 1. Introduction
        • 2. Evolution Path of GUI Agents
        • 3. Core Capabilities of Native Agent Model
        • 4. UI-TARS
        • 5. Experiment
        • 6. Conclusion
      • 2502.14282_PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
        • Abstract
        • 1. Introduction
        • 2. PC-Agent
        • 3. Experiments
        • 4. Related Work
        • 5. Conclusion
      • 2504.14603_UFO2: The Desktop AgentOS
        • Abstract
        • 1.Introduction
        • 2.Background
        • 3.System Design of UFO2
        • 4.Picture-in-Picture Interface
        • 5.Implementation and Specialized Engineering Design
        • 6.Evaluation
        • 7.Discussion & Future Work
        • 8.Related Work
        • 9.Conclusion
    • 记忆
      • 2505.22101_MemOS: An Operating System for Memory-Augmented Generation (MAG) in LLM (Short Version)
        • 总结
        • Abstract
        • 1 Introduction
        • 2 Memory in Large Language Models
        • 3 MemOS Design Philosophy
        • 4 MemOS
        • 4.1 MemOS 中的记忆类型
        • 4.2 记忆立方体(MemCube):核心资源
        • 4.3 MemOS 架构
        • 4.4 系统执行流程
        • 总结
        • 5 Conclusion
    • Tools
      • 2205.00445_MRKL
      • 2302.04761_Toolformer: Language Models Can Teach Themselves to Use Tools
      • 2303.17580_HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
      • 2307.16789_ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
        • 总结
        • LLM总结
        • Abstract
        • 1 Introduction
        • 2 Dataset Construction
        • 3 Experiments
        • 4 Related Work
        • 5 Conclusion
        • Appendix
        • Appendix A Implementation Details
    • AGI
      • 1905.10985_AI-GA: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence
      • 2408.06292_The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
  • RAG
    • 2005.11401_Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
    • 2312.10997_Retrieval-Augmented Generation for Large Language Models: A Survey
      • II. Overview of RAG
        • II-A Naive RAG
        • II-B Advanced RAG
        • II-C Modular RAG
        • II-D RAG vs Fine-tuning
      • III. Retrieval
        • III-A Retrieval Source
        • III-B Indexing Optimization
        • III-C Query Optimization
        • III-D Embedding
        • III-E Adapter
      • IV. Generation
        • IV-A Context Curation
        • IV-B LLM Fine-tuning
      • V. Augmentation process in RAG
        • V-A Iterative Retrieval
        • V-B Recursive Retrieval
        • V-C Adaptive Retrieval
      • VI. Task and Evaluation
        • VI-A Downstream Task
        • VI-B Evaluation Target
        • VI-C Evaluation Aspects
        • VI-D Evaluation Benchmarks and Tools
      • VII. Discussion and Future Prospects
        • VII-A RAG vs Long Context
        • VII-B RAG Robustness
        • VII-C Hybrid Approaches
        • VII-D Scaling laws of RAG
        • VII-E Production-Ready RAG
        • VII-F Multi-modal RAG
    • 2401.15884_CRAG: Corrective Retrieval Augmented Generation
    • 2403.14403_Adaptive-RAG
    • 2404.16130_GraphRAG: From Local to Global: A GraphRAG Approach to Query-Focused Summarization
      • 总结
      • LLM 总结
      • Abstract
      • 1 Introduction
      • 2 Background
        • 2.1 RAG方法与系统
        • 2.2 知识图谱在LLM与RAG中的应用
        • 2.3 自适应基准测试
        • 2.4 RAG评估标准
      • 3 Methods
        • 3.1 GraphRAG 工作流程
        • 3.2 全局理解问题生成
        • 3.3 全局理解评估标准
        • 总结
      • 4 Analysis
        • 4.1 实验1
        • 4.2 实验2
        • 总结
      • 5 Results
        • 5.1 实验一:不同方法在摘要任务中的表现比较
        • 5.2 实验二:基于声明的指标评估
        • 总结
      • 6 Discussion
        • 6.1 评估方法的局限性
        • 6.2 未来工作
        • 更广泛的影响
      • 7 Conclusion
      • Appendix A Entity and Relationship Extraction Approach
        • 1. 实体与关系抽取方法
        • 2. 自我反思(Self-Reflection)技术
        • 3. 分块大小与抽取效果的关系
        • 4. 实验结果(图3)
        • 总结
      • Appendix B Example Community Detection
      • Appendix C Context Window Selection
      • Appendix D Example Answer Comparison
      • Appendix E System Prompts
        • E.1 实体实例生成(Element Instance Generation)
        • E.2 社区摘要生成(Community Summary Generation)
        • E.3 社区问题回答生成(Community Answer Generation)
        • E.4 全局问题回答生成(Global Answer Generation)
      • Appendix F Evaluation Prompts
        • F.1 Relative Assessment Prompt
        • F.2 Relative Assessment Metrics
      • Appendix G Statistical Analysis
        • 统计方法:
        • 主要结果总结:
        • 总体趋势:
        • 重要结论:
    • 2405.16506_GRAG: Graph Retrieval-Augmented Generation
    • 2406.13213_Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata
    • 2410.05779_LightRAG: Simple and Fast Retrieval-Augmented Generation
      • 总结
      • Abstract
      • 1 Introduction
      • 2 Retrieval-Augmented Generation
      • 3 The LightRAG Architecture
        • 一、LightRAG架构概述
        • 二、基于图的文本索引(Graph-based Text Indexing)
        • 三、双层检索范式(Dual-level Retrieval Paradigm)
        • 四、检索增强的答案生成(Retrieval-Augmented Answer Generation)
        • 五、复杂度分析
        • 总结
      • 4 Evaluation
        • 1. 实验设置(4.1 Experimental Settings)
        • 2. LightRAG 与现有 RAG 方法的对比(4.2 RQ1)
        • 3. 消融实验(4.3 RQ2)
        • 总结
        • 4.4 Case Study (RQ3)
        • 4.4 案例研究(RQ3)总结:
        • 4.5 模型成本与适应性分析(RQ4)总结:
        • 总体结论:
      • 5 Related Work
        • 第5章 相关工作(总结)
      • 6 Conclusion
      • 7 Appendix
    • 2410.10450_KBLaM: Knowledge Base augmented Language Model
      • Abstract
      • 1. Introduction
      • 2. Related work
      • 3. Background
        • Self-attention layer
      • 4. Augmenting LLM with the KB
        • Knowledge tokens
        • Rectangular Attention: Injecting knowledge token into prompt tokens
        • KB length generalization through attention score scaling
      • 5. KB instruction tuning
      • 6. EXPERIMENTS
        • 6.1 EXPERIMENT SETTING
        • 6.2 EXPERIMENT RESULTS
        • 总结亮点
      • 7. CONCLUSION
      • 8. LIMITATIONS AND FUTURE WORK
      • Appendix A Extended related work
      • Appendix B Ablation study
      • Appendix C Sample KB
      • SAMPLE Q&A
      • PROMPT
        • PROMPT FOR SYNTHETIC KB GENERATION
        • Prompt for open-ended Q&A generation
        • PROMPT FOR GPT EVALUATION OF OPEN-ENDED Q&A
        • PROMPT FOR LLAMA EVALUATION
        • QUESTION TEMPLATE
      • SAMPLE OUTPUT
        • SYNTHETIC KB
        • ENRON
    • 2504.03137_LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph
      • Abstract
      • Introduction
      • Related Work
        • LLM Prompt Engineering
        • KG-based LLM Reasoning
      • Preliminaries
        • 1. Knowledge Graph (KG)
        • 2. Anchor Entities
        • 3. Relation Link
        • 4. Reasoning Path
      • Methodology
        • Stage1: Reasoning Graph Retrieval
        • Stage2: Knowledge Embedding
        • Stage3: Knowledge Prompts Mixed Reasoning
      • Experiments
      • Conclusion
    • GraphRAG 官方文档
      • Indexing
        • > Indexing Architecture
        • > Indexing Dataflow
        • > Prompt Tuning
      • Query
  • 论文池
    • 2305.16300_Random-Access Infinite Context Length for Transformers
      • LLM 总结
        • 研究背景与动机
        • 核心问题
        • 主要贡献
        • 关键技术点
        • 实验结果
        • 意义与应用前景
        • 总结
      • Abstract
      • 1 Introduction
      • 2 Related Work
      • 3 Methodology
        • 总体思路
        • 方法详解
        • 位置编码处理
        • 与其他方法的对比
        • 总结
        • 3.3 Memory & Computation
      • 4 Experiments
        • 4.1 语言建模实验
        • 4.2 微调预训练模型
        • 总结
      • 5 Future Work
      • 6 Conclusion
      • Acknowledgment
      • Appendix A Grouped Softmax Example
      • Appendix B Dataset Description
      • Appendix C Number of Unique Retrieved Blocks
      • Appendix D Context Miss Token
      • Appendix E Positional Augmentation
      • Appendix F Additional Extensions and Details
        • 1. 掩码语言建模(Masked Language Modeling)
        • 2. 与 Flash Attention 的结合
        • 3. 检索块数量与块大小的权衡
        • 总结
      • Appendix G Offloading KV Cache to CPU
    • 2311.18743_AlignBench: Benchmarking Chinese Alignment of Large Language Models
      • 主要内容总结:
      • 总结:
      • Abstract
      • 1 Introduction
        • 1. 背景与挑战
        • 2. AlignBench的设计目标
        • 3. AlignBench的主要特点
        • 4. AlignBench的应用与成果
        • 5. 总体贡献
        • 6. 表格对比
      • 2 Dataset
      • 3 Methods
      • 4 Human Evaluation on AlignBench
        • 一、一致性评估(Agreement Evaluation)
        • 二、解释质量评估(Quality Evaluation)
        • 总结
      • 5 AlignBench: Benchmarking Results
      • 6 Related Work
      • 7 Conclusion
      • Appendix A Appendix
        • A.2 Prompts and Details of Methods
        • A.2 提示模板与方法细节
        • A.3 各维度表现
        • A.4 案例分析
        • 总结
        • 一、核心问题:参考材料缺失导致评估困难
        • 二、数学积分问题对比分析
        • 三、总结
    • 2401.15391_MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
      • 背景与动机
      • 贡献
      • 方法概览
      • 实验结果
      • 总结
      • Abstract
      • 1 Introduction
        • 主要内容总结如下:
        • 总结:
      • 2 RAG with multi-Hop queries
        • 2.1 RAG(检索增强生成)概述
        • 2.2 多跳查询(Multi-Hop Queries)
        • 2.3 评估指标
        • 总结
      • 3 A Benchmarking Dataset: MultiHop-RAG
        • 一、MultiHop-RAG 数据集构建流程
        • 二、MultiHop-RAG 数据集统计信息
        • 总结
      • 4 Benchmarking RAG system using MultiHop-RAG
        • 一、检索相关任务(Retrieval-related Task)
        • 二、生成相关任务(Generation-related Task)
        • 三、其他潜在改进方向(Other Use Cases)
        • 总结
      • 5 Related Work
      • 6 Conclusion
      • Limitations
      • Appendix A Appendix A: GPT-4 Prompts Used for Data Generation
      • Appendix B Appendix B: Dataset Examples
    • 2405.16506_GRAG: Graph Retrieval-Augmented Generation
      • Abstract
      • 1 Introduction
      • 2 Related Work
        • 2.1 Prompt Tuning
        • 2.2 LLMs在图相关任务中的应用
        • 2.3 图上的检索方法
      • 3 Problem Formalization
      • 4 Methodology
        • 概述
        • 4.1 文本子图检索
        • 文本子图索引(Indexing)
        • 文本子图排序(Ranking)
        • 文本子图软剪枝(Soft Pruning)
        • 总结
        • 4.2 Textual Graph Augmented Generation
        • 1. 文本视图(Text View of Textual Graphs)
        • 2. 图视图(Graph View of Textual Graphs)
        • 3. 生成阶段(Generation Phase)
        • 总结
      • 5 Experiments
        • 总结:第五章 实验部分
      • 6 Conclusion
      • 7 Limitations
      • Acknowledgments
      • Appendix A Appendix
        • 附录A 总结
        • 总结
    • 2407.01178_Memory3: Language Modeling with Explicit Memory
      • Language Modeling with Explicit Memory
        • 研究背景与动机:
        • 主要内容与方法:
        • 实验与结果:
        • 总结:
      • Abstract
        • 核心思想
        • Memory3 模型特点
        • 实验结果
        • 总结
      • 1 _ Introduction
      • 1.1.1 _ Retrieval-augmented Training
        • 1.1.1 | 基于检索的训练(Retrieval-augmented Training)
        • 1.1.2 | 稀疏计算(Sparse Computation)
        • 1.1.3 | 参数即记忆(Parameter as Memory)
        • 总结
      • 2 _ Memory Circuitry Theory
        • 核心概念总结:
        • 总体贡献:
      • Definition 2.
        • 1. 定义与核心概念:计算图、同构与知识(电路)
        • 2. 知识的实例
        • 3. 知识的外部化与记忆
        • 4. 结论与断言
        • 总结
      • Remark 1.
        • 1. 电路构造的关键性质
        • 2. 记忆增强 LLM 的形式化定义
        • 3. 写入代价与读取代价的权衡(记忆层次结构)
        • 4. 知识使用频率与记忆分配
        • 5. 图示与结论
        • 小结
      • 3 _ Design
        • 3 | Design
        • 3.1 | 推理过程
        • 3.2 | 写入与读取记忆
        • 总结
      • 3.3 _ Memory Sparsification and Storage
        • 一、显式记忆的存储挑战
        • 二、各维度的稀疏化策略
        • 三、压缩效果
        • 四、部署方式
        • 五、补充说明与建议
        • 总结
      • 3.4 _ Model Shape
        • 3.4 | 模型结构(Model Shape)
        • 3.5 | 训练设计(Training Designs)
        • 总结
      • 3.6 _ Two-stage Pretrain
        • 一、预训练的两个阶段
        • 二、对 continual train 的优化
        • 三、防止信息泄露
        • 总结
      • 4 _ Pretraining Data
        • 4.1 数据收集(Data Collection)
        • 4.2 数据过滤(Filtering)
        • 4.3 分词器(Tokenizer)
        • 4.4 知识库(Knowledge Base)
        • 总结
      • 5 _ Pretrain
        • 1. 预训练总体设计
        • 2. 训练设置(Set-up)
        • 3. 预热阶段(Warmup Stage)
        • 4. 持续训练阶段(Continual Train Stage)
        • 总结
      • 6 _ Fine-tuning and Alignment
        • 6.1 监督微调(Supervised Finetuning, SFT)
        • 6.2 直接偏好优化(Direct Preference Optimization, DPO)
      • 7 _ Evaluation
        • 7.1 通用能力评估
        • 7.2 对话能力评估
        • 7.3 幻觉与事实性评估
        • 7.4 专业任务评估
        • 总结
      • 7.5 _ Inference Speed
        • 主要内容总结如下:
        • 总结:
      • 8 _ Conclusion
      • Acknowledgement
      • Appendix A Cost Estimation
        • 模型参数设定
        • 隐式记忆(Implicit Memory)成本
        • 显式记忆(Explicit Memory)成本
        • 外部信息(External Information,如 RAG)成本
        • 综合比较
        • 拓展讨论
      • Appendix B Vector Compression
      • Appendix C Supplementary Evaluation Results
    • 2505.14683_Emerging Properties in Unified Multimodal Pretraining
      • LLM 总结
      • Abstract
      • 1 Introduction
        • 核心内容总结:
        • 总结:
      • 2 Model
        • 1. 模型架构概览
        • 2. 生成策略
        • 3. 模型细节
        • 4. 广义因果注意力(Generalized Causal Attention)
        • 5. Transformer结构选择与实验
        • 总结
      • 3 Data
        • 数据特点与目标
        • 数据来源与统计
        • 数据构建方法
        • 数据训练策略
        • 总结
      • 4 Training
        • 1. 多阶段训练策略
        • 2. 关键超参数调整
        • 总结
      • 5 Evaluation
      • 6 Emerging Properties
        • 1. 新兴属性的定义与研究背景
        • 2. 任务表现与训练阶段的关系
        • 3. 多模态特征的重要性
        • 4. 定性分析与生成质量提升
        • 5. 核心发现与结论
        • 总结
      • 7 Main Results
        • 7.1 图像理解
        • 7.2 图像生成
        • 7.3 图像编辑
        • 7.4 带有推理的生成/编辑
        • 7.5 世界建模
        • 总结
        • 7.6 More Qualitative Results
      • 8 Conclusion
      • 9 Acknowledgement
    • MemOS: A Memory OS for AI System
      • LLM 总结:
      • Abstract
      • 1 Introduction
        • 1. 背景与动机
        • 2. 现有方法的不足
        • 3. 四大典型挑战
        • 4. MemOS的提出与核心理念
        • 5. 总结与意义
      • 2 Memory in Large Language Models
        • 总结如下:
        • 一、记忆研究的四个阶段
        • 二、第一阶段:记忆定义与探索
        • 三、MemOS 的初步构想
        • 四、总结
        • 2.1 显式长期记忆的建立(Stage 1)
        • 2.2 人脑式记忆机制的引入(Stage 2)
        • 2.3 基于工具的记忆管理(Stage 3)
        • 2.4 系统化记忆治理(Stage 4)
        • 总结
      • 3 MemOS Design Philosophy
        • 一、MemOS 的愿景(3.1 Vision of MemOS)
        • 二、从传统操作系统到记忆操作系统(3.2 From Computer OS to Memory OS)
        • 三、总结
      • 4 Memory Modeling in MemOS
        • 4.1 内存类型与语义演化路径
        • 4.2 Memory Cube(MemCube):内存的核心资源单元
        • 总结
      • 5 Architecture of MemOS
        • 总结:MemOS 架构与执行流程
        • 总结
        • 5.5.1 MemGovernance(内存治理模块)
        • 5.5.2 MemVault(内存存储与路由基础设施)
        • 5.5.3 MemLoader 与 MemDumper(内存加载与导出模块)
        • 5.5.4 MemStore(内存存储与分发接口)
        • 总结
      • 6 Evaluation
        • 1. 整体系统评估(End-to-End Evaluation on LOCOMO)
        • 2. 内存检索评估(Evaluation of Memory Retrieval)
        • 3. KV缓存加速评估(Evaluation of KV-Based Memory Acceleration)
        • 总结
      • 7 MemOS for Architecture Innovation and Applications
        • 一、MemOS推动的架构创新
        • 二、MemOS的应用场景
        • 总结
      • 8 Conclusion
  • 其他
    • 数据集&数据蒸馏
      • 1811.10959v3_Dataset Distillation
        • ABSTRACT
        • LLM总结
        • 1. INTRODUCTION
        • 3. APPROACH
      • 2502.20653_Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 7. Conclusion
      • 通用
        • Dataset distillation
    • 3D
      • 2003.08934_NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Neural Radiance Field Scene Representation
        • 4. Volume Rendering with Radiance Fields
        • 5. Optimizing a Neural Radiance Field
        • 6. Result
        • 7. Conclusion
      • 2203.08586: Deep vanishing point detection: Geometric priors make dataset variations vanish
        • 概念
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Geometric priors for VP detection
        • 4. Experiments
        • 5. Conclusion and limitations
      • 2312.14132_DUSt3R: Geometric 3D Vision Made Easy
        • 关键词
        • 相关概念
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Method
        • 4. Experiments with DUSt3R
        • 5. Conclusion
        • Appendix A 附录概览
        • Appendix B. Qualitative results
        • Appendix C. Extended Related Work
        • Appendix D. 多视角姿态估计(Multi-view Pose Estimation)
        • Appendix E. 视觉定位(Visual Localization)
        • Appendix F. Training details
      • 2406.09756_MASt3R: Grounding Image Matching in 3D with MASt3R
        • 前言
        • Abstract
        • 1. Introduction
        • 🧠 思维导图式总结
        • 2. Related works
        • 🧠 总结思维导图
        • 3. Method
        • 4. Experimental results
        • 5. Conclusion
        • Appendix
        • Appendix A Additional Qualitative Results
        • B. Fast Reciprocal Matching
        • C. Coarse-to-Fine
        • D. Detailed experimental settings
      • 2412.09401_SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos
        • 术语
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Method
        • 4. Experiments
        • 5. Conclusion
        • 6. 致谢
        • Appendix
        • Appendix A Implementation details
        • Appendix B Details for experimental settings
        • Appendix C Additional comparisons and analyses
        • D. More visual results
      • 2412.12392_MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
        • GPT
        • 先验知识
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Method
        • 4. Results
        • 5. Limitations and Future Work(局限与未来工作)
        • 🧾 6. Conclusion(总结)
        • 🧠 总结一句话版:
        • 8. Initialisation(初始化)
        • 9. Runtime Breakdown(运行时分析)
        • 10. Evaluation Setup(评估设置)
        • 11. EuRoC 结果总结
      • 2503.11651_VGGT: Visual Geometry Grounded Transformer
        • Abstract
        • 1. Introduction
        • 2. Related Work
        • 3. Method
        • 4. Experiments
        • 5. Discussions
        • 6. Conclusions
        • Appendix A Formal Definitions
        • Appendix B Implementation Details
        • Appendix C Additional Experiments
        • Appendix D Qualitative Examples
        • Appendix E Related Work
    • 其他
      • A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
        • The Basic Idea Behind CRC Algorithms
        • Polynomical Arithmetic
        • Binary Arithmetic with No Carries
        • 一个可用的实例
        • Choosing A Poly
        • A Straightforward CRC Implementation
        • A Table-Driven Implementation
        • A Slightly Mangled Table-Driven Implementation
        • 参考
      • Distributed Representations of Sentences and Documents
新溪-gordon
  • Docs »
  • LLM 模型 »
  • 2410.07490_MoDEM: Mixture of Domain Expert Models
  • View page source

主页

索引

模块索引

搜索页面

2410.07490_MoDEM: Mixture of Domain Expert Models¶

  • https://arxiv.org/html/2410.07490v1

主页

索引

模块索引

搜索页面

Next Previous

© Copyright 2010-2025, 新溪-gordon.

备案号 京ICP备16018553号
Built with Sphinx using a theme provided by Read the Docs
.