新溪-gordon
V2025.07
通用
通用
如何看一个论文是不是重要
学术网站
整体分析
1. 学术搜索平台(核心功能:检索与发现文献)
Google Scholar
Semantic Scholar
Web of Science
百度学术
2. 资源共享平台(核心功能:免费获取付费文献)
Sci-Hub
Library Genesis (LibGen)
Unpaywall
论文数据库(核心功能:存储与提供文献原文)
ACL Anthology
ArXiv
知网 CNKI
万方数据库
评测基准
评测基准
02xx.xxxxx_BLEU: a Method for Automatic Evaluation of Machine Translation
总结
Abstract
示例讲解
1. Introduction
2.The Baseline BLEU Metric
3.The BLEU Evaluation
4.The Human Evaluation
5.BLEU vs The Human Evaluation
6.Conclusion
0401.xxxxx_ROUGE: A Package for Automatic Evaluation of Summaries
总结
Abstract
1.Introduction
2.ROUGE-N: N-gram Co-Occurrence Statistics
3.ROUGE-L: Longest Common Subsequence
4 ROUGE-W: Weighted Longest Common Subsequence
5.ROUGE-S: Skip-Bigram Co-Occurrence Statistics
6 Evaluations of ROUGE
7 Conclusions
1803.01937_ROUGE2.0: Updated and Improved Measures for Evaluation of Summarization Tasks
Abstract
1. Problems with the current ROUGE measures
2. ROUGE 2.0
1804.08771_SacreBLEU: A Call for Clarity in Reporting BLEU Scores
BLEU
总结
Abstract
1 Introduction
2 Problem Description
3 A way forward
4 Summary
2306.05685_Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
总结
LLM 总结
Abstract
1 Introduction
2 MT-Bench and Chatbot Arena
3 LLM as a Judge
4 Agreement Evaluation
5 Human Preference Benchmark and Standardized Benchmark
6 Discussion
7 Conclusion
Appendix A Prompt templates
Appendix B Case Study
Appendix C Data Collection
Appendix D Additional Experimental Results
Appendix E Training Details of Vicuna Models
Appendix F Exploring Vicuna as a judge
数据集-Agent
2312.14033_T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step
总结
Abstract
1 Introduction
2 T-Eval
3 Experiments
4 Discussion
5 Related Work
6 Conclusion
Appendix A T-Eval Benchmark Details
Appendix B Implementation Details
Appendix C Detailed Evaluation Metrics
Appendix D API Documentation
2406.12045_τ-bench: A Benchmark for Tool-Agent-User
总结
Abstract
1.Introduction
2.Related Work
3.τ-bench: A benchmark for T ool-A gent-U ser Interaction
4. Benchmark Construction
5.Experiments
6.Disscussion
2506.07982_𝜏²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
总结
LLM 总结
Abstract
1 Introduction
2 Related Work
3
τ
2
-bench: Evaluating Agents in a Dual-Control Environment
4 Experiments
5 Conclusion
Broader Impact
Appendix
Appendix A Telecom Domain
Appendix B Verifying Original
τ
2
-bench
Appendix C Prompts
Appendix D Domain Policies
Appendix E User Simulator Quality
数据集-QA
1809.09600_HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
总结
Abstract
1 Introduction
2 Data Collection
3 Processing and Benchmark Settings
4 Dataset Analysis
5 Experiments
6 Related Work
7 Conclusions
Appendix A Data Collection Details
附录A 数据收集细节
Appendix B Further Data Analysis
Appendix C Full Wiki Setting Details
2109.07958_TruthfulQA: Measuring How Models Mimic Human Falsehoods
总结
LLM 总结
Abstract
1 Introduction
2 The TruthfulQA Benchmark
3 Experiments
4 Results
5 Discussion
6 Related Work
7 Conclusion
8 Ethics and Impact
Appendix A Additional examples from TruthfulQA
Appendix B Additional results
Appendix C Dataset construction
Appendix D Human evaluations
Appendix E Prompts
Appendix F Checking for data quality and disagreement
2311.12022_GPQA: A Graduate-Level Google-Proof Q&A Benchmark
总结
Abstract
1.Introduction
2.Data Collection
3.Dataset Analysis
4.Baseline
5.Related Work
6.Limitations
7.Conclusion
2411.04368_SimpleQA: Measuring short-form factuality in large language models
Abstract
1.Introduction
2.Data Collection and Verification
4.Measuring calibration
Appendix B Guessing strategy and F-score
数据集-编程
2107.03374_HumanEval: Evaluating Large Language Models Trained on Code
总结
Abstract
1.Introduction
2.Evaluation Framework
3.Code Fine-Tuning
4.Supervised Fine-Tuning
5.Docstring Generation
6.Limitations
7.Broader Impacts and Hazard Analysis
8.Related Work
9.Conclusions
2108.07732_MBPP: Program Synthesis with Large Language Models
Abstract
1 Introduction
2 Datasets
3 Model and Methods
4 MBPP Synthesis Results
5 Human-Model Collaboration Results
6 Program Execution Results
7 MathQA Results
8 Related Work
9 Risks and Limitations
10 Conclusion
Appendix A Appendix
2310.06770_SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
总结
LLM 总结
Abstract
1 Introduction
2 SWE-bench
3 SWE-Llama: Fine-tuning CodeLlama for SWE-bench
4 Experimental Setup
5 Results
6 Related Work
7 Discussion
8 Ethics Statement
9 Reproducibility Statement
Appendix
Appendix A Benchmark Details
Appendix B Additional Details on Training SWE-Llama
Appendix C Additional Results
Appendix D Additional Experimental Details
Appendix E Societal Impact
Appendix F In-depth Analysis of SWE-Llama Generations
2402.16694_HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
Abstract
1. Introduction
2. Related work
3. HumanEval-XL
4. Experiments
5. Conclusion
Acknowledgments
Appendix A Experiment Settings
Appendix B Comprehensive Experiment Results
2403.07974_LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
总结
LLM总结
Abstract
1 Introduction
2 Holistic Evaluation
3 Benchmark Curation
4 Experiment Setup
5 Results
6 Related Work
7 Limitations
8 Conclusion
Appendix A Dataset
Appendix B UI
Appendix C Experimental Setup
Appendix D Results
Appendix E Qualitative Examples
2407.10499_CIBench: Evaluating Your LLMs with a Code Interpreter Plugin
总结
Abstract
1 Introduction
2 Related Works
3 CIBench
4 Experiments
5 Conclusion
Appendix A Dataset Details
Appendix B Construction Prompts and Rules
Appendix C Experiment Example Demo
Appendix D Subjective Visualization Evaluation
Appendix E Dataset Error Analysis
Appendix F Human Annotator
Appendix G Ethical Consideration
2410.03859_SWE-bench-Multimodal: Do AI Systems Generalize to Visual Software Domains?
总结
Abstract
1 Introduction
2 SWE-bench Multimodal
3 Evaluating on SWE-bench M
4 Results
5 Related Work
6 Conclusion
Appendix A Dataset
Appendix B Collection
Appendix C Experiments
Appendix D Human Validation
Appendix E Limitations
2410.06992_SWE-Bench+: Enhanced Coding Benchmark for LLMs
总结
Abstract
1 Introduction
2 Robustness Analysis of SWE-Bench
3 Building SWE-Bench+
4 Robustness of SWE-Bench+
5 Effectiveness-aware Evaluation
6 Related Work
7 Conclusion
2501.01257_CodeForces: Benchmarking Competition-level Code Generation of LLMs on CodeForces
总结
Abstract
1 Introduction
2 Related Work
3 CodeForces Benchmark
4 Evaluation on Existing LLMs
5 Analysis Experiments
6 Discussion
7 Conclusion
8 Ethical Statement
Appendix A Model Cards
Appendix B Decoding Hyperparameters
Appendix C Analysis of Our Elo Rating Calculation System
Appendix D Human-comparable Elo Rating
Appendix E Problem Demonstration
Appendix F Special Judge
数据集-长文本
2402.05136_LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K
总结
Abstract
1. Introduction
2. Related Work
3 LV-Eval Benchmark
4 Evaluation
Appendix
Appendix C Detailed Evaluation Results
Appendix D Detailed Ablation Results
2402.17753_LoCoMo: Evaluating Very Long-Term Conversational Memory of LLM Agents
总结
Abstract
1 Introduction
2 Related Work
3 Generative Pipeline for LoCoMo
4 LoCoMo Evaluation Benchmark
5 Experimental Setup
6 Experimental Results
7 Conclusion
8 Limitations
9 Broader Impacts
Appendix Overview
Appendix A Generative Pipeline for LoCoMo
Appendix B Dataset
Appendix C Experimental Setup
Appendix D Results
2404.06654_RULER: What’s the Real Context Size of Your Long-Context Language Models?
总结
LLM 总结
Abstract
1 Introduction
2 Related Work
3 The Ruler Benchmark
4 Experiments & Results
5 Task Error Analysis
6 Model Analysis
7 Conclusion
8 Limitations
Appendix A Models
Appendix B Task Configurations
Appendix C Task Correlation Analysis
Appendix D Prompt Templates
Appendix E Passkey Retrieval and Vanilla NIAH Results
Appendix F Additional Results
2407.11963_NeedleBench: Can LLMs Do Retrieval and Reasoning in Information-Dense Context
总结
Abstract
1 Introduction
2 Related Work
3 Tasks and Datasets
4 Experiments
4.1.5 Impact of Language_ Which Model Performs Better under the Bilingual Scenario_
5 Conclusion and Future Work
Appendix A Evaluated Models
Appendix B NeedleBench Prompt Examples
Appendix C Error Analysis Examples
数据集-数学
2103.03874_MATH: Measuring Mathematical Problem Solving With the MATH Dataset
2110.14168_GSM8K: Training Verifiers to Solve Math Word Problems
总结
LLM 总结
Abstract
1 Introduction
2 Dataset
3 Related Work
4 Methods
5 Additional Experiments
6 Conclusion
Appendix A Dataset Details
Appendix B Hyperparameters
Appendix C Calculator Annotations
Appendix D Example Model Solutions
Appendix E Verifier Details
Appendix F Verifier Visualization
2405.12209_MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark
Abstract
1 Introduction
2 Methodology
3 Experiments and Analysis
4 Discussion
5 Related Work
6 Conclusion
7 Limitations
8 Ethical Considerations
Appendix A MathBench Statistics
Appendix B Detailed Experimental Results
Appendix C Extra Analysis
数据集-图片
2306.13394_MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
总结
LLM 总结
Abstract
1 Introduction
2 MME Evaluation Suite
3 Experiments
4 Analysis
5 Conclusion
2307.06281_MMBench: Is Your Multi-modal Model an All-around Player?
总结
Abstract
1 Introduction
2 Related Work
3 The construction of MMBench
4 Evaluation Strategy
5 Evaluation Results
6 Conclusion
Appendix A More Details about the Data
Appendix B More Details on MMBench Construction
Appendix C More Details on LLM-based Choice Extraction
Appendix D Evaluation Settings and Results
2307.16125_SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
总结
LLM 总结
Abstract
1 Introduction
2 Related Work
3 SEED-Bench
4 Evaluation Results
5 Conclusion
2311.12793_ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
总结
LLM 总结
Abstract
1 Introduction
2 Related Work
3 ShareGPT4V Dataset
4 ShareGPT4V-7B Model
4.1 模型架构
4.2 预训练
4.3 监督微调(SFT)
总结
5 Experiments
6 Conclusion
Appendix A Data Sources
Appendix B Caption Analysis
Appendix C Prompts
Appendix D Examples
2506.18095_ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
总结
Abstract
1 Introduction
2 ShareGPT-4o-Image
3 Janus-4o: Fine-Tuning with ShareGPT-4o-Image
4 Experiments
5 conclusion
Appendix A Related Work
Appendix B Image Generation Categories
Appendix C Prompts for Generation
Appendix D Document Pipeline
Appendix E Ethical Considerations and Societal Impact
数据集
通用
评测标准
准确率(Accuracy)
精确率(Precision, 精准率)
召回率(Recall)
F1 Score
可视化精度和召回率
2009.03300_MMLU: Measuring Massive Multitask Language Understanding
总结
Abstract
1.Introduction
2.Related Work
3.A Multitask Test
4.Experiments
5.Discussion
6.Conclusion
2305.08322_C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
总结
C-Eval_ A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
Abstract
1 Introduction
2 The C-Eval Evaluation Suite
3 Experiment
4 Related Work
5 Discussion
Acknowledgement
Appendix A Author Contributions
Appendix B Detailed Stats of C-Eval
Appendix C Explanation Data Generation
Appendix D Evaluation Prompts
Appendix E Details of the models being evaluated
Appendix F Breakdown of Model Performance
Appendix G Option Bias
Appendix H Compute and Resources Used for Evaluation
2306.09212_CMMLU: Measuring massive multitask language understanding in Chinese
总结
Abstract
1 Introduction
2 Related Work
3 CMMLU
4 Experiments
Impact of model size on performance
5 Conclusion
Appendix A Comparison to concurrent benchmarks
Appendix B CMMLU Subjects
Appendix C CMMLU Examples
Appendix D CMMLU Difficulty Distribution
Appendix E Emergent Ability shown in CMMLU subjects
Appendix F Models being Evaluated
Appendix G Strategies for Estimating Model Choices
Appendix H Regular expressions matching algorithmsl
Appendix I Correlation to other Benchmarks
Appendix J Breakdown of Model Performance
J.3 The effect of chain-of-thought prompt
2307.15020_SuperCLUE: A Comprehensive Chinese Large Language Model Benchmark
总结
LLM 总结
Abstract
1 Introduction
2 Related Work
3 SuperCLUE Benchmark
4 Experiments
5 Additional Analysis
6 Conclusion
Appendix A Evaluation Process
Appendix B Capability Categories
2311.12983_GAIA: a benchmark for General AI Assistants
总结
Abstract
1.Introduction
2.Related work
3.GAIA
4.LLMs results on GAIA
5.Discussion
6.Limitations
Appendix A Extended related work
Appendix C Extended description of GAIA
Appendix D Extended description of our question design framework
2404.07972_OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
总结
Abstract
1. Introduction
2. OSWORLD Environment
3. OSWORLD Benchmark
4. Benchmarking LLM and VLM Agent Baselines
5. Analysis
6. Related Work
7. Conclusion and Future Work
A. Details of OSWORLD Environment
C. Details of Baseline Methods
D. Examples of Qualitative Analysis
2501.14249_HLE: Humanity’s Last Exam
Abstract
1.Introduction
2.Related Work
3.Dataset
4.Evaluation
5.Discussion
LLM 模型
NLP 模型
1810.04805_BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
1 Introduction
2 Related Work
3 BERT
Appendix A Additional Details for BERT
18xx_GPT1: Improving Language Understanding by Generative Pre-Training
Abstract
1. Introduction
2. Related Work
3. Framework
4 Experiments
5 Analysis
6 Conclusion
引文口碑
要点解读
19xx_GPT2: Language Models are Unsupervised Multitask Learners
The Illustrated GPT-2
参考
2012.00413_CPM: A Large-scale Generative Chinese Pre-trained Language Model
2302.13971_LLaMA: Open and Efficient Foundation Language Models
2307.09288_Llama 2: Open Foundation and Fine-Tuned Chat Models
2309.16609_Qwen Technical Report
1. Introduction
2. Pretraining
3. Alignment
4. CODE-QWEN: SPECIALIZED MODEL FOR CODING
5. MATH-QWEN: SPECIALIZED MODEL FOR MATHEMATICS REASONING
6. Related Work
7. Conclusion
A.1 MORE TRAINING DETAILS
A.2 EVALUATION
2310.19341_Skywork: A More Open Bilingual Foundation Model
总结
LLM 总结
Abstract
1 Introduction
2 Methodology
3 Pre-training
4 Evaluation
5 Discussion
6 Limitation
7 Conclusion
Appendix A Details on GPT-7B vs. LLaMA-7B Experiment
Appendix B Preliminary Experiments on Distributed Training
Appendix C More Benchmark Results
Appendix D Details on LM Test Sets
2401.14196_DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence
2404.06395_MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
5. Two Stage Pre-training Strategy
6. Model
7 MiniCPM Family
2405.04434_DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
2406.12793_ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
2407.10671_Qwen2 Technical Report
Abstract
1. Introduction
2. Tokenizer & Model
3. Pre-training
4. Post-training
5. Evaluation
6. Conclusion
2412.15115_Qwen2.5
Abstract
1. Introduction
2. Architecture and Tokenizer
3. Pre-training
4. Post-training
5. Evaluation
6. Conclusion
2505.09388_Qwen3
Abstract
1. Introduction
2. Architecture
3. Pre-training
4. Post-training
5. Conclusion
多模态模型
2112.15093_CTR: Benchmarking Chinese Text Recognition: Datasets, Baselines, and an Empirical Study
Abstract
1. Introduction
2. Preliminaries
3. Datasets
4. Baselines
5. An Empirical Study
6. Conclusions
Appendix A Details of PRAB
Appendix C Visualization of Failure Cases.
2304.08485_LLaVA: Visual Instruction Tuning
Abstract
1. Introduction
2. Related Work
3. GPT-assisted Visual Instruction Data Generation
4. Visual Instruction Tuning
5. Experiments
6. Conclusion
2308.12966_Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Methodology
Training
Evaluation
B. Data Format Details of Training
2310.03744_LLaVA2: Improved Baselines with Visual Instruction Tuning
Abstract
1. Introduction
2. Related Work
3. Approach
4. Empirical Evaluation
5. Open Problems in LMMs
6. Conclusion
A. Implementation Details
B. Qualitative Results
2312.07533_VILA: On Pre-training for Visual Language Models
Abstract
1. Introduction
2. Background
3. On Pre-training for Visual Language Models
4. Experiments
5. Related Work
6. Conclusion
2403.05525_DeepSeek-VL: Towards Real-World Vision-Language Understanding
Abstract
2408.01800_MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Abstract
1. Introduction
2. Related Work
3. Model Architecture
4. Training
5. End-side Deployment
6. Experiments
7. Conclusion
2409.17146_Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
Abstract
1. Introduction
2. Architecture
3. Data
4. Training
5. Evaluation
6. Ablations
Appendix A: Model Details
Appendix B: Training Details
Appendix C: Evaluation Results
Appendix D: Result Details
Appendix E Ablations Details
Appendix F Data Details
Appendix G Dataset Examples
Appendix H Related Work
2410.13848_Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
总结
LLM总结
Abstract
1 Introduction
2 Related Work
3 Janus: A Simple, Unified and Flexible Multimodal Framework
4 Experiments
5 Conclusion
Appendix
Appendix A Details of Semantic Tokenizer Mentioned in Ablation Study
Appendix B Additional Qualitative Results
2411.00774_Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Abstract
1. Introduction
2. Model
3. Experience
4. Conclusion and Future Work
2412.04468_NVILA: Efficient Frontier Visual Language Models
Abstract
1. Introduction
2. Approach
3. Experiments
4. More Capabilities
5. Related Work
6. Conclusion
2502.13923_Qwen2.5-VL
Abstract
1. Introduction
2. Approach
3. Experiments
4. Conclusion
2503.20215_Qwen2.5-Omni Technical Report
Abstract
1. Introduction
2. Archtecture
3 预训练
4 后训练(Post-training)
5. Evaluation
6. Conclusion
2506.13642_Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Abstract
1. Introduction
2. Related Work
3. Stream-Omni
4. Experiments
5. Results and Analyses
6. Conclusion
Appendix A Construction of InstructOmni
Appendix B Construction of SpokenVisIT
2506.13642_Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model
Abstract
1 Introduction
2 Related Work
3 Stream-Omni
3.2.1 Data Construction
4 Experiments
5 Results and Analyses
6 Conclusion
Limitations
Appendix A Construction of InstructOmni
Appendix B Construction of SpokenVisIT
Appendix C Case Study
LLM 音频
2005.08100_Conformer: Convolution-augmented Transformer for Speech Recognition
LLM总结
Abstract
1 Introduction
2 Conformer Encoder
3 Experiments
4 Conclusion
2106.07447_HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
总结
LLM 总结
Abstract
I Introduction
II Method
III Related Work
IV Experimental Details
V Results
VI Conclusion
2112.02418_YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
关键概念
Abstract
1. Introduction
2. YourTTS Model
3. Experiments
4. Results and Discussion
5. Zero-Shot Voice Conversion
6. Speaker Adaptation
7. Conclusions, limitations and future work
2212.04356_whisper: Robust Speech Recognition via Large-Scale Weak Supervision
Abstract
1. Introduction
2. Approach
3. Experiments
4. Analysis and Ablations
5. Related Work
6. Limitations and Future Work
7. Conclusions
A. Evaluation Datasets
B Compared Models
C. Text Standardization
2301.02111_Vall-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Abstract
1. Introduction
2. Related Work
3. Background: Speech Quantization
4. VALL-E
5. Experiments
6. Conclusion, Limitations, and Future Work
2303.03926_VALL-E_X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Abstract
1. Introduction
2. Related Work
3 Cross-Lingual Codec Language Model
4. VALL-E X Application
5. Experiments
6. Conclusion
A. Appendix
2406.05370_VALL-E2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Abstract
1. Introduction
2. Related Work
3. VALL-E 2
4. Experiments
5. Conclusion
2407.05407_CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
Abstract
1. Instructions
2. CosyVoice: A Scalable TTS model using Supervised Semantic Tokens
3. Dataset
4. Experimental Settings
6. Conclusion
2407.10759_Qwen2-Audio Technical Report
Abstract
1. Introduction
2. Methodology
3. Experiments
5. Conclusion
2410.00037_Moshi: a speech-text foundation model for real-time dialogue
Abstract
1.Introduction
2.Related Work
3.Model
4. Datasets and Training
5. Evaluation
6.Safety
7.Conclusion
2412.10117_CosyVoice2: Scalable Streaming Speech Synthesis with Large Language Models
Abstract
1. Instroduction
2. CosyVoice 2
3. Experimental Settings
4. Experimental Results
5. Conclusion
2501.06282_MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Abstract
1.Instruction
2.Related Work
3.MinMo
4.Experiments
5.Conclusion
6.Limitations
A. Prompts for Voice Understanding Tasks
2505.02707_Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Abstract
1. Introduction
2. Related Work
3. Voila: Voice-Language Foundation Models
4. Experiments
5. Conclusion
2505.17589_CosyVoice3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
Abstract
1.Introduction
2.CosyVoice 3
3.The Multilingual Data Pipeline
4.Experimental Settings
5.Experimental Results
6.Conclusion
7.Limitations
LLM 视频
2301.12597_BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Abstract
1 Introduction
2 Related Work
3 Method
4 Experiment
5 Limitation
6 Conclusion
2308.01390_OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
OpenFlamingo_ An Open-Source Framework for Training Large Autoregressive Vision-Language Models
Abstract
1 Introduction
2 Related work
3 Approach
4 Results
5 Discussion
6 Conclusion
Appendix A Extended results
Appendix B Additional notes on filtering MMC4
Appendix C Synthetic data prompt
Appendix D Image credits
LLM MoE
2408.15664_AUXILIARY-LOSS-FREE LOAD BALANCING STRATEGY FOR MIXTURE-OF-EXPERTS
2410.07490_MoDEM: Mixture of Domain Expert Models
商业模型
2303.08774_GPT-4 Technical Report
2312.11805_Gemini: A Family of Highly Capable Multimodal Models
Abstract
1. Introduction
2. Model Architecture
3. Training Infrastructure
5. Evaluation
6. Post-Training Models
7. Responsible Deployment
8. Discussion and Conclusion
2403.05530_Gemini1.5: Unlocking multimodal understanding across millions of tokens of context
2406.02430_Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Abstract
1 Introduction
2 Method
3 Experiments
4 Model extensions
5 Model applications, limitations, and safety
6 Authors (alphabetical order)
7 Acknowledgement
2407.04675_Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Abstract
1 Introduction
2 Motivation
3 Methods
4 Model and Evaluation
5 Conclusion
Appendix A Appendix
2503.20020_Gemini2: Gemini Robotics: Bringing AI into the Physical World
2504.xxxxx_Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning
2505.07062_Seed1.5-VL Technical Report
Seed1.5-VL Technical Report
Abstract
1 Introduction
2 Architecture
3 Pre-training
3.2 Training Recipe
4 Post-training
4.4 Hybrid Reinforcement Learning
5 Training Infrastructure
6 Evaluation
6.1.3 Video Task Evaluation
6.3.2 Comparison with State-of-the-arts
7 Conclusion and Next Steps
8 Contributions and Acknowledgments
9 Qualitative examples
9.7 Visual Reasoning_ Visual Pattern Recognition
9.19 Failure Cases_ Combinatorial Search I
10 Evaluation Details
DREAM-1K
LLM 周边技术
Framework
1712.05889_Ray: A Distributed Framework for Emerging AI Applications
Abstract
1. Introduction
2. Motivation and Requirements
3. Programming and Computation Model
4. Architecture
5. Evaluation
6 Related Work
7 Discussion and Experiences
8. Conclusion
1910.02054_DeepSpeed_ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Abstract
1. Extended Introduction
2. Related Work
3 Where Did All the Memory Go?
4 ZeRO: Insights and Overview
5 Deep Dive into ZeRO-DP
6 Deep Dive into ZeRO-R
7 Communication Analysis of ZeRO-DP
8. Communication Analysis of ZeRO-R
9. Step Towards 1 Trillion Parameters
10. Implementation and Evaluation
11. Concluding Remarks
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Transformers: State-of-the-Art Natural Language Processing
2210.XX_Ray v2 Architecture
Overview
Architecture Overview
Object Management
Task Management
Resource Management and Scheduling
Actor management
Global Control Service
Cluster Management
Appendix
2309.06180_vLLM: Efficient Memory Management for Large Language Model Serving with PagedAttention
总结
1. Introduction
2. Background
3. Memory Challenges in LLM Serving
4. Method
5. Implementation
6. Evaluation
7. Ablation Studies
10. Conclusion
大模型调优
2101.00190_Prefix-Tuning: Optimizing Continuous Prompts for Generation
2103.10385_p-tuning: GPT Understands, Too
2104.08691_Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning
2106.09685_LoRA: Low-Rank Adaptation of Large Language Models
2401.01335_Self-Play: Fine-Tuning Converts Weak Language Models to Strong Language Models
2402.09353_DoRA: Weight-Decomposed Low-Rank Adaptation
2402.12354_LoRA+: Efficient Low Rank Adaptation of Large Models
2403.03507_GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
2403.13372_LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
竞争框架
3. Efficient Fine-Tuning Techniques
4 LlamaFactory Framework
6 Conclusion and Future Work
分布式模型
1701.06538_MoE: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
1806.03377_PipeDream: Fast and Efficient Pipeline Parallel DNN Training
Abstract
1. Introduction
2. Background & Related Work
3. Parallel Training in PipeDream
4. Implementation
5. Evaluation
1811.06965_GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
收集
1. Introduction
2. The GPipe Library
3. Performance Analyses
4. Image Classification
5. Massive Massively Multilingual Machine Translation
6. Design Features and Trade-Offs
1909.08053_Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
收集
Abstract
1. Introduction
2. Background and Challenges
3. Model Parallel Transformers
19xx_PipeDream: Generalized Pipeline Parallelism for DNN Training
收集
ABSTRACT
1. Introduction
2. BACKGROUND AND RELATED WORK
3. 流水线并行(PIPELINE PARALLELISM)
4. 实现
6. 结论
2006.09503_PipeDream-2BW: Memory-Efficient Pipeline-Parallel DNN Training
Abstract
2006.15704_PyTorch Distributed: Experiences on Accelerating Data Parallel Training
2006.16668_GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
2104.04473_Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Abstract
2205.14135_FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Abstract
1. Introduction
2 Background
3. FLASHATTENTION: Algorithm, Analysis, and Extensions
4. Experiments
5. Limitations and Future Directions
Appendix A Related Work
Appendix B Algorithm Details
Appendix C Proofs
Appendix D Extension Details
2307.08691_FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Abstract
1. Introduction
2. Background
3. FlashAttention-2: Algorithm, Parallelism, and Work Partitioning
4. Empirical Validation
5. Discussion and Future Directions
通用
LLM 量化
通用
混合精度
浮点数格式
weight-only quantization
2110.02861_bitsandbytes: 8-bit Optimizers via Block-wise Quantization
Abstract
1. Background
2. 8-bit Optimizers
3. 8-bit vs 32-bit Optimizer Performance for common Benchmarks
4. Analysis
5. Related Work
2206.01861_ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Abstract
1. Introduction
2. Relative Work
3. Background and Challenges
4. Methodology
5. Results
6. Conclusions
Appendix A Background
Appendix D Details about System Optimization
2206.09557_LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Abstract
1. Instructions
2. Background
3. Design Methodology of LUT-GEMM
4. Experimental results
5. Accelerating Quantized OPT-175B
6. Conclusion
Appendix A LLM Inference Latency Breakdown
Appendix B Detailed Implementation
2208.07339_LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
相关参考
Abstract
1. Introduction
2. Background
3. Int8 Matrix Multiplication at Scale
4. Emergent Large Magnitude Features in Transformers at Scale
5. Related Work
6. Discussion and Limitations
7. Broader Impacts
其他
2209.05433_FP8: FP8 Formats For Deep Learning
Abstract
1. Introduction
2. Aspects of FP8 Usage in Deep Learning
3. FP8 Binary Interchange Format
示例讲解
4. Empirical Results
5. Conclusions
2210.17323_GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Abstract
1. Introduction
2. Related Work
3. Background
4. The GPTQ Algorithm
5. Experimental Validation
6. Summary and Limitations
2211.10438_SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Abstract
1. Introduction
2. Preliminaries
3. Review of Quantization Difficulty
4. SmoothQuant
5. Experiments
6. Related Work
7. Conclusion
Appendix A. Discussion on Weight-Only Quantization
2305.14314_QLoRA: Efficient Finetuning of Quantized LLMs
关键词
Abstract
1. Introduction
2. Background
3. QLoRA Finetuning
4. QLoRA vs. Standard Finetuning
5. Pushing the Chatbot State-of-the-art with QLoRA
6. Qualitative Analysis
7. Related Work
8. Limitations and Discussion
2306.00978_AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Abstract
1. Introduction
2. Related Work
3. AWQ: Activation-aware Weight Quantization
4. TinyChat: Mapping AWQ onto Edge Platforms
5. Experiments
6. Conclusion
2309.05516_AutoRound: Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Abstract
1. Introduction
2. Related Work
3. Methodology
4. Experiments
5. Conclusion
LLM 安全
2312.06674_Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
LLM强化学习
1703.03864_Evolution Strategies: as a Scalable Alternative to Reinforcement Learning
2504.02495_DeepSeek-GRM: Inference-Time Scaling for Generalist Reward Modeling
Abstract
1. Introduction
2. Preliminaries
3. Self-Principled Critique Tuning (SPCT)
4. Inference-Time Scaling with SPCT
5. Results on Reward Modeling Benchmarks
6. Related Work
7. Conclusion and Future Work
A. Additional Related Work
B. Limitations and Future Directions
G. Prompt Templates
2504.13958_ToolRL: Reward is All Tool Learning Needs
其他
2203.02155_Training language models to follow instructions with human feedback(InstructGPT)
Abstract
1. Introduction
2. Related work
3. Methods and experimental details
4. Results
5. Discussion
Appendix A Additional prompt data details
Appendix B Additional human data collection details
Appendix C Additional model details
Appendix D Automatic evaluation details
2305.20050_Let’s Verify Step by Step
1. 研究背景
2. 监督方法对比
3. 核心发现
总结
2408.03314_Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
1. Introduction
3. How to Scale Test-Time Computation Optimally
5. Scaling Test-Time Compute via Verifiers
6. Refining the Proposal Distribution
其他
2412.14135_Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
FromGPT
1. Introduction
2. Background
3. Policy Initialization
4. Reward Design
5. Search
6. Learning
7 Open-source o1 Project
8. Future Directions
机器学习
ML Vision
1506.02640_You Only Look Once: Unified, Real-Time Object Detection
Abstract
1612.08242_YOLO9000: Better, Faster, Stronger
Abstract
1804.02767_YOLOv3
2004.10934_YOLOv4: Optimal Speed and Accuracy of Object Detection
Abstract
2205.00159_SVTR: Scene Text Recognition with a Single Visual Model
Abstract
1. Introduction
2. Method
3. Experiments
4. Conclusion
2207.02696_YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Abstract
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
2304.08485_Visual Instruction Tuning
2402.13616_YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Abstract
2405.14458_YOLOv10: Real-Time End-to-End Object Detection
Abstract
2411.15858_SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
定义
Abstract
1. Introduction
2. Related Work
3. Methods
4 Experiments
5. Conclusion
8. More detail of real-world datasets
ML
2112.09332_WebGPT: Browser-assisted question-answering with human feedback
2203.11147_GopherCite: Teaching language models to support answers with verified quotes
2304.09848_Generative_Search: Evaluating Verifiability in Generative Search Engines
2305.14251_FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
2305.14627_ALCE: Enabling Large Language Models to Generate Text with Citations
NLI 在引用质量评估中的应用
论文中用的prompt
2307.02185_Citation: A Key to Building Responsible and Accountable Large Language Models
2307.16883_HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
AI Agent
通用 Agent
2210.03629_ReAct
2303.08268_Chat-with-the-Environment
正文
2303.11366_Reflexion: Language Agents with Verbal Reinforcement Learning
2303.16434_TaskMatrix.AI
大脑
接口平台
API 选择器
2304.03442_Generative-Agents
Generative Agent Architecture
2307.07924_ChatDev: Communicative Agents for Software Development
2308.00352_MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
2308.04026_AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
2308.08155_AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
2308.10848_AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
理念
2310.06117_Step-Back: Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
2402.18679_MetaGPT_DI: Data Interpreter: An LLM Agent For Data Science
INTRODUCTION
2407.07061_IoA: Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
2.1 OVERVIEW OF IOA
2.2 ARCHITECTURE OF IOA
2.3 KEY MECHANISMS
2.5 Putting It All Together
2408.08435_ADAS: Automated Design of Agentic Systems
Prompt
2408.08435_ADAS: Automating Agentic Workflow Generation
Introduce
PRELIMINARY
2410.17238_SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning
1 Introduction
2 Related Works
3 Method
2410.21012_FACT: Examining the Effectiveness of Iterative Context Rewriting for Multi-fact Retrieval
Introduce
2504.01990_Advances and Challenges in Foundation Agents
2506.12508_AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving
Abstract
1.Introduction
3.AgentOrchestra
4.Experiments
视觉 Agent&AIOS
2108.03353_ Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Abstract
1. Introduction
2. Related Work
3. Dataset Creation
4. Model Design
其它
2209.08199_ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
Abstract
1. Introduction
2. Related Work
3. Problem Setting: Tasks and Metrics
4. Data Annotation
5. Dataset Analysis
6. Experiments and Baselines
7. Conclusion
8. Limitations
9. Ethical Considerations
A. Data Annotation Details
B. Data Examples
2212.06817_RT-1: ROBOTICS TRANSFORMER FOR REAL-WORLD CONTROL AT SCALE
ABSTRACT
1. Introduction
2. Related Work
3. Preliminaries
4. System Overview
5. RT-1: ROBOTICS TRANSFORMER
6. EXPERIMENTS
7. CONCLUSIONS, LIMITATIONS AND FUTURE WORK
B. MODEL CARD
C. MODEL AND DATA
D. EXPERIMENTS
2312.13771_AppAgent: Multimodal Agents as Smartphone Users
3.1 Environment and Action Space
3.2 Exploration Phase
3.3 Deployment Phase
2401.10935_SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Abstract
1. Introduction
2. Related work
3. Approach
4. ScreenSpot: A Grounding Benchmark
5. Experiments
6. Conclusion
Limitations
Ethical considerations
A. Details of SeeClick Pre-training
B ScreenSpot Annotation & Evaluation
C. Downstream Agent Tasks
2402.04615_ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Abstract
1. Introduction
2. Methodology
3. Automatic data generation
4. Data Mixtures
5. Experiments and Results
6. Conclusions
A Definitions of Metrics
B. Screen Schema Examples
C. Prompts For LLM Generated Content
D. Screen Navigation Generated Examples
F. ScreenQA Short Answers Generation
G. Complex Question Answering Datasets
H. New Benchmarks Repositories
2402.07939_UFO: A UI-Focused Agent for Windows OS Interaction
Abstract
1.Introduction
2.Related Work
3.The Design of UFO
4.Experiment
5.Limitations & Lessons Learned
6.Conclusion
2403.16971_AIOS: LLM Agent Operating System
Abstract
1. Introduction
2. The Architecture of AIOS
3. AIOS Kernel
4 Evaluation
Appendix E Discussion
2406.01014_Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
2411.02059_TableGPT2: A Large Multimodal Model with Tabular Data Integration
Abstract
2501.11733_Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Abstract
1. Introduction
2. Mobile-Agent-E
3. Experiments
4. Results
5. Related Work
6. Conclusion and Future Work
Appendix A Full Trajectory Comparison Example with Previous SOTA
Appendix B Error Recovery with Escalation to Manager
Appendix C Remaining Limitations
Appendix D All Tasks in Mobile-Eval-E Benchmark
Appendix E Atomic Operation Space
Appendix F Full list of Self-Evolved Shortcuts
Appendix G Full list of Self-Evolved Tips
2501.12326_UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Abstract
1. Introduction
2. Evolution Path of GUI Agents
3. Core Capabilities of Native Agent Model
4. UI-TARS
5. Experiment
6. Conclusion
2502.14282_PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Abstract
1. Introduction
2. PC-Agent
3. Experiments
4. Related Work
5. Conclusion
2504.14603_UFO2: The Desktop AgentOS
Abstract
1.Introduction
2.Background
3.System Design of UFO2
4.Picture-in-Picture Interface
5.Implementation and Specialized Engineering Design
6.Evaluation
7.Discussion & Future Work
8.Related Work
9.Conclusion
记忆
2505.22101_MemOS: An Operating System for Memory-Augmented Generation (MAG) in LLM (Short Version)
总结
Abstract
1 Introduction
2 Memory in Large Language Models
3 MemOS Design Philosophy
4 MemOS
4.1 MemOS 中的记忆类型
4.2 记忆立方体(MemCube):核心资源
4.3 MemOS 架构
4.4 系统执行流程
总结
5 Conclusion
Tools
2205.00445_MRKL
2302.04761_Toolformer: Language Models Can Teach Themselves to Use Tools
2303.17580_HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
2307.16789_ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
总结
LLM总结
Abstract
1 Introduction
2 Dataset Construction
3 Experiments
4 Related Work
5 Conclusion
Appendix
Appendix A Implementation Details
AGI
1905.10985_AI-GA: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence
2408.06292_The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
RAG
2005.11401_Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
2312.10997_Retrieval-Augmented Generation for Large Language Models: A Survey
II. Overview of RAG
II-A Naive RAG
II-B Advanced RAG
II-C Modular RAG
II-D RAG vs Fine-tuning
III. Retrieval
III-A Retrieval Source
III-B Indexing Optimization
III-C Query Optimization
III-D Embedding
III-E Adapter
IV. Generation
IV-A Context Curation
IV-B LLM Fine-tuning
V. Augmentation process in RAG
V-A Iterative Retrieval
V-B Recursive Retrieval
V-C Adaptive Retrieval
VI. Task and Evaluation
VI-A Downstream Task
VI-B Evaluation Target
VI-C Evaluation Aspects
VI-D Evaluation Benchmarks and Tools
VII. Discussion and Future Prospects
VII-A RAG vs Long Context
VII-B RAG Robustness
VII-C Hybrid Approaches
VII-D Scaling laws of RAG
VII-E Production-Ready RAG
VII-F Multi-modal RAG
2401.15884_CRAG: Corrective Retrieval Augmented Generation
2403.14403_Adaptive-RAG
2404.16130_GraphRAG: From Local to Global: A GraphRAG Approach to Query-Focused Summarization
总结
LLM 总结
Abstract
1 Introduction
2 Background
2.1 RAG方法与系统
2.2 知识图谱在LLM与RAG中的应用
2.3 自适应基准测试
2.4 RAG评估标准
3 Methods
3.1 GraphRAG 工作流程
3.2 全局理解问题生成
3.3 全局理解评估标准
总结
4 Analysis
4.1 实验1
4.2 实验2
总结
5 Results
5.1 实验一:不同方法在摘要任务中的表现比较
5.2 实验二:基于声明的指标评估
总结
6 Discussion
6.1 评估方法的局限性
6.2 未来工作
更广泛的影响
7 Conclusion
Appendix A Entity and Relationship Extraction Approach
1. 实体与关系抽取方法
2. 自我反思(Self-Reflection)技术
3. 分块大小与抽取效果的关系
4. 实验结果(图3)
总结
Appendix B Example Community Detection
Appendix C Context Window Selection
Appendix D Example Answer Comparison
Appendix E System Prompts
E.1 实体实例生成(Element Instance Generation)
E.2 社区摘要生成(Community Summary Generation)
E.3 社区问题回答生成(Community Answer Generation)
E.4 全局问题回答生成(Global Answer Generation)
Appendix F Evaluation Prompts
F.1 Relative Assessment Prompt
F.2 Relative Assessment Metrics
Appendix G Statistical Analysis
统计方法:
主要结果总结:
总体趋势:
重要结论:
2405.16506_GRAG: Graph Retrieval-Augmented Generation
2406.13213_Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata
2410.05779_LightRAG: Simple and Fast Retrieval-Augmented Generation
总结
Abstract
1 Introduction
2 Retrieval-Augmented Generation
3 The LightRAG Architecture
一、LightRAG架构概述
二、基于图的文本索引(Graph-based Text Indexing)
三、双层检索范式(Dual-level Retrieval Paradigm)
四、检索增强的答案生成(Retrieval-Augmented Answer Generation)
五、复杂度分析
总结
4 Evaluation
1. 实验设置(4.1 Experimental Settings)
2. LightRAG 与现有 RAG 方法的对比(4.2 RQ1)
3. 消融实验(4.3 RQ2)
总结
4.4 Case Study (RQ3)
4.4 案例研究(RQ3)总结:
4.5 模型成本与适应性分析(RQ4)总结:
总体结论:
5 Related Work
第5章 相关工作(总结)
6 Conclusion
7 Appendix
2410.10450_KBLaM: Knowledge Base augmented Language Model
Abstract
1. Introduction
2. Related work
3. Background
Self-attention layer
4. Augmenting LLM with the KB
Knowledge tokens
Rectangular Attention: Injecting knowledge token into prompt tokens
KB length generalization through attention score scaling
5. KB instruction tuning
6. EXPERIMENTS
6.1 EXPERIMENT SETTING
6.2 EXPERIMENT RESULTS
总结亮点
7. CONCLUSION
8. LIMITATIONS AND FUTURE WORK
Appendix A Extended related work
Appendix B Ablation study
Appendix C Sample KB
SAMPLE Q&A
PROMPT
PROMPT FOR SYNTHETIC KB GENERATION
Prompt for open-ended Q&A generation
PROMPT FOR GPT EVALUATION OF OPEN-ENDED Q&A
PROMPT FOR LLAMA EVALUATION
QUESTION TEMPLATE
SAMPLE OUTPUT
SYNTHETIC KB
ENRON
2504.03137_LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph
Abstract
Introduction
Related Work
LLM Prompt Engineering
KG-based LLM Reasoning
Preliminaries
1. Knowledge Graph (KG)
2. Anchor Entities
3. Relation Link
4. Reasoning Path
Methodology
Stage1: Reasoning Graph Retrieval
Stage2: Knowledge Embedding
Stage3: Knowledge Prompts Mixed Reasoning
Experiments
Conclusion
GraphRAG 官方文档
Indexing
> Indexing Architecture
> Indexing Dataflow
> Prompt Tuning
Query
论文池
2305.16300_Random-Access Infinite Context Length for Transformers
LLM 总结
研究背景与动机
核心问题
主要贡献
关键技术点
实验结果
意义与应用前景
总结
Abstract
1 Introduction
2 Related Work
3 Methodology
总体思路
方法详解
位置编码处理
与其他方法的对比
总结
3.3 Memory & Computation
4 Experiments
4.1 语言建模实验
4.2 微调预训练模型
总结
5 Future Work
6 Conclusion
Acknowledgment
Appendix A Grouped Softmax Example
Appendix B Dataset Description
Appendix C Number of Unique Retrieved Blocks
Appendix D Context Miss Token
Appendix E Positional Augmentation
Appendix F Additional Extensions and Details
1.
掩码语言建模(Masked Language Modeling)
2.
与 Flash Attention 的结合
3.
检索块数量与块大小的权衡
总结
Appendix G Offloading KV Cache to CPU
2311.18743_AlignBench: Benchmarking Chinese Alignment of Large Language Models
主要内容总结:
总结:
Abstract
1 Introduction
1. 背景与挑战
2. AlignBench的设计目标
3. AlignBench的主要特点
4. AlignBench的应用与成果
5. 总体贡献
6. 表格对比
2 Dataset
3 Methods
4 Human Evaluation on AlignBench
一、一致性评估(Agreement Evaluation)
二、解释质量评估(Quality Evaluation)
总结
5 AlignBench: Benchmarking Results
6 Related Work
7 Conclusion
Appendix A Appendix
A.2 Prompts and Details of Methods
A.2 提示模板与方法细节
A.3 各维度表现
A.4 案例分析
总结
一、核心问题:参考材料缺失导致评估困难
二、数学积分问题对比分析
三、总结
2401.15391_MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
背景与动机
贡献
方法概览
实验结果
总结
Abstract
1 Introduction
主要内容总结如下:
总结:
2 RAG with multi-Hop queries
2.1 RAG(检索增强生成)概述
2.2 多跳查询(Multi-Hop Queries)
2.3 评估指标
总结
3 A Benchmarking Dataset: MultiHop-RAG
一、MultiHop-RAG 数据集构建流程
二、MultiHop-RAG 数据集统计信息
总结
4 Benchmarking RAG system using MultiHop-RAG
一、检索相关任务(Retrieval-related Task)
二、生成相关任务(Generation-related Task)
三、其他潜在改进方向(Other Use Cases)
总结
5 Related Work
6 Conclusion
Limitations
Appendix A Appendix A: GPT-4 Prompts Used for Data Generation
Appendix B Appendix B: Dataset Examples
2405.16506_GRAG: Graph Retrieval-Augmented Generation
Abstract
1 Introduction
2 Related Work
2.1 Prompt Tuning
2.2 LLMs在图相关任务中的应用
2.3 图上的检索方法
3 Problem Formalization
4 Methodology
概述
4.1 文本子图检索
文本子图索引(Indexing)
文本子图排序(Ranking)
文本子图软剪枝(Soft Pruning)
总结
4.2 Textual Graph Augmented Generation
1. 文本视图(Text View of Textual Graphs)
2. 图视图(Graph View of Textual Graphs)
3. 生成阶段(Generation Phase)
总结
5 Experiments
总结:第五章 实验部分
6 Conclusion
7 Limitations
Acknowledgments
Appendix A Appendix
附录A 总结
总结
2407.01178_Memory3: Language Modeling with Explicit Memory
Language Modeling with Explicit Memory
研究背景与动机:
主要内容与方法:
实验与结果:
总结:
Abstract
核心思想
Memory3 模型特点
实验结果
总结
1 _ Introduction
1.1.1 _ Retrieval-augmented Training
1.1.1 | 基于检索的训练(Retrieval-augmented Training)
1.1.2 | 稀疏计算(Sparse Computation)
1.1.3 | 参数即记忆(Parameter as Memory)
总结
2 _ Memory Circuitry Theory
核心概念总结:
总体贡献:
Definition 2.
1.
定义与核心概念:计算图、同构与知识(电路)
2.
知识的实例
3.
知识的外部化与记忆
4.
结论与断言
总结
Remark 1.
1.
电路构造的关键性质
2.
记忆增强 LLM 的形式化定义
3.
写入代价与读取代价的权衡(记忆层次结构)
4.
知识使用频率与记忆分配
5.
图示与结论
小结
3 _ Design
3 | Design
3.1 | 推理过程
3.2 | 写入与读取记忆
总结
3.3 _ Memory Sparsification and Storage
一、显式记忆的存储挑战
二、各维度的稀疏化策略
三、压缩效果
四、部署方式
五、补充说明与建议
总结
3.4 _ Model Shape
3.4 | 模型结构(Model Shape)
3.5 | 训练设计(Training Designs)
总结
3.6 _ Two-stage Pretrain
一、预训练的两个阶段
二、对 continual train 的优化
三、防止信息泄露
总结
4 _ Pretraining Data
4.1 数据收集(Data Collection)
4.2 数据过滤(Filtering)
4.3 分词器(Tokenizer)
4.4 知识库(Knowledge Base)
总结
5 _ Pretrain
1. 预训练总体设计
2. 训练设置(Set-up)
3. 预热阶段(Warmup Stage)
4. 持续训练阶段(Continual Train Stage)
总结
6 _ Fine-tuning and Alignment
6.1 监督微调(Supervised Finetuning, SFT)
6.2 直接偏好优化(Direct Preference Optimization, DPO)
7 _ Evaluation
7.1 通用能力评估
7.2 对话能力评估
7.3 幻觉与事实性评估
7.4 专业任务评估
总结
7.5 _ Inference Speed
主要内容总结如下:
总结:
8 _ Conclusion
Acknowledgement
Appendix A Cost Estimation
模型参数设定
隐式记忆(Implicit Memory)成本
显式记忆(Explicit Memory)成本
外部信息(External Information,如 RAG)成本
综合比较
拓展讨论
Appendix B Vector Compression
Appendix C Supplementary Evaluation Results
2505.14683_Emerging Properties in Unified Multimodal Pretraining
LLM 总结
Abstract
1 Introduction
核心内容总结:
总结:
2 Model
1. 模型架构概览
2. 生成策略
3. 模型细节
4. 广义因果注意力(Generalized Causal Attention)
5. Transformer结构选择与实验
总结
3 Data
数据特点与目标
数据来源与统计
数据构建方法
数据训练策略
总结
4 Training
1. 多阶段训练策略
2. 关键超参数调整
总结
5 Evaluation
6 Emerging Properties
1.
新兴属性的定义与研究背景
2.
任务表现与训练阶段的关系
3.
多模态特征的重要性
4.
定性分析与生成质量提升
5.
核心发现与结论
总结
7 Main Results
7.1 图像理解
7.2 图像生成
7.3 图像编辑
7.4 带有推理的生成/编辑
7.5 世界建模
总结
7.6 More Qualitative Results
8 Conclusion
9 Acknowledgement
MemOS: A Memory OS for AI System
LLM 总结:
Abstract
1 Introduction
1. 背景与动机
2. 现有方法的不足
3. 四大典型挑战
4. MemOS的提出与核心理念
5. 总结与意义
2 Memory in Large Language Models
总结如下:
一、记忆研究的四个阶段
二、第一阶段:记忆定义与探索
三、MemOS 的初步构想
四、总结
2.1 显式长期记忆的建立(Stage 1)
2.2 人脑式记忆机制的引入(Stage 2)
2.3 基于工具的记忆管理(Stage 3)
2.4 系统化记忆治理(Stage 4)
总结
3 MemOS Design Philosophy
一、MemOS 的愿景(3.1 Vision of MemOS)
二、从传统操作系统到记忆操作系统(3.2 From Computer OS to Memory OS)
三、总结
4 Memory Modeling in MemOS
4.1 内存类型与语义演化路径
4.2 Memory Cube(MemCube):内存的核心资源单元
总结
5 Architecture of MemOS
总结:MemOS 架构与执行流程
总结
5.5.1 MemGovernance(内存治理模块)
5.5.2 MemVault(内存存储与路由基础设施)
5.5.3 MemLoader 与 MemDumper(内存加载与导出模块)
5.5.4 MemStore(内存存储与分发接口)
总结
6 Evaluation
1. 整体系统评估(End-to-End Evaluation on LOCOMO)
2. 内存检索评估(Evaluation of Memory Retrieval)
3. KV缓存加速评估(Evaluation of KV-Based Memory Acceleration)
总结
7 MemOS for Architecture Innovation and Applications
一、MemOS推动的架构创新
二、MemOS的应用场景
总结
8 Conclusion
其他
数据集&数据蒸馏
1811.10959v3_Dataset Distillation
ABSTRACT
LLM总结
1. INTRODUCTION
3. APPROACH
2502.20653_Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
Abstract
1. Introduction
2. Related Work
7. Conclusion
通用
Dataset distillation
3D
2003.08934_NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Abstract
1. Introduction
2. Related Work
3. Neural Radiance Field Scene Representation
4. Volume Rendering with Radiance Fields
5. Optimizing a Neural Radiance Field
6. Result
7. Conclusion
2203.08586: Deep vanishing point detection: Geometric priors make dataset variations vanish
概念
Abstract
1. Introduction
2. Related Work
3. Geometric priors for VP detection
4. Experiments
5. Conclusion and limitations
2312.14132_DUSt3R: Geometric 3D Vision Made Easy
关键词
相关概念
Abstract
1. Introduction
2. Related Work
3. Method
4. Experiments with DUSt3R
5. Conclusion
Appendix A
附录概览
Appendix B. Qualitative results
Appendix C. Extended Related Work
Appendix D. 多视角姿态估计(Multi-view Pose Estimation)
Appendix E. 视觉定位(Visual Localization)
Appendix F. Training details
2406.09756_MASt3R: Grounding Image Matching in 3D with MASt3R
前言
Abstract
1. Introduction
🧠 思维导图式总结
2. Related works
🧠 总结思维导图
3. Method
4. Experimental results
5. Conclusion
Appendix
Appendix A Additional Qualitative Results
B. Fast Reciprocal Matching
C. Coarse-to-Fine
D. Detailed experimental settings
2412.09401_SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos
术语
Abstract
1. Introduction
2. Related Work
3. Method
4. Experiments
5. Conclusion
6. 致谢
Appendix
Appendix A Implementation details
Appendix B Details for experimental settings
Appendix C Additional comparisons and analyses
D. More visual results
2412.12392_MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
GPT
先验知识
Abstract
1. Introduction
2. Related Work
3. Method
4. Results
5. Limitations and Future Work(局限与未来工作)
🧾 6. Conclusion(总结)
🧠 总结一句话版:
8. Initialisation(初始化)
9. Runtime Breakdown(运行时分析)
10. Evaluation Setup(评估设置)
11. EuRoC 结果总结
2503.11651_VGGT: Visual Geometry Grounded Transformer
Abstract
1. Introduction
2. Related Work
3. Method
4. Experiments
5. Discussions
6. Conclusions
Appendix A Formal Definitions
Appendix B Implementation Details
Appendix C Additional Experiments
Appendix D Qualitative Examples
Appendix E Related Work
其他
A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS
The Basic Idea Behind CRC Algorithms
Polynomical Arithmetic
Binary Arithmetic with No Carries
一个可用的实例
Choosing A Poly
A Straightforward CRC Implementation
A Table-Driven Implementation
A Slightly Mangled Table-Driven Implementation
参考
Distributed Representations of Sentences and Documents
新溪-gordon
Docs
»
LLM 模型
»
2410.07490_MoDEM: Mixture of Domain Expert Models
View page source
主页
索引
模块索引
搜索页面
2410.07490_MoDEM: Mixture of Domain Expert Models
¶
https://arxiv.org/html/2410.07490v1
主页
索引
模块索引
搜索页面