论文¶
通用¶
Pipedream: Fast and efficient pipeline parallel dnn training. arXiv:1806.03377, 2018. Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR, abs/1706.02677, 2017. Gpipe: Efficient training of giant neural networks using pipeline parallelism. CoRR, abs/1811.06965, 2018. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Agents¶
- React
- Chat with the Environment
- Reflexion: Language Agents with Verbal Reinforcement Learning
- TaskMatrix.AI
- Generative Agents
- ChatDev: Communicative Agents for Software Development
- MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
- AgentSims: An Open-Source Sandbox for Large Language Model Evaluation
- AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
- AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
- Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
- Data Interpreter: An LLM Agent For Data Science
- Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence
- ADAS: Automated Design of Agentic Systems
- SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning
- AFlow: Automating Agentic Workflow Generation
- FACT: Examining the Effectiveness of Iterative Context Rewriting for Multi-fact Retrieval
- 2504.01990_Advances and Challenges in Foundation Agents
AIOS¶
大模型调优¶
- Prefix-Tuning: Optimizing Continuous Prompts for Generation
- p-tuning: GPT Understands, Too
- Prompt Tuning: The Power of Scale for Parameter-Efficient Prompt Tuning
- LoRA: Low-Rank Adaptation of Large Language Models
- QLoRA: Efficient Finetuning of Quantized LLMs
- Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
- DoRA: Weight-Decomposed Low-Rank Adaptation
- LoRA+: Efficient Low Rank Adaptation of Large Models
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
- LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
- 2305.20050_Let’s Verify Step by Step
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
- Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
- 2203.02155_Training language models to follow instructions with human feedback(InstructGPT)
分布式模型¶
- 通用
- 1701.06538_MoE: Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
- 1806.03377_PipeDream: Fast and Efficient Pipeline Parallel DNN Training
- 1811.06965_GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
- 1909.08053_Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- 19xx_PipeDream: Generalized Pipeline Parallelism for DNN Training
- 2006.15704_PyTorch Distributed: Experiences on Accelerating Data Parallel Training
- 2006.16668_GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
- 2006.09503_PipeDream-2BW: Memory-Efficient Pipeline-Parallel DNN Training
- 2104.04473_Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
- 2205.14135_FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
- 2307.08691_FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
MoE NLP¶
- GPT1: Improving Language Understanding by Generative Pre-Training
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- GPT2: Language Models are Unsupervised Multitask Learners
- CPM: A Large-scale Generative Chinese Pre-trained Language Model
- LLaMA: Open and Efficient Foundation Language Models
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- Qwen Technical Report
- DeepSeek-Coder: When the Large Language Model Meets Programming – The Rise of Code Intelligence
- MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
- ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
- 2407.10671_Qwen2 Technical Report
- 2412.15115_Qwen2.5:
LLM MoE¶
LLm 多模态¶
- 2304.08485_LLaVA: Visual Instruction Tuning
- 2308.12966_Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
- 2310.03744_LLaVA2: Improved Baselines with Visual Instruction Tuning
- 2403.05525_DeepSeek-VL: Towards Real-World Vision-Language Understanding
- 2408.01800_MiniCPM-V: A GPT-4V Level MLLM on Your Phone
- 2409.17146_Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
- 2412.04468_NVILA: Efficient Frontier Visual Language Models
- 2502.13923_Qwen2.5-VL
- 2503.20215_Qwen2.5-Omni Technical Report
LLM强化学习¶
LLM 量化¶
3D¶
- Deep vanishing point detection: Geometric priors make dataset variations vanish
- 2312.14132_DUSt3R: Geometric 3D Vision Made Easy
- 2406.09756_MASt3R: Grounding Image Matching in 3D with MASt3R
- 2412.09401_SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos
- 2412.12392_MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
- 2503.11651_VGGT: Visual Geometry Grounded Transformer
LLM 安全¶
Benchmarking¶
数据集&数据蒸馏¶
Framework¶
- 1712.05889_Ray: A Distributed Framework for Emerging AI Applications
- 1910.02054_DeepSpeed_ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- PyTorch: An Imperative Style, High-Performance Deep Learning Library
- Transformers: State-of-the-Art Natural Language Processing
- 2210.XX_Ray v2 Architecture
- 2309.06180_Efficient Memory Management for Large Language Model Serving with PagedAttention
ML¶
- WebGPT: Browser-assisted question-answering with human feedback
- Teaching language models to support answers with verified quotes
- FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
- Evaluating Verifiability in Generative Search Engines
- Citation: A Key to Building Responsible and Accountable Large Language Models
- HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
- Enabling Large Language Models to Generate Text with Citations
ML 多模态相关¶
- 2108.03353_ Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
- 2209.08199_ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
- 2212.06817_RT-1: ROBOTICS TRANSFORMER FOR REAL-WORLD CONTROL AT SCALE
- 2401.10935_SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
- 2402.04615_ScreenAI: A Vision-Language Model for UI and Infographics Understanding
- 2411.02059_TableGPT2: A Large Multimodal Model with Tabular Data Integration
ML Vision¶
- 1506.02640_You Only Look Once: Unified, Real-Time Object Detection
- 1612.08242_YOLO9000: Better, Faster, Stronger
- 1804.02767_YOLOv3
- 2004.10934_YOLOv4: Optimal Speed and Accuracy of Object Detection
- 2205.00159_SVTR: Scene Text Recognition with a Single Visual Model
- 2207.02696_YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
- Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
- 2304.08485_Visual Instruction Tuning
- 2402.13616_YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
- 2405.14458_YOLOv10: Real-Time End-to-End Object Detection
- 2411.15858_SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
RAG¶
- 2005.11401_Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- 2312.10997_Retrieval-Augmented Generation for Large Language Models: A Survey
- 2401.15884_CRAG: Corrective Retrieval Augmented Generation
- 2403.14403_Adaptive-RAG
- 2404.16130_From Local to Global: A Graph RAG Approach to Query-Focused Summarization
- 2405.16506_GRAG: Graph Retrieval-Augmented Generation
- GraphRAG 官方文档
- 2406.13213_Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata
- 2410.10450_KBLaM: Knowledge Base augmented Language Model
- 2504.03137_LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph
Tools¶
手机业务¶
- AppAgent: Multimodal Agents as Smartphone Users
- Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
- 2501.11733_Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
- 2501.12326_UI-TARS: Pioneering Automated GUI Interaction with Native Agents
- 2502.14282_PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
AGI¶
others¶
Highlighting the top ML papers every week: https://github.com/dair-ai/ML-Papers-of-the-Week