LLM 周边技术¶

Framework¶

大模型调优¶

分布式模型¶

LLM 量化¶

LLM 安全¶

2312.06674_Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

LLM强化学习¶

其他¶

2203.02155_Training language models to follow instructions with human feedback(InstructGPT)
2305.20050_Let’s Verify Step by Step
2408.03314_Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
2412.14135_Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective