论文¶

通用¶

Agents¶

视觉 Agent&AIOS¶

大模型调优¶

分布式模型¶

LLM NLP¶

LLM MoE¶

LLM 多模态¶

LLM 音频¶

2005.08100_Conformer: Convolution-augmented Transformer for Speech Recognition
2112.02418_YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
2212.04356_whisper: Robust Speech Recognition via Large-Scale Weak Supervision
2301.02111_Vall-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
2303.03926_VALL-E_X: Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
2406.05370_VALL-E2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
2407.05407_CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
2407.10759_Qwen2-Audio Technical Report
2410.00037_Moshi: a speech-text foundation model for real-time dialogue
2412.10117_CosyVoice2: Scalable Streaming Speech Synthesis with Large Language Models
2501.06282_MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
2505.02707_Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
2505.17589_CosyVoice3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

LLM强化学习¶

LLM 量化¶

LLM 闭源模型¶

3D¶

LLM 安全¶

2312.06674_Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

Benchmarking¶

数据集&数据蒸馏¶

Framework¶

ML¶

ML 多模态相关¶

ML Vision¶

RAG¶

Tools¶

AGI¶

others¶

Highlighting the top ML papers every week: https://github.com/dair-ai/ML-Papers-of-the-Week

On This Page
论文¶
通用¶
Agents¶
视觉 Agent&AIOS¶
大模型调优¶
分布式模型¶
LLM NLP¶
LLM MoE¶
LLM 多模态¶
LLM 音频¶
LLM强化学习¶
LLM 量化¶
LLM 闭源模型¶
3D¶
LLM 安全¶
Benchmarking¶
数据集&数据蒸馏¶
Framework¶
ML¶
ML 多模态相关¶
ML Vision¶
RAG¶
Tools¶
AGI¶
others¶

Powered by Yiting & Majiang