CoT
Chain of Thought + Reinforcement Learning: Innovations in DeepSeek-R1 and Kimi k1.5 Papers
·1302 words·7 mins
AI
LLM
CoT
Reinforcement Learning
DeepSeek
Kimi
Model Distillation
Chain of Thought
An in-depth analysis of breakthroughs in reasoning capabilities by DeepSeek-R1 and Kimi k1.5. Exploring how DeepSeek enhances reasoning through GRPO algorithm and model distillation, and Kimi’s innovations in long-form Chain of Thought and reinforcement learning.