Kimi
A Brief Look at Chain of Thought and Reinforcement Learning in DeepSeek-R1 and Kimi k1.5 Papers
·1302 words·7 mins
AI
LLM
CoT
Reinforcement Learning
DeepSeek
Kimi
Model Distillation
Chain of Thought
A brief overview of the technical features in reasoning capabilities of DeepSeek-R1 and Kimi k1.5: DeepSeek employs GRPO algorithm and model distillation to enhance reasoning performance, while Kimi explores the integration of long-form Chain of Thought with reinforcement learning.