Skip to main content

Kimi

A Brief Look at Chain of Thought and Reinforcement Learning in DeepSeek-R1 and Kimi k1.5 Papers
·1302 words·7 mins
AI LLM CoT Reinforcement Learning DeepSeek Kimi Model Distillation Chain of Thought
A brief overview of the technical features in reasoning capabilities of DeepSeek-R1 and Kimi k1.5: DeepSeek employs GRPO algorithm and model distillation to enhance reasoning performance, while Kimi explores the integration of long-form Chain of Thought with reinforcement learning.