-
GRPO From Scratch
This post explains my pytorch implementation of Group Relative Policy Optimization Algorithm.
-
DPO From Scratch
This post explains my pytorch implementation of Direct Preference Optimization Algorithm.
-
On Reinforcement Learning for Large Language Models
A personal thinking on why reinforcement learning is vital for Large Language Models. [Updated 02/21]
-
Flow models for Generative AI
As an alternative to Diffusion Models, Continuous Normalizing Flow Matching is one of the most powerful paradigm for generative AI modeling.
-
Optimal offline RL with the unified model-based framework
A model-based framework + singleton absorbing MDP technique achieves the optimal rate for several challenging offline tasks.