-
GRPO From Scratch
This post explains my pytorch implementation of Group Relative Policy Optimization Algorithm.
-
DPO From Scratch
This post explains my pytorch implementation of Direct Preference Optimization Algorithm.
-
On Reinforcement Learning for Large Language Models
A personal thinking on why reinforcement learning is vital for Large Language Models. [Updated 02/21]
-
Flow models for Generative AI
As an alternative to Diffusion Models, Continuous Normalizing Flow Matching is one of the most powerful paradigm for generative AI modeling.
-
a post with redirect
you can also redirect to assets like pdf