See my posts on implementing DPO and GRPO from scratch! :sparkles: