On Reinforcement Learning for Large Language Models
A personal thinking on why reinforcement learning is vital for Large Language Models. [Updated 02/21]
Flow models for Generative AI
As an alternative to Diffusion Models, Continuous Normalizing Flow Matching is one of the most powerful paradigm for generative AI modeling.
Optimal offline RL with the unified model-based framework
A model-based framework + singleton absorbing MDP technique achieves the optimal rate for several challenging offline tasks.
A Brief Summary of Upper Bounds for Bandit Problems
This post summarizes the regret analysis of the Exploration-First Algorithm, the Upper Confidence Bound (UCB) Algorithm for the multi-armed bandits (MAB) problems and the LinUCB Algorithm for linear Bandits.
A Brief Introduction to Influence Funtion Technique
Influence function technique is powerful in that it provides a way to calculate efficiency bound for the semiparameteric estimation problems.