-
A Brief Summary of Upper Bounds for Bandit Problems
This post summarizes the regret analysis of the Exploration-First Algorithm, the Upper Confidence Bound (UCB) Algorithm for the multi-armed bandits (MAB) problems and the LinUCB Algorithm for linear Bandits.
-
A Brief Introduction to Influence Funtion Technique
Influence function technique is powerful in that it provides a way to calculate efficiency bound for the semiparameteric estimation problems.
-
Variance Reduction Technique for Optimal Offline RL
A algorithm that achieves Minimax rate for tabular RL
-
Why can't we surpass the speed of light? Einstein tells you
Only senior high school knowledge is needed to understand this pheonmenon!
-
TMIS (Plug-in) estimator is statistically efficient for Tabular OPE
Surprisingly, Monte Carlo on-policy estimator is actually statistically inefficient.