Variance Reduction Technique for Optimal Offline RL
A algorithm that achieves Minimax rate for tabular RL
Why can't we surpass the speed of light? Einstein tells you
Only senior high school knowledge is needed to understand this pheonmenon!
TMIS (Plug-in) estimator is statistically efficient for Tabular OPE
Surprisingly, Monte Carlo on-policy estimator is actually statistically inefficient.
Some nice images