Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction
· Source: arXiv cs.AI
A new learning method by temporal differences called STHTD-MP has been proposed, which improves out-of-policy prediction using a linear function…