Volume 18, No. 2, 2020
Sanjiv R. Das and Subir Varma
We present a reinforcement learning (RL) algorithm to solve for a dynamically optimal goal-based portfolio. The solution converges to that obtained from dynamic programming. Our approach is model-free and generates a solution that is based on forward simulation, whereas dynamic programming depends on backward recursion. This paper presents a brief overview of the various types of RL. Our example application illustrates how RL may be applied to problems with path-dependency and very large state spaces, which are often encountered in finance.