Abstract
Options are important financial instruments, whose prices are usually determined by computational methods. Computational finance is a compelling application area for reinforcement learning research, where hard sequential decision making problems abound and have great practical significance. In this paper, we investigate reinforcement learning methods, in particular, least squares policy iteration (LSPI), for the problem of learning an exercise policy for American options. We also investigate a method by Tsitsiklis and Van Roy, referred to as FQI. We compare LSPI and FQI with LSM, the standard least squares Monte Carlo method from the finance community. We evaluate their performance on both real and synthetic data. The results show that the exercise policies discovered by LSPI and FQI gain larger payoffs than those discovered by LSM, on both real and synthetic data. Our work shows that solution methods developed in reinforcement learning can advance the state of the art in an important and challenging application area, and demonstrates furthermore that computational finance remains an under-explored area for deployment of reinforcement learning methods.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Antos, A., Szepesvari, C., Munos, R.: Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning Journal 71, 89–129 (2008)
Bertsekas, D.P.: Dynamic programming and optimal control. Athena Scientific, Massachusetts (1995)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Massachusetts (1996)
Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22(1-3), 33–57 (1996)
Broadie, M., Detemple, J.B.: Option pricing: valuation models and applications. Management Science 50(9), 1145–1177 (2004)
Duffie, D.: Dynamic asset pricing theory. Princeton University Press, Princeton (2001)
Glasserman, P.: Monte Carlo Methods in Financial Engineering. Springer, New York (2004)
Hull, J.C.: Options, Futures and Other Derivatives, 6th edn. Prentice Hall, Englewood Cliffs (2006)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. The Journal of Machine Learning Research 4, 1107–1149 (2003)
Longstaff, F.A., Schwartz, E.S.: Valuing American options by simulation: a simple least-squares approach. The Review of Financial Studies 14(1), 113–147 (Spring, 2001)
Moody, J., Saffell, M.: Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks 12(4), 875–889 (2001)
Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, New York (1994)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Tsitsiklis, J.N., Van Roy, B.: Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks (special issue on computational finance) 12(4), 694–703 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, Y., Schuurmans, D. (2008). Policy Iteration for Learning an Exercise Policy for American Options. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds) Recent Advances in Reinforcement Learning. EWRL 2008. Lecture Notes in Computer Science(), vol 5323. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89722-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-89722-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89721-7
Online ISBN: 978-3-540-89722-4
eBook Packages: Computer ScienceComputer Science (R0)