{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,10,3]],"date-time":"2023-10-03T05:10:26Z","timestamp":1696309826465},"reference-count":44,"publisher":"Wiley","issue":"4","license":[{"start":{"date-parts":[[2013,8,6]],"date-time":"2013-08-06T00:00:00Z","timestamp":1375747200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/onlinelibrary.wiley.com\/termsAndConditions#vor"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computational Intelligence"],"published-print":{"date-parts":[[2014,11]]},"abstract":"This article addresses reinforcement learning problems based on factored Markov decision processes<\/jats:italic> (MDPs) in which the agent must choose among a set of candidate abstractions<\/jats:italic>, each build up from a different combination of state components. We present and evaluate a new approach that can perform effective abstraction selection that is more resource\u2010efficient and\/or more general than existing approaches. The core of the approach is to make selection of an abstraction part of the learning agent's decision\u2010making process by augmenting the agent's action space with internal actions that select the abstraction it uses. We prove that under certain conditions this approach results in a derived MDP whose solution yields both the optimal abstraction for the original MDP and the optimal policy under that abstraction. We examine our approach in three domains of increasing complexity: contextual bandit problems, episodic MDPs, and general MDPs with context\u2010specific structure. \u00a9 2013 Wiley Periodicals, Inc.<\/jats:p>","DOI":"10.1111\/coin.12016","type":"journal-article","created":{"date-parts":[[2013,8,7]],"date-time":"2013-08-07T01:56:24Z","timestamp":1375840584000},"page":"657-699","source":"Crossref","is-referenced-by-count":3,"title":["EFFICIENT ABSTRACTION SELECTION IN REINFORCEMENT LEARNING"],"prefix":"10.1111","volume":"30","author":[{"given":"Harm","family":"van Seijen","sequence":"first","affiliation":[{"name":"Department of Computing Science University of Alberta Edmonton AB Canada"}]},{"given":"Shimon","family":"Whiteson","sequence":"additional","affiliation":[{"name":"Informatics Institute University of Amsterdam Amsterdam the Netherlands"}]},{"given":"Leon","family":"Kester","sequence":"additional","affiliation":[{"name":"Distributed Sensor Systems Group, TNO Defence Security and Safety The Hague the Netherlands"}]}],"member":"311","published-online":{"date-parts":[[2013,8,6]]},"reference":[{"key":"e_1_2_12_2_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013689704352"},{"key":"e_1_2_12_3_1","first-page":"679","article-title":"A Markov decision process","volume":"6","author":"Bellman R. E","year":"1957","journal-title":"Journal of Mathematical Mechanics"},{"key":"e_1_2_12_4_1","volume-title":"Dynamic Programming","author":"Bellman R. E","year":"1957"},{"key":"e_1_2_12_5_1","unstructured":"BoutilierC. R.Dearden andM.Goldszmidt.1995.Exploiting structure in policy construction.InInternational Joint Conference on Artificial Intelligence Montreal QC Canada pp.1104\u20131113."},{"key":"e_1_2_12_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(00)00033-3"},{"key":"e_1_2_12_7_1","doi-asserted-by":"publisher","DOI":"10.1162\/153244303765208377"},{"key":"e_1_2_12_8_1","unstructured":"ChakrabortyD. andP.Stone.2011.Structure learning in ergodic factored MDPs without knowledge of the transition functions in\u2010degree.InProceedings of the Twenty\u2010Eighth International Conference on Machine Learning (ICML) Bellevue WA."},{"key":"e_1_2_12_9_1","unstructured":"ChapmanD. andL.\u2009P.Kaelbling.1991.Input generalization in delayed reinforcement learning: An algorithm and performance comparisons.InProceedings of the Twelfth International Joint Conference on Artificial Intelligence Sydney Australia pp.726\u2013731."},{"key":"e_1_2_12_10_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1467-8640.1989.tb00324.x"},{"key":"e_1_2_12_11_1","unstructured":"DeanT. R.Givan andS.Leach.1997.Model reduction techniques for computing approximately optimal solutions for Markov decision processes.InProceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence Providence RI pp.124\u2013131."},{"key":"e_1_2_12_12_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.639"},{"key":"e_1_2_12_13_1","doi-asserted-by":"crossref","unstructured":"DiukC. L.Li andB.\u2009R.Leffler.2009.The adaptive k\u2010meteorologists problem and its application to structure learning and feature selection in reinforcement learning.InProceedings of the 26th Annual International Conference on Machine Learning Montreal QC Canada.","DOI":"10.1145\/1553374.1553406"},{"key":"e_1_2_12_14_1","unstructured":"FernsN. P.Panangaden andD.Precup.2004.Metrics for finite Markov decision processes.InProceedings of the 20th Conference on Uncertainty in Artificial Intelligence AUAI Press pp.162\u2013169."},{"key":"e_1_2_12_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(02)00376-4"},{"key":"e_1_2_12_16_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1994.6.6.1185"},{"key":"e_1_2_12_17_1","unstructured":"JongN.\u2009K. andP.Stone.2005.State abstraction discovery from irrelevant state variables.InProceedings of the Nineteenth International Joint Conference on Artificial Intelligence Edinburgh UK pp.752\u2013757."},{"key":"e_1_2_12_18_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.301"},{"key":"e_1_2_12_19_1","unstructured":"KearnsM. andD.Koller.1999.Efficient reinforcement learning in factored MDPs.InInternational Joint Conference on Artificial Intelligence Stockholm Sweden pp.740\u2013747."},{"key":"e_1_2_12_20_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1017984413808"},{"key":"e_1_2_12_21_1","unstructured":"KonidarisG. andA.Barto.2009.Efficient skill learning using abstraction selection.InProceedings of the Twenty First International Joint Conference on Artificial Intelligence Pasadena CA pp.1107\u20131112."},{"key":"e_1_2_12_22_1","doi-asserted-by":"crossref","unstructured":"KroonM. andS.Whiteson.2009.Automatic feature selection for model\u2010based reinforcement learning in factored MDPs.InICMLA 2009: Proceedings of the Eighth International Conference on Machine Learning and Applications Miami FL pp.324\u2013330.","DOI":"10.1109\/ICMLA.2009.71"},{"key":"e_1_2_12_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/0196-8858(85)90002-8"},{"key":"e_1_2_12_24_1","unstructured":"LangfordJ. andT.Zhang.2007.The epoch\u2010greedy algorithm for contextual multi\u2010armed bandits.InProceedings of the Twenty\u2010First Annual Conference on Neural Information Processing Systems Vancouver BC pp.817\u2013824."},{"key":"e_1_2_12_25_1","doi-asserted-by":"crossref","unstructured":"LangfordJ. A.Strehl andJ.Wortman.2008.Exploration scavenging.InProceedings of the Twenty\u2010Fifth International Conference on Machine Learning ACM pp.528\u2013535.","DOI":"10.1145\/1390156.1390223"},{"key":"e_1_2_12_26_1","unstructured":"LefflerB. M.Littman andT.Edmunds.2007.Efficient reinforcement learning with relocatable action models.InProceedings of the Twenty\u2010Second National Conference on Artificial Intelligence Vancouver BC pp.572\u2013577."},{"key":"e_1_2_12_27_1","doi-asserted-by":"crossref","unstructured":"LiL. M.\u2009L.Littman andT.\u2009J.Walsh.2008.Knows what it knows: A framework for self\u2010aware learning.InProceedings of the 25th International Conference on Machine Learning ACM New York NY pp.568\u2013575.","DOI":"10.1145\/1390156.1390228"},{"key":"e_1_2_12_28_1","unstructured":"McCallumA.\u2009K.1995.Reinforcement learning with selective perception and hidden states Ph.D. Dissertation University of Rochester."},{"key":"e_1_2_12_29_1","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1007\/BF00993104","article-title":"Prioritized sweeping: Reinforcement learning with less data and less real time","volume":"13","author":"Moore A.","year":"1993","journal-title":"Machine Learning"},{"key":"e_1_2_12_30_1","unstructured":"MurphyK.\u2009P.2002.Dynamic bayesian networks: representation inference and learning Ph.D. Dissertation University of California."},{"key":"e_1_2_12_31_1","doi-asserted-by":"crossref","unstructured":"PandeyS. D.Agarwal D.Chakrabarti andV.Josifovski.2007.Bandits for taxonomies: A model\u2010based approach.InSIAM Data Mining Conference Minneapolis MN.","DOI":"10.1137\/1.9781611972771.20"},{"key":"e_1_2_12_32_1","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.24.11.1127"},{"key":"e_1_2_12_33_1","unstructured":"RavindranB. andA.\u2009GBarto.2003.SMDP homomorphisms: An algebraic approach to abstraction in semi\u2010Markov decision processes.InInternational Joint Conference on Artificial Intelligence Citeseer pp.1011\u20131018."},{"key":"e_1_2_12_34_1","doi-asserted-by":"publisher","DOI":"10.1214\/aos\/1176344136"},{"key":"e_1_2_12_35_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007678930559"},{"key":"e_1_2_12_36_1","first-page":"645","volume-title":"Proceedings of the Twenty\u2010Second National Conference on Artificial Intelligence","author":"Strehl A. L.","year":"2007"},{"key":"e_1_2_12_37_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00115009"},{"key":"e_1_2_12_38_1","doi-asserted-by":"crossref","unstructured":"SuttonR.\u2009S.1990.Integrated architectures for learning planning and reacting based on approximating dynamic programming.InProceedings of the Seventh International Conference on Machine Learning Austin TX pp.216\u2013224.","DOI":"10.1016\/B978-1-55860-141-3.50030-4"},{"key":"e_1_2_12_39_1","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton R. S.","year":"1998"},{"key":"e_1_2_12_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0004-3702(99)00052-1"},{"key":"e_1_2_12_41_1","doi-asserted-by":"publisher","DOI":"10.2200\/S00268ED1V01Y201005AIM009"},{"key":"e_1_2_12_42_1","first-page":"1633","article-title":"Transfer learning for reinforcement learning domains: A survey","volume":"10","author":"Taylor M. E.","year":"2009","journal-title":"The Journal of Machine Learning Research"},{"key":"e_1_2_12_43_1","first-page":"2045","article-title":"Exploiting best\u2010match equations for efficient reinforcement learning","volume":"12","author":"Seijen H.","year":"2011","journal-title":"The Journal of Machine Learning Research"},{"key":"e_1_2_12_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2005.844079"},{"issue":"3","key":"e_1_2_12_45_1","first-page":"9","article-title":"Q\u2010Learning","volume":"8","author":"Watkins C.","year":"1992","journal-title":"Machine Learning"}],"container-title":["Computational Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/api.wiley.com\/onlinelibrary\/tdm\/v1\/articles\/10.1111%2Fcoin.12016","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1111\/coin.12016","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,3]],"date-time":"2023-10-03T00:38:22Z","timestamp":1696293502000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1111\/coin.12016"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,8,6]]},"references-count":44,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2014,11]]}},"alternative-id":["10.1111\/coin.12016"],"URL":"https:\/\/doi.org\/10.1111\/coin.12016","archive":["Portico"],"relation":{},"ISSN":["0824-7935","1467-8640"],"issn-type":[{"value":"0824-7935","type":"print"},{"value":"1467-8640","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,8,6]]}}}