{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,25]],"date-time":"2024-09-25T12:12:01Z","timestamp":1727266321395},"reference-count":12,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2020,5,1]],"date-time":"2020-05-01T00:00:00Z","timestamp":1588291200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,5,23]],"date-time":"2020-05-23T00:00:00Z","timestamp":1590192000000},"content-version":"vor","delay-in-days":22,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"crossref","award":["JP17J08724"],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["SN COMPUT. SCI."],"published-print":{"date-parts":[[2020,5]]},"abstract":"Abstract<\/jats:title>This paper proposes a goal selection method to operate agents get maximum reward values per time by noncommunicative learning. In particular, that method aims to enable agents to cooperate along to dynamism of reward values and goal locations. Adaptation against to these dynamisms can enable agents to learn cooperative actions along to changing transportation tasks and changing incomes\/rewards because of transporting tasks for heavy\/valuable and light\/valueless items in a storehouse. Concretely, this paper extends the previous noncommunicative cooperative action learning method (Profit minimizing reinforcement learning with oblivion of memory: PMRL-OM) and sets the two unified conditions combined of the number of time steps and the rewards. One of the unified conditions is calculated the approximated number of time steps if the expected reward values are the same each other for all purposes, and the other is the minimum number of time steps divided by the reward value. The proposed method makes all agents learn to achieve the purposes in the order in which they have the minimum number of the condition values. After that, each agent learns cooperative policy by PMRL-OM as the previous method. This paper analyzes the unified conditions and derives that the condition calculating the approximated time steps can be combined both evaluations with almost same weight unlike the value the other condition, that is, the condition can help the agents to select the appropriate purposes among them with the small difference in terms of the two evaluations. This paper tests empirically the performances of PMRL-OM with the two conditions by comparing with the PMRL-OM in three cases of grid world problems whose goal locations and reward values are changed dynamically. The results of this derive that the unified conditions perform better than PMRL-OM without some conditions in grid world problems. In particular, it is clear that the condition calculating the approximated time step can direct the appropriate goals for the agents.<\/jats:p>","DOI":"10.1007\/s42979-020-00191-2","type":"journal-article","created":{"date-parts":[[2020,5,23]],"date-time":"2020-05-23T14:02:16Z","timestamp":1590242536000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Reward Value-Based Goal Selection for Agents\u2019 Cooperative Route Learning Without Communication in Reward and Goal Dynamism"],"prefix":"10.1007","volume":"1","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-4139-2605","authenticated-orcid":false,"given":"Fumito","family":"Uwano","sequence":"first","affiliation":[]},{"given":"Keiki","family":"Takadama","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,5,23]]},"reference":[{"key":"191_CR1","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1016\/j.artint.2018.01.002","volume":"258","author":"S Albrecht","year":"2018","unstructured":"Albrecht S, Stone P. Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif Intell. 2018;258:66\u201395.","journal-title":"Artif Intell"},{"key":"191_CR2","doi-asserted-by":"crossref","unstructured":"Godoy J, Karamouzas I, Guy S, Gini M. Implicit coordination in crowded multi-agent navigation. 2016. https:\/\/www.aaai.org\/ocs\/index.php\/AAAI\/AAAI16\/paper\/view\/12334. Accessed May 2019.","DOI":"10.1609\/aaai.v30i1.10131"},{"key":"191_CR3","unstructured":"H\u00f6nig W, Kiesel S, Tinka A, Durham JW, Ayanian N. Conflict-based search with optimal task assignment. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, international foundation for autonomous agents and multiagent systems, AAMAS \u201918, 2018;pp 757\u2013765"},{"key":"191_CR4","doi-asserted-by":"publisher","unstructured":"Liu Y, Liu L, Chen W. Intelligent traffic light control using distributed multi-agent q learning. In: 2017 IEEE 20th international conference on intelligent transportation systems (ITSC), 2017;pp 1\u20138. https:\/\/doi.org\/10.1109\/ITSC.2017.8317730","DOI":"10.1109\/ITSC.2017.8317730"},{"key":"191_CR5","unstructured":"Raileanu R, Denton E, Szlam A, Fergus R. Modeling others using oneself in multi-agent reinforcement learning. Tech. rep., 2018. http:\/\/proceedings.mlr.press\/v80\/raileanu18a\/raileanu18a.pdf"},{"key":"191_CR6","doi-asserted-by":"crossref","unstructured":"Sachiyo A, Katia S. Effective learning approach for planning and scheduling in multi-agent domain. In: 6th international conference on simulation of adaptive behavior, 2000;pp 507\u2013516","DOI":"10.7551\/mitpress\/3120.003.0054"},{"issue":"5","key":"191_CR7","doi-asserted-by":"publisher","first-page":"190","DOI":"10.9746\/jcmsi.12.190","volume":"12","author":"D Shiraishi","year":"2019","unstructured":"Shiraishi D, Miyazaki K, Kobayashi H. Proposal and evaluation of detour path suppression method in ps reinforcement learning. SICE J Control Meas Syst Integr. 2019;12(5):190\u20138.","journal-title":"SICE J Control Meas Syst Integr"},{"key":"191_CR8","volume-title":"Introduction to reinforcement learning","author":"RS Sutton","year":"1998","unstructured":"Sutton RS, Barto AG. Introduction to reinforcement learning. 1st ed. Cambridge: MIT Press; 1998.","edition":"1"},{"issue":"5","key":"191_CR9","doi-asserted-by":"publisher","first-page":"199","DOI":"10.9746\/jcmsi.12.199","volume":"12","author":"F Uwano","year":"2019","unstructured":"Uwano F, Takadama K. Utilizing observed information for no-communication multi-agent reinforcement learning toward cooperation in dynamic environment. SICE J Control Meas Syst Integr. 2019;12(5):199\u2013208. https:\/\/doi.org\/10.9746\/jcmsi.12.199.","journal-title":"SICE J Control Meas Syst Integr"},{"issue":"4","key":"191_CR10","doi-asserted-by":"publisher","first-page":"321","DOI":"10.9746\/jcmsi.11.321","volume":"11","author":"F Uwano","year":"2018","unstructured":"Uwano F, Tatebe N, Tajima Y, Nakata M, Kovacs T, Takadama K. Multi-agent cooperation based on reinforcement learning with internal reward in maze problem. SICE J Control Meas Syst Integr. 2018;11(4):321\u201330. https:\/\/doi.org\/10.9746\/jcmsi.11.321.","journal-title":"SICE J Control Meas Syst Integr"},{"key":"191_CR11","unstructured":"Watkins CJ (1989) Learning from delayed rewards. Ph.D. thesis, King\u2019s College"},{"key":"191_CR12","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1007\/978-3-319-23240-9_13","volume-title":"Modeling decisions for artificial intelligence","author":"W Zemzem","year":"2015","unstructured":"Zemzem W, Tagina M. Cooperative multi-agent learning in a large dynamic environment. In: Torra V, Narukawa T, editors. Modeling decisions for artificial intelligence. Cham: Springer; 2015. p. 155\u201366."}],"updated-by":[{"updated":{"date-parts":[[2023,9,28]],"date-time":"2023-09-28T00:00:00Z","timestamp":1695859200000},"DOI":"10.1007\/s42979-023-02168-3","type":"correction","label":"Correction"}],"container-title":["SN Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42979-020-00191-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s42979-020-00191-2\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s42979-020-00191-2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,6]],"date-time":"2024-08-06T07:33:36Z","timestamp":1722929616000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s42979-020-00191-2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,5]]},"references-count":12,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,5]]}},"alternative-id":["191"],"URL":"https:\/\/doi.org\/10.1007\/s42979-020-00191-2","relation":{},"ISSN":["2662-995X","2661-8907"],"issn-type":[{"type":"print","value":"2662-995X"},{"type":"electronic","value":"2661-8907"}],"subject":[],"published":{"date-parts":[[2020,5]]},"assertion":[{"value":"28 February 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 May 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 May 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 September 2023","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Correction","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"A Correction to this paper has been published:","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"https:\/\/doi.org\/10.1007\/s42979-023-02168-3","URL":"https:\/\/doi.org\/10.1007\/s42979-023-02168-3","order":7,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"182"}}