{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2022,9,24]],"date-time":"2022-09-24T02:54:11Z","timestamp":1663988051850},"reference-count":0,"publisher":"Research Institute for Intelligent Computer Systems","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IJC"],"abstract":"This paper introduces novel concepts for accelerating learning in an off-policy reinforcement learning algorithm for Partially Observable Markov Decision Processes (POMDP) by leveraging multiple agents frame work. Reinforcement learning (RL) algorithm is considerably a slow but elegant approach to learning in an unknown environment. Although the action-value (Q-learning) is faster than the state-value, the rate of convergence to an optimal policy or maximum cumulative reward remains a constraint. Consequently, in an attempt to optimize the learning phase of an RL problem within POMD environment, we present two multi-agent learning paradigms: the multi-agent off-policy reinforcement learning and an ingenious GA (genetic Algorithm) approach for multi-agent offline learning using feedforward neural networks. At the end of the trainings (episodes and epochs) for reinforcement learning and genetic algorithm respectively, we compare the convergence rate for both algorithms with respect to creating the underlying MDPs for POMDP problems. Finally, we demonstrate the impact of layered resampling of Monte Carlo\u0432\u0402\u2122s particle filter for improving the belief state estimation accuracy with respect to ground truth within POMDP domains. Initial empirical results suggest practicable solutions.<\/jats:p>","DOI":"10.47839\/ijc.19.3.1887","type":"journal-article","created":{"date-parts":[[2021,2,27]],"date-time":"2021-02-27T22:22:14Z","timestamp":1614464534000},"page":"377-386","source":"Crossref","is-referenced-by-count":5,"title":["A MULTI-AGENT APPROACH TO POMDPS USING OFF-POLICY REINFORCEMENT LEARNING AND GENETIC ALGORITHMS"],"prefix":"10.47839","author":[{"given":"Samuel","family":"Obadan","sequence":"first","affiliation":[]},{"given":"Zenghui","family":"Wang","sequence":"additional","affiliation":[]}],"member":"27386","published-online":{"date-parts":[[2020,9,27]]},"container-title":["International Journal of Computing"],"original-title":[],"link":[{"URL":"https:\/\/computingonline.net\/computing\/article\/download\/1887\/929","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2021,2,27]],"date-time":"2021-02-27T22:22:14Z","timestamp":1614464534000},"score":1,"resource":{"primary":{"URL":"https:\/\/computingonline.net\/computing\/article\/view\/1887"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,9,27]]},"references-count":0,"URL":"https:\/\/doi.org\/10.47839\/ijc.19.3.1887","relation":{},"ISSN":["2312-5381","1727-6209"],"issn-type":[{"value":"2312-5381","type":"electronic"},{"value":"1727-6209","type":"print"}],"subject":[],"published":{"date-parts":[[2020,9,27]]}}}