{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,14]],"date-time":"2024-09-14T23:10:01Z","timestamp":1726355401228},"reference-count":30,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2021,12,20]],"date-time":"2021-12-20T00:00:00Z","timestamp":1639958400000},"content-version":"vor","delay-in-days":353,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Complexity"],"published-print":{"date-parts":[[2021,1]]},"abstract":"Reinforcement learning (RL) agents can learn to control a nonlinear system without using a model of the system. However, having a model brings benefits, mainly in terms of a reduced number of unsuccessful trials before achieving acceptable control performance. Several modelling approaches have been used in the RL domain, such as neural networks, local linear regression, or Gaussian processes. In this article, we focus on techniques that have not been used much so far: symbolic regression (SR), based on genetic programming and local modelling. Using measured data, symbolic regression yields a nonlinear, continuous\u2010time analytic model. We benchmark two state\u2010of\u2010the\u2010art methods, SNGP (single\u2010node genetic programming) and MGGP (multigene genetic programming), against a standard incremental local regression method called RFWR (receptive field weighted regression). We have introduced modifications to the RFWR algorithm to better suit the low\u2010dimensional continuous\u2010time systems we are mostly dealing with. The benchmark is a nonlinear, dynamic magnetic manipulation system. The results show that using the RL framework and a suitable approximation method, it is possible to design a stable controller of such a complex system without the necessity of any haphazard learning. While all of the approximation methods were successful, MGGP achieved the best results at the cost of higher computational complexity. Index Terms\u2013AI\u2010based methods, local linear regression, nonlinear systems, magnetic manipulation, model learning for control, optimal control, reinforcement learning, symbolic regression.<\/jats:p>","DOI":"10.1155\/2021\/6617309","type":"journal-article","created":{"date-parts":[[2021,12,21]],"date-time":"2021-12-21T04:05:11Z","timestamp":1640059511000},"update-policy":"http:\/\/dx.doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Control of Magnetic Manipulator Using Reinforcement Learning Based on Incrementally Adapted Local Linear Models"],"prefix":"10.1155","volume":"2021","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-8675-8060","authenticated-orcid":false,"given":"Martin","family":"Brablc","sequence":"first","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0003-3302-6779","authenticated-orcid":false,"given":"Jan","family":"\u017degklitz","sequence":"additional","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0002-5674-3888","authenticated-orcid":false,"given":"Robert","family":"Grepl","sequence":"additional","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0001-9578-8598","authenticated-orcid":false,"given":"Robert","family":"Babu\u0161ka","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2021,12,20]]},"reference":[{"key":"e_1_2_7_1_2","unstructured":"GuS. LillicrapT. SutskeverI. andLevineS. Continuous deep Q-learning with model-based acceleration Proceedings of the Proc. 33nd Int. Conf. Mach. Learn. New York USA: JMLR.org 19 June 2016 New York NY USA ACM 2829\u2014\u20132838 https:\/\/arxiv.org\/abs\/1603.00748."},{"key":"e_1_2_7_2_2","doi-asserted-by":"crossref","unstructured":"AlibekovE. KubalikJ. andBabuskaR. Symbolic method for deriving policy in reinforcement learning Proceedings of the 2016 IEEE 55th Conf. Decis. Control 12 December 2016 Las Vegas NV USA IEEE 2789\u20132795 https:\/\/ieeexplore.ieee.org\/document\/7798684\/.","DOI":"10.1109\/CDC.2016.7798684"},{"key":"e_1_2_7_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2017.02.006"},{"key":"e_1_2_7_4_2","unstructured":"DeisenrothM. P.andRasmussenC. E. PILCO: a model-based and data-efficient approach to policy search Proceedings of the Proc. 28th Int. Conf. Int. Conf. Mach. Learn 28 June 2011 Bellevue Washington USA Omnipress 465\u2014\u2013472 https:\/\/dl.acm.org\/citation.cfm?id=3104482.3104541."},{"key":"e_1_2_7_5_2","unstructured":"DuanY. ChenX. HouthooftR. SchulmanJ. andAbbeelP. Benchmarking deep reinforcement learning for continuous control 48 Proceedings of the Proc. 33rd Int. Conf. Int. Conf. Mach. Learn 19 June 2016 New York USA JMLR.org 1329\u20131338 https:\/\/dl.acm.org\/citation.cfm?id=3045390.3045531 http:\/\/arxiv.org\/abs\/1604.06778."},{"key":"e_1_2_7_6_2","doi-asserted-by":"crossref","unstructured":"PetersJ.andSchaalS. Policy gradient methods for Robotics Proceedings of the 2006 IEEE\/RSJ Int. Conf. Intell. Robot. Syst 9 October 2006 Beijing China IEEE 2219\u20132225 https:\/\/ieeexplore.ieee.org\/document\/4058714\/.","DOI":"10.1109\/IROS.2006.282564"},{"key":"e_1_2_7_7_2","doi-asserted-by":"publisher","DOI":"10.4028\/www.scientific.net\/amm.464.369"},{"key":"e_1_2_7_8_2","doi-asserted-by":"publisher","DOI":"10.1155\/2020\/8564140"},{"volume-title":"Applications of Artificial Intelligence Techniques in Industry 4.0","year":"2019","key":"e_1_2_7_9_2"},{"key":"e_1_2_7_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ifacol.2017.08.805"},{"key":"e_1_2_7_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/tsmcb.2011.2170565"},{"key":"e_1_2_7_12_2","doi-asserted-by":"crossref","unstructured":"GrondmanI. BusoniuL. andBabuskaR. Model learning actor-critic algorithms: performance evaluation in a motion control task Proceedings of the 2012 IEEE 51st IEEE Conf. Decis. Control 10 December 2012 Maui HI USA IEEE 5272\u20135277 https:\/\/ieeexplore.ieee.org\/document\/6426427\/.","DOI":"10.1109\/CDC.2012.6426427"},{"key":"e_1_2_7_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-94-017-2053-3_3"},{"key":"e_1_2_7_14_2","first-page":"3","volume-title":"Approximate Dynamic Programming and Reinforcement Learning","author":"Bu\u015foniu L.","year":"2010"},{"key":"e_1_2_7_15_2","doi-asserted-by":"publisher","DOI":"10.1162\/089976698300016963"},{"key":"e_1_2_7_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/tsmcb.2002.806485"},{"key":"e_1_2_7_17_2","doi-asserted-by":"publisher","DOI":"10.1023\/a:1013258808932"},{"key":"e_1_2_7_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/tii.2012.2209660"},{"key":"e_1_2_7_19_2","doi-asserted-by":"publisher","DOI":"10.1007\/s00170-018-1838-8"},{"key":"e_1_2_7_20_2","doi-asserted-by":"crossref","unstructured":"YangT. SunN. FangY. XinX. andChenH. New adaptive control methods forn-link robot manipulators with online gravity compensation: design and experiments IEEE Transactions on Industrial Electronics 69 1\u20132022.","DOI":"10.1109\/TIE.2021.3050371"},{"key":"e_1_2_7_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2017.12.004"},{"key":"e_1_2_7_22_2","doi-asserted-by":"crossref","unstructured":"HurakZ.andZemanekJ. Feedback linearization approach to distributed feedback manipulation Proceedings of the 2012 Am. Control Conf. 27 June 2012 Montreal QC Canada IEEE 991\u2013996 https:\/\/ieeexplore.ieee.org\/document\/6315262\/.","DOI":"10.1109\/ACC.2012.6315262"},{"key":"e_1_2_7_23_2","first-page":"16032","article-title":"Time-optimal control for bilinear nonnegative-in-control systems: application to magnetic manipulation","volume":"50","author":"Zem\u00e1nek J.","year":"2017","journal-title":"IFAC-PapersOnLine"},{"key":"e_1_2_7_24_2","doi-asserted-by":"crossref","unstructured":"DamsteegJ.-W. NageshraoS. P. andBabuskaR. Model-based real-time control of a magnetic manipulator system Proceedings of the 2017 IEEE 56th Annu. Conf. Decis. Control 12 December 2017 Melbourne VIC Australia IEEE 3277\u20133282 https:\/\/ieeexplore.ieee.org\/document\/8264140\/.","DOI":"10.1109\/CDC.2017.8264140"},{"key":"e_1_2_7_25_2","unstructured":"HinchliffeM. HidenH. McKayB. WillisM. ThamM. andBartonG. KozaJ. R. Modelling chemical process systems using a multi-gene genetic programming algorithm \u2013 BibSonomy Proceedings of the Late Break. Pap. Genet. Program. 1996 Conf. 28 July 1996 Stamford CA USA Stanford Bookstore 56\u2014\u201365 https:\/\/www.bibsonomy.org\/bibtex\/2ba500b4ed22826a3b171019d4a172229\/brazovayeye."},{"key":"e_1_2_7_26_2","doi-asserted-by":"crossref","unstructured":"\u017degklitzJ.andPo\u0161\u00edkP. Linear combinations of features as leaf nodes in symbolic regression Proceedings of the Proc. Genet. Evol. Comput. Conf. Companion - GECCO \u201917 15 July 2017 New York USA ACM Press 145\u2013146 https:\/\/dl.acm.org\/citation.cfm?doid=3067695.3076009.","DOI":"10.1145\/3067695.3076009"},{"key":"e_1_2_7_27_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-23923-1_87"},{"key":"e_1_2_7_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/s12239-010-0072-7"},{"key":"e_1_2_7_29_2","doi-asserted-by":"publisher","DOI":"10.1021\/ac60214a047"},{"key":"e_1_2_7_30_2","unstructured":"BrablcM. SovaV. andGreplR. MagaD.andBrezinaT. Adaptive feedforward controller for a DC motor drive based on inverse dynamic model with recursive least squares parameter estimation Proceedings of the Proc. 2016 17th Int. Conf. Mechatronics - Mechatronika ME 7 December 2016 Prague Czech Republic IEEE 146\u2013150 https:\/\/www.scopus.com\/inward\/record.url?eid=2-s2.0-85015277645&partnerID=MN8TOARS."}],"container-title":["Complexity"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/complexity\/2021\/6617309.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/complexity\/2021\/6617309.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1155\/2021\/6617309","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,14]],"date-time":"2024-09-14T22:07:21Z","timestamp":1726351641000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1155\/2021\/6617309"}},"subtitle":[],"editor":[{"given":"Aydin","family":"Azizi","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["10.1155\/2021\/6617309"],"URL":"https:\/\/doi.org\/10.1155\/2021\/6617309","archive":["Portico"],"relation":{},"ISSN":["1076-2787","1099-0526"],"issn-type":[{"type":"print","value":"1076-2787"},{"type":"electronic","value":"1099-0526"}],"subject":[],"published":{"date-parts":[[2021,1]]},"assertion":[{"value":"2020-12-11","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-11-16","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}