{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,12,30]],"date-time":"2024-12-30T18:25:25Z","timestamp":1735583125829},"reference-count":41,"publisher":"World Scientific Pub Co Pte Lt","issue":"05","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Asia Pac. J. Oper. Res."],"published-print":{"date-parts":[[2013,10]]},"abstract":" Reinforcement learning (RL) is a state or action value based machine learning method which solves large-scale multi-stage decision problems such as Markov Decision Process (MDP) and Semi-Markov Decision Process (SMDP) problems. We minimize the makespan of flow shop scheduling problems with an RL algorithm. We convert flow shop scheduling problems into SMDPs by constructing elaborate state features, actions and the reward function. Minimizing the accumulated reward is equivalent to minimizing the schedule objective function. We apply on-line TD(\u03bb) algorithm with linear gradient-descent function approximation to solve the SMDPs. To examine the performance of the proposed RL algorithm, computational experiments are conducted on benchmarking problems in comparison with other scheduling methods. The experimental results support the efficiency of the proposed algorithm and illustrate that the RL approach is a promising computational approach for flow shop scheduling problems worthy of further investigation. <\/jats:p>","DOI":"10.1142\/s0217595913500140","type":"journal-article","created":{"date-parts":[[2013,7,3]],"date-time":"2013-07-03T05:18:10Z","timestamp":1372828690000},"page":"1350014","source":"Crossref","is-referenced-by-count":30,"title":["FLOW SHOP SCHEDULING WITH REINFORCEMENT LEARNING"],"prefix":"10.1142","volume":"30","author":[{"given":"ZHICONG","family":"ZHANG","sequence":"first","affiliation":[{"name":"Department of Industrial Engineering, School of Mechanical Engineering, Dongguan University of Technology, Songshan Lake District, Dongguan 523808, Guangdong Province, P. R. China"}]},{"given":"WEIPING","family":"WANG","sequence":"additional","affiliation":[{"name":"Department of Industrial Engineering, School of Mechanical Engineering, Dongguan University of Technology, Songshan Lake District, Dongguan 523808, Guangdong Province, P. R. China"}]},{"given":"SHOUYAN","family":"ZHONG","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Dongguan University of Technology, Songshan Lake District, Dongguan 523808, Guangdong Province, P. R. China"}]},{"given":"KAISHUN","family":"HU","sequence":"additional","affiliation":[{"name":"Department of Industrial Engineering, School of Mechanical Engineering, Dongguan University of Technology, Songshan Lake District, Dongguan 523808, Guangdong Province, P. R. China"}]}],"member":"219","published-online":{"date-parts":[[2013,10,2]]},"reference":[{"key":"rf1","doi-asserted-by":"publisher","DOI":"10.1016\/S0921-8890(00)00087-7"},{"key":"rf2","volume-title":"Neuro-Dynamic Programming","author":"Bertsekas D. P.","year":"1996"},{"key":"rf4","unstructured":"R. H.\u00a0Crites and A. G.\u00a0Barto, Advances in Neural Information Processing Systems, Proceedings of the 1995 Conference, eds. D. S.\u00a0Touretzky, M. C.\u00a0Mozer and M. E.\u00a0Hasselmo (MIT Press, Cambridge, MA, 1996)\u00a0pp. 1017\u20131023."},{"key":"rf5","doi-asserted-by":"publisher","DOI":"10.1007\/11559221_39"},{"key":"rf6","doi-asserted-by":"publisher","DOI":"10.1016\/j.aei.2006.01.001"},{"key":"rf7","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-45185-3_11"},{"key":"rf8","doi-asserted-by":"publisher","DOI":"10.1016\/S0377-2217(97)00019-2"},{"key":"rf9","volume-title":"Pattern Classification and Scene Analysis","author":"Duda R. O.","year":"1973"},{"key":"rf10","first-page":"503","volume":"6","author":"Ernst D.","year":"2005","journal-title":"Journal of Machine Learning Research"},{"key":"rf11","doi-asserted-by":"publisher","DOI":"10.1287\/moor.1060.0208"},{"key":"rf12","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2006.06.007"},{"key":"rf13","doi-asserted-by":"publisher","DOI":"10.1023\/B:APIN.0000011143.95085.74"},{"key":"rf14","doi-asserted-by":"publisher","DOI":"10.1002\/nav.3800010110"},{"key":"rf15","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCC.2004.843188"},{"key":"rf16","first-page":"263","volume":"2","author":"Lee J. M.","year":"2004","journal-title":"International Journal of Control, Automation, and Systems"},{"key":"rf17","doi-asserted-by":"publisher","DOI":"10.1016\/j.jprocont.2008.11.009"},{"key":"rf18","doi-asserted-by":"publisher","DOI":"10.1016\/S0165-0114(02)00299-3"},{"key":"rf19","doi-asserted-by":"publisher","DOI":"10.1111\/j.1475-3995.2000.tb00190.x"},{"key":"rf20","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1989.1.2.281"},{"key":"rf21","doi-asserted-by":"publisher","DOI":"10.1016\/j.cor.2007.02.010"},{"key":"rf22","doi-asserted-by":"publisher","DOI":"10.1016\/j.simpat.2004.12.003"},{"key":"rf23","volume-title":"Scheduling: Theory, Algorithms, and Systems","author":"Pinedo M.","year":"2002"},{"key":"rf24","first-page":"751","volume":"16","author":"Rasmussen C. E.","year":"2004","journal-title":"Advances in Neural Information Processing Systems"},{"key":"rf25","doi-asserted-by":"publisher","DOI":"10.1007\/11564096_32"},{"key":"rf27","volume-title":"Design of Experiments Using the Taguchi Approach","author":"Roy R. K.","year":"2001"},{"key":"rf28","first-page":"974","volume":"9","author":"Singh S.","year":"1997","journal-title":"Advances in Neural Information Processing Systems"},{"key":"rf29","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejor.2006.02.023"},{"key":"rf30","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton R. S.","year":"1998"},{"key":"rf31","first-page":"777","volume":"1999","author":"Tadi\u0107 V.","year":"1999","journal-title":"Lecture Notes in Computer Science"},{"key":"rf32","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007609817671"},{"key":"rf33","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-006-5835-z"},{"key":"rf35","volume-title":"Introduction to Scheduling","author":"Tang H.","year":"2002"},{"key":"rf37","doi-asserted-by":"publisher","DOI":"10.1145\/203330.203343"},{"key":"rf38","doi-asserted-by":"publisher","DOI":"10.1109\/9.580874"},{"key":"rf39","volume-title":"Handbook of Markov Decision Processes: Methods and Applications","author":"Van Roy B.","year":"2001"},{"key":"rf40","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2004.08.018"},{"key":"rf41","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2008.05.012"},{"key":"rf42","doi-asserted-by":"publisher","DOI":"10.1007\/s00170-006-0492-8"},{"key":"rf43","doi-asserted-by":"publisher","DOI":"10.1007\/s00170-007-1104-y"},{"key":"rf45","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1613\/jair.655","volume":"12","author":"Zhang W.","year":"2000","journal-title":"Journal of Artificial Intelligence Research"},{"key":"rf47","unstructured":"M.\u00a0Zweben, Intelligent Scheduling, eds. M.\u00a0Zweben and M. S.\u00a0Fox (Morgan Kaufmann, San Francisco, CA, 1994)\u00a0pp. 241\u2013255."}],"container-title":["Asia-Pacific Journal of Operational Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0217595913500140","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,6]],"date-time":"2019-08-06T09:02:01Z","timestamp":1565082121000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0217595913500140"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,10]]},"references-count":41,"journal-issue":{"issue":"05","published-online":{"date-parts":[[2013,10,2]]},"published-print":{"date-parts":[[2013,10]]}},"alternative-id":["10.1142\/S0217595913500140"],"URL":"https:\/\/doi.org\/10.1142\/s0217595913500140","relation":{},"ISSN":["0217-5959","1793-7019"],"issn-type":[{"value":"0217-5959","type":"print"},{"value":"1793-7019","type":"electronic"}],"subject":[],"published":{"date-parts":[[2013,10]]}}}