Abstract
Reinforcement learning (RL) is one of the most remarkable branches of machine learning and attracts the attention of researchers from numerous fields. Especially in recent years, the RL methods have been applied to machine scheduling problems and are among the top five most encouraging methods for scheduling literature. Therefore, in this study, a comprehensive literature review about RL methods applications to machine scheduling problems was conducted. In this regard, Scopus and Web of Science databases were searched very inclusively using the proper keywords. As a result of the comprehensive research, 80 papers were found, published between 1995 and 2020. These papers were analyzed considering different aspects of the problem such as applied algorithms, machine environments, job and machine characteristics, objectives, benchmark methods, and a detailed classification scheme was constructed. Job shop scheduling, unrelated parallel machine scheduling, and single machine scheduling problems were found as the most studied problem type. The main contributions of the study are to examine essential aspects of reinforcement learning in machine scheduling problems, identify the most frequently investigated problem types, objectives, and constraints, and reveal the deficiencies and promising areas in the related literature. This study can help researchers who wish to study in this field through the comprehensive analysis of the related literature.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ábrahám, G., Auer, P., Dósa, G., Dulai, T., & Werner-Stark, Ã. (2019). A reinforcement learning motivated algorithm for process optimization. Periodica Polytechnica Civil Engineering, 63(4), 961–970. https://doi.org/10.3311/PPci.14295
Aissani, N., Bekrar, A., Trentesaux, D., & Beldjilali, B. (2012). Dynamic scheduling for multi-site companies: A decisional approach based on reinforcement multi-agent learning. Journal of Intelligent Manufacturing, 23(6), 2513–2529. https://doi.org/10.1007/s10845-011-0580-y
Aissani, N., Trentesaux, D., & Beldjilali, B. (2009). Multi-agent reinforcement learning for adaptive scheduling: Application to multi-site company. In IFAC proceedings volumes, (Vol. 42, No. 4, pp. 1102–1107). https://doi.org/10.3182/20090603-3-RU-2001.0280.
Aissani, N., & Trentesaux, D. (2008). Efficient and effective reactive scheduling of manufacturing system using Sarsa-multi-objective agents. In Proceedings of the 7th international conference MOSIM, Paris (pp. 698–707).
Arviv, K., Stern, H., & Edan, Y. (2016). Collaborative reinforcement learning for a two-robot job transfer flow-shop scheduling problem. International Journal of Production Research, 54(4), 1196–1209. https://doi.org/10.1080/00207543.2015.1057297
Atighehchian, A., & Sepehri, M. M. (2013). An environment-driven, function-based approach to dynamic single-machine scheduling. European Journal of Industrial Engineering, 7(1), 100–118. https://doi.org/10.1504/EJIE.2013.051594
Aydin, M. E., & Öztemel, E. (2000). Dynamic job-shop scheduling using reinforcement learning agents. Robotics and Autonomous Systems, 33(2), 169–178. https://doi.org/10.1016/S0921-8890(00)00087-7
Barto, A. G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(1), 41–77. https://doi.org/10.1023/A:1022140919877
Bouazza, W., Sallez, Y., & Beldjilali, B. (2017). A distributed approach solving partially flexible job-shop scheduling problem with a Q-learning effect. IFAC-PapersOnLine, 50(1), 15890–15895. https://doi.org/10.1016/j.ifacol.2017.08.2354
Cadavid, J. P. U., Lamouri, S., Grabot, B., Pellerin, R., & Fortin, A. (2020). Machine learning applied in production planning and control: a state-of-the-art in the era of industry 4.0. Journal of Intelligent Manufacturing, 31(6), 1531–1558. https://doi.org/10.1007/s10845-019-01531-7
Csáji, B. C., & Monostori, L. (2005). Stochastic approximate scheduling by neurodynamic learning. In IFAC Proceedings Volumes, (Vol. 38, No. 1, pp. 355–360). https://doi.org/10.3182/20050703-6-CZ-1902.01481
Csáji, B. C., & Monostori, L. (2008). Adaptive stochastic resource control: A machine learning approach. Journal of Artificial Intelligence Research, 32, 453–486. https://doi.org/10.1613/jair.2548
Csáji, B. C., Monostori, L., & Kádár, B. (2006). Reinforcement learning in a distributed market-based production control system. Advanced Engineering Informatics, 20(3), 279–288. https://doi.org/10.1016/j.aei.2006.01.001
Das, T. K., Gosavi, A., Mahadevan, S., & Marchalleck, N. (1999). Solving semi-Markov decision problems using average reward reinforcement learning. Management Science, 45(4), 560–574. https://doi.org/10.1287/mnsc.45.4.560
De Raedt, L. (2008). Logical and relational learning. New York: Springer. https://doi.org/10.1007/978-3-540-68856-3.
Ding, Z., & Dong, H. (2020). Challenges of reinforcement learning. In Deep Reinforcement Learning (pp. 249–272). Singapore: Springer. https://doi.org/10.1007/978-981-15-4095-0_7
Dulac-Arnold, G., Mankowitz, D., & Hester, T. (2019). Challenges of real-world reinforcement learning. (Online) https://arxiv.org/abs/1904.12901
Fuchigami, H. Y., & Rangel, S. (2018). A survey of case studies in production scheduling: Analysis and perspectives. Journal of Computational Science, 25, 425–436. https://doi.org/10.1016/j.jocs.2017.06.004
Fang, G., Li, Y., Liu, A., & Liu, Z. (2020). A reinforcement learning method to scheduling problem of steel production process.Journal of Physics: Conference Series, 1486(7), 072035. https://doi.org/10.1088/1742-6596/1486/7/072035
Gabel, T., & Riedmiller, M. (2006a). Reducing policy degradation in neuro-dynamic programming. In ESANN 2006 Proceedings - European Symposium on Artificial Neural Networks (pp. 653–658).
Gabel, T., & Riedmiller, M. (2006b). Multi-agent case-based reasoning for cooperative reinforcement learners. In Roth-Berghofer, T. R., Göker, M. H., & Güvenir, H. A. (Eds.), Advances in case-based reasoning. ECCBR 2006 (4106 vol.). Berlin, Heidelberg: Springer. https://doi.org/10.1007/11805816_5
Gabel, T., & Riedmiller, M. (2007a). On a successful application of multi-agent reinforcement learning to operations research benchmarks. In 2007 IEEE international symposium on approximate dynamic programming and reinforcement learning (pp. 68–75). https://doi.org/10.1109/ADPRL.2007.368171
Gabel, T., & Riedmiller, M. (2007b). Scaling adaptive agent-based reactive job-shop scheduling to large-scale problems. In Proceedings of the 2007 IEEE symposium on computational Intelligence in scheduling, CI-Sched 2007 (pp. 259–266). https://doi.org/10.1109/SCIS.2007.367699
Gabel, T., & Riedmiller, M. (2008). Adaptive reactive job-shop scheduling with reinforcement learning agents. International Journal of Information Technology and Intelligent Computing, 24(4), 14–18
Gabel, T., & Riedmiller, M. (2011). Distributed policy search reinforcement learning for job-shop scheduling tasks. International Journal of Production Research, 50(1), 41–61. https://doi.org/10.1080/00207543.2011.571443
Gosavi, A. (2015). Simulation-based optimization. Berlin: Springer
Graham, R. L., Lawler, E. L., Lenstra, J. K., & Kan, A. H. G. R. (1979). Optimization and approximation in deterministic sequencing and scheduling: A survey. Annals of Discrete Mathematics, 5, 287–326. https://doi.org/10.1016/S0167-5060(08)70356-X
Guo, L., Zhuang, Z., Huang, Z., & Qin, W. (2020). Optimization of dynamic multi-objective non-identical parallel machine scheduling with multi-stage reinforcement learning. In 2020 IEEE 16th international conference on automation science and engineering (CASE) (pp. 1215–1219). https://doi.org/10.1109/CASE48305.2020.9216743
Han, W., Guo, F., & Su, X. (2019). A reinforcement learning method for a hybrid flow-shop scheduling problem. Algorithms, 12(11), https://doi.org/10.3390/a12110222
Heuillet, A., Couthouis, F., & Díaz-Rodríguez, N. (2021). Explainability in deep reinforcement learning. Knowledge-Based Systems, 214, 106685. https://doi.org/10.1016/j.knosys.2020.106685
Hong, J., & Prabhu, V. V. (2004). Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems. Applied Intelligence, 20(1), 71–87. https://doi.org/10.1023/B:APIN.0000011143.95085.74
Idrees, H. D., Sinnokrot, M. O., & Al-Shihabi, S. (2006). A reinforcement learning algorithm to minimize the mean tardiness of a single machine with controlled capacity. In Proceedings - Winter simulation conference (pp. 1765–1769). https://doi.org/10.1109/WSC.2006.322953
Iwamura, K., Mayumi, N., Tanimizu, Y., & Sugimura, N. (2010). A study on real-time scheduling for holonic manufacturing systems - Determination of utility values based on multi-agent reinforcement learning. In International conference on industrial applications of holonic and multi-agent systems (pp. 135–144). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03668-2_13
Jiménez, Y. M., Palacio, J. C., & Nowé, A. (2020). Multi-agent reinforcement learning tool for job shop scheduling problems. In International conference on optimization and learning (pp. 3–12). https://doi.org/10.1007/978-3-030-41913-4_1
Kaelbling, L., Littman, M. L., Moore, A. W., & Hall, S. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285. https://doi.org/10.1613/jair.301
Khadilkar, H. (2018). A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Transactions on Intelligent Transportation Systems, 20(2), 727–736. https://doi.org/10.1109/TITS.2018.2829165
Kim, G. H., & Lee, C. S. G. (1996). Genetic reinforcement learning for scheduling heterogeneous machines. In Proceedings - IEEE International Conference on Robotics and Automation (Vol. 3, pp. 2798–2803). https://doi.org/10.1109/ROBOT.1996.506586
Kim, N., & Shin, H. (2017). The application of actor-critic reinforcement learning for fab dispatching scheduling. In 2017 Winter simulation conference (pp. 4570–4571). https://doi.org/10.1109/WSC.2017.8248209
Kong, L. F., & Wu, J. (2005). Dynamic single machine scheduling using Q-learning agent. In 2005 International conference on machine learning and cybernetics, ICMLC 2005 (pp. 3237–3241). https://doi.org/10.1109/ICMLC.2005.1527501
Lee, S., Cho, Y., & Lee, Y. H. (2020). Injection mold production sustainable scheduling using deep reinforcement learning. Sustainability, 12(20), 8718. https://doi.org/10.3390/su12208718
Lihu, A., & Holban, S. (2009). Top five most promising algorithms in scheduling. In Proceedings – 2009 5th international symposium on applied computational intelligence and informatics, SACI 2009 (pp. 397–404). https://doi.org/10.1109/SACI.2009.5136281
Lin, C. C., Deng, D. J., Chih, Y. L., & Chiu, H. T. (2019). Smart manufacturing scheduling with edge computing using multiclass deep Q network. IEEE Transactions on Industrial Informatics, 15(7), 4276–4284. https://doi.org/10.1109/TII.2019.2908210
Liu, C. C., Jin, H. Y., Tian, Y., & Yu, H. B. (2001). Reinforcement learning approach to re-entrant manufacturing system scheduling. In 2001 International Conferences on Info-Tech and Info-Net: A Key to Better Life, ICII 2001 - Proceedings (Vol. 3, pp. 280–285). https://doi.org/10.1109/ICII.2001.983070
Liu, C. L., Chang, C. C., & Tseng, C. J. (2020). Actor-critic deep reinforcement learning for solving job shop scheduling problems. IEEE Access, 8, 71752–71762. https://doi.org/10.1109/ACCESS.2020.2987820
Liu, W., & Wang, X. (2009). Dynamic decision model in evolutionary games based on reinforcement learning. Systems Engineering - Theory & Practice, 29(3), 28–33. https://doi.org/10.1016/S1874-8651(10)60008-7
Luo, S. (2020). Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Applied Soft Computing, 91, 106208. https://doi.org/10.1016/j.asoc.2020.106208
Miyashita, K. (2000). Learning scheduling control knowledge through reinforcements. International Transactions in Operational Research, 7(2), 125–138. https://doi.org/10.1016/S0969-6016(00)00014-9
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., & Hassabis, D., …. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236
Monostori, L., & Csáji, B. C. (2006). Stochastic dynamic production control by neurodynamic programming. CIRP Annals - Manufacturing Technology, 55(1), 473–478. https://doi.org/10.1016/S0007-8506(07)60462-4
Monostori, L., Csáji, B. C., & Kádár, B. (2004). Adaptation and learning in distributed production control. CIRP Annals - Manufacturing Technology, 53(1), 349–352. https://doi.org/10.1016/S0007-8506(07)60714-8
Nahmias, S., & Olsen, T. L. (2015). Production and operations analysis. Long Grove: Waveland Press
Neto, T. R. F., & Godinho Filho, M. (2013). Literature review regarding Ant Colony Optimization applied to scheduling problems: Guidelines for implementation and directions for future research. Engineering Applications of Artificial Intelligence, 26(1), 150–161. https://doi.org/10.1016/j.engappai.2012.03.011
Palombarini, J., & Martínez, E. (2010). Learning to repair plans and schedules using a relational (deictic) representation. In Computer aided chemical engineering (Vol. 27, pp. 1377–1382). Elsevier. https://doi.org/10.1016/s1570-7946(09)70620-0
Palombarini, J., & Martínez, E. (2012a). SmartGantt – An interactive system for generating and updating rescheduling knowledge using relational abstractions. Computers and Chemical Engineering, 47, 202–216. https://doi.org/10.1016/j.compchemeng.2012.06.021
Palombarini, J., & Martínez, E. (2012b). SmartGantt – An intelligent system for real time rescheduling based on relational reinforcement learning. Expert Systems With Applications, 39(11), 10251–10268. https://doi.org/10.1016/j.eswa.2012.02.176
Parente, M., Figueira, G., Amorim, P., & Marques, A. (2020). Production scheduling in the context of Industry 4.0: review and trends. International Journal of Production Research, 58(17), 5401–5431. https://doi.org/10.1080/00207543.2020.1718794
Park, I., Huh, J., Kim, J., & Park, J. (2020). A reinforcement learning approach to robust scheduling of semiconductor manufacturing facilities. IEEE Transactions on Automation Science and Engineering, 17(3), 1420–1431. https://doi.org/10.1109/tase.2019.2956762
Paternina-Arboleda, C. D., & Das, T. K. (2001). Intelligent dynamic control policies for serial production lines. IIE Transactions, 33(1), 65–77. https://doi.org/10.1023/A:1007641824604
Qu, S., Chu, T., Wang, J., Leckie, J., & Jian, W. (2015). A centralized reinforcement learning approach for proactive scheduling in manufacturing. In IEEE international conference on emerging technologies and factory automation, ETFA (Vol. 2015-Octob, pp. 1–8). https://doi.org/10.1109/ETFA.2015.7301417
Qu, S., Wang, J., Govil, S., & Leckie, J. O. (2016a). Optimized adaptive scheduling of a manufacturing process system with multi-skill workforce and multiple machine types: An ontology-based, multi-agent reinforcement learning approach. Procedia CIRP, 57, 55–60. https://doi.org/10.1016/j.procir.2016.11.011
Qu, S., Jie, W., & Shivani, G. (2016b). Learning adaptive dispatching rules for a manufacturing process system by using reinforcement learning approach. In IEEE International Conference on Emerging Technologies and Factory Automation, ETFA (Vol. 2016-Novem, pp. 1–8). https://doi.org/10.1109/etfa.2016.7733712
Qu, G., Wierman, A., & Li, N. (2020). Scalable reinforcement learning of localized policies for multi-agent networked systems. In Learning for Dynamics and Control (pp. 256–266).
Ramírez-Hernández, J. A., & Fernandez, E. (2005). A case study in scheduling reentrant manufacturing lines: Optimal and simulation-based approaches. In Proceedings of the 44th IEEE conference on decision and control (Vol. 2005, pp. 2158–2163). https://doi.org/10.1109/CDC.2005.1582481
Ramírez-Hernández, J. A., & Fernandez, E. (2009). A simulation-based approximate dynamic programming approach for the control of the intel Mini-Fab benchmark model. In Proceedings - Winter simulation conference (pp. 1634–1645). https://doi.org/10.1109/wsc.2009.5429179
Ren, J., Ye, C., & Yang, F. (2020). A novel solution to JSPs based on long short-term memory and policy gradient algorithm. International Journal of Simulation Modelling, 19, 157–168. https://doi.org/10.2507/ijsimm19-1-co4
Reyna, Y. C. F., Cáceres, A. P., Jiménez, Y. M., & Reyes, Y. T. (2019a). An improvement of reinforcement learning approach for permutation of flow-shop scheduling problems. In RISTI - Revista Iberica de Sistemas e Tecnologias de Informacao, (E18), pp. 257–270.
Reyna, Y. C. F., Jiménez, Y. M., Cabrera, A. V., & Sánchez, E. A. (2019b). Optimization of heavily constrained hybrid-flexible flowshop problems using a multi-agent reinforcement learning approach. Investigacion Operacional, 40(1), 100–111
Reyna, Y. C. F., Jiménez, Y. M., & Nowé, A. (2018). Q-learning algorithm performance for m-machine n-jobs flow shop scheduling to minimize makespan. Investigación Operacional, 38(3), 281–290
Reyna, Y. C. F., Jiménez, Y. M., Bermúdez Cabrera, J. M., & Méndez Hernández, B. M. (2015). A reinforcement learning approach for scheduling problems. Investigacion Operacional, 36(3), 225–231
Riedmiller, S., & Riedmiller, M. (1999). A neural reinforcement learning approach to learn local dispatching policies in production scheduling. In IJCAI Iiternational joint conference on artificial intelligence (Vol. 2, pp. 764–769).
Russel, S., & Norvig, P. (2010). Artificial intelligence: A modern approach. London: Pearson.
Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the tenth international conference on machine learning (Vol. 298, pp. 298–305). https://doi.org/10.1016/b978-1-55860-307-3.50045-9
Shiue, Y., Lee, K., & Su, C. (2018). Real-time scheduling for a smart factory using a reinforcement learning approach. Computers & Industrial Engineering, 125(101), 604–614. https://doi.org/10.1016/j.cie.2018.03.039
Sigaud, O., & Buffet, O. (2013). Markov Decision Processes in Artificial Intelligence: MDPs, beyond MDPs and applications. New York: Wiley
Stricker, N., Kuhnle, A., Sturm, R., & Friess, S. (2018). Manufacturing technology reinforcement learning for adaptive order dispatching in the semiconductor industry. CIRP Annals, 67(1), 511–514. https://doi.org/10.1016/j.cirp.2018.04.041
Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. Cambridge: MIT Press
Szepesvári, C. (2010). Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning, 4(1), 1–103. https://doi.org/10.2200/S00268ED1V01Y201005AIM009
Thomas, T. E., Koo, J., Chaterji, S., & Bagchi, S. (2018). Minerva: A reinforcement learning-based technique for optimal scheduling and bottleneck detection in distributed factory operations. In 2018 10th international conference on communication systems & networks (COMSNETS) (pp. 129–136). https://doi.org/10.1109/COMSNETS.2018.8328189
Van Otterlo, M. (2009). The logic of adaptive behavior: Knowledge representation and algorithms for adaptive sequential decision making under uncertainty in first-order and relational domains. Ios Press
Vapnik, V. N. (2000). Methods of pattern recognition. In The nature of statistical learning theory (pp. 123–180). New York, NY: Springer
Wang, H. X., & Yan, H. S. (2013a). An adaptive scheduling system in knowledgeable manufacturing based on multi-agent. In 10th IEEE international conference on control and automation (ICCA) (pp. 496–501). https://doi.org/10.1109/icca.2013.6564866
Wang, H. X., & Yan, H. S. (2013b). An adaptive assembly scheduling approach in knowledgeable manufacturing. Applied Mechanics and Materials, 433–435, 2347–2350. https://doi.org/10.4028/www.scientific.net/AMM.433-435.2347
Wang, H. X., & Yan, H. S. (2016). An interoperable adaptive scheduling strategy for knowledgeable manufacturing based on SMGWQ-learning. Journal of Intelligent Manufacturing, 27(5), 1085–1095. https://doi.org/10.1007/s10845-014-0936-1
Wang, H. X., Sarker, B. R., Li, J., & Li, J. (2020). Adaptive scheduling for assembly job shop with uncertain assembly times based on dual Q- learning. International Journal of Production Research. https://doi.org/10.1080/00207543.2020.1794075
Wang, Y. C., & Usher, J. M. (2004). Learning policies for single machine job dispatching. Robotics and Computer-Integrated Manufacturing, 20(6), 553–562. https://doi.org/10.1016/j.rcim.2004.07.003
Wang, Y. C., & Usher, J. M. (2005). Application of reinforcement learning for agent-based production scheduling. Engineering Applications of Artificial Intelligence, 18(1), 73–82. https://doi.org/10.1016/j.engappai.2004.08.018
Wang, Y. C., & Usher, J. M. (2007). A reinforcement learning approach for developing routing policies in multi-agent production scheduling. International Journal of Advanced Manufacturing Technology, 33(3–4), 323–333. https://doi.org/10.1007/s00170-006-0465-y
Wang, Y. F. (2018). Adaptive job shop scheduling strategy based on weighted Q-learning algorithm. Journal of Intelligent Manufacturing, 31(2), 417–432. https://doi.org/10.1007/s10845-018-1454-3
Waschneck, B., Reichstaller, A., Belzner, L., Altenmüller, T., Bauernhansl, T., Knapp, A., & Kyek, A. (2018a). Optimization of global production scheduling with deep reinforcement learning. Procedia CIRP, 72, 1264–1269. https://doi.org/10.1016/j.procir.2018.03.212
Waschneck, B., Reichstaller, A., Belzner, L., Altenmuller, T., Bauernhansl, T., Knapp, A., & Kyek, A. (2018b). Deep reinforcement learning for semiconductor production scheduling. In 2018 29th annual SEMI advanced semiconductor manufacturing conference, ASMC 2018 (pp. 301–306). https://doi.org/10.1109/asmc.2018.8373191
Wei, Y., & Zhao, M. (2004). Composite rules selection using reinforcement learning for dynamic job-shop scheduling. In 2004 IEEE conference on robotics, automation and mechatronics (Vol. 2, pp. 1083–1088). https://doi.org/10.1109/RAMECH.2004.1438070
Xanthopoulos, A. S., Koulouriotis, D. E., Tourassis, V. D., & Emiris, D. M. (2013). Intelligent controllers for bi-objective dynamic scheduling on a single machine with sequence-dependent setups. Applied Soft Computing Journal, 13(12), 4704–4717. https://doi.org/10.1016/j.asoc.2013.07.015
Xiao, Y., Tan, Q., Zhou, L., & Tang, H. (2017). Stochastic scheduling with compatible job families by an improved Q-learning algorithm. In Chinese Control Conference, CCC (pp. 2657–2662). https://doi.org/10.23919/ChiCC.2017.8027764
Yang, H. B., & Yan, H. S. (2009). An adaptive approach to dynamic scheduling in knowledgeable manufacturing cell. International Journal of Advanced Manufacturing Technology, 42(3–4), 312–320. https://doi.org/10.1007/s00170-008-1588-0
Yang, H. B., & Yan, H. S. (2007). An adaptive policy of dynamic scheduling in knowledgeable manufacturing environment. In Proceedings of the IEEE international conference on automation and logistics, ICAL 2007 (pp. 835–840). https://doi.org/10.1109/ICAL.2007.4338680
Yingzi, W. E. I., Xinli, J., & Pingbo, H. A. O. (2009). Pattern Driven Dynamic Scheduling Approach using Reinforcement Learning. In 2009 IEEE international conference on automation and logistics (pp. 514–519). https://doi.org/10.1109/ICAL.2009.5262867
Yuan, B., Jiang, Z., & Wang, L. (2016). Dynamic parallel machine scheduling with random breakdowns using the learning agent. International Journal of Services Operations and Informatics, 8(2), 94–103. https://doi.org/10.1504/IJSOI.2016.080083
Yuan, B., Wang, L., & Jiang, Z. (2013). Dynamic parallel machine scheduling using the learning agent. In 2013 IEEE international conference on industrial engineering and engineering management (pp. 1565–1569). https://doi.org/10.1109/IEEM.2013.6962673
Zhang, T., Xie, S., & Rose, O. (2017). Real-time job shop scheduling based on simulation and Markov decision processes. In Proceedings - Winter simulation conference (pp. 3899–3907). https://doi.org/10.1109/WSC.2017.8248100
Zhang, T., Xie, S., & Rose, O. (2018). Real-time batching in job shops based on simulation and reinforcement learning. In 2018 Winter simulation conference (WSC) (pp. 3331–3339). https://doi.org/10.1109/WSC.2018.8632524
Zhang, W., & Dietterich, T. G. (1995). A reinforcement learning approach to job-shop scheduling. In 1995 International joint conference on artificial intelligence (pp. 1114–1120).
Zhang, W., & Dietterich, T. G. (1996). High-performance job-shop scheduling with a time-delay TD (λ) network. Advances in Neural Information Processing Systems, 91, 1024–1030
Zhang, Z., Zheng, L., Hou, F., & Li, N. (2011). Semiconductor final test scheduling with Sarsa(λ, k) algorithm. European Journal of Operational Research, 215(2), 446–458. https://doi.org/10.1016/j.ejor.2011.05.052
Zhang, Z., Zheng, L., Li, N., Wang, W., Zhong, S., & Hu, K. (2012). Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning. Computers and Operations Research, 39(7), 1315–1324. https://doi.org/10.1016/j.cor.2011.07.019
Zhang, Z., Zheng, L., & Weng, M. X. (2007). Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-learning. International Journal of Advanced Manufacturing Technology, 34(9–10), 968–980. https://doi.org/10.1007/s00170-006-0662-8
Zhao, M., Li, X., Gao, L., Wang, L., & Xiao, M. (2019). An improved Q-learning based rescheduling method for flexible job-shops with machine failures. In 2019 IEEE 15th international conference on automation science and engineering (CASE) (pp. 331–337). https://doi.org/10.1109/COASE.2019.8843100
Zhou, L., Zhang, L., & Horn, B. K. P. (2020). Deep reinforcement learning-based dynamic scheduling in smart manufacturing. Procedia CIRP, 93, 383–388. https://doi.org/10.1016/j.procir.2020.05.163
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kayhan, B.M., Yildiz, G. Reinforcement learning applications to machine scheduling problems: a comprehensive literature review. J Intell Manuf 34, 905–929 (2023). https://doi.org/10.1007/s10845-021-01847-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10845-021-01847-3