{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,25]],"date-time":"2024-08-25T13:27:56Z","timestamp":1724592476813},"publisher-location":"New York, NY, USA","reference-count":38,"publisher":"ACM","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["U1811261, 62137001"],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,10,17]]},"DOI":"10.1145\/3511808.3557626","type":"proceedings-article","created":{"date-parts":[[2022,10,16]],"date-time":"2022-10-16T01:22:22Z","timestamp":1665883342000},"update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Learning Rate Perturbation"],"prefix":"10.1145","author":[{"given":"Hengyu","family":"Liu","sequence":"first","affiliation":[{"name":"Northeastern University, Shenyang, China"}]},{"given":"Qiang","family":"Fu","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia, Beijing, China"}]},{"given":"Lun","family":"Du","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia, Beijing, China"}]},{"given":"Tiancheng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Northeastern University, Shenyang, China"}]},{"given":"Ge","family":"Yu","sequence":"additional","affiliation":[{"name":"Northeastern University, Shenyang, China"}]},{"given":"Shi","family":"Han","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia, Beijing, China"}]},{"given":"Dongmei","family":"Zhang","sequence":"additional","affiliation":[{"name":"Microsoft Research Asia, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2022,10,17]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Training with noise is equivalent to Tikhonov regularization. Neural computation 7, 1","author":"Bishop Chris M","year":"1995","unstructured":"Chris M Bishop . 1995. Training with noise is equivalent to Tikhonov regularization. Neural computation 7, 1 ( 1995 ), 108--116. Chris M Bishop. 1995. Training with noise is equivalent to Tikhonov regularization. Neural computation 7, 1 (1995), 108--116."},{"key":"e_1_3_2_2_2_1","unstructured":"Christopher M Bishop etal 1995. Neural networks for pattern recognition. Oxford university press. Christopher M Bishop et al. 1995. Neural networks for pattern recognition. Oxford university press."},{"key":"e_1_3_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9414919"},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3488560.3498474"},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467228"},{"key":"e_1_3_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3485447.3512201"},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/MITS.2017.2776130"},{"key":"e_1_3_2_2_9_1","volume-title":"Neuron with Steady Response Leads to Better Generalization. arXiv preprint arXiv:2111.15414","author":"Fu Qiang","year":"2021","unstructured":"Qiang Fu , Lun Du , Haitao Mao , Xu Chen , Wei Fang , Shi Han , and Dongmei Zhang . 2021. Neuron with Steady Response Leads to Better Generalization. arXiv preprint arXiv:2111.15414 ( 2021 ). Qiang Fu, Lun Du, Haitao Mao, Xu Chen, Wei Fang, Shi Han, and Dongmei Zhang. 2021. Neuron with Steady Response Leads to Better Generalization. arXiv preprint arXiv:2111.15414 (2021)."},{"key":"e_1_3_2_2_10_1","volume-title":"International conference on machine learning. PMLR, 3059-- 3068","author":"Gulcehre Caglar","year":"2016","unstructured":"Caglar Gulcehre , Marcin Moczulski , Misha Denil , and Yoshua Bengio . 2016 . Noisy activation functions . In International conference on machine learning. PMLR, 3059-- 3068 . Caglar Gulcehre, Marcin Moczulski, Misha Denil, and Yoshua Bengio. 2016. Noisy activation functions. In International conference on machine learning. PMLR, 3059-- 3068."},{"key":"e_1_3_2_2_11_1","volume-title":"Inductive Representation Learning on Large Graphs. CoRR abs\/1706.02216","author":"Hamilton William L.","year":"2017","unstructured":"William L. Hamilton , Rex Ying , and Jure Leskovec . 2017. Inductive Representation Learning on Large Graphs. CoRR abs\/1706.02216 ( 2017 ). arXiv:1706.02216 http:\/\/arxiv.org\/abs\/1706.02216 William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. CoRR abs\/1706.02216 (2017). arXiv:1706.02216 http:\/\/arxiv.org\/abs\/1706.02216"},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_13_1","volume-title":"Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)."},{"key":"e_1_3_2_2_14_1","volume-title":"Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907","author":"Kipf Thomas N","year":"2016","unstructured":"Thomas N Kipf and MaxWelling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 ( 2016 ). Thomas N Kipf and MaxWelling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)."},{"key":"e_1_3_2_2_15_1","unstructured":"Alex Krizhevsky Geoffrey Hinton etal 2009. Learning multiple layers of features from tiny images. (2009). Alex Krizhevsky Geoffrey Hinton et al. 2009. Learning multiple layers of features from tiny images. (2009)."},{"key":"e_1_3_2_2_16_1","unstructured":"Yann LeCun. 1998. The MNIST database of handwritten digits. http:\/\/yann. lecun. com\/exdb\/mnist\/ (1998). Yann LeCun. 1998. The MNIST database of handwritten digits. http:\/\/yann. lecun. com\/exdb\/mnist\/ (1998)."},{"key":"e_1_3_2_2_17_1","volume-title":"How to decay your learning rate. arXiv preprint arXiv:2103.12682","author":"Lewkowycz Aitor","year":"2021","unstructured":"Aitor Lewkowycz . 2021. How to decay your learning rate. arXiv preprint arXiv:2103.12682 ( 2021 ). Aitor Lewkowycz. 2021. How to decay your learning rate. arXiv preprint arXiv:2103.12682 (2021)."},{"key":"e_1_3_2_2_18_1","volume-title":"On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265","author":"Liu Liyuan","year":"2019","unstructured":"Liyuan Liu , Haoming Jiang , Pengcheng He , Weizhu Chen , Xiaodong Liu , Jianfeng Gao , and Jiawei Han . 2019. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 ( 2019 ). Liyuan Liu, Haoming Jiang, Pengcheng He,Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. 2019. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265 (2019)."},{"key":"e_1_3_2_2_19_1","volume-title":"Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030","author":"Liu Ze","year":"2021","unstructured":"Ze Liu , Yutong Lin , Yue Cao , Han Hu , Yixuan Wei , Zheng Zhang , Stephen Lin , and Baining Guo . 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 ( 2021 ). Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)."},{"key":"e_1_3_2_2_20_1","volume-title":"Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983","author":"Loshchilov Ilya","year":"2016","unstructured":"Ilya Loshchilov and Frank Hutter . 2016 . Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016). Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983 (2016)."},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482153"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TENCON.2019.8929465"},{"key":"e_1_3_2_2_23_1","volume-title":"Adding gradient noise improves learning for very deep networks. arXiv preprint arXiv:1511.06807","author":"Neelakantan Arvind","year":"2015","unstructured":"Arvind Neelakantan , Luke Vilnis , Quoc V Le , Ilya Sutskever , Lukasz Kaiser , Karol Kurach , and James Martens . 2015. Adding gradient noise improves learning for very deep networks. arXiv preprint arXiv:1511.06807 ( 2015 ). Arvind Neelakantan, Luke Vilnis, Quoc V Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, and James Martens. 2015. Adding gradient noise improves learning for very deep networks. arXiv preprint arXiv:1511.06807 (2015)."},{"key":"e_1_3_2_2_24_1","volume-title":"CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity. arXiv preprint arXiv:1912.11493","author":"Preechakul Konpat","year":"2019","unstructured":"Konpat Preechakul and Boonserm Kijsirikul . 2019. CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity. arXiv preprint arXiv:1912.11493 ( 2019 ). Konpat Preechakul and Boonserm Kijsirikul. 2019. CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity. arXiv preprint arXiv:1912.11493 (2019)."},{"key":"e_1_3_2_2_25_1","volume-title":"Neural smithing: supervised learning in feedforward artificial neural networks","author":"Reed Russell","unstructured":"Russell Reed and Robert J MarksII. 1999. Neural smithing: supervised learning in feedforward artificial neural networks . Mit Press . Russell Reed and Robert J MarksII. 1999. Neural smithing: supervised learning in feedforward artificial neural networks. Mit Press."},{"key":"e_1_3_2_2_26_1","volume-title":"An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747","author":"Ruder Sebastian","year":"2016","unstructured":"Sebastian Ruder . 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 ( 2016 ). Sebastian Ruder. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747 (2016)."},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"crossref","unstructured":"Olga Russakovsky Jia Deng Hao Su Jonathan Krause Sanjeev Satheesh Sean Ma Zhiheng Huang Andrej Karpathy Aditya Khosla Michael Bernstein etal 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115 3 (2015) 211--252. Olga Russakovsky Jia Deng Hao Su Jonathan Krause Sanjeev Satheesh Sean Ma Zhiheng Huang Andrej Karpathy Aditya Khosla Michael Bernstein et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115 3 (2015) 211--252.","DOI":"10.1007\/s11263-015-0816-y"},{"key":"e_1_3_2_2_28_1","volume-title":"Neural Code Summarization: How Far Are We? arXiv preprint arXiv:2107.07112","author":"Shi Ensheng","year":"2021","unstructured":"Ensheng Shi , Yanlin Wang , Lun Du , Junjie Chen , Shi Han , Hongyu Zhang , Dongmei Zhang , and Hongbin Sun . 2021. Neural Code Summarization: How Far Are We? arXiv preprint arXiv:2107.07112 ( 2021 ). Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, and Hongbin Sun. 2021. Neural Code Summarization: How Far Are We? arXiv preprint arXiv:2107.07112 (2021)."},{"key":"e_1_3_2_2_29_1","volume-title":"Cast: Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees. arXiv preprint arXiv:2108.12987","author":"Shi Ensheng","year":"2021","unstructured":"Ensheng Shi , YanlinWang, Lun Du , Hongyu Zhang , Shi Han , Dongmei Zhang , and Hongbin Sun . 2021 . Cast: Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees. arXiv preprint arXiv:2108.12987 (2021). Ensheng Shi, YanlinWang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, and Hongbin Sun. 2021. Cast: Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees. arXiv preprint arXiv:2108.12987 (2021)."},{"key":"e_1_3_2_2_30_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and AndrewZisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and AndrewZisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_3_2_2_31_1","volume-title":"Cyclical learning rates for training neural networks. In 2017 IEEE winter conference on applications of computer vision (WACV)","author":"Smith Leslie N","unstructured":"Leslie N Smith . 2017. Cyclical learning rates for training neural networks. In 2017 IEEE winter conference on applications of computer vision (WACV) . IEEE , 464--472. Leslie N Smith. 2017. Cyclical learning rates for training neural networks. In 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, 464--472."},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-017-9701-0"},{"key":"e_1_3_2_2_33_1","volume-title":"Graph attention networks. arXiv preprint arXiv:1710.10903","author":"Velickovi\u0107 Petar","year":"2017","unstructured":"Petar Velickovi\u0107 , Guillem Cucurull , Arantxa Casanova , Adriana Romero , Pietro Lio , and Yoshua Bengio . 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 ( 2017 ). Petar Velickovi\u0107, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)."},{"key":"e_1_3_2_2_34_1","volume-title":"A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima. arXiv preprint arXiv:2002.03495","author":"Xie Zeke","year":"2020","unstructured":"Zeke Xie , Issei Sato , and Masashi Sugiyama . 2020. A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima. arXiv preprint arXiv:2002.03495 ( 2020 ). Zeke Xie, Issei Sato, and Masashi Sugiyama. 2020. A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima. arXiv preprint arXiv:2002.03495 (2020)."},{"key":"e_1_3_2_2_35_1","volume-title":"How powerful are graph neural networks? arXiv preprint arXiv:1810.00826","author":"Xu Keyulu","year":"2018","unstructured":"Keyulu Xu , Weihua Hu , Jure Leskovec , and Stefanie Jegelka . 2018. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 ( 2018 ). Keyulu Xu,Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018)."},{"key":"e_1_3_2_2_36_1","volume-title":"International conference on machine learning. PMLR, 40--48","author":"Yang Zhilin","year":"2016","unstructured":"Zhilin Yang , William Cohen , and Ruslan Salakhudinov . 2016 . Revisiting semisupervised learning with graph embeddings . In International conference on machine learning. PMLR, 40--48 . Zhilin Yang, William Cohen, and Ruslan Salakhudinov. 2016. Revisiting semisupervised learning with graph embeddings. In International conference on machine learning. PMLR, 40--48."},{"key":"e_1_3_2_2_37_1","volume-title":"Kaichao Long and Michael I Jordan","author":"Jianmin You Mingsheng Wang","year":"2019","unstructured":"Mingsheng Wang Jianmin You , Kaichao Long and Michael I Jordan . 2019 . How does learning rate decay help modern neural networks? arXiv preprint arXiv:1908.01878 (2019). Mingsheng Wang Jianmin You, Kaichao Long and Michael I Jordan. 2019. How does learning rate decay help modern neural networks? arXiv preprint arXiv:1908.01878 (2019)."},{"key":"e_1_3_2_2_38_1","volume-title":"Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701","author":"Zeiler Matthew D","year":"2012","unstructured":"Matthew D Zeiler . 2012. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 ( 2012 ). Matthew D Zeiler. 2012. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)."}],"event":{"name":"CIKM '22: The 31st ACM International Conference on Information and Knowledge Management","location":"Atlanta GA USA","acronym":"CIKM '22","sponsor":["SIGWEB ACM Special Interest Group on Hypertext, Hypermedia, and Web","SIGIR ACM Special Interest Group on Information Retrieval"]},"container-title":["Proceedings of the 31st ACM International Conference on Information & Knowledge Management"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3511808.3557626","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,27]],"date-time":"2023-06-27T18:08:47Z","timestamp":1687889327000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3511808.3557626"}},"subtitle":["A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima"],"short-title":[],"issued":{"date-parts":[[2022,10,17]]},"references-count":38,"alternative-id":["10.1145\/3511808.3557626","10.1145\/3511808"],"URL":"https:\/\/doi.org\/10.1145\/3511808.3557626","relation":{},"subject":[],"published":{"date-parts":[[2022,10,17]]},"assertion":[{"value":"2022-10-17","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}