{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T22:14:09Z","timestamp":1730326449930,"version":"3.28.0"},"publisher-location":"New York, NY, USA","reference-count":67,"publisher":"ACM","funder":[{"name":"NSF","award":["2128725, 1919197, 2106893"]},{"name":"Virginia Commonwealth Cyber Initiative","award":[""]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,1,27]]},"DOI":"10.1145\/3575693.3575698","type":"proceedings-article","created":{"date-parts":[[2023,1,30]],"date-time":"2023-01-30T22:56:55Z","timestamp":1675119415000},"page":"791-803","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining"],"prefix":"10.1145","author":[{"given":"Liwei","family":"Guo","sequence":"first","affiliation":[{"name":"University of Virginia, USA"}]},{"given":"Wonkyo","family":"Choe","sequence":"additional","affiliation":[{"name":"University of Virginia, USA"}]},{"given":"Felix Xiaozhu","family":"Lin","sequence":"additional","affiliation":[{"name":"University of Virginia, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,1,30]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2022. Hugging Face:nlptown\/bert-base-multilingual-uncased-sentiment. https:\/\/huggingface.co\/nlptown\/bert-base-multilingual-uncased-sentiment (Accessed on 07\/07\/2022) \t\t\t\t 2022. Hugging Face:nlptown\/bert-base-multilingual-uncased-sentiment. https:\/\/huggingface.co\/nlptown\/bert-base-multilingual-uncased-sentiment (Accessed on 07\/07\/2022)"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"2022. PyTorch. https:\/\/pytorch.org\/ (Accessed on 03\/14\/2022) \t\t\t\t 2022. PyTorch. https:\/\/pytorch.org\/ (Accessed on 03\/14\/2022)","DOI":"10.5121\/ijdms.2022.14201"},{"key":"e_1_3_2_1_3_1","unstructured":"2022. PyTorch 1.11 TorchData and functorch are now available | PyTorch. https:\/\/pytorch.org\/blog\/pytorch-1.11-released\/ (Accessed on 07\/07\/2022) \t\t\t\t 2022. PyTorch 1.11 TorchData and functorch are now available | PyTorch. https:\/\/pytorch.org\/blog\/pytorch-1.11-released\/ (Accessed on 07\/07\/2022)"},{"key":"e_1_3_2_1_4_1","unstructured":"2022. TensorFlow. https:\/\/www.tensorflow.org\/ (Accessed on 03\/14\/2022) \t\t\t\t 2022. TensorFlow. https:\/\/www.tensorflow.org\/ (Accessed on 03\/14\/2022)"},{"key":"e_1_3_2_1_5_1","unstructured":"2022. Version 0.23.2 \u2014 scikit-learn 1.1.1 documentation. https:\/\/scikit-learn.org\/stable\/whats_new\/v0.23.html (Accessed on 07\/07\/2022) \t\t\t\t 2022. Version 0.23.2 \u2014 scikit-learn 1.1.1 documentation. https:\/\/scikit-learn.org\/stable\/whats_new\/v0.23.html (Accessed on 07\/07\/2022)"},{"volume-title":"Android: Low Memory Killer Daemon. https:\/\/source.android.com\/devices\/tech\/perf\/lmkd\/","year":"2022","key":"e_1_3_2_1_6_1","unstructured":"Android. 2022 . Android: Low Memory Killer Daemon. https:\/\/source.android.com\/devices\/tech\/perf\/lmkd\/ Android. 2022. Android: Low Memory Killer Daemon. https:\/\/source.android.com\/devices\/tech\/perf\/lmkd\/"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.334"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.636"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-15742-5_7"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2702123.2702486"},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3301418.3313946"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-23535-2_30"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-4828"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"crossref","unstructured":"Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE computational science and engineering 5 1 (1998) 46\u201355. \t\t\t\t Leonardo Dagum and Ramesh Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE computational science and engineering 5 1 (1998) 46\u201355.","DOI":"10.1109\/99.660313"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2020.113193"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/n19-1423"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00038"},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of Machine Learning and Systems 2019","author":"Eisenman Assaf","year":"2019","unstructured":"Assaf Eisenman , Maxim Naumov , Darryl Gardner , Misha Smelyanskiy , Sergey Pupyrev , Kim M. Hazelwood , Asaf Cidon , and Sachin Katti . 2019 . Bandana: Using Non-Volatile Memory for Storing Deep Learning Models . In Proceedings of Machine Learning and Systems 2019 , MLSys 2019, Stanford, CA, USA, March 31 - April 2, 2019, Ameet Talwalkar, Virginia Smith, and Matei Zaharia (Eds.). mlsys.org. https:\/\/proceedings.mlsys.org\/book\/277.pdf Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim M. Hazelwood, Asaf Cidon, and Sachin Katti. 2019. Bandana: Using Non-Volatile Memory for Storing Deep Learning Models. In Proceedings of Machine Learning and Systems 2019, MLSys 2019, Stanford, CA, USA, March 31 - April 2, 2019, Ameet Talwalkar, Virginia Smith, and Matei Zaharia (Eds.). mlsys.org. https:\/\/proceedings.mlsys.org\/book\/277.pdf"},{"key":"e_1_3_2_1_19_1","volume-title":"Reducing Transformer Depth on Demand with Structured Dropout. In 8th International Conference on Learning Representations, ICLR 2020","author":"Fan Angela","year":"2020","unstructured":"Angela Fan , Edouard Grave , and Armand Joulin . 2020 . Reducing Transformer Depth on Demand with Structured Dropout. In 8th International Conference on Learning Representations, ICLR 2020 , Addis Ababa, Ethiopia , April 26-30, 2020. OpenReview.net. https:\/\/openreview.net\/forum?id=SylO2yStDr Angela Fan, Edouard Grave, and Armand Joulin. 2020. Reducing Transformer Depth on Demand with Structured Dropout. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https:\/\/openreview.net\/forum?id=SylO2yStDr"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3241539.3241559"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD45719.2019.8942147"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3492321.3519565"},{"key":"e_1_3_2_1_23_1","volume-title":"Dally","author":"Han Song","year":"2016","unstructured":"Song Han , Huizi Mao , and William J . Dally . 2016 . Deep Compression : Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds .). arxiv:1510.00149 Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). arxiv:1510.00149"},{"key":"e_1_3_2_1_24_1","volume-title":"Proceedings of the 38th International Conference on Machine Learning, ICML 2021","volume":"4159","author":"He Chaoyang","year":"2021","unstructured":"Chaoyang He , Shen Li , Mahdi Soltanolkotabi , and Salman Avestimehr . 2021 . PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models . In Proceedings of the 38th International Conference on Machine Learning, ICML 2021 , 18-24 July 2021, Virtual Event, Marina Meila and Tong Zhang (Eds.) (Proceedings of Machine Learning Research , Vol. 139). PMLR, 4150\u2013 4159 . http:\/\/proceedings.mlr.press\/v139\/he21a.html Chaoyang He, Shen Li, Mahdi Soltanolkotabi, and Salman Avestimehr. 2021. PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, Marina Meila and Tong Zhang (Eds.) (Proceedings of Machine Learning Research, Vol. 139). PMLR, 4150\u20134159. http:\/\/proceedings.mlr.press\/v139\/he21a.html"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_2_1_26_1","volume-title":"DynaBERT: Dynamic BERT with Adaptive Width and Depth. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020","author":"Hou Lu","year":"2020","unstructured":"Lu Hou , Zhiqi Huang , Lifeng Shang , Xin Jiang , Xiao Chen , and Qun Liu . 2020 . DynaBERT: Dynamic BERT with Adaptive Width and Depth. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020 , NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc\u2019Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/6f5216f8d89b086c18298e043bfe48ed-Abstract.html Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, and Qun Liu. 2020. DynaBERT: Dynamic BERT with Adaptive Width and Depth. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc\u2019Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https:\/\/proceedings.neurips.cc\/paper\/2020\/hash\/6f5216f8d89b086c18298e043bfe48ed-Abstract.html"},{"key":"e_1_3_2_1_27_1","volume-title":"Pipeline Parallelism for Inference on Heterogeneous Edge Computing. CoRR, abs\/2110.14895","author":"Hu Yang","year":"2021","unstructured":"Yang Hu , Connor Imes , Xuanang Zhao , Souvik Kundu , Peter A. Beerel , Stephen P. Crago , and John Paul Walters . 2021. Pipeline Parallelism for Inference on Heterogeneous Edge Computing. CoRR, abs\/2110.14895 ( 2021 ), arXiv:2110.14895. arxiv:2110.14895 Yang Hu, Connor Imes, Xuanang Zhao, Souvik Kundu, Peter A. Beerel, Stephen P. Crago, and John Paul Walters. 2021. Pipeline Parallelism for Inference on Heterogeneous Edge Computing. CoRR, abs\/2110.14895 (2021), arXiv:2110.14895. arxiv:2110.14895"},{"key":"e_1_3_2_1_28_1","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019","author":"Huang Yanping","year":"2019","unstructured":"Yanping Huang , Youlong Cheng , Ankur Bapna , Orhan Firat , Dehao Chen , Mia Xu Chen , HyoukJoong Lee , Jiquan Ngiam , Quoc V. Le , Yonghui Wu , and Zhifeng Chen . 2019 . GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism . In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 , NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d\u2019Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.). 103\u2013112. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/093f65e080a295f8076b1c5722a46aa2-Abstract.html Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Xu Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d\u2019Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.). 103\u2013112. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/093f65e080a295f8076b1c5722a46aa2-Abstract.html"},{"key":"e_1_3_2_1_29_1","volume-title":"ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In 8th International Conference on Learning Representations, ICLR 2020","author":"Lan Zhenzhong","year":"2020","unstructured":"Zhenzhong Lan , Mingda Chen , Sebastian Goodman , Kevin Gimpel , Piyush Sharma , and Radu Soricut . 2020 . ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In 8th International Conference on Learning Representations, ICLR 2020 , Addis Ababa, Ethiopia , April 26-30, 2020. OpenReview.net. https:\/\/openreview.net\/forum?id=H1eA7AEtvS Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https:\/\/openreview.net\/forum?id=H1eA7AEtvS"},{"key":"e_1_3_2_1_30_1","volume-title":"End the Senseless Killing: Improving Memory Management for Mobile Operating Systems. In 2020 USENIX Annual Technical Conference, USENIX ATC 2020","author":"Lebeck Niel","year":"2020","unstructured":"Niel Lebeck , Arvind Krishnamurthy , Henry M. Levy , and Irene Zhang . 2020 . End the Senseless Killing: Improving Memory Management for Mobile Operating Systems. In 2020 USENIX Annual Technical Conference, USENIX ATC 2020 , July 15-17, 2020, Ada Gavrilovska and Erez Zadok (Eds.). USENIX Association, 873\u2013887. https:\/\/www.usenix.org\/conference\/atc20\/presentation\/lebeck Niel Lebeck, Arvind Krishnamurthy, Henry M. Levy, and Irene Zhang. 2020. End the Senseless Killing: Improving Memory Management for Mobile Operating Systems. In 2020 USENIX Annual Technical Conference, USENIX ATC 2020, July 15-17, 2020, Ada Gavrilovska and Erez Zadok (Eds.). USENIX Association, 873\u2013887. https:\/\/www.usenix.org\/conference\/atc20\/presentation\/lebeck"},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3074179"},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2815675.2815686"},{"key":"e_1_3_2_1_33_1","volume-title":"RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, abs\/1907.11692","author":"Liu Yinhan","year":"2019","unstructured":"Yinhan Liu , Myle Ott , Naman Goyal , Jingfei Du , Mandar Joshi , Danqi Chen , Omer Levy , Mike Lewis , Luke Zettlemoyer , and Veselin Stoyanov . 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, abs\/1907.11692 ( 2019 ), arXiv:1907.11692. arxiv:1907.11692 Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, abs\/1907.11692 (2019), arXiv:1907.11692. arxiv:1907.11692"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9413631"},{"key":"e_1_3_2_1_35_1","volume-title":"The DSP capabilities of arm cortex-m4 and cortex-m7 processors. ARM White Paper, 29","author":"Lorenser Thomas","year":"2016","unstructured":"Thomas Lorenser . 2016. The DSP capabilities of arm cortex-m4 and cortex-m7 processors. ARM White Paper, 29 ( 2016 ). Thomas Lorenser. 2016. The DSP capabilities of arm cortex-m4 and cortex-m7 processors. ARM White Paper, 29 (2016)."},{"volume-title":"The conversational interface. 6","author":"McTear Michael Frederick","key":"e_1_3_2_1_36_1","unstructured":"Michael Frederick McTear , Zoraida Callejas , and David Griol . 2016. The conversational interface. 6 , Springer . Michael Frederick McTear, Zoraida Callejas, and David Griol. 2016. The conversational interface. 6, Springer."},{"key":"e_1_3_2_1_37_1","volume-title":"Enabling Large NNs on Tiny MCUs with Swapping. CoRR, abs\/2101.08744","author":"Miao Hongyu","year":"2021","unstructured":"Hongyu Miao and Felix Xiaozhu Lin . 2021. Enabling Large NNs on Tiny MCUs with Swapping. CoRR, abs\/2101.08744 ( 2021 ), arXiv:2101.08744. arxiv:2101.08744 Hongyu Miao and Felix Xiaozhu Lin. 2021. Enabling Large NNs on Tiny MCUs with Swapping. CoRR, abs\/2101.08744 (2021), arXiv:2101.08744. arxiv:2101.08744"},{"key":"e_1_3_2_1_38_1","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019","author":"Michel Paul","year":"2019","unstructured":"Paul Michel , Omer Levy , and Graham Neubig . 2019 . Are Sixteen Heads Really Better than One? In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019 , NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d\u2019Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.). 14014\u201314024. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/2c601ad9d2ff9bc8b282670cdd54f69f-Abstract.html Paul Michel, Omer Levy, and Graham Neubig. 2019. Are Sixteen Heads Really Better than One? In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d\u2019Alch\u00e9-Buc, Emily B. Fox, and Roman Garnett (Eds.). 14014\u201314024. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/2c601ad9d2ff9bc8b282670cdd54f69f-Abstract.html"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359646"},{"key":"e_1_3_2_1_40_1","volume-title":"TinyStack: A Minimal GPU Stack for Client ML. CoRR, abs\/2105.05085","author":"Park Heejin","year":"2021","unstructured":"Heejin Park and Felix Xiaozhu Lin . 2021. TinyStack: A Minimal GPU Stack for Client ML. CoRR, abs\/2105.05085 ( 2021 ), arXiv:2105.05085. arxiv:2105.05085 Heejin Park and Felix Xiaozhu Lin. 2021. TinyStack: A Minimal GPU Stack for Client ML. CoRR, abs\/2105.05085 (2021), arXiv:2105.05085. arxiv:2105.05085"},{"key":"e_1_3_2_1_41_1","volume-title":"Sinclair","author":"Pati Suchita","year":"2021","unstructured":"Suchita Pati , Shaizeen Aga , Nuwan Jayasena , and Matthew D . Sinclair . 2021 . Demystifying BERT: Implications for Accelerator Design. CoRR , abs\/2104.08335 (2021), arXiv:2104.08335. arxiv:2104.08335 Suchita Pati, Shaizeen Aga, Nuwan Jayasena, and Matthew D. Sinclair. 2021. Demystifying BERT: Implications for Accelerator Design. CoRR, abs\/2104.08335 (2021), arXiv:2104.08335. arxiv:2104.08335"},{"key":"e_1_3_2_1_42_1","volume-title":"BiBERT: Accurate Fully Binarized BERT. In The Tenth International Conference on Learning Representations, ICLR 2022","author":"Qin Haotong","year":"2022","unstructured":"Haotong Qin , Yifu Ding , Mingyuan Zhang , Qinghua Yan , Aishan Liu , Qingqing Dang , Ziwei Liu , and Xianglong Liu . 2022 . BiBERT: Accurate Fully Binarized BERT. In The Tenth International Conference on Learning Representations, ICLR 2022 , Virtual Event , April 25-29, 2022. OpenReview.net. https:\/\/openreview.net\/forum?id=5xEgrl_5FAJ Haotong Qin, Yifu Ding, Mingyuan Zhang, Qinghua Yan, Aishan Liu, Qingqing Dang, Ziwei Liu, and Xianglong Liu. 2022. BiBERT: Accurate Fully Binarized BERT. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https:\/\/openreview.net\/forum?id=5xEgrl_5FAJ"},{"key":"e_1_3_2_1_43_1","volume-title":"Language models are unsupervised multitask learners. OpenAI blog, 1, 8","author":"Radford Alec","year":"2019","unstructured":"Alec Radford , Jeffrey Wu , Rewon Child , David Luan , Dario Amodei , and Ilya Sutskever . 2019. Language models are unsupervised multitask learners. OpenAI blog, 1, 8 ( 2019 ), 9. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1, 8 (2019), 9."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_2_1_45_1","volume-title":"a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs\/1910.01108","author":"Sanh Victor","year":"2019","unstructured":"Victor Sanh , Lysandre Debut , Julien Chaumond , and Thomas Wolf . 2019. DistilBERT , a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs\/1910.01108 ( 2019 ), arXiv:1910.01108. arxiv:1910.01108 Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs\/1910.01108 (2019), arXiv:1910.01108. arxiv:1910.01108"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1925019.1925023"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9414560"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.195"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480095"},{"key":"e_1_3_2_1_50_1","volume-title":"Efficient Transformers: A Survey. CoRR, abs\/2009.06732","author":"Tay Yi","year":"2020","unstructured":"Yi Tay , Mostafa Dehghani , Dara Bahri , and Donald Metzler . 2020 . Efficient Transformers: A Survey. CoRR, abs\/2009.06732 (2020), arXiv:2009.06732. arxiv:2009.06732 Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2020. Efficient Transformers: A Survey. CoRR, abs\/2009.06732 (2020), arXiv:2009.06732. arxiv:2009.06732"},{"key":"e_1_3_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397271.3401441"},{"key":"e_1_3_2_1_52_1","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017 . Attention is All you Need . In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 , December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998\u20136008. https:\/\/proceedings.neurips.cc\/paper\/ 2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998\u20136008. https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/p19-1580"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3476886.3477511"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/w18-5446"},{"volume-title":"Efficient Algorithms and Hardware for Natural Language Processing","author":"Wang Hanrui","key":"e_1_3_2_1_56_1","unstructured":"Hanrui Wang . 2020. Efficient Algorithms and Hardware for Natural Language Processing . Massachusetts Institute of Technology . Hanrui Wang. 2020. Efficient Algorithms and Hardware for Natural Language Processing. Massachusetts Institute of Technology."},{"key":"e_1_3_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.686"},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00881"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447993.3448625"},{"key":"e_1_3_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446763"},{"key":"e_1_3_2_1_61_1","volume-title":"A Note on Latency Variability of Deep Neural Networks for Mobile Inference. CoRR, abs\/2003.00138","author":"Yang Luting","year":"2020","unstructured":"Luting Yang , Bingqian Lu , and Shaolei Ren . 2020. A Note on Latency Variability of Deep Neural Networks for Mobile Inference. CoRR, abs\/2003.00138 ( 2020 ), arXiv:2003.00138. arxiv:2003.00138 Luting Yang, Bingqian Lu, and Shaolei Ren. 2020. A Note on Latency Variability of Deep Neural Networks for Mobile Inference. CoRR, abs\/2003.00138 (2020), arXiv:2003.00138. arxiv:2003.00138"},{"key":"#cr-split#-e_1_3_2_1_62_1.1","doi-asserted-by":"crossref","unstructured":"Rongjie Yi Ting Cao Ao Zhou Xiao Ma Shangguang Wang and Mengwei Xu. 2022. Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices. https:\/\/doi.org\/10.48550\/ARXIV.2206.07446 10.48550\/ARXIV.2206.07446","DOI":"10.3390\/electronics10182206"},{"key":"#cr-split#-e_1_3_2_1_62_1.2","unstructured":"Rongjie Yi Ting Cao Ao Zhou Xiao Ma Shangguang Wang and Mengwei Xu. 2022. Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices. https:\/\/doi.org\/10.48550\/ARXIV.2206.07446"},{"key":"e_1_3_2_1_63_1","volume-title":"8th International Conference on Learning Representations, ICLR 2020","author":"Yu Haonan","year":"2020","unstructured":"Haonan Yu , Sergey Edunov , Yuandong Tian , and Ari S. Morcos . 2020. Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP . In 8th International Conference on Learning Representations, ICLR 2020 , Addis Ababa, Ethiopia , April 26-30, 2020 . OpenReview.net. https:\/\/openreview.net\/forum?id=S1xnXRVFwH Haonan Yu, Sergey Edunov, Yuandong Tian, and Ari S. Morcos. 2020. Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https:\/\/openreview.net\/forum?id=S1xnXRVFwH"},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00071"},{"key":"e_1_3_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/EMC2-NIPS53020.2019.00016"},{"key":"e_1_3_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00716"}],"event":{"name":"ASPLOS '23: 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","SIGOPS ACM Special Interest Group on Operating Systems","SIGPLAN ACM Special Interest Group on Programming Languages"],"location":"Vancouver BC Canada","acronym":"ASPLOS '23"},"container-title":["Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3575693.3575698","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T15:31:37Z","timestamp":1679931097000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3575693.3575698"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,27]]},"references-count":67,"alternative-id":["10.1145\/3575693.3575698","10.1145\/3575693"],"URL":"https:\/\/doi.org\/10.1145\/3575693.3575698","relation":{},"subject":[],"published":{"date-parts":[[2023,1,27]]},"assertion":[{"value":"2023-01-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}