{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,4,29]],"date-time":"2024-04-29T13:58:12Z","timestamp":1714399092342},"reference-count":17,"publisher":"Association for Computing Machinery (ACM)","issue":"13","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2014,8]]},"abstract":"\n MapReduce based data-intensive computing solutions are increasingly deployed as production systems. Unlike Internet companies who invent and adopt the technology from the very beginning, traditional enterprises demand easy-to-use software due to the limited capabilities of administrators. Automatic job optimization software for MapReduce is a promising technique to satisfy such requirements. In this paper, we introduce a toolkit from IBM, called\n MRTuner<\/jats:italic>\n , to enable holistic optimization for MapReduce jobs. In particular, we propose a novel\n Producer-Transporter-Consumer<\/jats:italic>\n (PTC) model, which characterizes the tradeoffs in the parallel execution among tasks. We also carefully investigate the complicated relations among about twenty parameters, which have significant impact on the job performance. We design an efficient search algorithm to find the optimal execution plan. Finally, we conduct a thorough experimental evaluation on two different types of clusters using the HiBench suite which covers various Hadoop workloads from GB to TB size levels. The results show that the search latency of\n MRTuner<\/jats:italic>\n is a few orders of magnitude faster than that of the state-of-the-art cost-based optimizer, and the effectiveness of the optimized execution plan is also significantly improved.\n <\/jats:p>","DOI":"10.14778\/2733004.2733005","type":"journal-article","created":{"date-parts":[[2015,5,12]],"date-time":"2015-05-12T15:37:52Z","timestamp":1431445072000},"page":"1319-1330","source":"Crossref","is-referenced-by-count":57,"title":["MRTuner"],"prefix":"10.14778","volume":"7","author":[{"given":"Juwei","family":"Shi","sequence":"first","affiliation":[{"name":"IBM Research - China, Beijing, China and Renmin University of China, Beijing, China"}]},{"given":"Jia","family":"Zou","sequence":"additional","affiliation":[{"name":"IBM Research - China, Beijing, China"}]},{"given":"Jiaheng","family":"Lu","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}]},{"given":"Zhao","family":"Cao","sequence":"additional","affiliation":[{"name":"IBM Research - China, Beijing, China"}]},{"given":"Shiqiang","family":"Li","sequence":"additional","affiliation":[{"name":"IBM Research - China, Beijing, China"}]},{"given":"Chen","family":"Wang","sequence":"additional","affiliation":[{"name":"IBM Research - China, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2014,8]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/275487.275492"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213556.2213558"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/173284.155333"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465308"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/VTDC.2006.17"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402707.3402746"},{"key":"e_1_2_1_7_1","first-page":"261","volume-title":"CIDR","author":"Herodotou H.","year":"2011","unstructured":"H. Herodotou , H. Lim , G. Luo , N. Borisov , L. Dong , F. B. Cetin , and S. Babu . Starfish: A self-tuning system for big data analytics . In CIDR , pages 261 -- 272 , 2011 . H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu. Starfish: A self-tuning system for big data analytics. In CIDR, pages 261--272, 2011."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW.2010.5452747"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213840"},{"key":"e_1_2_1_10_1","volume-title":"Proc. of Nokia Mobile Data Challenge Workshop","author":"Laurila J. K.","year":"2012","unstructured":"J. K. Laurila , D. Gatica-Perez , I. Aad , O. Bornet , T.-M.-T. Do , O. Dousse , J. Eberle , and M. Miettinen . The mobile data challenge: Big data for mobile computing research . In Proc. of Nokia Mobile Data Challenge Workshop , 2012 . J. K. Laurila, D. Gatica-Perez, I. Aad, O. Bornet, T.-M.-T. Do, O. Dousse, J. Eberle, and M. Miettinen. The mobile data challenge: Big data for mobile computing research. In Proc. of Nokia Mobile Data Challenge Workshop, 2012."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989426"},{"key":"e_1_2_1_12_1","volume-title":"Sort Benchmark","author":"Malley O.","year":"2009","unstructured":"O. O Malley and A. C. Murthy . Winning a 60 second dash with a yellow elephant . Sort Benchmark , 2009 . O. OMalley and A. C. Murthy. Winning a 60 second dash with a yellow elephant. Sort Benchmark, 2009."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2391229.2391250"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/79173.79181"},{"key":"e_1_2_1_15_1","first-page":"1","volume-title":"MASCOTS","author":"Wang G.","year":"2009","unstructured":"G. Wang , A. R. Butt , P. Pandey , and K. Gupta . A simulation approach to evaluating design decisions in mapreduce setups . In MASCOTS , pages 1 -- 11 , 2009 . G. Wang, A. R. Butt, P. Pandey, and K. Gupta. A simulation approach to evaluating design decisions in mapreduce setups. In MASCOTS, pages 1--11, 2009."},{"key":"e_1_2_1_16_1","volume-title":"Hadoop: The Definitive Guide. O'Reilly","author":"White T.","year":"2012","unstructured":"T. White . Hadoop: The Definitive Guide. O'Reilly , 2012 . T. White. Hadoop: The Definitive Guide. O'Reilly, 2012."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/SCC.2010.41"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/2733004.2733005","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:38:25Z","timestamp":1672220305000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/2733004.2733005"}},"subtitle":["a toolkit to enable holistic optimization for mapreduce jobs"],"short-title":[],"issued":{"date-parts":[[2014,8]]},"references-count":17,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2014,8]]}},"alternative-id":["10.14778\/2733004.2733005"],"URL":"https:\/\/doi.org\/10.14778\/2733004.2733005","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2014,8]]}}}