{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T10:40:11Z","timestamp":1740134411832,"version":"3.37.3"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2014,5,26]],"date-time":"2014-05-26T00:00:00Z","timestamp":1401062400000},"content-version":"vor","delay-in-days":25,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000145","name":"Division of Information and Intelligent Systems","doi-asserted-by":"publisher","award":["IIS-091710 and IIS-1018443"],"id":[{"id":"10.13039\/100000145","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Database Syst."],"published-print":{"date-parts":[[2014,5]]},"abstract":"\n Distributed stream processing systems must function efficiently for data streams that fluctuate in their arrival rates and data distributions. Yet repeated and prohibitively expensive load reallocation across machines may make these systems ineffective, potentially resulting in data loss or even system failure. To overcome this problem, we propose a comprehensive solution, called the Robust Load Distribution (RLD) strategy, that is resilient under data fluctuations. RLD provides \u03f5-optimal query performance under an expected range of load fluctuations without suffering from the performance penalty caused by load migration. RLD is based on three key strategies. First, we model robust distributed stream processing as a parametric query optimization problem in a parameter space that captures the stream fluctuations. The notions of both robust logical and robust physical plans that work together to proactively handle all ranges of expected fluctuations in parameters are abstracted as overlays of this parameter space. Second, our Early-terminated Robust Partitioning (\n ERP<\/jats:italic>\n ) finds a combination of robust logical plans that together cover the parameter space, while minimizing the number of prohibitively expensive optimizer calls with a\n probabilistic bound<\/jats:italic>\n on the space coverage. Third, we design a family of algorithms for physical plan generation. Our\n GreedyPhy<\/jats:italic>\n exploits a probabilistic model to efficiently find a robust physical plan that sustains most frequently used robust logical plans at runtime. Our\n CorPhy<\/jats:italic>\n algorithm exploits operator correlations for the robust physical plan optimization. The resulting physical plan smooths the workload on each node under all expected fluctuations. Our\n OptPrune<\/jats:italic>\n algorithm, using\n CorPhy<\/jats:italic>\n as baseline, is guaranteed to find the optimal physical plan that maximizes the parameter space coverage with a practical increase in optimization time. Lastly, we further expand the capabilities of our proposed RLD framework to also appropriately react under so-called \u201cspace drifts\u201d, that is, a space drift is a change of the parameter space where the observed runtime statistics deviate from the expected optimization-time statistics. Our RLD solution is capable of adjusting itself to the unexpected yet significant data fluctuations beyond those planned for via covering the parameter space. Our experimental study using stock market and sensor network streams demonstrates that our RLD methodology consistently outperforms state-of-the-art solutions in terms of efficiency and effectiveness in highly fluctuating data stream environments.\n <\/jats:p>","DOI":"10.1145\/2602138","type":"journal-article","created":{"date-parts":[[2014,5,27]],"date-time":"2014-05-27T12:56:59Z","timestamp":1401195419000},"page":"1-45","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Robust Distributed Query Processing for Streaming Data"],"prefix":"10.1145","volume":"39","author":[{"given":"Chuan","family":"Lei","sequence":"first","affiliation":[{"name":"Worcester Polytechnic Institute, Worcester, MA"}]},{"given":"Elke A.","family":"Rundensteiner","sequence":"additional","affiliation":[{"name":"Worcester Polytechnic Institute, Worcester, MA"}]}],"member":"320","published-online":{"date-parts":[[2014,5,26]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the Conference on Innovative Data Systems Research (CIDR'05)","author":"Abadi Daniel J.","year":"2005","unstructured":"Daniel J. Abadi , Yanif Ahmad , Magdalena Balazinska , Mitch Cherniack , Jeong Hyon Hwang , Wolfgang Lindner , Anurag S. Maskey , Er Rasin , Esther Ryvkina , Nesime Tatbul , Ying Xing , and Stan Zdonik . 2005 . The design of the borealis stream processing engine . In Proceedings of the Conference on Innovative Data Systems Research (CIDR'05) . Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Mitch Cherniack, Jeong Hyon Hwang, Wolfgang Lindner, Anurag S. Maskey, Er Rasin, Esther Ryvkina, Nesime Tatbul, Ying Xing, and Stan Zdonik. 2005. The design of the borealis stream processing engine. In Proceedings of the Conference on Innovative Data Systems Research (CIDR'05)."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-003-0095-z"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.2307\/1267471"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066172"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1066157.1066171"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007615"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/1251175.1251190"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10898-009-9496-x"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2008.160"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/872757.872857"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.5555\/977401.978067"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807226"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/INM.2011.5990564"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the Conference on Innovative Data Systems Research (CIDR'03)","author":"Cherniack Mitch","year":"2003","unstructured":"Mitch Cherniack , Hari Balakrishnan , Magdalena Balazinska , Don Carney , Ugur Cetintemel , 2003 . Scalable distributed stream processing . In Proceedings of the Conference on Innovative Data Systems Research (CIDR'03) . 257--268. Mitch Cherniack, Hari Balakrishnan, Magdalena Balazinska, Don Carney, Ugur Cetintemel, et al. 2003. Scalable distributed stream processing. In Proceedings of the Conference on Innovative Data Systems Research (CIDR'03). 257--268."},{"key":"e_1_2_1_15_1","volume-title":"Thomas","author":"Cover Thomas M.","year":"1991","unstructured":"Thomas M. Cover and Joy A . Thomas . 1991 . Elements of Information Theory. Wiley-Interscience , New York. Thomas M. Cover and Joy A. Thomas. 1991. Elements of Information Theory. Wiley-Interscience, New York."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453976"},{"volume-title":"Load Balancing Strategies for Distributed Memory Machines","author":"Diekman Ralf","key":"e_1_2_1_17_1","unstructured":"Ralf Diekman and Robert Preis . 1999. Load Balancing Strategies for Distributed Memory Machines . Civil-Comp Press , 124--157. Ralf Diekman and Robert Preis. 1999. Load Balancing Strategies for Distributed Memory Machines. Civil-Comp Press, 124--157."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/67544.66960"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/615172.615176"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/169725.169708"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/s007780050037"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/276304.276315"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2011.5767851"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/371578.371598"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2013.6544877"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/161982.162007"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0377-2217(02)00123-6"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007642"},{"key":"e_1_2_1_29_1","volume-title":"Machine Learning","author":"Mitchell Thomas M.","unstructured":"Thomas M. Mitchell . 1997. Machine Learning 1 st Ed. McGraw-Hill , New York . Thomas M. Mitchell. 1997. Machine Learning 1st Ed. McGraw-Hill, New York.","edition":"1"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the Conference on Innovative Data Systems Research (CIDR'03)","author":"Motwani Rajeev","year":"2003","unstructured":"Rajeev Motwani , Jennifer Widom , Arvind Arasu , Brian Babcock , Shivnath Babu , 2003 . Query processing, approximation, and resource management in a data stream management system . In Proceedings of the Conference on Innovative Data Systems Research (CIDR'03) . 245--256. Rajeev Motwani, Jennifer Widom, Arvind Arasu, Brian Babcock, Shivnath Babu, et al. 2003. Query processing, approximation, and resource management in a data stream management system. In Proceedings of the Conference on Innovative Data Systems Research (CIDR'03). 245--256."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1516360.1516452"},{"volume-title":"Random Variables, and Stochastic Processes","author":"Papoulis Athanasios","key":"e_1_2_1_32_1","unstructured":"Athanasios Papoulis . 1984. Probability , Random Variables, and Stochastic Processes . McGraw Hill . Athanasios Papoulis. 1984. Probability, Random Variables, and Stochastic Processes. McGraw Hill."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3402707.3402751"},{"volume-title":"Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05)","author":"Reddy Naveen","key":"e_1_2_1_34_1","unstructured":"Naveen Reddy and Jayant R. Haritsa . 2005. Analyzing plan diagrams of database query optimizers . In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05) . 1228--1239. Naveen Reddy and Jayant R. Haritsa. 2005. Analyzing plan diagrams of database query optimizers. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). 1228--1239."},{"volume-title":"Proceedings of the 19th International Conference on Data Engineering (ICDE'03)","author":"Shah Mehul A.","key":"e_1_2_1_35_1","unstructured":"Mehul A. Shah , Joseph M. Hellerstein , Sirish Chandrasekaran , and Michael J. Franklin . 2003. Flux: An adaptive partitioning operator for continuous query systems . In Proceedings of the 19th International Conference on Data Engineering (ICDE'03) . 25--36. Mehul A. Shah, Joseph M. Hellerstein, Sirish Chandrasekaran, and Michael J. Franklin. 2003. Flux: An adaptive partitioning operator for continuous query systems. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03). 25--36."},{"key":"e_1_2_1_36_1","unstructured":"Behrooz A. Shirazi Krishna M. Kavi and Ali R. Hurson Eds. 1995. Scheduling and Load Balancing in Parallel and Distributed Systems. IEEE Computer Society Press. Behrooz A. Shirazi Krishna M. Kavi and Ali R. Hurson Eds. 1995. Scheduling and Load Balancing in Parallel and Distributed Systems. IEEE Computer Society Press."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1099554.1099595"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the 29th International Conference on Very Large Data Bases (VLDB'03)","volume":"29","author":"Tian Feng","unstructured":"Feng Tian and David J . DeWitt. 2003. Tuple routing strategies for distributed eddies . In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB'03) , vol. 29 . 333--344. Feng Tian and David J. DeWitt. 2003. Tuple routing strategies for distributed eddies. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB'03), vol. 29. 333--344."},{"key":"e_1_2_1_39_1","unstructured":"TradingMarkets. 2013. http:\/\/www.tradingmarkets.com\/. TradingMarkets. 2013. http:\/\/www.tradingmarkets.com\/."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/11564126_73"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89856-6_16"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.5555\/1182635.1164194"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2005.53"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/1007568.1007617"}],"container-title":["ACM Transactions on Database Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2602138","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,30]],"date-time":"2022-12-30T20:38:24Z","timestamp":1672432704000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2602138"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,5]]},"references-count":44,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2014,5]]}},"alternative-id":["10.1145\/2602138"],"URL":"https:\/\/doi.org\/10.1145\/2602138","relation":{},"ISSN":["0362-5915","1557-4644"],"issn-type":[{"type":"print","value":"0362-5915"},{"type":"electronic","value":"1557-4644"}],"subject":[],"published":{"date-parts":[[2014,5]]},"assertion":[{"value":"2013-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-05-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}