Abstract
In the era of information explosion, huge amount of data are generated from various sensing devices continuously, which are often too low level for analytics purpose, and too massive to load to data-warehouses for filtering and summarizing with the reasonable latency. Distributed stream analytics for multilevel abstraction is the key to solve this problem.
We advocate a distributed infrastructure for CDR (Call Detail Record) stream analytics in the telecommunication network where the stream processing is integrated into the database engine, and carried out in terms of continuous querying; the computation model is based on network-distributed (rather than clustered) Map-Reduce scheme. We propose the window based cooperation mechanism for having multiple engines synchronized and cooperating on the data falling in a common window boundary, based on time, cardinality, etc. This mechanism allows the engines to cooperate window by window without centralized coordination. We further propose the quantization mechanism for integrating the discretization and abstraction of continuous-valued data, for efficient and incremental data reduction, and in turn, network data movement reduction. These mechanisms provide the key roles in scaling out CDR stream analytics.
The proposed approach has been integrated into the PostgreSQL engine.
Our preliminary experiments reveal its merit for large-scale distributed stream processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arasu, A., Babu, S., Widom, J.: The CQL Continuous Query Language: Semantic Founda-tions and Query Execution. VLDB Journal 2(15) (June 2006)
Chandrasekaran, S., et al.: TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In: CIDR 2003 (2003)
Chen, Q., Therber, A., Hsu, M., Zeller, H., Zhang, B., Wu, R.: Efficiently Support Map-Reduce alike Computation Models Inside Parallel DBMS. In: Proc. Thirteenth International Database Engineering & Applications Symposium, IDEAS’09 (2009)
Chen, Q., Hsu, M., Liu, R.: Extend UDF Technology for Integrated Analytics. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 256–270. Springer, Heidelberg (2009)
Chen, Q., Hsu, M.: Data-Continuous SQL Process Model. In: Proc. 16th International Conference on Cooperative Information Systems, CoopIS’08 (2008)
Dean, J.: Experiences with MapReduce, an abstraction for large-scale computation. In: Int. Conf. on Parallel Architecture and Compilation Techniques. ACM, New York (2006)
DeWitt, D.J., Paulson, E., Robinson, E., Naughton, J., Royalty, J., Shankar, S., Krioukov, A.: Clustera: An Integrated Computation and Data Management System. In: VLDB 2008 (2008)
Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: 13th Int. Joint Conf. on Artificial Intelligence (1993)
Franklin, M.J., et al.: Continuous Analytics: Rethinking Query Processing in a Net-work-Effect World. In: CIDR 2009 (2009)
Gedik, B., Andrade, H., Wu, K.-L., Yu, P.S., Doo, M.C.: SPADE: The System S Declarative Stream Processing Engine. In: ACM SIGMOD 2008 (2008)
Greenplum: Greenplum MapReduce for the Petabytes Database (2008), http://www.greenplum.com/resources/MapReduce/
Liarou, E., et al.: Exploiting the Power of Relational Databases for Efficient Stream Processing. In: EDBT 2009 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, Q., Hsu, M. (2010). Scale Out Parallel and Distributed CDR Stream Analytics. In: Hameurlain, A., Morvan, F., Tjoa, A.M. (eds) Data Management in Grid and Peer-to-Peer Systems. Globe 2010. Lecture Notes in Computer Science, vol 6265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15108-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-15108-8_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15107-1
Online ISBN: 978-3-642-15108-8
eBook Packages: Computer ScienceComputer Science (R0)