Abstract
Parallel computing is very important in providing the computing speed and scalability needed for large scale data mining applications. In order to achieve a good performance, a good scheduling of parallel tasks is very important. This paper proposes and evaluates various scheduling strategies for parallel FI-growth data mining. We show that the execution time of parallel data mining on multicore cluster systems depends on a task scheduling strategy used. Using simulation, we compare 9 strategies on 8 to 64 core multicore cluster systems. The results show that selecting the right strategy can substantially reduce the execution time of parallel data mining on multicore cluster systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)
Agarwal, R.C., Aggarwal, C.C., Prasad, V.V.V.: A tree projection algorithm for generation of frequent item sets. J. Parallel Distrib. Compute. 61, 350–371 (2001)
Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. of the ACM SIGMOD International Conference on Management of Data, pp. 1–12 (2000)
Amphawan, K., Surarerks, A.: An Approach of Frequent Item Tree for Association Generation. In: Proc. of the IASTED Conference on Artificial Intelligence and Soft Computing (2005)
Zaki, M.J.: Parallel and Distributed Association Mining: A Survey. IEEE Concurrency 7(4), 14–25 (1999)
Agrawal, R., Shafer, J.C.: Parallel Mining of Association Rules. IEEE Transactions on Knowledge and Data Engineering 8(6), 962–969 (1996)
Park, J.S., Chen, M., Yu, P.S.: An Effective Hash-Based Algorithm for Mining Association Rules. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 175–186 (1995)
Zaïane, O.R., El-Hajj, M., Lu, P.: Fast Parallel Association Rule Mining without Candidacy Generation. In: Proceedings of the IEEE International Conference on Data Mining, pp. 665–668 (2001)
Javed, A., Khokhar, A.: Frequent Pattern Mining on Message Passing Multiprocessor Systems. Distributed and Parallel Databases 16(3), 321–334 (2004)
Skillicorn, D.B.: Strategies for parallel data mining. IEEE Concurrency 7, 26–35 (1999)
Manaskasemsak, B., Benjamas, N., Rungsawang, A., Surarerks, A., Uthayopas, P.: Parallel association rule mining based on FI-growth algorithm. In: Proceedings of the International Conference Parallel and Distributed System, vol. 2, pp. 1–8 (2007)
Srikant, R.: Synthetic Data Generation Code for Association and Sequential patterns. IBM Quest, http://www.almaden.ibm.com/cs/projects/iis/hdb/Projects/data_mining/datasets/syndata.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Benjamas, N., Uthayopas, P. (2009). An Impact of Scheduling Strategy to Parallel FI-Growth Data Mining Algorithm. In: Papasratorn, B., Chutimaskul, W., Porkaew, K., Vanijja, V. (eds) Advances in Information Technology. IAIT 2009. Communications in Computer and Information Science, vol 55. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10392-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-10392-6_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10391-9
Online ISBN: 978-3-642-10392-6
eBook Packages: Computer ScienceComputer Science (R0)