Abstract
The LISp-Miner system for data mining and knowledge discovery uses the GUHA method to comb through a large data base and finds 2 × 2 contingency tables that satisfy a certain condition given by generalised quantifiers and thereby suggest the existence of possible relations between attributes. In this paper, we show how a more detailed interpretation of the data in the tables that were found by GUHA can be obtained using Bayesian statistical methods. Using a multinomial sampling model and Dirichlet prior, we derive posterior distributions for parameters that correspond to GUHA generalised quantifiers. Examples are presented illustrating the new Bayesian post-processing tools implemented in LISp-Miner. A statistical model for the analysis of contingency tables for data from two subpopulations is also presented.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Balakrishnan, N., & Nevzorov, V.B. (2003). A primer on statistical distributions. New York: Wiley.
Berry, D.A. (1996). Statistics: A Bayesian perspective. Duxberry Press.
Bolstad, W. (2007). Introduction to Bayesian statistics (2nd ed.). New York: Wiley.
Cook, J.D. (2009). Exact calculation of beta inequalities. Technical Report 54, University of Texax M. D. Anderson Cancer Center Department of Biostatistics. http://biostats.bepress.com/mdandersonbiostat/paper54. Accessed 19 June 2013
Cools, R. (2003). An encyclopaedia of cubature formulas. Journal of Complexity, 19, 445–453.
Dardzinska, A. (2013). Action rules mining. Studies in Computational Intelligence (Vol. 468). Springer.
Devroye, L. (1986). Non-uniform random variate generation. New York: Springer. Web Edition http://www.nrbook.com/devroye/. Accessed 19 June 2013
Eerola, H. (2009). Lääketieteellisen datan analysointia GUHA-tiedonlouhintamenetelmällä (in Finnish). Master’s thesis, Tampere University of Technology.
Frigyik, B., Kapila, A., Gupta, M. (2010). Introduction to the Dirichlet distribution and related processes. Technical Report UWEETR-2010-0006, University of Washington Information Design Lab. http://ee.washington.edu/research/guptalab/publications/UWEETR-2010-0006.pdf.
Hájek, P., & Havránek, T. (1978). Mechanizing hypothesis formation: Mathematical foundations for a general theory. Springer. http://www.cs.cas.cz/hajek/guhabook/. Accessed 19 June 2013
Hájek, P., Havel, I., Chytil, M. (1966). The GUHA method of automatic hypotheses determination. Computing, 1, 293–308. ISSN 0010-485X. doi:10.1007/BF02345483.
Hájek, P., Holeňa, M., Rauch, J. (2010). The GUHA method and its meaning for data mining. Journal of Computer and System Sciences, 76(1), 34–48. ISSN 0022-0000. doi:10.1016/j.jcss.2009.05.004.
Hubbard, R. (2011). The widespread misinterpretation of p-values as error probabilities. Journal of Applied Statistics, 38(11), 2617–2626. ISSN 0266-4763 (print), 1360-0532 (electronic). doi:10.1080/02664763.2011.567245.
Kotz, S., Balakrishnan, N., Johnson, N.L. (2000). Continuous multivariate distributions, volume 1: Models and applications (2nd ed.). New York: Wiley.
Lee, P.M. (2012). Bayesian statistics: An introduction. New York: Wiley.
Myllymäki, P., Silander, T., Tirri, H., Uronen, P. (2002). B-course contraceptive method choice dataset. http://b-course.cs.helsinki.fi/obc/cmcexpl.html. Accessed 19 June 2013
Ng, K.W., Tian, G., Tang, M. (2011). Dirichlet and related distributions. New York: Wiley.
Pham-Gia, T., Turkkan, N., Eng, P. (1993). Bayesian analysis of the difference of two proportions. Communications in Statistics Theory and Methods, 22(6), 1755–1771.
Piché, R., & Turunen, E. (2010). Bayesian assaying of GUHA nuggets. In E. Hüllermeier, R. Kruse, F. Hoffmann (Eds.), Information processing and management of uncertainty in knowledge-based systems. Theory and Methods, Communications in computer and information science (Vol. 80, pp. 348–355). doi:10.1007/978-3-642-14055-6.
Ras, Z., & Wieczorkowska, A. (2000). Action-rules: How to increase profit of a company. In D. Zighed, J. Komorowski, J. Zytkow (Eds.), Principles of data mining and knowledge discovery. Lecture notes in computer science (Vol. 1910, pp. 75–116). Springer. ISBN 978-3-540-41066-9. doi:10.1007/3-540-45372-5_70.
Rauch, J. (2005). Logic of association rules. Applied Intelligence, 22, 9–28.
Rauch, J. (2009). Considerations on logical calculi for dealing with knowledge in data mining online. Applied Intelligence, 22, 177–201.
Rauch, J. (2013). Observational calculi and association rules. Studies in computational intelligence. Springer.
Rauch, J., & Šimůnek, M. (2005). An alternative approach to mining association rules. In T.Y. Lin, S. Ohsuga, C.-J. Liau, X. Hu, S. Tsumoto (Eds.), Foundations of data mining and knowledge discovery. Studies in computational intelligence (Vol. 6, pp. 211–231). Springer. ISBN 978-3-540-26257-2. doi:10.1007/11498186_13.
Rauch, J., & Šimůnek, M. (2009). Dealing with background knowledge in the sewebar project. In B. Berendt, D. Mladenic, M. de Gemmis, G. Semeraro, M. Spiliopoulou, G. Stumme, V. Svatek, F. Železnỳ (Eds.), Knowledge discovery enhanced with semantic and social information (pp. 89–106). Springer.
Rauch, J., & Šimůnek, M. (2012). LISp-Miner project homepage. http://lispminer.vse.cz/ (online). Accessed 21 Sep 2012.
Roussas, G. (1997). A course in mathematical statistics (2nd ed.). New York: Academic.
Šimůnek, M. (2003). Academic KDD project LISp-Miner. In A. Abraham, K. Franke, K. Koppen (Eds.), Intelligent systems design and applications, advances in soft computing (pp. 263–272). Springer.
Šimundić, A.-M. & Nikolac, N. (2009). Statistical errors in manuscripts submitted to biochemia medica journal. Biechemia Medica, 19(3), 294–300.
Turunen, E. (2012). The GUHA method in data mining. Lecture notes. Tampere University of Technology. http://URN.fi/URN:NBN:fi:tty-201209261292. Accessed 19 June 2013
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Piché, R., Järvenpää, M., Turunen, E. et al. Bayesian analysis of GUHA hypotheses. J Intell Inf Syst 42, 47–73 (2014). https://doi.org/10.1007/s10844-013-0255-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-013-0255-6