Abstract
An investigation into the extraction of useful information from the free text element of questionnaires, using a semi-automated summarisation extraction technique to generate text summarisation classifiers, is described. A realisation of the proposed technique, SARSET (Semi-Automated Rule Summarisation Extraction Tool), is presented and evaluated using real questionnaire data. The results of this approach are compared against the results obtained using two alternative techniques to build text summarisation classifiers. The first of these uses standard rule-based classifier generators, and the second is founded on the concept of building classifiers using secondary data. The results demonstrate that the proposed semi-automated approach outperforms the other two approaches considered.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abd-Elrahman, A., Andreu, M., Abbott, T.: Using text data mining techniques for understanding free-style question answers in course evaluation forms. Research in Higher Education Journal 9, 11–21 (2010)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499 (1994)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval, vol. 463. ACM press, New York (1999)
Chen, Y.L., Weng, C.H.: Mining fuzzy association rules from questionnaire data. Knowledge-Based Systems 22, 46–56 (2009)
Coenen, F.: The LUCS-KDD TFP Association Rule Mining Algorithm. Department of Computer Science, The University of Liverpool, UK (2004), http://www.csc.liv.ac.uk/~frans/KDD/Software/Apriori_TFP/aprioriTFP.html
Coenen, F.: The LUCS-KDD TFPC Classification Association Rule Mining Algorithm. Department of Computer Science, The University of Liverpool, UK (2004), http://www.csc.liv.ac.uk/~frans/KDD/Software/Apriori_TFPC/aprioriTFPC.html
Garcia-Constantino, M., Coenen, F., Noble, P.-J., Radford, A., Setzkorn, C., Tierney, A.: An investigation concerning the generation of text summarisation classifiers using secondary data. In: Perner, P. (ed.) MLDM 2011. LNCS, vol. 6871, pp. 387–398. Springer, Heidelberg (2011)
Hand, D.J., Till, R.J.: A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems. Machine Learning 45, 171–186 (2001)
Hiramatsu, A., Oiso, H., Tamura, S., Komoda, N.: Support system for analyzing open-ended questionnaires data by culling typical opinions. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 2, pp. 1377–1382 (2004)
Hirasawa, S.: Analyses of Student Questionnaires for Faculty Developments. A Short Course at Tamkang University Taipei, Taiwan, R.O.C., March 7-9 (2006)
Hirasawa, S., Chu, W.W.: Knowledge acquisition from documents with both fixed and free formats. In: 2003 IEEE International Conference on Systems, Man and Cybernetics, vol. 5, pp. 4694–4699 (2003)
Hiroko, I., Masao, U., Hitoshi, I.: Criterion for judging request intention in response texts of open-ended questionnaires. In: Proceedings of the Second International Workshop on Paraphrasing, pp. 49–56. Association for Computational Linguistics (2003)
Jing, L.P., Huang, H.K., Shi, H.B.: Improved feature selection approach TFIDF in text mining. In: Proceedings of the First International Conference on Machine Learning and Cybernetics, pp. 944–946 (2002)
Joshi, A.K.: Natural language processing. Science 253, 1242 (1991)
McCallum, A.: Information extraction: Distilling structured data from unstructured text. ACM Queue 3, 48–57 (2005)
Morinaga, S., Yamanishi, K., Tateishi, K., Fukushima, T.: Mining product reputations on the web. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 341–349 (2002)
Nagamachi, M.: Kansei engineering: a new ergonomic consumer-oriented technology for product development. International Journal of Industrial Ergonomics 15, 3–11 (1995)
Radford, A., Noble, P.J., Coyne, K.P., Gaskell, R.M., Jones, P.H., Bryan, J.G.E., Setzkorn, C., Tierney, Á., Dawson, S.: Antibacterial prescribing patterns in small animal veterinary practice identified via SAVSNET: the small animal veterinary surveillance network. Veterinary Record 169, 310–318 (2011)
Rosell, M., Velupillai, S.: Revealing relations between open and closed answers in questionnaires through text clustering evaluation. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), pp. 1716–1722 (2008)
Svátek, V.: Ontologies, Questionnaires and (Mining) Tabular Data. In: the 3rd European Semantic Web Conference (ESWC 2006) (2006)
Uchida, Y., Yoshikawa, T., Furuhashi, T., Hirao, E., Iguchi, H.: Extraction of important keywords in free text of questionnaire data and visualization of relationship among sentences. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2009), pp. 1604–1608 (2009)
Willett, P.: The Porter stemming algorithm: then and now. Program: Electronic Library and Information Systems 40, 219–223 (2006)
Yamanishi, K., Li, H.: Mining open answers in questionnaire data. IEEE Intelligent Systems, pp. 58–63 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Garcia-Constantino, M., Coenen, F., Noble, P.J., Radford, A., Setzkorn, C. (2012). A Semi-Automated Approach to Building Text Summarisation Classifiers. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2012. Lecture Notes in Computer Science(), vol 7376. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31537-4_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-31537-4_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31536-7
Online ISBN: 978-3-642-31537-4
eBook Packages: Computer ScienceComputer Science (R0)