Abstract
It is obvious that Internet can act as a powerful source of information. However, as happens with other media, each type of information is targeted to a different type of public. Specifically, adult content should not be accessible for children. In this context, several approaches for content filtering have been proposed both in the industry and the academia. Some of these approaches use the text content of a webpage to model a classic bag-of-word model to categorise them and filter the inappropriate content. These methods, to the best of our knowledge, have no semantic information at all and, therefore, they may be surpassed using different attacks that exploit the well-known ambiguity of natural language. Given this background, we present the first semantics-aware adult filtering approach that models webpages, applying a previous word-sense-disambiguation step in order to face the ambiguity. We show that this approach can improve the filtering results of the classic statistical models. abstract environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Gómez Hidalgo, J., Sanz, E., García, F., Rodríguez, M.: Web content filtering. Advances in Computers 76, 257–306 (2009)
Choi, B., Chung, B., Ryou, J.: Adult Image Detection Using Bayesian Decision Rule Weighted by SVM Probability. In: 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology, pp. 659–662. IEEE (2009)
Du, R., Safavi-Naini, R., Susilo, W.: Web filtering using text classification. In: The 11th IEEE International Conference on Networks, ICON 2003, pp. 325–330. IEEE (2003)
Kim, Y., Nam, T.: An efficient text filter for adult web documents. In: The 8th International Conference on Advanced Communication Technology, ICACT 2006, vol. 1, 3 p. IEEE (2006)
Ho, W., Watters, P.: Statistical and structural approaches to filtering internet pornography. In: 2004 IEEE International Conference on Systems, Man and Cybernetics, vol. 5, pp. 4792–4798. IEEE (2004)
Sanderson, M.: Wsd and ir. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 142–151. Springer, New York (1994)
Nelson, B., Barreno, M., et al.: Misleading learners: Co-opting your spam filter. In: Machine Learning in Cyber Trust, pp. 17–51 (2009)
Santos, I., Laorden, C., Sanz, B., Bringas, P.G.: Enhanced topic-based vector space model for semantics-aware spam filtering. Expert Systems With Applications (39), 437–444, doi:10.1016/j.eswa.2011.07.034
Laorden, C., Santos, I., Sanz, B., Alvarez, G., Bringas, P.G.: Word sense disambiguation for spam filtering. Electronic Commerce Research and Applications 11, 290–298 (2012), doi:10.1016/j.elerap.2011.11.004
Mavroeidis, D., Tsatsaronis, G., Vazirgiannis, M., Theobald, M., Weikum, G.: Word sense disambiguation for exploiting hierarchical thesauri in text classification. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 181–192. Springer, Heidelberg (2005)
Xu, H., Yu, B.: Automatic thesaurus construction for spam filtering using revised back propagation neural network. Expert Systems with Applications 37, 18–23 (2010)
Padr, L., Stanilovsky, E.: Freeling 3.0: Towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012), Istanbul, Turkey. ELRA (2012)
Agirre, E., Soroa, A.: Personalizing pagerank for wsd. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 33–41 (2009)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web (1999)
Carreras, X., Chao, I., Padró, L., Padró, M.: Freeling: An open-source suite of language analyzers. In: Proceedings of the 4th LREC, vol. 4 (2004)
Carreras, X., Padró, L.: A flexible distributed architecture for natural language analyzers. In: Proceedings of the LREC, vol. 2 (2002)
Garner, S.R., et al.: Weka: The waikato environment for knowledge analysis
Salton, G., McGill, M.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)
Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. Int. J. Comput. Appl. Technol. 35, 183–193 (2009)
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)
Becker, J., Kuropka, D.: Topic-based vector space model. In: Proceedings of the 6th International Conference on Business Information Systems, pp. 7–12 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Santos, I. et al. (2014). An Empirical Study on Word Sense Disambiguation for Adult Content Filtering. In: de la Puerta, J., et al. International Joint Conference SOCO’14-CISIS’14-ICEUTE’14. Advances in Intelligent Systems and Computing, vol 299. Springer, Cham. https://doi.org/10.1007/978-3-319-07995-0_53
Download citation
DOI: https://doi.org/10.1007/978-3-319-07995-0_53
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07994-3
Online ISBN: 978-3-319-07995-0
eBook Packages: EngineeringEngineering (R0)