Abstract
Internet and social Web made possible the acquisition of information to feed a growing number of Machine Learning (ML) applications and, in addition, brought light to the use of crowdsourcing approaches, commonly applied to problems that are easy for humans but difficult for computers to solve, building the crowd-powered systems. In this work, we consider the issue of semantic drift in a bootstrap learning algorithm and propose the novel idea of a crowd-powered approach to diminish the effects of such issue. To put this idea to test we built a hybrid version of the Coupled Pattern Learner (CPL), a bootstrap learning algorithm that extract contextual patterns from an unstructured text, and SSCrowd, a component that allows conversation between learning systems and Web users, in an attempt to actively and autonomously look for human supervision by asking people to take part into the knowledge acquisition process, thus using the intelligence of the crowd to improve the learning capabilities of CPL. We take advantage of the ease that humans have to understand language in unstructured text, and we show the results of using a hybrid crowd-powered approach to diminish the effects of semantic drift.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Amershi, S., Cakmak, M., Knox, W.B., Kulesza, T.: Power to the people: the role of humans in interactive machine learning. AI Magazine 35(4), 105–120 (2014)
Balcan, M.-F., Urner, R.: Active learning-modern learning theory. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms, pp. 8–13. Springer, New York (2016)
Bernstein, M.S.: Crowd-powered systems. KI-Künstliche Intelligenz 27(1), 69–73 (2013)
Bernstein, M.S., et al.: Soylent: a word processor with a crowd inside. In: Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology, pp. 313–322. ACM (2010)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
Bradeško, L., Starc, J., Mladenic, D., Grobelnik, M., Witbrock, M.: Curious cat conversational crowd based and context aware knowledge acquisition chat bot. In: 2016 IEEE 8th International Conference on Intelligent Systems (IS), pp. 239–252. IEEE (2016)
Brew, A., Greene, D., Cunningham, P.: Using crowdsourcing and active learning to track sentiment in online media. In: ECAI, pp. 145–150 (2010)
Callan, J., Hoy, M., Yoo, C., Zhao, L.: Clueweb09 data set (2009)
Carlson, A.: Coupled semi-supervised learning. Tech. rep., Machine Learning Department, Carnegie Mellon University (2010)
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr, E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI, vol. 5, p. 3 (2010)
Curran, J.R., Murphy, T., Scholz, B.: Minimising semantic drift with mutual exclusion bootstrapping. In: Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, vol. 6, pp. 172–180. Citeseer (2007)
Kamar, E., Hacker, S., Horvitz, E.: Combining human and machine intelligence in large-scale crowdsourcing. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, vol. 1, pp. 467–474. International Foundation for Autonomous Agents and Multiagent Systems (2012)
Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: Advances in neural information processing systems, pp. 1953–1961 (2011)
Lasecki, W.S., Wesley, R., Nichols, J., Kulkarni, A., Allen, J.F., Bigham, J.P.: Chorus: a crowd-powered conversational assistant. In: Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology, pp. 151–162. ACM (2013)
Lenat, D.B.: CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11), 33–38 (1995)
McIntosh , T., Curran, J.R.: Reducing semantic drift with bagging and distributional similarity. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 396–404 (2009)
Pedro, S.D.S., Appel, A.P., Hruschka Jr, E.R.: Autonomously reviewing and validating the knowledge base of a never-ending learning system. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1195–1204. ACM (2013)
Pedro, S.D.S., Hruschka, E.R.: Conversing learning: active learning and active social interaction for human supervision in never-ending learning systems. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds.) IBERAMIA 2012. LNCS (LNAI), vol. 7637, pp. 231–240. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34654-5_24
Pedro, S.D.S., Hruschka Jr, E.R.: Collective intelligence as a source for machine learning self-supervision. In: Proceedings of the 4th International Workshop on Web Intelligence & Communities in conjunction with WWW 2012, p. 5. ACM (2012)
Riloff, E., Jones, R., et al.: Learning dictionaries for information extraction by multi-level bootstrapping. In: AAAI/IAAI, pp. 474–479 (1999)
Settles, B.: Active learning literature survey. University of Wisconsin, Madison 52(55–66), 11 (2010)
Sun, C., Rampalli, N., Yang, F., Doan, A.H.: Chimera: Large-scale classification using machine learning, rules, and crowdsourcing. Proc. VLDB Endowment 7(13), 1529–1540 (2014)
Von Ahn, L., Maurer, B., McMillen, C., Abraham, D., Blum, M.: reCAPTCHA: human-based character recognition via web security measures. Science 321(5895), 1465–1468 (2008)
Yangarber, R.: Counter-training in discovery of semantic patterns. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 343–350. Association for Computational Linguistics (2003)
Yarowsky, D.: Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Annual Meeting of the Association for Computational Linguistics (1995)
Zaidan, O.F., Burch, C.C.: Crowdsourcing translation: professional quality from non-professionals. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1220–1229. Association for Computational Linguistics (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Pedro, S.D.S., Hruschka, E.R. (2019). Crowd-Powered Systems to Diminish the Effects of Semantic Drift. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2019. Lecture Notes in Computer Science(), vol 11734. Springer, Cham. https://doi.org/10.1007/978-3-030-29859-3_59
Download citation
DOI: https://doi.org/10.1007/978-3-030-29859-3_59
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29858-6
Online ISBN: 978-3-030-29859-3
eBook Packages: Computer ScienceComputer Science (R0)