Abstract
Several recent papers have described how lexical properties of words can be captured by simple measurements of which other words tend to occur close to them. At a practical level, word co–occurrence statistics are used to generate high dimensional vector space representations and appropriate distance metrics are defined on those spaces. The resulting co–occurrence vectors have been used to account for phenomena ranging from semantic priming to vocabulary acquisition. We have developed a simple and highly efficient system for computing useful word co–occurrence statistics, along with a number of criteria for optimizing and validating the resulting representations. Other workers have advocated various methods for reducing the number of dimensions in the co–occurrence vectors. Lund&Burgess [10] have suggested using only the most variant components; Landauer&Dumais [5] stress that to be of explanatory value the dimensionality of the co–occurrence vectors must be reduced to around 300 using singular value decomposition, a procedure related to principal components analysis; and Lowe&McDonald [8] have used a statistical reliability criterion. We have used a simpler framework that orders and truncates the dimensions according to their word frequency. Here we compare how the different methods perform for two evaluation criteria and briefly discuss the consequences of the different methodologies for work within cognitive or neural computation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aston, G. & Burnard, L. (1998). The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh University Press.
Battig, W.F. & Montague, W.E. (1969). Category norms for verbal items in 56 categories: A replication and extension of the Connecticut category norms. Journal of Experimental Psychology Monograph, 80,1–45.
Dagan, I., Marcus, S. & Markovitch, S. (1993). Contextual word similarity and estimation from sparse data. in Proceedings of the 31st Annual Meeting of the ACL, 164–171.
Finch, S. & Chater, N. (1992). Bootstrapping syntactic categories. In Proceedings of the Ilh Annual Meeting of the Cognitive Science Society, 820–825.
Landauer, T. & Dumais, S. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2),211–240.
Levy, J.P., Bullinaria, JA. & Patel, M. (1998). Explorations in the derivation of word co-occurrence statistics. South Pacific Journal of Psychology, 10 (1), 99–111.
Levy, J.P. & Bullinaria, J.A. (1999). The emergence of semantic representations from language usage. Paper given at the EPSRC Workshop on Self-Organising Systems-Future Prospectsfor Computing, UMIST, October 1999.
Lowe, W. & McDonald, S. (2000). The direct route: Mediated priming in semantic space. In Proceedings of the 22nd Annual Meeting of the Cognitive Science Society.
Lund, K., Burgess, C. & Atchley, R.A. (1995). Semantic and associative priming in high-dimensional semantic space. In Proceedings of the 1 i h Annual Meeting of the Cognitive Science Society, 660–665.
Lund, K. & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence, Behavior Research Methods, Instruments,&Computers, 28(2), 203–208.
Manning, C.D. & Schiitze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.
Patel, M., Bullinaria, J.A. & Levy, J.P. (1998). Extracting Semantic Representations from Large Text Corpora. In Bullinaria, J.A., Glasspool, D.W. & Houghton, G. (eds), 4th Neural Computation and Psychology Workshop, London, 9-Jl April 1997: Connectionist Representations, 199–212. London: Springer-Verlag.
Redington, M. & Chater, N. (1997). Probabilistic and distributional approaches to language acquisition. Trends in Cognitive Sciences, 1(7), 273–281.
Schiitze, H. (1993). Word Space. In SJ. Hanson, J.D. Cowan & C.L. Giles (Eds.) Advances in Neural Information Processing Systems 5, 895–902. San Mateo, CA: Morgan Kauffmann.
Zhu, H. (1997). Bayesian Geometric Theory of Learning Algorithms. In: Proceedings of the International Conference on Neural Networks (ICNN’97), Vol. 2, 1041–1044.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag London
About this paper
Cite this paper
Levy, J.P., Bullinaria, J.A. (2001). Learning Lexical Properties from Word Usage Patterns: Which Context Words Should be Used?. In: French, R.M., Sougné, J.P. (eds) Connectionist Models of Learning, Development and Evolution. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0281-6_27
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0281-6_27
Publisher Name: Springer, London
Print ISBN: 978-1-85233-354-6
Online ISBN: 978-1-4471-0281-6
eBook Packages: Springer Book Archive