Abstract
The aim of this paper is to illustrate the potential of a parallel corpus in the context of (computer-assisted) language learning. In order to do so, we propose to answer two main questions (1) what corpus (data) to use and (2) how to use the corpus (data). We provide an answer to the what-question by describing the importance and particularities of compiling and processing a corpus for pedagogical purposes. In order to answer the how-question, we first investigate the central concepts of the interactionist theory of second language acquisition: comprehensible input, input enhancement, comprehensible output and output enhancement. By means of two case studies, we illustrate how the abovementioned concepts can be realized in concrete corpus-based language learning activities. We propose a design for a receptive and productive language task and describe how a parallel corpus can be at the basis of powerful language learning activities. The Dutch Parallel Corpus, a ten-million word sentence aligned and annotated parallel corpus, is used to develop these language tasks.







Similar content being viewed by others
Notes
Other linguistic research domains such as descriptive linguistics, contrastive linguistics, and translation studies, and more application-oriented research areas such as machine translation and natural language processing research (NLP) have all illustrated the benefits of using corpora in their domain.
Sinclair (2004: 9) states that the changes brought forth by corpora “were likely to have a profound effect on the teaching and learning of languages, because the new descriptions would represent language in a different way”.
In a more general context, Rundell and Stock (1992) also referred to “The corpus revolution”.
For a corpus typology, we refer to the overview of Fuster and Clavel (2010).
As pointed out by Gilquin and Granger (2010: 362): “(i)t is probably fair to say that most DDL activities involve concordances of some sort”. Frequency lists may also be used in this context.
E.g. KWIC (Key-Word-In-Context). The context is usually expressed in a number of words situated left and right from the keyword.
As Frankenberg-García (2005: 197) stated in her paper on monolingual and parallel concordances, “the two types of concordances have non-conflicting, complementary roles to play” (e.g. parallel concordances might be more interesting to optimize comprehension whereas monolingual concordances may be useful to provide grammatical evidence) and it is important to judge which type of concordance matches the particular learning context.
For more information on the corpus and its availability, see http://www.inl.nl/tst-centrale/nl/producten/corpora/dutch-parallel-corpus-niet-commercieel/6-65, visited 21 March 2013.
For more information, we refer to http://www.tei-c.org/Guidelines/P5/, visited 21 March 2013.
In his 1999 book “Interaction and the second language learner”, Ellis argued that there are two main types of interaction: interpersonal and intrapersonal interaction. Chapelle (2003) stated that interpersonal interaction does not only take place between people but also between learner and computer.
We could also refer to a learner-corpus interaction in the context of Data Driven Learning (e.g. Fuster and Clavel 2010).
Chapelle’s conceptualization (1998, 2003, 2005a, 2005b) is largely based on the one made by Sharwood Smith (1993)). In his 1993 article, he referred to the notion as a manipulation of “aspects of the input” but no “further assumptions about the consequences of that input on the learner” could be made.
Because words in the French-Dutch part of the DPC corpus are not aligned below sentence level, the isolated translation of the French word “la percée” cannot be given.
Translation into English: “daily cleansing, wash your face, the upper part of the body and your hands; brush your teeth”.
The aim of the DPC project was to create a multifunctional corpus, which is also suited for translation studies, machine translation, descriptive linguistics, etc.
For a complete overview, see van Baardewijk-Rességuier and van Willigen-Sinemus (1989).
References
Abraham, L. (2008). Computer-mediated glosses in second language reading comprehension and vocabulary learning: A meta-analysis. Computer Assisted Language Learning, 21(3), 199–226.
Ackerley, K., & Coccetta, F. (2007). Enriching language learning through a multimedia corpus. ReCALL, 19(3), 351–370.
Barlow, Michael. (1996). Corpora for Theory and Practice. International Journal of Corpus Linguistics, 1(1), 1–37.
Barlow, Michael. (2000). Parallel texts in language teaching. In S. P. Botley, M. A. McEnerey, & A. Wilson (Eds.), Multilingual Corpora in Teaching and Research (pp. 107–115). Amsterdam/Atlanta: Rodopi.
Bernardini, S. (2002). Exploring new directions for discovery learning. In B. Ketteman & G. Marko (Eds.), Teaching and learning by doing corpus analysis. Proceedings from the Fourth International Conference on Teaching and Language Corpora, Graz 19–24 July (pp. 165–182). Amsterdam: Rodopi.
Bland, S. K., Noblitt, J. S., Armington, S., & Gay, G. (1990). The naive lexical hypothesis: Evidence from computer-assisted language learning. The Modern Language Journal, 74(4), 440–450.
Bormuth, J. R. (1966). Readability: A new approach. Reading research quarterly, 1, 79–132.
Braun, S. (2005). From pedagogically relevant corpora to authentic language learning contents. ReCALL, 17(1), 47–64.
Braun, S. (2006). ELISA: A pedagogically enriched corpus for language learning purposes. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus technology and language pedagogy (pp. 25–47). Frankfurt am Main: Peter Lang.
Braun, S. (2007). Integrating corpus work into secondary education: From data-driven learning to needs-driven corpora. ReCALL, 19(3), 307–328.
Cárdenas-Claros, M. S., & Gruba, P. A. (2009). Help options in CALL: A systematic review. CALICO Journal, 27(1), 69–90.
Carter, R. A., & McCarthy, M. J. (2006). Cambridge grammar of English: A comprehensive guide to spoken and written English grammar and usage. Cambridge: Cambridge University Press.
Chall, J. S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Cambridge, MA: Brookline Books.
Chambers, A. (2010). What is data-driven learning? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp. 345–358). London/New York: Routledge.
Chapelle, C. A. (1997). CALL in the Year 2000: Still in search of research paradigms? Language Learning and Technology, 1(1), 19–43.
Chapelle, C. A. (1998). Multimedia CALL: Lessons to be learned from research on instructed SLA. Language Learning and Technology, 2(1), 22–34.
Chapelle, C. A. (2003). English language learning and technology. Amsterdam/Philadelphia: John Benjamins Publishing Company.
Chapelle, C. A. (2005a). Interactionist SLA theory in CALL research. In J. L. Egbert & G. M. Petrie (Eds.), CALL research perspectives (pp. 53–64). Mahwah, NJ: Lawrence Erlbaum Associates.
Chapelle, C. A. (2005b). Computer-assisted language learning. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 743–755). Mahwah, NJ: Lawrence Erlbaum Associates.
Cheng, W. (2010). What can a corpus tell us about language teaching? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp. 319–332). London/New York: Routledge.
Council of Europe. (2001). Common European framework of reference for languages: learning, teaching, assessment. Cambridge: Cambridge University Press.
Crossley, S., Greenfield, J., & McNamara, D. (2008). Assessing text readability using cognitively based indices. TESOL Quarterly, 42(3), 475–493.
De Clercq, O., & Montero Perez, M. (2010). Data collection and IPR in multilingual parallel corpora: Dutch parallel corpus. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperdis, M. Rosner, et al. (Eds.), Proceedings of the seventh international conference on language resources and evaluation (LREC’10) (pp. 3383–3388). Malta: Valletta.
Dickens, A., & Salkie, R. (1996). Comparing Bilingual Dictionaries with a Parallel Corpus. In M. Gellerstam, J. Järborg, S. Malmgren, K. Norén, L. Rogström, & C. Papmehl (Eds.), EURALEX’96 Proceedings (pp. 551–559). Gothenberg: Göteborg University, Department of Swedish.
Doughty, C. J., & Williams, J. (1998). Focus on Form in classroom second language acquisition. Cambridge: Cambridge University Press.
Ellis, R. (1999). Learning a second language through interaction. Amsterdam/Philadelphia: John Benjamins Publishing Company.
Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32, 221–233.
Flowerdew, J. (1996). Concordancing in language learning. In M. Pennington (Ed.), The power of call (pp. 97–113). Houston, Texas: Athelstan.
François, T. (2009). Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for FFL. Proceedings of the EACL 2009 Student Research Workshop (pp. 19–27).
Frankenberg-García, A. (2003). Lost in parallel concordances. In G. Aston, S. Bernardini, & D. Stewart (Eds.), Corpora in translation education (pp. 15–24). Manchester: St Jerome.
Frankenberg-García, A. (2005). Pedagogical uses of monolingual and parallel concordances. ELT Journal, 59(3), 189–198.
Fuster, M., & Clavel, B. (2010). Corpus linguistics and its applications in higher education. Revista Alicantina de Estudios Ingleses, 23, 51–67.
Gabrielatos, C. (2005). Corpora and language teaching: Just a fling or wedding bells? TESL-EJ, 8(4), 1–37.
Gao, Z.-M. (2011). Exploring the effects and use of a Chinese-English parallel concordancer. Computer Assisted Language Learning, 24(3), 255–275.
Gass, S. M., & Mackey, A. (2007). Input, Interaction, and Output in Second Language Acquisition. In B. Van Patten & J. Williams (Eds.), Theories in Second Language Acquisition. An introduction (pp. 175–199). Mahwah, NJ: Lawrence Erlbaum Associates.
Gavioli, L. (2005). Exploring corpora for ESP learning. Amsterdam/Philadelphia: John Benjamins Publishing Company.
Gilquin, G., & Granger, S. (2010). How can data-driven learning be used in language teaching? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (pp. 359–370). London/New York: Routledge.
Granath, S. (2009). Who benefits from learning how to use corpora? In K. Aijmer (Ed.), Corpora and language teaching (pp. 47–65). Amsterdam/Philadelphia: John Benjamins Publishing Company.
Granger, S. (2002). A Bird’s-eye view of learner corpus research. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 3–33). Amsterdam/Philadelphia: John Benjamins Publishing Company.
Granger, S. (2003). Error-tagged learner corpora and CALL: A promising synergy. CALICO Journal, 20(3), 465–480.
Granger, S. (2009). The contribution of learner corpora to second language acquisition and foreign language teaching A critical evaluation. In K. Aijmer (Ed.), Corpora and language teaching (pp. 13–32). Amsterdam/Philadelphia: John Benjamins Publishing Company.
Greenfield, J. (2004). Readability formulas for EFL. Japan Association for Language Teaching, 26(1), 5–24.
Hegelheimer, V. (2006). Helping ESL writers through a multimodal, corpus-based, online grammar resource. CALICO Journal, 24(1), 5–32.
Hunston, S., & Francis, G. (1998). Verbs observed: A corpus-driven pedagogic grammar. Applied Linguistics, 19(1), 45–72.
Ide, N., Erjavec, T., & Tufiş, D. (2002). Sense discrimination with parallel corpora. Proceedings of the SIGLEX/SENSEVAL workshop on word sense disambiguation: Recent successes and future directions (pp. 54–60). Philadelphia.
Johansson, S. (2009). Some thoughts on corpora and second-language acquisition. In K. Aijmer (Ed.), Corpora and language teaching (pp. 33–44). Amsterdam/Philadelphia: John Benjamins Publishing Company.
Johns, T. (1991). Should you be persuaded: Two examples of data-driven learning. In T. Johns & P. King (Eds.), Classroom concordancing (pp. 1–13). Birmingham: ELR.
Johns, T. (1997). Contexts: The background, development and trialing of a concordance-based CALL program. In A. Wichmann, S. Fligelstone, T. McEnery, & G. Knowles (Eds.), Teaching and language corpora (pp. 100–115). New York: Addison Wesley Longman.
Kaur, J., & Hegelheimer, V. (2005). ESL students use of concordance in the transfer of academic word knowledge: An exploratory study. Computer Assisted Language Learning, 18(4), 287–310.
Kennedy, C., & Miceli, T. (2010). Corpus-assisted creative writing: introducing intermediate Italian learners to a corpus as a reference resource. Language Learning and Technology, 14(1), 28–44.
Kenning, M–. M. (2010). What are parallel and comparable corpora and how can we use them? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp. 487–500). London/New York: Routledge.
Krashen, S. D. (1985). The input hypothesis: Issues and implications. London/New York: Longman.
Lee, D. Y. W. (2001). Genres, registers, text types, domains and styles: clarifying the concepts and navigating a path through the BNC jungle. Language Learning and Technology, 5(3), 37–72.
Leech, G. (1997). Teaching and language corpora: A convergence. In Anne Wichmann, Steven Fligelstone, Tony McEnery, & Gerry Knowles (Eds.), Teaching and language corpora (pp. 1–23). New York: Addison Wesley Longman.
Lefever, E., Hoste, V., & De Cock, M. (2011). Parasense or how to use parallel corpora for word sense disambiguation. Proceedings of the 49th annual meeting of the Association for Computational Linguistics: Human language technologies (pp. 317–322). Portland, Oregon, USA.
Leowen, S. (2011). Focus on from. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 576–592). New York/London: Routledge.
Lexicografie, Van Dale. (2002). Van Dale Groot Woordenboek Frans Nederlands, Nederlands Frans. Utrecht/Antwerpen: Van Dale.
Liu, N., & Nation, I. S. P. (1985). Factors affecting guessing vocabulary in context. RELC Journal, 16(1), 33–42.
Lixun, W. (2001). Exploring parallel concordancing in English and Chinese. Language Learning and Technology, 5(3), 174–184.
Macken, L. (2010). Sub-sentential alignment of translational correspondences. PhD thesis. Antwerp, University of Antwerp.
McEnery, T., & Xiao, R. (2011). What corpora can offer in language teaching and learning. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 364–380). London/New York: Routledge.
Mishan, F. (2004). Authenticating corpora for language learning: A problem and its resolution. ELT Journal, 58(3), 219–227.
Montero Perez, M., De Clercq, O., Desmet, P., Verlinde, S., & Peeters, G. (2009). Dutch parallel corpus: un nouveau corpus multilingue disponible en ligne. Romaneske, 4, 2–8.
Mukherjee, J. (2006). Corpus linguistics and language pedagogy The state of the art—and beyond. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus Technology and language pedagogy (pp. 5–24). Frankfurt am Main: Peter Lang.
Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: University Press.
Nerbonne, J. (2000). Parallel texts in computer-assisted language learning. In J. Veronis (Ed.), Parallel text processing (pp. 354–369). Dordrecht/Boston: Kluwer.
Nesselhauf, N. (2004). Learner corpora and their potential for language teaching. In J. Sinclair (Ed.), How to use corpora in language teaching (pp. 125–152). Amsterdam/Philadelphia: John Benjamins Publishing Company.
O’Sullivan, I., & Chambers, A. (2006). Learners’ writing skills in French: Corpus consultation and learner evaluation. Journal of Second Language Writing, 15, 49–68.
Paulussen, H., Macken, L., Vandeweghe, W., & Desmet, P. (2013). Dutch parallel corpus: A balanced parallel corpus for Dutch-English and Dutch-French. In P. Spyns & J. Odijk (Eds.), Essential Speech and language technology for Dutch (pp. 185–199). Heidelberg: Springer.
Peters, C., Picchi, E., & Biagini, L. (2000). Parallel and comparable bilingual corpora in language teaching and learning. In J. Aarts & W. Meijs (Eds.), Multilingual corpora in teaching and research (pp. 73–85). Amsterdam/Atlanta: Rodopi.
Römer, U. (2008–2009). Corpora and language teaching. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics An International Handbook (pp. 112–131). Berlin/New York: Mouton de Gruyter.
Rundell, M., & Stock, P. (1992). The corpus revolution. English Today, 30, 9–14.
Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied Linguistics, 11, 129–158.
Sharwood Smith, M. (1993). Input enhancement in instructed SLA: Theoretical bases. Studies in Second Language Acquisition, 15, 165–179.
Sinclair, J. (1987). Looking Up: An account of the COBUILD project in lexical computing. London: Collins.
Sinclair, J. (2004). Introduction. In J. Sinclair (Ed.), How to Use corpora in language teaching (pp. 1–10). Amsterdam/Philadelphia: John Benjamins Publishing Company.
St.John, E. (2001). A case for using a parallel corpus and concordancer. Language Learning and Technology, 5(3), 185–203.
Swain, M. (1985). Communicative competence: Some roles of comprehensible input and comprehensible output in its development. In S. Gass & C. Madden (Eds.), Input in second language acquisition (pp. 235–256). New York: Newbury House.
Swain, M. (2005). The output hypothesis: Theory and research. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 471–484). Mahwah, NJ: Lawrence Erlbaum Associates.
Szirmai, M. (2002). Corpus linguistics in Japan: Its status and Role in language education. In P. Lewis (Ed.), The changing face of CALL A Japanese perspective (pp. 91–107). Lisse: Swets and Zeitlinger Publishers.
Takashima, H., & Ellis, R. (1999). Output enhancement and the acquisition of the past tense. In R. Ellis (Ed.), Learning a second language through interaction (pp. 173–188). Amsterdam/Philadelphia: John Benjamins Publishing Company.
Tribble, C. (2000). Practical uses for language corpora in ELT. In P. Brett & G. Motteramm (Eds.), A special interest in computers: Learning and teaching with information and communications technologies (pp. 31–41). Whistable, Kent: IATEFL.
Uitdenbogerd, A. L. (2005). Readability of French as a Foreign Language and its Uses. In J. Kay, A. Turpin, R. Wilkinson (Eds.), Proceeding of the Australasian Document Computing Symposium (ADCS) (pp.19–25). Sydney, NSW, Australia.
van Baardewijk-Rességuier, J., & van Willigen-Sinemus, M. (1989). Matériaux pour la traduction du néerlandais en français. Muiderberg: Dick Coutinho.
Van Patten, B., Williams, J., & Rott, S. (2004). Form-meaning connections in second language acquisition. In B. Van Patten, J. Williams, S. Rott, & M. Overstreet (Eds.), Form-meaning connections in second language acquisition (pp. 1–26). Mahwah, NJ: Lawrence Erlbaum Associates.
Widdowson, H. G. (1998). Context, community and authentic language. TESOL Quarterly, 32(4), 705–716.
Widdowson, H. G. (2000). On the limitations of linguistics applied. Applied Linguistics, 21(1), 3–25.
Widdowson, H. G. (2003). Defining issues in English language teaching. Oxford: Oxford University Press.
Xu, J. (2010). Using multimedia vocabulary annotations in L2 reading and listening activities. CALICO Journal, 27(2), 311–327.
Yoon, H. (2008). More than a linguistic reference: The influence of corpus technology on L2 academic writing. Language Learning and Technology, 12(2), 31–48.
Acknowledgments
The DPC project has been carried out within the STEVIN programme, which is funded by the Dutch and Flemish Governments. The DPC was created by a Flemish consortium (KU Leuven Kulak and the Faculty of Translation Studies of Ghent University College): Piet Desmet, Willy Vandeweghe, Hans Paulussen, Lieve Macken, Maribel Montero Perez, Orphée De Clercq, Lidia Rura, Julia Trushkina, and Antoine Besnehard.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Montero Perez, M., Paulussen, H., Macken, L. et al. From input to output: the potential of parallel corpora for CALL. Lang Resources & Evaluation 48, 165–189 (2014). https://doi.org/10.1007/s10579-013-9241-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-013-9241-4