From input to output: the potential of parallel corpora for CALL | Language Resources and Evaluation Skip to main content
Log in

From input to output: the potential of parallel corpora for CALL

  • SI: Resources for language learning
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

The aim of this paper is to illustrate the potential of a parallel corpus in the context of (computer-assisted) language learning. In order to do so, we propose to answer two main questions (1) what corpus (data) to use and (2) how to use the corpus (data). We provide an answer to the what-question by describing the importance and particularities of compiling and processing a corpus for pedagogical purposes. In order to answer the how-question, we first investigate the central concepts of the interactionist theory of second language acquisition: comprehensible input, input enhancement, comprehensible output and output enhancement. By means of two case studies, we illustrate how the abovementioned concepts can be realized in concrete corpus-based language learning activities. We propose a design for a receptive and productive language task and describe how a parallel corpus can be at the basis of powerful language learning activities. The Dutch Parallel Corpus, a ten-million word sentence aligned and annotated parallel corpus, is used to develop these language tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Other linguistic research domains such as descriptive linguistics, contrastive linguistics, and translation studies, and more application-oriented research areas such as machine translation and natural language processing research (NLP) have all illustrated the benefits of using corpora in their domain.

  2. Sinclair (2004: 9) states that the changes brought forth by corpora “were likely to have a profound effect on the teaching and learning of languages, because the new descriptions would represent language in a different way”.

  3. In a more general context, Rundell and Stock (1992) also referred to “The corpus revolution”.

  4. For a corpus typology, we refer to the overview of Fuster and Clavel (2010).

  5. As pointed out by Gilquin and Granger (2010: 362): “(i)t is probably fair to say that most DDL activities involve concordances of some sort”. Frequency lists may also be used in this context.

  6. E.g. KWIC (Key-Word-In-Context). The context is usually expressed in a number of words situated left and right from the keyword.

  7. While extensive research has been published on the use of learner corpora in language pedagogy, it is not our aim to provide a detailed overview of the importance of learner corpora. For more information on the use of learner corpora we refer to for instance Granger (2002, 2009).

  8. As Frankenberg-García (2005: 197) stated in her paper on monolingual and parallel concordances, “the two types of concordances have non-conflicting, complementary roles to play” (e.g. parallel concordances might be more interesting to optimize comprehension whereas monolingual concordances may be useful to provide grammatical evidence) and it is important to judge which type of concordance matches the particular learning context.

  9. For more information on the corpus and its availability, see http://www.inl.nl/tst-centrale/nl/producten/corpora/dutch-parallel-corpus-niet-commercieel/6-65, visited 21 March 2013.

  10. For more information, we refer to http://www.tei-c.org/Guidelines/P5/, visited 21 March 2013.

  11. In his 1999 book “Interaction and the second language learner”, Ellis argued that there are two main types of interaction: interpersonal and intrapersonal interaction. Chapelle (2003) stated that interpersonal interaction does not only take place between people but also between learner and computer.

  12. We could also refer to a learner-corpus interaction in the context of Data Driven Learning (e.g. Fuster and Clavel 2010).

  13. Chapelle’s conceptualization (1998, 2003, 2005a, 2005b) is largely based on the one made by Sharwood Smith (1993)). In his 1993 article, he referred to the notion as a manipulation of “aspects of the input” but no “further assumptions about the consequences of that input on the learner” could be made.

  14. Because words in the French-Dutch part of the DPC corpus are not aligned below sentence level, the isolated translation of the French word “la percée” cannot be given.

  15. Translation into English: “daily cleansing, wash your face, the upper part of the body and your hands; brush your teeth”.

  16. The aim of the DPC project was to create a multifunctional corpus, which is also suited for translation studies, machine translation, descriptive linguistics, etc.

  17. For a complete overview, see van Baardewijk-Rességuier and van Willigen-Sinemus (1989).

References

  • Abraham, L. (2008). Computer-mediated glosses in second language reading comprehension and vocabulary learning: A meta-analysis. Computer Assisted Language Learning, 21(3), 199–226.

    Article  Google Scholar 

  • Ackerley, K., & Coccetta, F. (2007). Enriching language learning through a multimedia corpus. ReCALL, 19(3), 351–370.

    Article  Google Scholar 

  • Barlow, Michael. (1996). Corpora for Theory and Practice. International Journal of Corpus Linguistics, 1(1), 1–37.

    Article  Google Scholar 

  • Barlow, Michael. (2000). Parallel texts in language teaching. In S. P. Botley, M. A. McEnerey, & A. Wilson (Eds.), Multilingual Corpora in Teaching and Research (pp. 107–115). Amsterdam/Atlanta: Rodopi.

    Google Scholar 

  • Bernardini, S. (2002). Exploring new directions for discovery learning. In B. Ketteman & G. Marko (Eds.), Teaching and learning by doing corpus analysis. Proceedings from the Fourth International Conference on Teaching and Language Corpora, Graz 1924 July (pp. 165–182). Amsterdam: Rodopi.

  • Bland, S. K., Noblitt, J. S., Armington, S., & Gay, G. (1990). The naive lexical hypothesis: Evidence from computer-assisted language learning. The Modern Language Journal, 74(4), 440–450.

    Article  Google Scholar 

  • Bormuth, J. R. (1966). Readability: A new approach. Reading research quarterly, 1, 79–132.

    Article  Google Scholar 

  • Braun, S. (2005). From pedagogically relevant corpora to authentic language learning contents. ReCALL, 17(1), 47–64.

    Article  Google Scholar 

  • Braun, S. (2006). ELISA: A pedagogically enriched corpus for language learning purposes. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus technology and language pedagogy (pp. 25–47). Frankfurt am Main: Peter Lang.

    Google Scholar 

  • Braun, S. (2007). Integrating corpus work into secondary education: From data-driven learning to needs-driven corpora. ReCALL, 19(3), 307–328.

    Article  Google Scholar 

  • Cárdenas-Claros, M. S., & Gruba, P. A. (2009). Help options in CALL: A systematic review. CALICO Journal, 27(1), 69–90.

    Google Scholar 

  • Carter, R. A., & McCarthy, M. J. (2006). Cambridge grammar of English: A comprehensive guide to spoken and written English grammar and usage. Cambridge: Cambridge University Press.

    Google Scholar 

  • Chall, J. S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Cambridge, MA: Brookline Books.

    Google Scholar 

  • Chambers, A. (2010). What is data-driven learning? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp. 345–358). London/New York: Routledge.

    Google Scholar 

  • Chapelle, C. A. (1997). CALL in the Year 2000: Still in search of research paradigms? Language Learning and Technology, 1(1), 19–43.

    Google Scholar 

  • Chapelle, C. A. (1998). Multimedia CALL: Lessons to be learned from research on instructed SLA. Language Learning and Technology, 2(1), 22–34.

    Google Scholar 

  • Chapelle, C. A. (2003). English language learning and technology. Amsterdam/Philadelphia: John Benjamins Publishing Company.

    Book  Google Scholar 

  • Chapelle, C. A. (2005a). Interactionist SLA theory in CALL research. In J. L. Egbert & G. M. Petrie (Eds.), CALL research perspectives (pp. 53–64). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Chapelle, C. A. (2005b). Computer-assisted language learning. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 743–755). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Cheng, W. (2010). What can a corpus tell us about language teaching? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp. 319–332). London/New York: Routledge.

    Google Scholar 

  • Council of Europe. (2001). Common European framework of reference for languages: learning, teaching, assessment. Cambridge: Cambridge University Press.

    Google Scholar 

  • Crossley, S., Greenfield, J., & McNamara, D. (2008). Assessing text readability using cognitively based indices. TESOL Quarterly, 42(3), 475–493.

    Google Scholar 

  • De Clercq, O., & Montero Perez, M. (2010). Data collection and IPR in multilingual parallel corpora: Dutch parallel corpus. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperdis, M. Rosner, et al. (Eds.), Proceedings of the seventh international conference on language resources and evaluation (LREC’10) (pp. 3383–3388). Malta: Valletta.

    Google Scholar 

  • Dickens, A., & Salkie, R. (1996). Comparing Bilingual Dictionaries with a Parallel Corpus. In M. Gellerstam, J. Järborg, S. Malmgren, K. Norén, L. Rogström, & C. Papmehl (Eds.), EURALEX’96 Proceedings (pp. 551–559). Gothenberg: Göteborg University, Department of Swedish.

    Google Scholar 

  • Doughty, C. J., & Williams, J. (1998). Focus on Form in classroom second language acquisition. Cambridge: Cambridge University Press.

    Google Scholar 

  • Ellis, R. (1999). Learning a second language through interaction. Amsterdam/Philadelphia: John Benjamins Publishing Company.

    Book  Google Scholar 

  • Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32, 221–233.

    Article  Google Scholar 

  • Flowerdew, J. (1996). Concordancing in language learning. In M. Pennington (Ed.), The power of call (pp. 97–113). Houston, Texas: Athelstan.

    Google Scholar 

  • François, T. (2009). Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for FFL. Proceedings of the EACL 2009 Student Research Workshop (pp. 19–27).

  • Frankenberg-García, A. (2003). Lost in parallel concordances. In G. Aston, S. Bernardini, & D. Stewart (Eds.), Corpora in translation education (pp. 15–24). Manchester: St Jerome.

    Google Scholar 

  • Frankenberg-García, A. (2005). Pedagogical uses of monolingual and parallel concordances. ELT Journal, 59(3), 189–198.

    Article  Google Scholar 

  • Fuster, M., & Clavel, B. (2010). Corpus linguistics and its applications in higher education. Revista Alicantina de Estudios Ingleses, 23, 51–67.

    Google Scholar 

  • Gabrielatos, C. (2005). Corpora and language teaching: Just a fling or wedding bells? TESL-EJ, 8(4), 1–37.

    Google Scholar 

  • Gao, Z.-M. (2011). Exploring the effects and use of a Chinese-English parallel concordancer. Computer Assisted Language Learning, 24(3), 255–275.

    Article  Google Scholar 

  • Gass, S. M., & Mackey, A. (2007). Input, Interaction, and Output in Second Language Acquisition. In B. Van Patten & J. Williams (Eds.), Theories in Second Language Acquisition. An introduction (pp. 175–199). Mahwah, NJ: Lawrence Erlbaum Associates.

  • Gavioli, L. (2005). Exploring corpora for ESP learning. Amsterdam/Philadelphia: John Benjamins Publishing Company.

    Book  Google Scholar 

  • Gilquin, G., & Granger, S. (2010). How can data-driven learning be used in language teaching? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge Handbook of Corpus Linguistics (pp. 359–370). London/New York: Routledge.

    Google Scholar 

  • Granath, S. (2009). Who benefits from learning how to use corpora? In K. Aijmer (Ed.), Corpora and language teaching (pp. 47–65). Amsterdam/Philadelphia: John Benjamins Publishing Company.

    Google Scholar 

  • Granger, S. (2002). A Bird’s-eye view of learner corpus research. In S. Granger, J. Hung, & S. Petch-Tyson (Eds.), Computer learner corpora, second language acquisition and foreign language teaching (pp. 3–33). Amsterdam/Philadelphia: John Benjamins Publishing Company.

    Chapter  Google Scholar 

  • Granger, S. (2003). Error-tagged learner corpora and CALL: A promising synergy. CALICO Journal, 20(3), 465–480.

    Google Scholar 

  • Granger, S. (2009). The contribution of learner corpora to second language acquisition and foreign language teaching A critical evaluation. In K. Aijmer (Ed.), Corpora and language teaching (pp. 13–32). Amsterdam/Philadelphia: John Benjamins Publishing Company.

    Google Scholar 

  • Greenfield, J. (2004). Readability formulas for EFL. Japan Association for Language Teaching, 26(1), 5–24.

    Google Scholar 

  • Hegelheimer, V. (2006). Helping ESL writers through a multimodal, corpus-based, online grammar resource. CALICO Journal, 24(1), 5–32.

    Google Scholar 

  • Hunston, S., & Francis, G. (1998). Verbs observed: A corpus-driven pedagogic grammar. Applied Linguistics, 19(1), 45–72.

    Article  Google Scholar 

  • Ide, N., Erjavec, T., & Tufiş, D. (2002). Sense discrimination with parallel corpora. Proceedings of the SIGLEX/SENSEVAL workshop on word sense disambiguation: Recent successes and future directions (pp. 54–60). Philadelphia.

  • Johansson, S. (2009). Some thoughts on corpora and second-language acquisition. In K. Aijmer (Ed.), Corpora and language teaching (pp. 33–44). Amsterdam/Philadelphia: John Benjamins Publishing Company.

    Google Scholar 

  • Johns, T. (1991). Should you be persuaded: Two examples of data-driven learning. In T. Johns & P. King (Eds.), Classroom concordancing (pp. 1–13). Birmingham: ELR.

    Google Scholar 

  • Johns, T. (1997). Contexts: The background, development and trialing of a concordance-based CALL program. In A. Wichmann, S. Fligelstone, T. McEnery, & G. Knowles (Eds.), Teaching and language corpora (pp. 100–115). New York: Addison Wesley Longman.

    Google Scholar 

  • Kaur, J., & Hegelheimer, V. (2005). ESL students use of concordance in the transfer of academic word knowledge: An exploratory study. Computer Assisted Language Learning, 18(4), 287–310.

    Article  Google Scholar 

  • Kennedy, C., & Miceli, T. (2010). Corpus-assisted creative writing: introducing intermediate Italian learners to a corpus as a reference resource. Language Learning and Technology, 14(1), 28–44.

    Google Scholar 

  • Kenning, M–. M. (2010). What are parallel and comparable corpora and how can we use them? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (pp. 487–500). London/New York: Routledge.

    Google Scholar 

  • Krashen, S. D. (1985). The input hypothesis: Issues and implications. London/New York: Longman.

    Google Scholar 

  • Lee, D. Y. W. (2001). Genres, registers, text types, domains and styles: clarifying the concepts and navigating a path through the BNC jungle. Language Learning and Technology, 5(3), 37–72.

    Google Scholar 

  • Leech, G. (1997). Teaching and language corpora: A convergence. In Anne Wichmann, Steven Fligelstone, Tony McEnery, & Gerry Knowles (Eds.), Teaching and language corpora (pp. 1–23). New York: Addison Wesley Longman.

    Google Scholar 

  • Lefever, E., Hoste, V., & De Cock, M. (2011). Parasense or how to use parallel corpora for word sense disambiguation. Proceedings of the 49th annual meeting of the Association for Computational Linguistics: Human language technologies (pp. 317–322). Portland, Oregon, USA.

  • Leowen, S. (2011). Focus on from. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 576–592). New York/London: Routledge.

    Google Scholar 

  • Lexicografie, Van Dale. (2002). Van Dale Groot Woordenboek Frans Nederlands, Nederlands Frans. Utrecht/Antwerpen: Van Dale.

    Google Scholar 

  • Liu, N., & Nation, I. S. P. (1985). Factors affecting guessing vocabulary in context. RELC Journal, 16(1), 33–42.

    Article  Google Scholar 

  • Lixun, W. (2001). Exploring parallel concordancing in English and Chinese. Language Learning and Technology, 5(3), 174–184.

    Google Scholar 

  • Macken, L. (2010). Sub-sentential alignment of translational correspondences. PhD thesis. Antwerp, University of Antwerp.

  • McEnery, T., & Xiao, R. (2011). What corpora can offer in language teaching and learning. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 364–380). London/New York: Routledge.

    Google Scholar 

  • Mishan, F. (2004). Authenticating corpora for language learning: A problem and its resolution. ELT Journal, 58(3), 219–227.

    Article  Google Scholar 

  • Montero Perez, M., De Clercq, O., Desmet, P., Verlinde, S., & Peeters, G. (2009). Dutch parallel corpus: un nouveau corpus multilingue disponible en ligne. Romaneske, 4, 2–8.

    Google Scholar 

  • Mukherjee, J. (2006). Corpus linguistics and language pedagogy The state of the art—and beyond. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus Technology and language pedagogy (pp. 5–24). Frankfurt am Main: Peter Lang.

    Google Scholar 

  • Nation, I. S. P. (2001). Learning vocabulary in another language. Cambridge: University Press.

    Book  Google Scholar 

  • Nerbonne, J. (2000). Parallel texts in computer-assisted language learning. In J. Veronis (Ed.), Parallel text processing (pp. 354–369). Dordrecht/Boston: Kluwer.

    Google Scholar 

  • Nesselhauf, N. (2004). Learner corpora and their potential for language teaching. In J. Sinclair (Ed.), How to use corpora in language teaching (pp. 125–152). Amsterdam/Philadelphia: John Benjamins Publishing Company.

    Google Scholar 

  • O’Sullivan, I., & Chambers, A. (2006). Learners’ writing skills in French: Corpus consultation and learner evaluation. Journal of Second Language Writing, 15, 49–68.

    Article  Google Scholar 

  • Paulussen, H., Macken, L., Vandeweghe, W., & Desmet, P. (2013). Dutch parallel corpus: A balanced parallel corpus for Dutch-English and Dutch-French. In P. Spyns & J. Odijk (Eds.), Essential Speech and language technology for Dutch (pp. 185–199). Heidelberg: Springer.

    Chapter  Google Scholar 

  • Peters, C., Picchi, E., & Biagini, L. (2000). Parallel and comparable bilingual corpora in language teaching and learning. In J. Aarts & W. Meijs (Eds.), Multilingual corpora in teaching and research (pp. 73–85). Amsterdam/Atlanta: Rodopi.

    Google Scholar 

  • Römer, U. (2008–2009). Corpora and language teaching. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics An International Handbook (pp. 112–131). Berlin/New York: Mouton de Gruyter.

  • Rundell, M., & Stock, P. (1992). The corpus revolution. English Today, 30, 9–14.

    Article  Google Scholar 

  • Schmidt, R. W. (1990). The role of consciousness in second language learning. Applied Linguistics, 11, 129–158.

    Article  Google Scholar 

  • Sharwood Smith, M. (1993). Input enhancement in instructed SLA: Theoretical bases. Studies in Second Language Acquisition, 15, 165–179.

    Article  Google Scholar 

  • Sinclair, J. (1987). Looking Up: An account of the COBUILD project in lexical computing. London: Collins.

    Google Scholar 

  • Sinclair, J. (2004). Introduction. In J. Sinclair (Ed.), How to Use corpora in language teaching (pp. 1–10). Amsterdam/Philadelphia: John Benjamins Publishing Company.

    Chapter  Google Scholar 

  • St.John, E. (2001). A case for using a parallel corpus and concordancer. Language Learning and Technology, 5(3), 185–203.

    Google Scholar 

  • Swain, M. (1985). Communicative competence: Some roles of comprehensible input and comprehensible output in its development. In S. Gass & C. Madden (Eds.), Input in second language acquisition (pp. 235–256). New York: Newbury House.

    Google Scholar 

  • Swain, M. (2005). The output hypothesis: Theory and research. In E. Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 471–484). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Szirmai, M. (2002). Corpus linguistics in Japan: Its status and Role in language education. In P. Lewis (Ed.), The changing face of CALL A Japanese perspective (pp. 91–107). Lisse: Swets and Zeitlinger Publishers.

    Google Scholar 

  • Takashima, H., & Ellis, R. (1999). Output enhancement and the acquisition of the past tense. In R. Ellis (Ed.), Learning a second language through interaction (pp. 173–188). Amsterdam/Philadelphia: John Benjamins Publishing Company.

    Google Scholar 

  • Tribble, C. (2000). Practical uses for language corpora in ELT. In P. Brett & G. Motteramm (Eds.), A special interest in computers: Learning and teaching with information and communications technologies (pp. 31–41). Whistable, Kent: IATEFL.

    Google Scholar 

  • Uitdenbogerd, A. L. (2005). Readability of French as a Foreign Language and its Uses. In J. Kay, A. Turpin, R. Wilkinson (Eds.), Proceeding of the Australasian Document Computing Symposium (ADCS) (pp.19–25). Sydney, NSW, Australia.

  • van Baardewijk-Rességuier, J., & van Willigen-Sinemus, M. (1989). Matériaux pour la traduction du néerlandais en français. Muiderberg: Dick Coutinho.

    Google Scholar 

  • Van Patten, B., Williams, J., & Rott, S. (2004). Form-meaning connections in second language acquisition. In B. Van Patten, J. Williams, S. Rott, & M. Overstreet (Eds.), Form-meaning connections in second language acquisition (pp. 1–26). Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Widdowson, H. G. (1998). Context, community and authentic language. TESOL Quarterly, 32(4), 705–716.

    Article  Google Scholar 

  • Widdowson, H. G. (2000). On the limitations of linguistics applied. Applied Linguistics, 21(1), 3–25.

    Article  Google Scholar 

  • Widdowson, H. G. (2003). Defining issues in English language teaching. Oxford: Oxford University Press.

    Google Scholar 

  • Xu, J. (2010). Using multimedia vocabulary annotations in L2 reading and listening activities. CALICO Journal, 27(2), 311–327.

    Google Scholar 

  • Yoon, H. (2008). More than a linguistic reference: The influence of corpus technology on L2 academic writing. Language Learning and Technology, 12(2), 31–48.

    Google Scholar 

Download references

Acknowledgments

The DPC project has been carried out within the STEVIN programme, which is funded by the Dutch and Flemish Governments. The DPC was created by a Flemish consortium (KU Leuven Kulak and the Faculty of Translation Studies of Ghent University College): Piet Desmet, Willy Vandeweghe, Hans Paulussen, Lieve Macken, Maribel Montero Perez, Orphée De Clercq, Lidia Rura, Julia Trushkina, and Antoine Besnehard.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maribel Montero Perez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Montero Perez, M., Paulussen, H., Macken, L. et al. From input to output: the potential of parallel corpora for CALL. Lang Resources & Evaluation 48, 165–189 (2014). https://doi.org/10.1007/s10579-013-9241-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-013-9241-4

Keywords