Abstract
This paper proposes a novel technique that applies case-based reasoning in order to generate templates for reusable parse tree fragments, based on PoS tags of bigrams and trigrams that demonstrate low variability in their syntactic analyses from prior data. The aim of this approach is to improve the speed of dependency parsers by avoiding redundant calculations. This can be resolved by applying the predefined templates that capture results of previous syntactic analyses and directly assigning the stored structure to a new n-gram that matches one of the templates, instead of parsing a similar text fragment again. The study shows that using a heuristic approach to select and reuse the partial results increases parsing speed by reducing the input length to be processed by a parser. The increase in parsing speed comes at some expense of accuracy. Experiments on English show promising results: the input dimension can be reduced by more than 20% at the cost of less than 3 points of Unlabeled Attachment Score.
This work has received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant agreement No 714150), from the TELEPARES-UDC project (FFI2014-51978-C2-2-R) and the ANSWER-ASAP project (TIN2017-85160-C2-1-R) from MINECO, and from Xunta de Galicia (ED431B 2017/01). We gratefully acknowledge NVIDIA Corporation for the donation of a GTX Titan X GPU.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In this implementation we only consider projective trees for trigrams.
- 2.
Intel Core i7-7700 CPU 4.2 GHz.
- 3.
In case a bigram and trigram overlap, the n-gram with higher head confidence will be chosen and its dependents will be removed.
References
Baroni, M.: Distributions in text. In: Corpus Linguistics: An international handbook, vol. 2, pp. 803–821. Mouton de Gruyter (2009)
Bodenstab, N., Dunlop, A., Hall, K., Roark, B.: Beam-width prediction for efficient context-free parsing. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT 2011, pp. 440–449. Association for Computational Linguistics, Stroudsburg (2011). http://dl.acm.org/citation.cfm?id=2002472.2002529
Gómez-Rodríguez, C., Alonso-Alonso, I., Vilares, D.: How important is syntactic parsing accuracy? An empirical evaluation on rule-based sentiment analysis. Artif. Intell. Rev. 52(3), 2081–2097 (2017). https://doi.org/10.1007/s10462-017-9584-0
Ha, L.Q., Hanna, P., Ming, J., Smith, F.: Extending Zipf’s law to n-grams for large corpora. Artif. Intell. Rev. 32(1–4), 101–113 (2009). https://doi.org/10.1007/s10462-009-9135-4
Ha, L.Q., Sicilia-Garcia, E.I., Ming, J., Smith, F.J.: Extension of Zipf’s law to words and phrases. In: Proceedings of the 19th International Conference on Computational Linguistics-Volume 1, pp. 1–6. Association for Computational Linguistics (2002)
Hüllermeier, E.: Case-Based Approximate Reasoning, Theory and Decision Library, vol. 44. Springer, Cham (2007). https://doi.org/10.1007/1-4020-5695-8
Kiperwasser, E., Goldberg, Y.: Simple and accurate dependency parsing using bidirectional LSTM feature representations. TACL 4, 313–327 (2016). https://transacl.org/ojs/index.php/tacl/article/view/885
Nivre, J., Hall, J., Nilsson, J.: Maltparser: a data-driven parser-generator for dependency parsing. In: Proceedings of LREC, vol. 6, pp. 2216–2219 (2006)
Nivre, J., et al.: Universal dependencies 2.1 (2017). http://hdl.handle.net/11234/1-2515. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (’UFAL), Faculty of Mathematics and Physics, Charles University
Richter, M.M., Aamodt, A.: Case-based reasoning foundations. Knowl. Eng. Rev. 20(3), 203–207 (2005). https://doi.org/10.1017/S0269888906000695
Smith, F., Devine, K.: Storing and retrieving word phrases. Inf. Process. Manage. 21(3), 215–224 (1985)
Straka, M., Straková, J.: Tokenizing, pos tagging, lemmatizing and parsing ud 2.0 with udpipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88–99. Association for Computational Linguistics, Vancouver, August 2017. http://www.aclweb.org/anthology/K/K17/K17-3009.pdf
Vieira, T., Eisner, J.: Learning to prune: exploring the frontier of fast and accurate parsing. Trans. Assoc. Comput. Linguist. 5, 263–278 (2017). https://transacl.org/ojs/index.php/tacl/article/view/924
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Strzyz, M., Gómez-Rodríguez, C. (2023). Speeding up Natural Language Parsing by Reusing Partial Results. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_46
Download citation
DOI: https://doi.org/10.1007/978-3-031-24337-0_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24336-3
Online ISBN: 978-3-031-24337-0
eBook Packages: Computer ScienceComputer Science (R0)