Abstract
The theory of context-free languages is well-understood and context-free parsers can be used as off-the-shelf tools in practice. In particular, to use a context-free parser framework, a user does not need to understand its internals but can specify a language declaratively as a grammar. However, many languages in practice are not context-free. One particularly important class of such languages is layout-sensitive languages, in which the structure of code depends on indentation and whitespace. For example, Python, Haskell, F#, and Markdown use indentation instead of curly braces to determine the block structure of code. Their parsers (and lexers) are not declaratively specified but hand-tuned to account for layout-sensitivity.
To support declarative specifications of layout-sensitive languages, we propose a parsing framework in which a user can annotate layout in a grammar. Annotations take the form of constraints on the relative positioning of tokens in the parsed subtrees. For example, a user can declare that a block consists of statements that all start on the same column. We have integrated layout constraints into SDF and implemented a layout-sensitive generalized parser as an extension of generalized LR parsing. We evaluate the correctness and performance of our parser by parsing 33 290 open-source Haskell files. Layout-sensitive generalized parsing is easy to use, and its performance overhead compared to layout-insensitive parsing is small enough for practical application.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bravenboer, M., Vermaas, R., Vinju, J.J., Visser, E.: Generalized Type-Based Disambiguation of Meta Programs with Concrete Object Syntax. In: Glück, R., Lowry, M. (eds.) GPCE 2005. LNCS, vol. 3676, pp. 157–172. Springer, Heidelberg (2005)
Dijkstra, A., Fokker, J., Swierstra, S.D.: The architecture of the Utrecht Haskell compiler. In: Proceedings of Haskell Symposium, pp. 93–104. ACM (2009)
Erdweg, S., Giarrusso, P.G., Rendel, T.: Language composition untangled. In: Proceedings of Workshop on Language Descriptions, Tools and Applications, LDTA (to appear, 2012)
Erdweg, S., Rendel, T., Kästner, C., Ostermann, K.: SugarJ: Library-based syntactic language extensibility. In: Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp. 391–406. ACM (2011)
Erdweg, S., Rieger, F., Rendel, T., Ostermann, K.: Layout-sensitive language extensibility with SugarHaskell. In: Proceedings of Haskell Symposium, pp. 149–160. ACM (2012)
Ford, B.: Packrat parsing: Simple, powerful, lazy, linear time, functional pearl. In: Proceedings of International Conference on Functional Programming (ICFP), pp. 36–47. ACM (2002)
Ford, B.: Parsing expression grammars: A recognition-based syntactic foundation. In: Proceedings of Symposium on Principles of Programming Languages (POPL), pp. 111–122. ACM (2004)
Heering, J., Hendriks, P.R.H., Klint, P., Rekers, J.: The syntax definition formalism SDF – reference manual. SIGPLAN Notices 24(11), 43–75 (1989)
Jim, T., Mandelbaum, Y., Walker, D.: Semantics and algorithms for data-dependent grammars. In: Proceedings of Symposium on Principles of Programming Languages (POPL), pp. 417–430. ACM (2010)
Kats, L.C.L., de Jonge, M., Nilsson-Nyman, E., Visser, E.: Providing rapid feedback in generated modular language environments: Adding error recovery to scannerless generalized-LR parsing. In: Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp. 445–464. ACM (2009)
Kats, L.C.L., Visser, E., Wachsmuth, G.: Pure and declarative syntax definition: Paradise lost and regained. In: Proceedings of Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp. 918–932. ACM (2010)
Landin, P.J.: The next 700 programming languages. Communication of the ACM 9(3), 157–166 (1966)
Leijen, D., Meijer, E.: Parsec: Direct style monadic parser combinators for the real world. Technical Report UU-CS-2001-27, Universiteit Utrecht (2001)
Marlow, S. (ed.): Haskell 2010 language report (2010), http://www.haskell.org/onlinereport/haskell2010
Mason, T., Brown, D.: Lex & yacc. O’Reilly (1990)
McBride, C.: Epigram: Practical Programming with Dependent Types. In: Vene, V., Uustalu, T. (eds.) AFP 2004. LNCS, vol. 3622, pp. 130–170. Springer, Heidelberg (2005)
Parr, T., Fisher, K.: LL(*): The foundation of the ANTLR parser generator. In: Proceedings of Conference on Programming Language Design and Implementation (PLDI), pp. 425–436. ACM (2011)
Parr, T., Quong, R.W.: ANTLR: A predicated-LL(k) parser generator. Software Practice and Experience 25, 789–810 (1994)
Tomita, M.: An efficient augmented-context-free parsing algorithm. Computational Linguistics 13(1-2), 31–46 (1987)
den van Brand, M.G.J., Scheerder, J., Vinju, J.J., Visser, E.: Disambiguation Filters for Scannerless Generalized LR Parsers. In: Nigel Horspool, R. (ed.) CC 2002. LNCS, vol. 2304, pp. 143–158. Springer, Heidelberg (2002)
Visser, E.: Scannerless generalized-LR parsing. Technical Report P9707, Programming Research Group, University of Amsterdam (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Erdweg, S., Rendel, T., Kästner, C., Ostermann, K. (2013). Layout-Sensitive Generalized Parsing. In: Czarnecki, K., Hedin, G. (eds) Software Language Engineering. SLE 2012. Lecture Notes in Computer Science, vol 7745. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36089-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-36089-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36088-6
Online ISBN: 978-3-642-36089-3
eBook Packages: Computer ScienceComputer Science (R0)