I've recently been writing a bit of parsing code in Rust, and I've been jumping back and forth between a few different parsing libraries - they all have different advantages and disadvantages, so I wanted to write up some notes here to help folks who are undecided choose what libraries and techniques to consider, and also to offer some suggestions for the future of the Rust parsing ecosystem.
This is a sequel to the previous post about Pratt parsing. Here, we’ll study the relationship between top-down operator precedence (Pratt parsing) and the more famous shunting yard algorithm. Spoiler: they are the same algorithm, the difference is implementation style with recursion (Pratt) or a manual stack (Dijkstra).
I stumbled across a very useful paper called "Syntax error recovery in parsing expression grammars" by (Medeiros, S. and Fabio Mascarenhas, 2018) that I would like to share, and I'll be testing some of its concepts using a prototype parser written in Rust with the help of the nom crate.
The main motivation behind this post comes from the fact that, with the implementation of the lambda calculus we wrote in the last post has poor readability and is difficult to write, it also forces the user of our implementation to have Rust installed to use it, instead of being able to run an standalone executable.
One step in this direction is to write a parser for our small language, so we are able to take raw text and transform it into an adequate term. Rust has a lot of parsing libraries, such as nom, pest, combine and lalrpop, each one with its pros and cons. However, I think is better to just write our own parser from scratch so we can take a look on how parsing actually works. In case you're interested the parsing pattern we are going to use here is loosely based on the very first predictive parser from the second chapter of the Dragon Book.
Welcome to my article about Pratt parsing — the monad tutorial of syntactic analysis. The goals of this article are:
- Raising an issue that the so-called left-recursion problem is overstated.
- Complaining about inadequacy of BNF for represeting infix expressions.
- Providing a description and implementation of Pratt parsing algorithm which sticks to the core and doesn’t introduce a DSL-y abstraction.
- Understanding the algorithm myself for hopefully the last time. I’ve implemented a production-grade Pratt parser once, but I no longer immediately understand that code :-)
This post assumes a fair bit of familiarity with parsing techniques, and, for example, does not explain what a context free grammar is.
This is going to be interesting project for me.
I'm gonna build modern parsing library in Rust and I'll share here my experience.
Note - I'm not going to sell you golden solution - I have no final answers. Instead, I'd like to think about these articles as my notes and diary. Hopefully when I finish both you and I will learn something new.
Ok. Let's start from the beginning.
tutorial parsing nom
Nom's official documentation includes trivially simple examples (e.g. how to parse a hexadecimal RGB color code) and very complicated examples (e.g. how to parse json). When I first learned nom I found a steep learning curve in between the simple and complex examples. Furthermore, previous versions of nom, and most of the existing documentation, use macros. From nom 5.0 onward macros are soft-deprecated in favor of functions. This tutorial aims to fill the gap between simple and complex parsers by parsing the contents of /proc/mounts, and it demonstrates the use of functions instead of macros.
Combine is a parser combinator library for the Rust programming language. I first announced version 3 of Combine back in August and back then I definitely expected to have a stable version by now. However other projects (cough gluon cough) got in the way and Combine fell to the wayside. It didn’t help that I didn’t have a killer feature for 3.0 either, user-defined error types make it possible to define parsers usable in #[no_std] crates which is great when you need it but it is still a fairly niche use-case.
View all tags