Sunday, March 20, 2016

Parser Refactoring

Not a lot of sexy work in the past few months. I wanted to add features - local variables, interfaces, .NET interop, something - but just found myself dreading going into the parser code to extend it. So instead, I spent a little time refactoring that to suck less.

I should note that I'm talking about the actual parser code for type and function declarations, not Tangent's fancy inference engine for function bodies. The parser code used to be just a series of functions. They took tokens, they returned an optional type, popping tokens as needed on success, perhaps calling other parser functions. When the grammar was a dozen or so rules, that was fine. It was easy to step through, and it was easy to unit test.

That quickly got squirrelly as I added generics, function params, product types, and sum types. The grammar is still only about two dozen rules, but they interact a lot more. This means each parsing function is doing more, and that added complexity was harming unit testing and my ability to jump back into the code after some time doing real work.

So, I went back to an old stand-by the compositional parser. One of the first things I did when I learned C# was to create a parsing framework akin to boost::spirit for the first version of Tangent. Not using that though. It generates full parse trees, which are awkward to work with. I just tossed together something similar, but specific to the existing structure in the Tangent code.

What this allows me to do is to define the grammar more declaratively. That makes it easier to see what's going on. And since the declarations get real ugly real fast, it pushes me to separate things out into testable, reusable chunks. And since I was smart enough to have a small set of regression tests, I could make sure that the refactoring didn't break anything I expected to work before.



The stable checkpoint is available at https://github.com/Telastyn/Tangent/releases/tag/Milestone3.1

No comments:

Post a Comment