Tangent Developer Journal: 2014

Monday, December 15, 2014

Progress - Expression Parsing

Work continues. After an aborted first attempt that did not deal with ambiguity well, and a bout of coder's block, the first swipe at the second stage of the parser is dev-done. The code can (read: should) now parse statements when given an arbitrary identifier stream and a scope. Scopes are simple now. No nesting. Just type constants (which aren't used yet), parameter declarations and functions in separate bundles to keep things simple.

Tests exist for:

step from identifiers to parameter reference
step from identifiers to function binding
step from function binding to function invocation
function consumption of non-terminals (values of a given type)
priority given to parameters when ambiguous
use of lower priority when it can successfully parse

Not that it actually does anything yet. All the code does is take identifiers and give you a parse tree for that statement. More than one parse tree actually if the statement is ambiguous in the current scope. It also likely behaves poorly when there are cycles in the grammar/function declarations.

Still, I am happy to have the core that makes Tangent different there and working. And happy that progress continues

Saturday, December 6, 2014

Progress - Initial Iteration

The first iteration is progressing. As much as I would like type declarations and functions be very similar, they are currently very distinct. This is for simplicity and the whole "don't get fancy" focus. The current language is dead simple (not lisp dead simple, but dead simple compared to what I'd like eventually):

program ::= (type-decl | function-decl)^*
type-decl ::= id⁺ :> enum { comma-delimited-id-list }
function-decl ::= phrase-part⁺ => id⁺ { statement^* }
phrase-part ::= id | parameter-decl
parameter-decl ::= '(' id⁺ : id⁺ ')'
statement ::= id⁺ ;

The rest of the work is done in the second step of the parser, the part that we're focusing on this time around. That part takes the various statements and performs bottom-up parsing on them using the type declarations from the first (conventional) step of the parser. The void type is used to define the start rule for statements, and other rules will need to be used to help deal with potential ambiguities (like preferring an id literal over a parameter reference).

Otherwise, it should proceed pretty much like any other bottom-up parsing approach, albeit providing an extensible syntax while demanding nothing more from programmers than they're used to - declaring and annotating types. Currently, the first step of the parser is dev-done.

Wednesday, November 26, 2014

Once more unto the breach

It has been a while.

After the last failure, work (and school) showed up in force. I did not have much time for Tangent, or more interesting things for that matter. But now work has died down, as has class. I've (finally) picked up the Dragon Book, and realized that much of what I was doing with Tangent types was very similar to the use of non-terminals in formal grammars. I asked a question about it on StackExchange, which... didn't really go as hoped. Though it did remind me of the vast gap between professionals and academics.

Which in turn provided a bit of motivation to get off my ass and figure things out on my own. So I've started in on yet another version of Tangent - this time on GitHub. And this time, the approach will be to have an exceptionally minimal end-to-end implementation that is built upon. First step is do enough to declare symbols, declare (unary) functions, parse expressions using the type info, and debug-print symbols. No subtyping. No records. No type operators. No strings, no ints, no bools.

Just enough to implement the core extensible syntax successfully. Don't get fancy. Don't get greedy.