Sunday, March 8, 2015

Milestone 1 revisited

So it turns out that I am an idiot. Well, I was being an idiot. Now perhaps not so much. I talked in the Milestone 1 post  about how looping eventually will overflow because of how it's implemented. In short, that I can't use tail call optimization because the compiler can't tell if the recursive call is in tail position, since conditionals are implemented in-language.

That was entirely true. But what I failed to realize (until last night, idly thinking on the edge of sleep) is that it doesn't matter if the recursive call is in tail position. Let's look again at our while code:


while (condition: ~>bool) (body: ~>void) => void {
    if condition { 
        body; 
        while condition body; 
    };
}

and think about how this gets compiled into CIL:

temp1 = condition
temp2 = new Closure({ body; while condition body; });
call if(temp1,temp2)

So while the recursive call is in tail position for the closure, the real eureka moment was realizing that the conditional was the thing that mattered conceptually. It was the function call in tail position for the loop. All of the internet articles reiterate how conditionals are special for determining tail call usage, and because I was so focused on the recursive aspect I did not stop to consider how doing things differently invalidated conventional wisdom.

Anyways, loops now work as expected without stack overflow and without needing to build them into the language. The code is in github marked as Milestone 2.

Thursday, March 5, 2015

Changes to return and added debugging support

Two updates to talk about today.

The first is a change to how return works in the language. It now does not work. This is a side effect of making blocks into implicit closures. By doing that, it lets you pass them around (cool), but it makes it really awkward to return from them. I had intended to have return simply set a return value and exit, allowing the calling function to use the return value if it was expecting one. But even there, it becomes confusing if the return value is for the function or for the closure. In the end, I decided it wasn't really worth the confusion (for now).

So instead, Tangent will now do a little type inference with blocks (and functions). Similar to CIL, the return value for a function (or block/closure) will be the last statement in the function. So it works something like this:


f(x: ~> int) => void { }
g => string { 
    must-return-void;
    must-return-void;
    f {
       must-return-void;
       // ...
       must-return-int;
    };
    must-return-void;
    must-return-string;
}

So the normal Tangent order of operation inference works, but the target of the algorithm differs. For normal blocks, the last statement will need to return void, just like any other statement (or any other block). Otherwise, it provides a mechanism for basic anonymous functions.

The second change was the addition of debugging info to the compiler. If you start debugging versus the exe the compiler generates, you'll now be able to step through code, set breakpoints, inspect variables, etc. While it was a lot of work to wire the line/column info from source to compiler, there's not much to actually making the debugging work. One extra ILGenerator call to mark the reductions with their location in code. Visual Studio does the rest. I had done this with earlier revisions of the language, but it is a nice reminder of how powerful well made software can be.

Friday, February 20, 2015

Milestone 1

Finally, some approximation of success. Milestone 1 has been marked on github and is free for to download. The language has enough functionality to meet my initial goals, which is to allow in-language conditionals and in-language loops. It does the order of operations inference and actually runs. Unfortunately there is one small issue and one fairly significant issue. The small issue is that you can't return out of blocks currently. Since blocks are implemented as closures, I can't jump to the end of the function since I'm no longer in the function when the return is hit.

The bigger problem is that because blocks are implemented as closures, using a block in a while loop will (eventually) cause a stack overflow. There are a few ways around that, the most natural to this design being implementation of Tail Call Optimization. Except tail call optimization can only be done when I know the closure call is in tail position. And since conditionals are done in-language, the compiler can't tell. I asked a question on stack exchange about it and... did not get help really. The academics don't care about the implementation and the pragmatists don't care about the theory.

Oh, and the error reporting is horrific and there are no debugging symbols yet. Sorry.

Anyways, here is the test program for milestone 1 - basic booleans, if statements and while loops - all defined in-language with only enum declaration, function declaration, void/unit-type, specialization and closures built into the language itself:


bool :> enum {
    true,
    false
}

if (condition: bool) (positive: ~>void) => void {
}

if (condition: bool) (positive: ~>void) else (negative: ~>void) => void {
    negative;
}

if (condition: bool.true) (positive: ~>void) => void {
    positive;
}

if (condition: bool.true) (positive: ~>void) else (negative: ~>void) => void {
    positive;
}

(lhs: bool) equals (rhs: bool) => bool {
    return false;
}

(lhs: bool.true) equals (rhs: bool.true) => bool {
    return true;
}

(lhs: bool.false) equals (rhs: bool.false) => bool {
 return true;
}

while (condition: ~>bool) (body: ~>void) => void {
    if condition { body; while condition body; };
}

entrypoint => void {
    while true equals true print ".";
}

Which in turn compiles to CIL (yes, CIL lets you define functions with spaces in the name):

I'm not sure what the next steps will be quite yet. I expect it will be implementing ints with proper order of operations in-language. Depending on the approach, I may first move this stuff into built-ins to help improve performance and structure of the built code. I also might get distracted by the lack of general equality and push generics in to support that, or some fancy ill-conceived idea like usual.

But keeping things simple has worked fairly well so far (and cs.stackexchange less so). A good lesson to remember.

Saturday, January 31, 2015

Progress - Compilation.

Sorry for the long layoff in posts, half dozen of you who wander here lost.

Work continues in fits and starts. I had gotten the language compiling using Expression Trees, which quickly ran into the issue that Expression Trees can't work with MethodBuilders, since they're not really MethodInfos. So that sucked. Not quite sure how previous iterations ever worked...

So things are now using Reflection.Emit to generate the actual IL, which seems like a particularly bad idea. It means I need to learn new things to make this already challenging adventure more challenging, and it means that things are likely to be even slower once running. But whatever. I don't need it to be fast, and I don't particularly want to build strings of C# and then recompile that.

Speaking of bad ideas, I've been working on getting conditionals working ("whoa, conditionals" you reply in your mock impressed voice). Yes, and sadly, I've gotten it into my head to not make conditionals built in statements, but something you can define in the language itself. And Tangent is now to the point where you can do that (I think, there's no way to easily execute the code yet):


bool :> enum {
    true,
    false
}

if (condition: bool) (positive: ~>void) => void {
}

if (condition: bool) (positive: ~>void) else (negative: ~>void) => void {
    negative;
}

if (condition: bool.true) (positive: ~>void) => void {
    positive;
}

if (condition: bool.true) (positive: ~>void) else (negative: ~>void) 
 => void {
    positive;
}

So there are a few parts added since the last post to make this work. The first is the lazy operator ~>. It is a type constructor that takes a type and creates a "lazy" form of that type. In C# terms, it converts ~>void to Action, or ~>bool to Func<bool>. In Tangent terms, it allows you to pass in a fully bound function to the parameter without actually executing (reducing) it. And since everything in Tangent is a function, it allows you to pass things around unevaluated. Tangent does some trickery where a block is implicitly of type ~>void so they too can be passed around.

The other trickery is the bool.true sort of stuff. I'm not thrilled by the syntax, but the concept is likely to carry forward. This is the only current specialization/subtyping relation in the language. This sort of overloading is handled via dynamic dispatch by the compiler. Ideally as other subtyping relations show up, similar sort of specialization should be allowed. But that is for later.

For now, the dynamic dispatch and lazy evaluation is there in combination to allow you the user to create your own syntax for things as elementary as if statements.

Now to get enough working to actually run it and see some results.


Monday, December 15, 2014

Progress - Expression Parsing

Work continues. After an aborted first attempt that did not deal with ambiguity well, and a bout of coder's block, the first swipe at the second stage of the parser is dev-done. The code can (read: should) now parse statements when given an arbitrary identifier stream and a scope. Scopes are simple now. No nesting. Just type constants (which aren't used yet), parameter declarations and functions in separate bundles to keep things simple.

Tests exist for:
  • step from identifiers to parameter reference
  • step from identifiers to function binding
  • step from function binding to function invocation
  • function consumption of non-terminals (values of a given type)
  • priority given to parameters when ambiguous
  • use of lower priority when it can successfully parse
Not that it actually does anything yet. All the code does is take identifiers and give you a parse tree for that statement. More than one parse tree actually if the statement is ambiguous in the current scope. It also likely behaves poorly when there are cycles in the grammar/function declarations.

Still, I am happy to have the core that makes Tangent different there and working. And happy that progress continues

Saturday, December 6, 2014

Progress - Initial Iteration

The first iteration is progressing. As much as I would like type declarations and functions be very similar, they are currently very distinct. This is for simplicity and the whole "don't get fancy" focus. The current language is dead simple (not lisp dead simple, but dead simple compared to what I'd like eventually):

program ::= (type-decl | function-decl)*
type-decl ::= id+ :> enum { comma-delimited-id-list
function-decl ::= phrase-part+ => id+ { statement* }
phrase-part ::= id | parameter-decl
parameter-decl ::= '(' id+ : id+ ')'
statement ::= id+ ;

The rest of the work is done in the second step of the parser, the part that we're focusing on this time around. That part takes the various statements and performs bottom-up parsing on them using the type declarations from the first (conventional) step of the parser. The void type is used to define the start rule for statements, and other rules will need to be used to help deal with potential ambiguities (like preferring an id literal over a parameter reference).

Otherwise, it should proceed pretty much like any other bottom-up parsing approach, albeit providing an extensible syntax while demanding nothing more from programmers than they're used to - declaring and annotating types. Currently, the first step of the parser is dev-done. 

Wednesday, November 26, 2014

Once more unto the breach

It has been a while.

After the last failure, work (and school) showed up in force. I did not have much time for Tangent, or more interesting things for that matter. But now work has died down, as has class. I've (finally) picked up the Dragon Book, and realized that much of what I was doing with Tangent types was very similar to the use of non-terminals in formal grammars. I asked a question about it on StackExchange, which... didn't really go as hoped. Though it did remind me of the vast gap between professionals and academics.

Which in turn provided a bit of motivation to get off my ass and figure things out on my own. So I've started in on yet another version of Tangent - this time on GitHub. And this time, the approach will be to have an exceptionally minimal end-to-end implementation that is built upon. First step is do enough to declare symbols, declare (unary) functions, parse expressions using the type info, and debug-print symbols. No subtyping. No records. No type operators. No strings, no ints, no bools.

Just enough to implement the core extensible syntax successfully. Don't get fancy. Don't get greedy.