Tangent Developer Journal: 2015

Thursday, December 31, 2015

Delegates - Part 1

Turns out it was delegates. Having lists without higher order functions was going to annoy me sooner rather than later, and allowing mutable state - even local mutable state - was going to add complexity and bugs I didn't want to deal with.

But if you remember, we don't actually have a way to declare higher order function types in Tangent. And since functions look like phrases, it's not quite clear how to allow users to both specify higher order functions and anonymous functions that fit that style of calling convention. What I settled on is a little bit weird, so bear with me.

To say that a parameter takes a higher order function, you build a small phrase in the parameter declaration. For example:

twofer(fn(int): void) => void {
fn 2;
}

or

sort (collection: list int) with (compare (int) to (int): int) => list int { ... }

Since you can't ever access the delegate parameter directly, just specifying the type is sufficient. fn above compiles to a .NET Action<int>, with the phrase style calling, and compare (int) to (int) to a Func<int, int, int>, again allowing the function to call it with the right phrase pattern.

To call these functions, you just use lambda syntax similar to C#:

twofer (x) => {print x;};

or

sort stuff with (a) (b) => { a.Compare(b); };

Right now, the parens and curly braces are required. I expect the parens to become not required and the curly requirement likely to stay. Annoying, but this is better than many of my alternatives.

Now, there's a little bit of a trick to this. In C#, it is very clear what types the lambda's parameters are. But in Tangent, we infer the order of operations - both using the lambda as an argument, and in the body of the lambda itself. As the types of the lambda parameters change, the actual operations taken in the body of the lambda can also change.

What happens is that the entire lambda is considered a node in the expression. When the pattern twofer (Action<int>) sees the tokens "twofer" <lambda> it checks to see if the argument matches the parameter. Does it have the same number of parameters? If x is an int and we need to return void, does the lambda parse? If not, then the match fails and the compiler moves along, trying different interpretations of the statement.

This will allow some mildly weird behavior, like the body of lambdas effecting which overload of a function is used. I'll look to have an example of that once I'm more sure it works properly. Likewise, I want to make sure that my closure implementation actually is correct. It was a little too easy.

Wednesday, December 2, 2015

Cleanup

So, at the core of Tangent is the code that takes all of the phrases the user has defined and figures out how to grok that statement. In previous iterations of the language, that code has been hairy, nasty, nested cruft that has made it difficult to add features and debug things when they go south.

In this iteration, things were better. I was explicit in defining the various things that the language could define, and good about keeping things simple. Unfortunately, this led to a bit of duplication in this core code because the simplest way to deal with these definitions was just to have different handlers that did almost the same thing.

Not great. So I spent some time making some better abstractions and cleaning up that core code. In the end, it works akin to a shift reduce parser - the phrases define what pattern should be matched and depending on what is being declared (a parameter, a type, a function, etc.) it creates a different expression type in the abstract syntax tree. The new design better shares the matching and use of phrases, while leaving the "what happens when this is matched" variable. That allows better testing too, which is nice.

That should let me more easily move into local variables or delegates depending on what I feel like doing.

Wednesday, October 28, 2015

Software Engineering FTW!

Sorry about the delay. After the success in finally getting partial specialization working, I figured that it would be a good time to do a little cleanup. You know, address some of the tech debt I incurred hacking away before moving on to another feature that would be more complicated than I anticipated.

First on the list as you might expect was a lack of unit tests. Unfortunately, the stuff I wanted to test (code generation) doesn't exactly lend itself to unit testing well. For simplicity, the compiler emits directly to files. For even basic cases, the setup is a bit onerous to reproduce a sane intermediate structure. And the super basic cases aren't the ones that have been causing problems - it's been when things start interacting.

So I decided to do the next best thing, automated integration tests. I took all of the little test programs I've written (and posted here) and put them into a csproj along with their expected output. A little bit of code to kick off the compiler and run the resultant temp file executable, and I had a nice little regression suite.

You can probably guess what happened next. Yeah, half the tests failed.

Some were super quick to fix (typos during test creation). Some were kinda quick to fix (functionality I had removed). A few were slow to fix (wait, that should work...). And one took about a month and a half of intermittent spare time to fix. It was the basic Algebraic Data Type code:

int list :> int | (a: int),(b: int list) {
(this).head => int { a }
(this).tail => int list { b }
print (this) => void {
print this.head;
print this.tail;
}
}

print (il: int list) => void {}

entrypoint => void {
print 1,2,3;
}

It first blew up because of ambiguity errors between print(int list) and print(_anonymousProductType1). That ended up just being some overzealous refactoring on my part during the partial specialization work. I had removed the fanning out of implicit conversions for sum types, so the compiler didn't see the two functions as a specialization.

The second problem was more troublesome. Because of partial specialization fixes, the relationship between int list and _anonymousProductType1 was tighter. Under the covers, int list inherits from Variant<int, _anonymousProductType1>, and _anonymousProductType1 has a constructor that takes an int and an int list. That cyclic reference was biting me in the ass. The compiler was written to take an intermediate type representation and compile up a .NET type for it. I couldn't quite get the ordering right to build most of one type and then the other so that reflection didn't choke on stuff that was half-built.

In the end, I ended up having to take a bit of the nuclear option and change how the code generation worked. Now the compiler will populate all of the TypeBuilders and pretty much always work with half-built types until function generation at which point it uses AppDomain events to build the types in the right order. (taken from this SO answer) I think that this may cause me problems when types get more member methods, and it has made stuff a bit messy.

But at least now all my tests pass and things are stable again. Next on the docket is likely local variables.

Sunday, August 23, 2015

Runtime Partial Specialization Dispatch

Sorry about the delay. After last post, I decided to implement runtime partial specialization to keep the behavior of the language consistent - if you have a method override that is a better "match" for your arguments, then Tangent will use it.

It turns out that was not quite as straightforward as I would've liked. The issues were mostly in how I was working with product types. In Tangent, a type tends to be anonymous and self-contained. Type Declarations on the other hand supply names and generic parameters for types. This caused an issue with compilation since generic product type constructors didn't really know where their generic bits came from. So by the time they made it down to the code-gen, I couldn't construct the appropriate .NET type.

Of course, it took a while to actually figure out that was the issue - after it took a while to write up the partial specialization dispatch code into the code-gen. Anyways - this example code now works:

foo(T) :> f(x: T){}

do(x: (T)) => void {
print "in generic";
}

do(x: foo(T)) => void {
print "in foo generic";
}

do(x: foo int) => void {
print "in foo int";
}

bar(x: (T))=> void{
do x;
}

entrypoint => void {
bar "bar";
bar f "bar";
bar f 42;
}

Producing the properly dispatched output:

in generic
in foo generic
in foo int

I'm not sure if it's going to stay in the implementation or not. The dispatch for the most generic function is about 60 CIL operations, and doing type inference for the partial specialization requires reflection trickery. In short - it is slow, and slow in a way that I doubt could be improved too much.

I'll revisit it once I have users to complain about it.

Saturday, July 25, 2015

Runtime Generic Dispatch

As promised, doing some of the basic compile time generic specialization I showed last time during runtime was next on the list. It is now done. The following test code will now properly dispatch at runtime based on the type of the parameters passed into bar:

foo (x: (T)) => void {
print "in inference";
}

foo (x: int) => void {
print "in int.";
}

bar (x: (T)) => void {
foo x;
}

entrypoint => void {
bar "bar";
bar 42;
}

This though had a bit of a sticking point. Visual Studio compiles the basic type check as something like this:

IL_0001: ldtoken [mscorlib]System.Int32
IL_0006: call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
IL_000b: ldarga.s 'value'
IL_000d: constrained. !!fooT
IL_0013: callvirt instance class [mscorlib]System.Type [mscorlib]System.Object::GetType()
IL_0018: call bool [mscorlib]System.Type::op_Equality(class [mscorlib]System.Type, class [mscorlib]System.Type)

Note this weird constrained. prefix. It is some magic that lets CIL call generic parameters without boxing them. The problem comes with the ldarga.s instruction above it. Instead of pulling out the parameter value like everything else under the sun, this is getting the address of that parameter. If you don't get the address and make that call, you get some nondescript ExecutionEngineException during runtime. Awesome.

The problem for me was that at that point of the compilation process, I don't know the parameter's index since it was abstracted a few layers above. So for now, Tangent just boxes the thing and goes about its business.

I'm not sure what is next on the docket. Perhaps improving this to allow partial specialization, perhaps local variables, perhaps .NET imports. We'll see what strikes my fancy.

Wednesday, July 22, 2015

Generic Inference

Last post, I talked about Generic Types - or parameterized types or type constructors if you prefer. Things like List<T> where the type needs another type to be usable. Today, we're going to take a quick look at the companion to these, Inferred Types.

Generic Types are used during a type declaration to give you, well, a generic type. Inferred Types are used during function declaration to let you the programmer reference the type of a parameter (or part of a parameter) in the function without having to explicitly pass in the type when calling the function. Consider this basic C# function:

public void Echo<T>(T value) {
Console.WriteLine(value);
}

When you call Echo, you don't need to specify T - it is inferred from the type of value you pass in. That's what we're talking about today for Tangent. They are slightly different from generic type parameters, so took a little work, despite having to implement a basic version of them to get the normal generic types working.

Anyways, on to the test code for the day:

foo (x: (T)) => void {
print "in inference";
}

foo (x: int) => void {
print "in int.";
}

entrypoint => void {
foo "bar";
foo 42;
}

Again, nothing fancy. All I want to see is that the compiler parses the inference properly, figures out the right overload, and generates a functional exe. As you can see/remember, the parameter declaration syntax in Tangent is (parameter-name: type-expression). This syntax is also used for generic parameters in a type declaration as I explained in the last post. It will also be used for inferred generics.

To declare an inferred generic type, all you need to do is add the parameter declaration syntax to the parameter's type expression. So above, (T) says "please infer the type of x, and assign it to the variable T". Eventually, when the language supports constraints, it would look something like (T: constraint). This also works for inferring the parameters of generic functions - just like in C#. If you wanted to infer the T out of a List, it would look something like foo (x: List<(T)>) => void...

Now for the caveats. The first is that constraints don't actually work. The parser will grab them and resolve them to a type, but the type checker doesn't care and the code generator really doesn't care. The second is that specialization only happens at compile time. The above example correctly dispatches to the generic version and the int version respectively, but only because they are statically known. Fixing that is probably the next work. And lastly, there's a little bit of a subtle issue with how things work.

Consider this C# function:

public void Add<T>(List<T> collection, T value) { ... }

If you pass in a list of objects and an int, the compiler is smart enough to find the most common base type to use. At time of writing, Tangent doesn't have intersection types, or any similar mechanism to find that. So specifying the same generic inference type twice in the same function... will do something undefinedly bad. What you can (should be able to?) do is something like this:

add (value: T) to (collection: List<(T)>) => void { ... }

Here, value has no influence on the inference. The type of the List is what matters, and the type checker will verify that the value's type is (at least) that. That should be a good enough work around for the majority of cases.

Regardless, this basic functionality now works and is available in github. Next will likely be getting runtime dispatch of generic inferred functions to specific ones to work. After that I think that I'll move towards more practical realms so I can do vaguely useful things in the language. Local variable declarations and .NET interop being tops of the list.

Thursday, July 9, 2015

Generic Types

Well, that kinda sucked.

Even though implementing Sum Types was a bit painful, it was really rewarding to get things working. From that success, I moved on to implementing basic generic types. It involved a bit of work, but was pretty straight-forward once I figured out what I wanted. Previously, type declarations were pretty simple:

(identifier)+ :> (type-definition)

Product types added a new type definition, and sum types let them be composed, but the general syntax stayed the same. The design I ended up on looks very similar to the function syntax:

(identifier|parameter-decl)+ :> (type-definition)

Instead of being a simple name, a type can also have parameters. That is what a generic is after all, right? The type-definition would then have access to the parameter and could make references to it. The parameter's type would be a kind - essentially a type of a type, or in C# parlance, the constraint of the generic parameter.* The awesome thing then is that the right side is effectively the constructor for this type, so any generic references could be used to allow type inference in constructors. Haven't you always wanted that? Yes, it seems many of you did.

Presumably many of you have been humans for more than a few years, so you know that this is the point in the story where it comes crashing down. First, I ran into the issue of how to parse the generic parameters and then make them available to the rest of the declaration when the original code didn't know or care about such things. Scope is always the thing that takes more work then expected. Also note that I started off working on generic types, not any sort of generic inference in constructors. Once my parser said it couldn't figure out how to deal with things, I realized that the design required me to do at least some basic inference. Okay, whatever. Inference is tedious but I'd done it before and it's pretty straight-forward once you view it as a pattern matching exercise.

But then I ran into the big problem. Here is my test code for the feature:

Nothing :> enum { null }
(T)? :> T | Nothing;
do(x: int?) => void {
print "fn.";
}

entrypoint => void {
do 42;
do null;
}

A stupid little program that verifies that C#-like nullable types can be done. I don't particularly care if the value is real or null, and I don't really care what the output is. All I want is int? to parse properly, and do to take an int and null and execute.

The big problem here is a little subtle - remember that the right side of a type declaration is the constructor of the type, and I use type inference to figure out what type to build from it. When the constructor sees null, what is T?

Since the parser works akin to a normal shift-reduce parser, I don't have any sort of context. null doesn't know that I'm trying to pass it into an int?, all it knows is that it is a Nothing and needs to reduce if possible. There is no way for it to know/guess T. So that sucks. What really sucks is that it took me all that time and effort to realize that T (anything) or Nothing (something) is just anything again.

About 2 weeks of brooding and thinking of things that didn't work led me to a good enough compromise. I already have dynamic dispatch on functions (see ADTs), so I can flip it around. Instead of having Nothing build T? and call do, I would generate do(T) and do(Nothing). Then when I have a clear Nothing, it can call the function directly without any inference. That though leads to Tangent's first "Really Weird Thing It Won't Allow For Arcane Reasons": you cannot declare a function that takes a generic Sum Type and uses a generic parameter of that type in its return type and that generic parameter isn't used in all of the alternatives of the sum type. The compiler will give you an error about the offending function and generic parameter, and fail. This seems like a tiny limitation, and lets stuff like the example above work.

An annoying, twisty, frustrating road - but now functional.

* - yes, I am aware that kinds in other languages do not refer to this sort of thing.

Sunday, May 31, 2015

Algebraic Data Types

So about a month ago, I posted about the addition of product types to Tangent. These allow you to combine data together into things that look like classes. Well today is the other side of that coin, sum types. These let your data be either one type or another. Together, product types and sum types provide what is known as Algebraic Data Types, which functional languages have long touted for their expressiveness. Types can then be defined like "Foo is a string and a datetime or a string and a timespan".

The example test code for today:

int list :> int | (a: int),(b: int list) {
(this).head => int { a }
(this).tail => int list { b }
print (this) => void {
print this.head;
print this.tail;
}
}

print (il: int list) => void {}

entrypoint => void {
print 1,2,3;
}

This defines a nice simple (non-empty) list of integers and functions to print the list. The pipe ('|') is used as the type constructor for sum types in type declarations. So the declaration above is read as "int list is either an int or an int and an int list". The recursive type then cascades out indefinitely. You'll find this sort of thing all over Lisp and other functional languages. Here, I've defined the comma as the list constructor. And because of how Tangent works, list constructs are directly usable in code (as in the entrypoint). No boatloads of parens necessary like in Lisp. No need to extend the syntax like C# and other languages with special list initialization syntax. You can do it right in-language with Tangent.

Alas, there's still some cruft around things. First, is the need for the empty print function for int lists. Right now there's no real code for working with sum types. The only thing that is there is specialization logic like there is for single value types. So the "base" function is needed for the code to compile against, but the compiler is smart enough to dispatch calls to more specific versions of the function (int and the anonymous product type in this case). Not great.

Also not great is the passthroughs for head and tail on the anonymous product type. I'd like to have some syntax that would automatically re-expose constructor parameters, but I'm not sure what it is, or what it would look like.

And lastly, this is a pretty specific implementation. Lists should be generic. I have a good idea about what the syntax will look for that, and I expect that stuff to be the next large chunk of code worked on.

Why are these things here you ask? Sure, this sort of code seems pretty academic. After all, I don't expect people to use these lists instead of .NET's List. The main motivation for these things is so I don't have null in the language. The pains of null are well documented, and I'm curious how thing would work in a world where you need explicit nullable types. Beyond that, I hope that with sum types I don't need to implement exceptions. If your function can return something or an error, use a sum type that says that explicitly. That's very like Java checked exceptions, which suck hard, but I'm curious if the language flexibility can't lessen that pain in practice.

Anyways, these are now available in source control.

Tidbits
One bug I ran into was the lack of type checking in CIL. It happily sent the wrong parameter into a function and the function happily ran it, leading to badness. Similarly, if you don't unbox value types, they too will happily run the function on the wrong bits of data. Something to remember in the future since it was troublesome to track down. I also imagine that I could use such behavior in hacky terrible ways should they be necessary.

Also, sum types were a pain to implement. Product types are fairly simple, since the "does the type of that match what we need" logic can work the same as it did with simple types. Sum types required fairly distributed changes to a lot of different code since they add not only a selection step to type matching, but a layer of hierarchy to the type structure (though that was maybe since I insisted on anonymous types like above). It was interesting to see, and may be why so few languages implement them.

Saturday, May 9, 2015

Order of Operations

So I've finally gotten this iteration of Tangent to the point where I can start to see if it actually does what I think it should do. As you might expect, it does not.

I've gotten basic classes working enough that I can start making actual arithmetic with type shenanigans to coerce things to take the right order of operations. Here is the test code for the day:

term :> term (x: int){
(this) => int { x; }
}

factor :> factor (x: int){
(this) => term { term x; }
}

(lhs: int) + (rhs: term) => int { asm add lhs rhs; }
(lhs: term) * (rhs: factor) => term { term asm mul lhs rhs; }
(x: int) => factor { factor x; }

entrypoint => void {
print 2 + 3 * 4;
}

Now for a slight aside. The first time I ran this, I got 86. How did I get 86? Well, read on! The answer is at the bottom.

Anyways, this now produces 20, due to a subtlety in parsers that I didn't really fully grok until today when it bit me full on the ass. When you look at a formal grammar for arithmetic, there are always conversions to take function invocations, constants, paren expressions... bunches of stuff into a factor. What normal parsers do though is they determine the order of operations there, before ever knowing anything about the function invocation, constants, paren expressions, etc. They basically form the structure of the code ignoring any sort of conversions or trickery that needs to be done to turn that token into an int.

Since I'm doing the order of operations after knowing what trickery is necessary, the parser has forgotten if the int it sees is a constant, or the result of a function, or all of the limitations that formal grammars put on conversions to a factor. So Tangent happily converts 3 to a factor, then to a term to do the addition, and then circles the int result back around to a factor to multiply afterwards. To get this working, I'll need to break the cycle which will make normal use of integer parameters broken and/or weird.

So, 86. The root cause of the bug was in how the compiler generated its CIL. If you remember back to my post about loops, Tangent uses tail call optimization to keep function invocation workable in those sort of scenarios. The issue was that I always added the tail call to the last statement in a function. And I didn't change that when I added built-in opcodes like add and multiply (and constructor calls). So in the CIL, you saw arg1; arg2; tailcall; add;. .NET was happy to let that run, but I expect that the tailcall fubar'd my stack so that when the add actually ran, the arguments had been replaced by whatever random stuff was on the stack when that method started. Bad times.

Friday, May 8, 2015

Symbols in phrases

Quick update. One of the things that has been around in previous iterations of the language, which I've neglected to implement so far has been the ability to use symbols when defining a phrase.That lead to verbose operators in previous examples:

(x: int) plus (y: int) => int {...}
(a: bool) equals (b: bool) => bool {...}

With today's update, you can now use symbols (at the moment, pretty much any non-ascii, non-whitespace, non-open-curly/open-parens unicode character) in your phrase delcarations:

(x: int) + (y: int) => int {...}
(a: bool) = (b: bool) => bool {...}

Thursday, May 7, 2015

Product Types

Sorry for the delay in posting, but I found myself stuck deciding where to go next with the language. On one hand, I want to get the thing built enough that I can build non-trivial programs with it, to see how useful/terrible the inference works in practice. On the other, I want to produce some more academic results showing that the order of operation inference is equivalent to existing methods for doing this sort of stuff. All while knowing that I was unlikely to achieve either. Let's just say that it was not the most motivational of situations.

I still haven't really decided on which path to take, but I'm moving forward anyways. Today's post is about Product Types. Wikipedia and other sources have nice formal descriptions if you're so interested. Practically, it means that I've started in on what most of you know as classes. Tangent so far has only allowed enums, which allowed types to be one of some well defined set of values. What I've added is the phrase syntax to type declarations. Consider this trivial test program:

int :> enum {a}
factor :> factor (x: int){}

foo(f: factor) => void { 
  print "in foo";
}

entrypoint => void {
  foo factor a;
}

Here, factor is a product type which takes only a single input a (poorly named) int. That's a terrible example. Let's look at something better:

height :> (ft: int) feet (in: int) inches {}

This is modeled a bit off of Haskell. The left side of the declaration is the type name (and eventually, generic parameters). The right side of the declaration is the constructor. It will act akin to C# 6 Primary Constructors. The parameters declared in the constructor can be used within the class declaration (in curlies, empty here and currently unsupported). But since the constructor uses the same phrase syntax that functions do, you're free to make it more descriptive than new Foo(blah, blah).

The compiler will now take this code and generate a nice POCO, as well as call the constructor at the appropriate time. It only took about 4 hours too, even after the time away from the code, which is a nice sign that the underlying code is solid. You can't actually do anything with the things yet. I need to decide if I want to work with them via pattern matching or via a more OO style approach.

Sunday, March 8, 2015

Milestone 1 revisited

So it turns out that I am an idiot. Well, I was being an idiot. Now perhaps not so much. I talked in the Milestone 1 post about how looping eventually will overflow because of how it's implemented. In short, that I can't use tail call optimization because the compiler can't tell if the recursive call is in tail position, since conditionals are implemented in-language.

That was entirely true. But what I failed to realize (until last night, idly thinking on the edge of sleep) is that it doesn't matter if the recursive call is in tail position. Let's look again at our while code:

while (condition: ~>bool) (body: ~>void) => void {
    if condition { 
        body; 
        while condition body; 
    };
}

and think about how this gets compiled into CIL:

temp1 = condition
temp2 = new Closure({ body; while condition body; });
call if(temp1,temp2)

So while the recursive call is in tail position for the closure, the real eureka moment was realizing that the conditional was the thing that mattered conceptually. It was the function call in tail position for the loop. All of the internet articles reiterate how conditionals are special for determining tail call usage, and because I was so focused on the recursive aspect I did not stop to consider how doing things differently invalidated conventional wisdom.

Anyways, loops now work as expected without stack overflow and without needing to build them into the language. The code is in github marked as Milestone 2.

Thursday, March 5, 2015

Changes to return and added debugging support

Two updates to talk about today.

The first is a change to how return works in the language. It now does not work. This is a side effect of making blocks into implicit closures. By doing that, it lets you pass them around (cool), but it makes it really awkward to return from them. I had intended to have return simply set a return value and exit, allowing the calling function to use the return value if it was expecting one. But even there, it becomes confusing if the return value is for the function or for the closure. In the end, I decided it wasn't really worth the confusion (for now).

So instead, Tangent will now do a little type inference with blocks (and functions). Similar to CIL, the return value for a function (or block/closure) will be the last statement in the function. So it works something like this:

f(x: ~> int) => void { }
g => string { 
    must-return-void;
    must-return-void;
    f {
       must-return-void;
       // ...
       must-return-int;
    };
    must-return-void;
    must-return-string;
}

So the normal Tangent order of operation inference works, but the target of the algorithm differs. For normal blocks, the last statement will need to return void, just like any other statement (or any other block). Otherwise, it provides a mechanism for basic anonymous functions.

The second change was the addition of debugging info to the compiler. If you start debugging versus the exe the compiler generates, you'll now be able to step through code, set breakpoints, inspect variables, etc. While it was a lot of work to wire the line/column info from source to compiler, there's not much to actually making the debugging work. One extra ILGenerator call to mark the reductions with their location in code. Visual Studio does the rest. I had done this with earlier revisions of the language, but it is a nice reminder of how powerful well made software can be.

Friday, February 20, 2015

Milestone 1

Finally, some approximation of success. Milestone 1 has been marked on github and is free for to download. The language has enough functionality to meet my initial goals, which is to allow in-language conditionals and in-language loops. It does the order of operations inference and actually runs. Unfortunately there is one small issue and one fairly significant issue. The small issue is that you can't return out of blocks currently. Since blocks are implemented as closures, I can't jump to the end of the function since I'm no longer in the function when the return is hit.

The bigger problem is that because blocks are implemented as closures, using a block in a while loop will (eventually) cause a stack overflow. There are a few ways around that, the most natural to this design being implementation of Tail Call Optimization. Except tail call optimization can only be done when I know the closure call is in tail position. And since conditionals are done in-language, the compiler can't tell. I asked a question on stack exchange about it and... did not get help really. The academics don't care about the implementation and the pragmatists don't care about the theory.

Oh, and the error reporting is horrific and there are no debugging symbols yet. Sorry.

Anyways, here is the test program for milestone 1 - basic booleans, if statements and while loops - all defined in-language with only enum declaration, function declaration, void/unit-type, specialization and closures built into the language itself:

bool :> enum {
    true,
    false
}

if (condition: bool) (positive: ~>void) => void {
}

if (condition: bool) (positive: ~>void) else (negative: ~>void) => void {
    negative;
}

if (condition: bool.true) (positive: ~>void) => void {
    positive;
}

if (condition: bool.true) (positive: ~>void) else (negative: ~>void) => void {
    positive;
}

(lhs: bool) equals (rhs: bool) => bool {
    return false;
}

(lhs: bool.true) equals (rhs: bool.true) => bool {
    return true;
}

(lhs: bool.false) equals (rhs: bool.false) => bool {
 return true;
}

while (condition: ~>bool) (body: ~>void) => void {
    if condition { body; while condition body; };
}

entrypoint => void {
    while true equals true print ".";
}

Which in turn compiles to CIL (yes, CIL lets you define functions with spaces in the name):

I'm not sure what the next steps will be quite yet. I expect it will be implementing ints with proper order of operations in-language. Depending on the approach, I may first move this stuff into built-ins to help improve performance and structure of the built code. I also might get distracted by the lack of general equality and push generics in to support that, or some fancy ill-conceived idea like usual.

But keeping things simple has worked fairly well so far (and cs.stackexchange less so). A good lesson to remember.

Saturday, January 31, 2015

Progress - Compilation.

Sorry for the long layoff in posts, half dozen of you who wander here lost.

Work continues in fits and starts. I had gotten the language compiling using Expression Trees, which quickly ran into the issue that Expression Trees can't work with MethodBuilders, since they're not really MethodInfos. So that sucked. Not quite sure how previous iterations ever worked...

So things are now using Reflection.Emit to generate the actual IL, which seems like a particularly bad idea. It means I need to learn new things to make this already challenging adventure more challenging, and it means that things are likely to be even slower once running. But whatever. I don't need it to be fast, and I don't particularly want to build strings of C# and then recompile that.

Speaking of bad ideas, I've been working on getting conditionals working ("whoa, conditionals" you reply in your mock impressed voice). Yes, and sadly, I've gotten it into my head to not make conditionals built in statements, but something you can define in the language itself. And Tangent is now to the point where you can do that (I think, there's no way to easily execute the code yet):

bool :> enum {
    true,
    false
}

if (condition: bool) (positive: ~>void) => void {
}

if (condition: bool) (positive: ~>void) else (negative: ~>void) => void {
    negative;
}

if (condition: bool.true) (positive: ~>void) => void {
    positive;
}

if (condition: bool.true) (positive: ~>void) else (negative: ~>void) 
 => void {
    positive;

So there are a few parts added since the last post to make this work. The first is the lazy operator ~>. It is a type constructor that takes a type and creates a "lazy" form of that type. In C# terms, it converts ~>void to Action, or ~>bool to Func<bool>. In Tangent terms, it allows you to pass in a fully bound function to the parameter without actually executing (reducing) it. And since everything in Tangent is a function, it allows you to pass things around unevaluated. Tangent does some trickery where a block is implicitly of type ~>void so they too can be passed around.

The other trickery is the bool.true sort of stuff. I'm not thrilled by the syntax, but the concept is likely to carry forward. This is the only current specialization/subtyping relation in the language. This sort of overloading is handled via dynamic dispatch by the compiler. Ideally as other subtyping relations show up, similar sort of specialization should be allowed. But that is for later.

For now, the dynamic dispatch and lazy evaluation is there in combination to allow you the user to create your own syntax for things as elementary as if statements.

Now to get enough working to actually run it and see some results.