Wednesday, January 6, 2016

Delegates - Part 2

I talked a little bit last time about how Tangent's lambdas are a little weird compared to C# and other languages. Today, I'm going to go a little more in depth into that, as well as showing some other examples of use. The first is an example that uses the return value of the lambda:

apply (fn(int): int) twice to (x: int) => int {
  fn fn x;
}

entrypoint => void {
  print apply (x)=>{x*x} twice to 2;
}

Because of the order of operation inference, fn fn x gets parsed out to the right order, passing the result of one operation to the next.

Now for some weirder stuff. Because of how Tangent parses statements, a lambda can have different interpretations depending on what type it resolves to. So when the compiler matches a lambda to a function parameter it needs to make sure that the lambda works for those types. A simple example:

foo(fn(int):void) => void {
  fn 42;
}

foo(fn(string):void) => void {
  fn "foo";
}

entrypoint => void {
  foo (x) => { print x + 2; };

}

Since string + int makes no sense, the int -> void overload is chosen and this prints 44. Weirder yet, Tangent includes the return type in the work. If the lambda can't return the requested type, that interpretation is discarded. Consider a similar example:

foo(fn(int): int) => void {
  print "with int";
  print fn 42;
}

foo(fn(int): void) => void {
  print "with void";
  fn 42;
}

entrypoint => void {
  foo (x) => {x;};
  foo (x) => {print x;};

}

Here the different bodies return different types, leading to dispatching to the different overloads and avoiding ambiguity.

Nifty, and it all worked once I wrote up the tests. Next will be maybe cleaning up a little of the lambda syntax and testing more of the closure functionality.

Thursday, December 31, 2015

Delegates - Part 1

Turns out it was delegates. Having lists without higher order functions was going to annoy me sooner rather than later, and allowing mutable state - even local mutable state - was going to add complexity and bugs I didn't want to deal with.

But if you remember, we don't actually have a way to declare higher order function types in Tangent. And since functions look like phrases, it's not quite clear how to allow users to both specify higher order functions and anonymous functions that fit that style of calling convention. What I settled on is a little bit weird, so bear with me.

To say that a parameter takes a higher order function, you build a small phrase in the parameter declaration. For example:

twofer(fn(int): void) => void {
  fn 2;
}

or

sort (collection: list int) with (compare (int) to (int): int) => list int { ... }

Since you can't ever access the delegate parameter directly, just specifying the type is sufficient. fn above compiles to a .NET Action<int>, with the phrase style calling, and compare (int) to (int) to a Func<int, int, int>, again allowing the function to call it with the right phrase pattern.

To call these functions, you just use lambda syntax similar to C#:

twofer (x) => {print x;};

or

sort stuff with (a) (b) => { a.Compare(b); };

Right now, the parens and curly braces are required. I expect the parens to become not required and the curly requirement likely to stay. Annoying, but this is better than many of my alternatives.

Now, there's a little bit of a trick to this. In C#, it is very clear what types the lambda's parameters are. But in Tangent, we infer the order of operations - both using the lambda as an argument, and in the body of the lambda itself. As the types of the lambda parameters change, the actual operations taken in the body of the lambda can also change.

What happens is that the entire lambda is considered a node in the expression. When the pattern twofer (Action<int>) sees the tokens "twofer" <lambda> it checks to see if the argument matches the parameter. Does it have the same number of parameters? If x is an int and we need to return void, does the lambda parse? If not, then the match fails and the compiler moves along, trying different interpretations of the statement.

This will allow some mildly weird behavior, like the body of lambdas effecting which overload of a function is used. I'll look to have an example of that once I'm more sure it works properly. Likewise, I want to make sure that my closure implementation actually is correct. It was a little too easy.

Wednesday, December 2, 2015

Cleanup

So, at the core of Tangent is the code that takes all of the phrases the user has defined and figures out how to grok that statement. In previous iterations of the language, that code has been hairy, nasty, nested cruft that has made it difficult to add features and debug things when they go south.

In this iteration, things were better. I was explicit in defining the various things that the language could define, and good about keeping things simple. Unfortunately, this led to a bit of duplication in this core code because the simplest way to deal with these definitions was just to have different handlers that did almost the same thing.

Not great. So I spent some time making some better abstractions and cleaning up that core code. In the end, it works akin to a shift reduce parser - the phrases define what pattern should be matched and depending on what is being declared (a parameter, a type, a function, etc.) it creates a different expression type in the abstract syntax tree. The new design better shares the matching and use of phrases, while leaving the "what happens when this is matched" variable. That allows better testing too, which is nice.

That should let me more easily move into local variables or delegates depending on what I feel like doing.

Wednesday, October 28, 2015

Software Engineering FTW!

Sorry about the delay. After the success in finally getting partial specialization working, I figured that it would be a good time to do a little cleanup. You know, address some of the tech debt I incurred hacking away before moving on to another feature that would be more complicated than I anticipated.

First on the list as you might expect was a lack of unit tests. Unfortunately, the stuff I wanted to test (code generation) doesn't exactly lend itself to unit testing well. For simplicity, the compiler emits directly to files. For even basic cases, the setup is a bit onerous to reproduce a sane intermediate structure. And the super basic cases aren't the ones that have been causing problems - it's been when things start interacting.

So I decided to do the next best thing, automated integration tests. I took all of the little test programs I've written (and posted here) and put them into a csproj along with their expected output. A little bit of code to kick off the compiler and run the resultant temp file executable, and I had a nice little regression suite.

You can probably guess what happened next. Yeah, half the tests failed.

Some were super quick to fix (typos during test creation). Some were kinda quick to fix (functionality I had removed). A few were slow to fix (wait, that should work...). And one took about a month and a half of intermittent spare time to fix. It was the basic Algebraic Data Type code:

int list :> int | (a: int),(b: int list) {
  (this).head => int { a }
  (this).tail => int list { b }
  print (this) => void {
    print this.head;
print this.tail;
  }
}

print (il: int list) => void {}

entrypoint => void {
  print 1,2,3;
}

It first blew up because of ambiguity errors between print(int list) and print(_anonymousProductType1). That ended up just being some overzealous refactoring on my part during the partial specialization work. I had removed the fanning out of implicit conversions for sum types, so the compiler didn't see the two functions as a specialization.

The second problem was more troublesome. Because of partial specialization fixes, the relationship between int list and _anonymousProductType1 was tighter. Under the covers, int list inherits from Variant<int, _anonymousProductType1>, and _anonymousProductType1 has a constructor that takes an int and an int list. That cyclic reference was biting me in the ass. The compiler was written to take an intermediate type representation and compile up a .NET type for it. I couldn't quite get the ordering right to build most of one type and then the other so that reflection didn't choke on stuff that was half-built.

In the end, I ended up having to take a bit of the nuclear option and change how the code generation worked. Now the compiler will populate all of the TypeBuilders and pretty much always work with half-built types until function generation at which point it uses AppDomain events to build the types in the right order. (taken from this SO answer) I think that this may cause me problems when types get more member methods, and it has made stuff a bit messy.

But at least now all my tests pass and things are stable again. Next on the docket is likely local variables.

Sunday, August 23, 2015

Runtime Partial Specialization Dispatch

Sorry about the delay. After last post, I decided to implement runtime partial specialization to keep the behavior of the language consistent - if you have a method override that is a better "match" for your arguments, then Tangent will use it.

It turns out that was not quite as straightforward as I would've liked. The issues were mostly in how I was working with product types. In Tangent, a type tends to be anonymous and self-contained. Type Declarations on the other hand supply names and generic parameters for types. This caused an issue with compilation since generic product type constructors didn't really know where their generic bits came from. So by the time they made it down to the code-gen, I couldn't construct the appropriate .NET type.

Of course, it took a while to actually figure out that was the issue - after it took a while to write up the partial specialization dispatch code into the code-gen. Anyways - this example code now works:

foo(T) :> f(x: T){}

do(x: (T)) => void {
  print "in generic";
}

do(x: foo(T)) => void {
  print "in foo generic";
}

do(x: foo int) => void {
  print "in foo int";
}

bar(x: (T))=> void{
  do x;
}

entrypoint => void {
  bar "bar";
  bar f "bar";
  bar f 42;
}

Producing the properly dispatched output:

in generic
in foo generic
in foo int

I'm not sure if it's going to stay in the implementation or not. The dispatch for the most generic function is about 60 CIL operations, and doing type inference for the partial specialization requires reflection trickery. In short - it is slow, and slow in a way that I doubt could be improved too much.

I'll revisit it once I have users to complain about it.

Saturday, July 25, 2015

Runtime Generic Dispatch

As promised, doing some of the basic compile time generic specialization I showed last time during runtime was next on the list. It is now done. The following test code will now properly dispatch at runtime based on the type of the parameters passed into bar:

foo (x: (T)) => void {
  print "in inference";
}

foo (x: int) => void {
  print "in int.";
}

bar (x: (T)) => void {
  foo x;
}

entrypoint => void {
  bar "bar";
  bar 42;
}

This though had a bit of a sticking point. Visual Studio compiles the basic type check as something like this:

  IL_0001:  ldtoken    [mscorlib]System.Int32
  IL_0006:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
  IL_000b:  ldarga.s   'value'
  IL_000d:  constrained. !!fooT
  IL_0013:  callvirt   instance class [mscorlib]System.Type [mscorlib]System.Object::GetType()
  IL_0018:  call       bool [mscorlib]System.Type::op_Equality(class [mscorlib]System.Type, class [mscorlib]System.Type)

Note this weird constrained. prefix. It is some magic that lets CIL call generic parameters without boxing them. The problem comes with the ldarga.s instruction above it. Instead of pulling out the parameter value like everything else under the sun, this is getting the address of that parameter. If you don't get the address and make that call, you get some nondescript ExecutionEngineException during runtime. Awesome.

The problem for me was that at that point of the compilation process, I don't know the parameter's index since it was abstracted a few layers above. So for now, Tangent just boxes the thing and goes about its business.

I'm not sure what is next on the docket. Perhaps improving this to allow partial specialization, perhaps local variables, perhaps .NET imports. We'll see what strikes my fancy.

Wednesday, July 22, 2015

Generic Inference

Last post, I talked about Generic Types - or parameterized types or type constructors if you prefer. Things like List<T> where the type needs another type to be usable. Today, we're going to take a quick look at the companion to these, Inferred Types.

Generic Types are used during a type declaration to give you, well, a generic type. Inferred Types are used during function declaration to let you the programmer reference the type of a parameter (or part of a parameter) in the function without having to explicitly pass in the type when calling the function. Consider this basic C# function:

public void Echo<T>(T value) {
  Console.WriteLine(value);
}

When you call Echo, you don't need to specify T - it is inferred from the type of value you pass in. That's what we're talking about today for Tangent. They are slightly different from generic type parameters, so took a little work, despite having to implement a basic version of them to get the normal generic types working.

Anyways, on to the test code for the day:

foo (x: (T)) => void {
  print "in inference";
}

foo (x: int) => void {
  print "in int.";
}

entrypoint => void {
  foo "bar";
  foo 42;
}

Again, nothing fancy. All I want to see is that the compiler parses the inference properly, figures out the right overload, and generates a functional exe. As you can see/remember, the parameter declaration syntax in Tangent is (parameter-name: type-expression). This syntax is also used for generic parameters in a type declaration as I explained in the last post. It will also be used for inferred generics. 

To declare an inferred generic type, all you need to do is add the parameter declaration syntax to the parameter's type expression. So above, (T) says "please infer the type of x, and assign it to the variable T". Eventually, when the language supports constraints, it would look something like (T: constraint). This also works for inferring the parameters of generic functions - just like in C#. If you wanted to infer the T out of a List, it would look something like foo (x: List<(T)>) => void...

Now for the caveats. The first is that constraints don't actually work. The parser will grab them and resolve them to a type, but the type checker doesn't care and the code generator really doesn't care. The second is that specialization only happens at compile time. The above example correctly dispatches to the generic version and the int version respectively, but only because they are statically known. Fixing that is probably the next work. And lastly, there's a little bit of a subtle issue with how things work.

Consider this C# function:

public void Add<T>(List<T> collection, T value) { ... }

If you pass in a list of objects and an int, the compiler is smart enough to find the most common base type to use. At time of writing, Tangent doesn't have intersection types, or any similar mechanism to find that. So specifying the same generic inference type twice in the same function... will do something undefinedly bad. What you can (should be able to?) do is something like this:

add (value: T) to (collection: List<(T)>) => void { ... }
  
Here, value has no influence on the inference. The type of the List is what matters, and the type checker will verify that the value's type is (at least) that. That should be a good enough work around for the majority of cases.

Regardless, this basic functionality now works and is available in github. Next will likely be getting runtime dispatch of generic inferred functions to specific ones to work. After that I think that I'll move towards more practical realms so I can do vaguely useful things in the language. Local variable declarations and .NET interop being tops of the list.