Sunday, August 23, 2015

Runtime Partial Specialization Dispatch

Sorry about the delay. After last post, I decided to implement runtime partial specialization to keep the behavior of the language consistent - if you have a method override that is a better "match" for your arguments, then Tangent will use it.

It turns out that was not quite as straightforward as I would've liked. The issues were mostly in how I was working with product types. In Tangent, a type tends to be anonymous and self-contained. Type Declarations on the other hand supply names and generic parameters for types. This caused an issue with compilation since generic product type constructors didn't really know where their generic bits came from. So by the time they made it down to the code-gen, I couldn't construct the appropriate .NET type.

Of course, it took a while to actually figure out that was the issue - after it took a while to write up the partial specialization dispatch code into the code-gen. Anyways - this example code now works:

foo(T) :> f(x: T){}

do(x: (T)) => void {
  print "in generic";
}

do(x: foo(T)) => void {
  print "in foo generic";
}

do(x: foo int) => void {
  print "in foo int";
}

bar(x: (T))=> void{
  do x;
}

entrypoint => void {
  bar "bar";
  bar f "bar";
  bar f 42;
}

Producing the properly dispatched output:

in generic
in foo generic
in foo int

I'm not sure if it's going to stay in the implementation or not. The dispatch for the most generic function is about 60 CIL operations, and doing type inference for the partial specialization requires reflection trickery. In short - it is slow, and slow in a way that I doubt could be improved too much.

I'll revisit it once I have users to complain about it.

Saturday, July 25, 2015

Runtime Generic Dispatch

As promised, doing some of the basic compile time generic specialization I showed last time during runtime was next on the list. It is now done. The following test code will now properly dispatch at runtime based on the type of the parameters passed into bar:

foo (x: (T)) => void {
  print "in inference";
}

foo (x: int) => void {
  print "in int.";
}

bar (x: (T)) => void {
  foo x;
}

entrypoint => void {
  bar "bar";
  bar 42;
}

This though had a bit of a sticking point. Visual Studio compiles the basic type check as something like this:

  IL_0001:  ldtoken    [mscorlib]System.Int32
  IL_0006:  call       class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
  IL_000b:  ldarga.s   'value'
  IL_000d:  constrained. !!fooT
  IL_0013:  callvirt   instance class [mscorlib]System.Type [mscorlib]System.Object::GetType()
  IL_0018:  call       bool [mscorlib]System.Type::op_Equality(class [mscorlib]System.Type, class [mscorlib]System.Type)

Note this weird constrained. prefix. It is some magic that lets CIL call generic parameters without boxing them. The problem comes with the ldarga.s instruction above it. Instead of pulling out the parameter value like everything else under the sun, this is getting the address of that parameter. If you don't get the address and make that call, you get some nondescript ExecutionEngineException during runtime. Awesome.

The problem for me was that at that point of the compilation process, I don't know the parameter's index since it was abstracted a few layers above. So for now, Tangent just boxes the thing and goes about its business.

I'm not sure what is next on the docket. Perhaps improving this to allow partial specialization, perhaps local variables, perhaps .NET imports. We'll see what strikes my fancy.

Wednesday, July 22, 2015

Generic Inference

Last post, I talked about Generic Types - or parameterized types or type constructors if you prefer. Things like List<T> where the type needs another type to be usable. Today, we're going to take a quick look at the companion to these, Inferred Types.

Generic Types are used during a type declaration to give you, well, a generic type. Inferred Types are used during function declaration to let you the programmer reference the type of a parameter (or part of a parameter) in the function without having to explicitly pass in the type when calling the function. Consider this basic C# function:

public void Echo<T>(T value) {
  Console.WriteLine(value);
}

When you call Echo, you don't need to specify T - it is inferred from the type of value you pass in. That's what we're talking about today for Tangent. They are slightly different from generic type parameters, so took a little work, despite having to implement a basic version of them to get the normal generic types working.

Anyways, on to the test code for the day:

foo (x: (T)) => void {
  print "in inference";
}

foo (x: int) => void {
  print "in int.";
}

entrypoint => void {
  foo "bar";
  foo 42;
}

Again, nothing fancy. All I want to see is that the compiler parses the inference properly, figures out the right overload, and generates a functional exe. As you can see/remember, the parameter declaration syntax in Tangent is (parameter-name: type-expression). This syntax is also used for generic parameters in a type declaration as I explained in the last post. It will also be used for inferred generics. 

To declare an inferred generic type, all you need to do is add the parameter declaration syntax to the parameter's type expression. So above, (T) says "please infer the type of x, and assign it to the variable T". Eventually, when the language supports constraints, it would look something like (T: constraint). This also works for inferring the parameters of generic functions - just like in C#. If you wanted to infer the T out of a List, it would look something like foo (x: List<(T)>) => void...

Now for the caveats. The first is that constraints don't actually work. The parser will grab them and resolve them to a type, but the type checker doesn't care and the code generator really doesn't care. The second is that specialization only happens at compile time. The above example correctly dispatches to the generic version and the int version respectively, but only because they are statically known. Fixing that is probably the next work. And lastly, there's a little bit of a subtle issue with how things work.

Consider this C# function:

public void Add<T>(List<T> collection, T value) { ... }

If you pass in a list of objects and an int, the compiler is smart enough to find the most common base type to use. At time of writing, Tangent doesn't have intersection types, or any similar mechanism to find that. So specifying the same generic inference type twice in the same function... will do something undefinedly bad. What you can (should be able to?) do is something like this:

add (value: T) to (collection: List<(T)>) => void { ... }
  
Here, value has no influence on the inference. The type of the List is what matters, and the type checker will verify that the value's type is (at least) that. That should be a good enough work around for the majority of cases.

Regardless, this basic functionality now works and is available in github. Next will likely be getting runtime dispatch of generic inferred functions to specific ones to work. After that I think that I'll move towards more practical realms so I can do vaguely useful things in the language. Local variable declarations and .NET interop being tops of the list.

Thursday, July 9, 2015

Generic Types

Well, that kinda sucked.

Even though implementing Sum Types was a bit painful, it was really rewarding to get things working. From that success, I moved on to implementing basic generic types. It involved a bit of work, but was pretty straight-forward once I figured out what I wanted. Previously, type declarations were pretty simple:

(identifier)+ :> (type-definition)

Product types added a new type definition, and sum types let them be composed, but the general syntax stayed the same. The design I ended up on looks very similar to the function syntax:

(identifier|parameter-decl)+ :> (type-definition)

Instead of being a simple name, a type can also have parameters. That is what a generic is after all, right? The type-definition would then have access to the parameter and could make references to it. The parameter's type would be a kind - essentially a type of a type, or in C# parlance, the constraint of the generic parameter.* The awesome thing then is that the right side is effectively the constructor for this type, so any generic references could be used to allow type inference in constructors. Haven't you always wanted that? Yes, it seems many of you did.

Presumably many of you have been humans for more than a few years, so you know that this is the point in the story where it comes crashing down. First, I ran into the issue of how to parse the generic parameters and then make them available to the rest of the declaration when the original code didn't know or care about such things. Scope is always the thing that takes more work then expected. Also note that I started off working on generic types, not any sort of generic inference in constructors. Once my parser said it couldn't figure out how to deal with things, I realized that the design required me to do at least some basic inference. Okay, whatever. Inference is tedious but I'd done it before and it's pretty straight-forward once you view it as a pattern matching exercise.

But then I ran into the big problem. Here is my test code for the feature:

Nothing :> enum { null }
(T)? :> T | Nothing;
do(x: int?) => void { 
  print "fn."; 


entrypoint => void { 

  do 42; 
  do null; 
}

A stupid little program that verifies that C#-like nullable types can be done. I don't particularly care if the value is real or null, and I don't really care what the output is. All I want is int? to parse properly, and do to take an int and null and execute.

The big problem here is a little subtle - remember that the right side of a type declaration is the constructor of the type, and I use type inference to figure out what type to build from it. When the constructor sees null, what is T?

Since the parser works akin to a normal shift-reduce parser, I don't have any sort of context. null doesn't know that I'm trying to pass it into an int?, all it knows is that it is a Nothing and needs to reduce if possible. There is no way for it to know/guess T. So that sucks. What really sucks is that it took me all that time and effort to realize that T (anything) or Nothing (something) is just anything again.

About 2 weeks of brooding and thinking of things that didn't work led me to a good enough compromise. I already have dynamic dispatch on functions (see ADTs), so I can flip it around. Instead of having Nothing build T? and call do, I would generate do(T) and do(Nothing). Then when I have a clear Nothing, it can call the function directly without any inference. That though leads to Tangent's first "Really Weird Thing It Won't Allow For Arcane Reasons": you cannot declare a function that takes a generic Sum Type and uses a generic parameter of that type in its return type and that generic parameter isn't used in all of the alternatives of the sum type. The compiler will give you an error about the offending function and generic parameter, and fail. This seems like a tiny limitation, and lets stuff like the example above work.

An annoying, twisty, frustrating road - but now functional.


* - yes, I am aware that kinds in other languages do not refer to this sort of thing.

Sunday, May 31, 2015

Algebraic Data Types

So about a month ago, I posted about the addition of product types to Tangent. These allow you to combine data together into things that look like classes. Well today is the other side of that coin, sum types. These let your data be either one type or another. Together, product types and sum types provide what is known as Algebraic Data Types, which functional languages have long touted for their expressiveness. Types can then be defined like "Foo is a string and a datetime or a string and a timespan".

The example test code for today:

int list :> int | (a: int),(b: int list) {
  (this).head => int { a }
  (this).tail => int list { b }
  print (this) => void {
    print this.head;
    print this.tail;
  }
}

print (il: int list) => void {}

entrypoint => void {
  print 1,2,3;
}


This defines a nice simple (non-empty) list of integers and functions to print the list. The pipe ('|') is used as the type constructor for sum types in type declarations. So the declaration above is read as "int list is either an int or an int and an int list". The recursive type then cascades out indefinitely. You'll find this sort of thing all over Lisp and other functional languages. Here, I've defined the comma as the list constructor. And because of how Tangent works, list constructs are directly usable in code (as in the entrypoint). No boatloads of parens necessary like in Lisp. No need to extend the syntax like C# and other languages with special list initialization syntax. You can do it right in-language with Tangent.

Alas, there's still some cruft around things. First, is the need for the empty print function for int lists. Right now there's no real code for working with sum types. The only thing that is there is specialization logic like there is for single value types. So the "base" function is needed for the code to compile against, but the compiler is smart enough to dispatch calls to more specific versions of the function (int and the anonymous product type in this case). Not great.

Also not great is the passthroughs for head and tail on the anonymous product type. I'd like to have some syntax that would automatically re-expose constructor parameters, but I'm not sure what it is, or what it would look like.

And lastly, this is a pretty specific implementation. Lists should be generic. I have a good idea about what the syntax will look for that, and I expect that stuff to be the next large chunk of code worked on.

Why are these things here you ask? Sure, this sort of code seems pretty academic. After all, I don't expect people to use these lists instead of .NET's List. The main motivation for these things is so I don't have null in the language. The pains of null are well documented, and I'm curious how thing would work in a world where you need explicit nullable types. Beyond that, I hope that with sum types I don't need to implement exceptions. If your function can return something or an error, use a sum type that says that explicitly. That's very like Java checked exceptions, which suck hard, but I'm curious if the language flexibility can't lessen that pain in practice.

Anyways, these are now available in source control.

Tidbits
One bug I ran into was the lack of type checking in CIL. It happily sent the wrong parameter into a function and the function happily ran it, leading to badness. Similarly, if you don't unbox value types, they too will happily run the function on the wrong bits of data. Something to remember in the future since it was troublesome to track down. I also imagine that I could use such behavior in hacky terrible ways should they be necessary.

Also, sum types were a pain to implement. Product types are fairly simple, since the "does the type of that match what we need" logic can work the same as it did with simple types. Sum types required fairly distributed changes to a lot of different code since they add not only a selection step to type matching, but a layer of hierarchy to the type structure (though that was maybe since I insisted on anonymous types like above). It was interesting to see, and may be why so few languages implement them.

Saturday, May 9, 2015

Order of Operations

So I've finally gotten this iteration of Tangent to the point where I can start to see if it actually does what I think it should do. As you might expect, it does not.

I've gotten basic classes working enough that I can start making actual arithmetic with type shenanigans to coerce things to take the right order of operations. Here is the test code for the day:

term :> term (x: int){
  (this) => int { x; }
}

factor :> factor (x: int){
  (this) => term { term x; }
}

(lhs: int) + (rhs: term) => int { asm add lhs rhs; }
(lhs: term) * (rhs: factor) => term { term asm mul lhs rhs; }
(x: int) => factor { factor x; }

entrypoint => void {
  print 2 + 3 * 4;
}


Now for a slight aside. The first time I ran this, I got 86. How did I get 86? Well, read on! The answer is at the bottom.

Anyways, this now produces 20, due to a subtlety in parsers that I didn't really fully grok until today when it bit me full on the ass. When you look at a formal grammar for arithmetic, there are always conversions to take function invocations, constants, paren expressions... bunches of stuff into a factor. What normal parsers do though is they determine the order of operations there, before ever knowing anything about the function invocation, constants, paren expressions, etc. They basically form the structure of the code ignoring any sort of conversions or trickery that needs to be done to turn that token into an int.

Since I'm doing the order of operations after knowing what trickery is necessary, the parser has forgotten if the int it sees is a constant, or the result of a function, or all of the limitations that formal grammars put on conversions to a factor. So Tangent happily converts 3 to a factor, then to a term to do the addition, and then circles the int result back around to a factor to multiply afterwards. To get this working, I'll need to break the cycle which will make normal use of integer parameters broken and/or weird.

So, 86. The root cause of the bug was in how the compiler generated its CIL. If you remember back to my post about loops, Tangent uses tail call optimization to keep function invocation workable in those sort of scenarios. The issue was that I always added the tail call to the last statement in a function. And I didn't change that when I added built-in opcodes like add and multiply (and constructor calls). So in the CIL, you saw arg1; arg2; tailcall; add;. .NET was happy to let that run, but I expect that the tailcall fubar'd my stack so that when the add actually ran, the arguments had been replaced by whatever random stuff was on the stack when that method started. Bad times.

Friday, May 8, 2015

Symbols in phrases

Quick update. One of the things that has been around in previous iterations of the language, which I've neglected to implement so far has been the ability to use symbols when defining a phrase.That lead to verbose operators in previous examples:

(x: int) plus (y: int) => int {...}
(a: bool) equals (b: bool) => bool {...}

With today's update, you can now use symbols (at the moment, pretty much any non-ascii, non-whitespace, non-open-curly/open-parens unicode character) in your phrase delcarations:

(x: int) + (y: int) => int {...}
(a: bool) = (b: bool) => bool {...}