Tangent Developer Journal: July 2015

Saturday, July 25, 2015

Runtime Generic Dispatch

As promised, doing some of the basic compile time generic specialization I showed last time during runtime was next on the list. It is now done. The following test code will now properly dispatch at runtime based on the type of the parameters passed into bar:

foo (x: (T)) => void {
print "in inference";
}

foo (x: int) => void {
print "in int.";
}

bar (x: (T)) => void {
foo x;
}

entrypoint => void {
bar "bar";
bar 42;
}

This though had a bit of a sticking point. Visual Studio compiles the basic type check as something like this:

IL_0001: ldtoken [mscorlib]System.Int32
IL_0006: call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
IL_000b: ldarga.s 'value'
IL_000d: constrained. !!fooT
IL_0013: callvirt instance class [mscorlib]System.Type [mscorlib]System.Object::GetType()
IL_0018: call bool [mscorlib]System.Type::op_Equality(class [mscorlib]System.Type, class [mscorlib]System.Type)

Note this weird constrained. prefix. It is some magic that lets CIL call generic parameters without boxing them. The problem comes with the ldarga.s instruction above it. Instead of pulling out the parameter value like everything else under the sun, this is getting the address of that parameter. If you don't get the address and make that call, you get some nondescript ExecutionEngineException during runtime. Awesome.

The problem for me was that at that point of the compilation process, I don't know the parameter's index since it was abstracted a few layers above. So for now, Tangent just boxes the thing and goes about its business.

I'm not sure what is next on the docket. Perhaps improving this to allow partial specialization, perhaps local variables, perhaps .NET imports. We'll see what strikes my fancy.

Wednesday, July 22, 2015

Generic Inference

Last post, I talked about Generic Types - or parameterized types or type constructors if you prefer. Things like List<T> where the type needs another type to be usable. Today, we're going to take a quick look at the companion to these, Inferred Types.

Generic Types are used during a type declaration to give you, well, a generic type. Inferred Types are used during function declaration to let you the programmer reference the type of a parameter (or part of a parameter) in the function without having to explicitly pass in the type when calling the function. Consider this basic C# function:

public void Echo<T>(T value) {
Console.WriteLine(value);
}

When you call Echo, you don't need to specify T - it is inferred from the type of value you pass in. That's what we're talking about today for Tangent. They are slightly different from generic type parameters, so took a little work, despite having to implement a basic version of them to get the normal generic types working.

Anyways, on to the test code for the day:

foo (x: (T)) => void {
print "in inference";
}

foo (x: int) => void {
print "in int.";
}

entrypoint => void {
foo "bar";
foo 42;
}

Again, nothing fancy. All I want to see is that the compiler parses the inference properly, figures out the right overload, and generates a functional exe. As you can see/remember, the parameter declaration syntax in Tangent is (parameter-name: type-expression). This syntax is also used for generic parameters in a type declaration as I explained in the last post. It will also be used for inferred generics.

To declare an inferred generic type, all you need to do is add the parameter declaration syntax to the parameter's type expression. So above, (T) says "please infer the type of x, and assign it to the variable T". Eventually, when the language supports constraints, it would look something like (T: constraint). This also works for inferring the parameters of generic functions - just like in C#. If you wanted to infer the T out of a List, it would look something like foo (x: List<(T)>) => void...

Now for the caveats. The first is that constraints don't actually work. The parser will grab them and resolve them to a type, but the type checker doesn't care and the code generator really doesn't care. The second is that specialization only happens at compile time. The above example correctly dispatches to the generic version and the int version respectively, but only because they are statically known. Fixing that is probably the next work. And lastly, there's a little bit of a subtle issue with how things work.

Consider this C# function:

public void Add<T>(List<T> collection, T value) { ... }

If you pass in a list of objects and an int, the compiler is smart enough to find the most common base type to use. At time of writing, Tangent doesn't have intersection types, or any similar mechanism to find that. So specifying the same generic inference type twice in the same function... will do something undefinedly bad. What you can (should be able to?) do is something like this:

add (value: T) to (collection: List<(T)>) => void { ... }

Here, value has no influence on the inference. The type of the List is what matters, and the type checker will verify that the value's type is (at least) that. That should be a good enough work around for the majority of cases.

Regardless, this basic functionality now works and is available in github. Next will likely be getting runtime dispatch of generic inferred functions to specific ones to work. After that I think that I'll move towards more practical realms so I can do vaguely useful things in the language. Local variable declarations and .NET interop being tops of the list.

Thursday, July 9, 2015

Generic Types

Well, that kinda sucked.

Even though implementing Sum Types was a bit painful, it was really rewarding to get things working. From that success, I moved on to implementing basic generic types. It involved a bit of work, but was pretty straight-forward once I figured out what I wanted. Previously, type declarations were pretty simple:

(identifier)+ :> (type-definition)

Product types added a new type definition, and sum types let them be composed, but the general syntax stayed the same. The design I ended up on looks very similar to the function syntax:

(identifier|parameter-decl)+ :> (type-definition)

Instead of being a simple name, a type can also have parameters. That is what a generic is after all, right? The type-definition would then have access to the parameter and could make references to it. The parameter's type would be a kind - essentially a type of a type, or in C# parlance, the constraint of the generic parameter.* The awesome thing then is that the right side is effectively the constructor for this type, so any generic references could be used to allow type inference in constructors. Haven't you always wanted that? Yes, it seems many of you did.

Presumably many of you have been humans for more than a few years, so you know that this is the point in the story where it comes crashing down. First, I ran into the issue of how to parse the generic parameters and then make them available to the rest of the declaration when the original code didn't know or care about such things. Scope is always the thing that takes more work then expected. Also note that I started off working on generic types, not any sort of generic inference in constructors. Once my parser said it couldn't figure out how to deal with things, I realized that the design required me to do at least some basic inference. Okay, whatever. Inference is tedious but I'd done it before and it's pretty straight-forward once you view it as a pattern matching exercise.

But then I ran into the big problem. Here is my test code for the feature:

Nothing :> enum { null }
(T)? :> T | Nothing;
do(x: int?) => void {
print "fn.";
}

entrypoint => void {
do 42;
do null;
}

A stupid little program that verifies that C#-like nullable types can be done. I don't particularly care if the value is real or null, and I don't really care what the output is. All I want is int? to parse properly, and do to take an int and null and execute.

The big problem here is a little subtle - remember that the right side of a type declaration is the constructor of the type, and I use type inference to figure out what type to build from it. When the constructor sees null, what is T?

Since the parser works akin to a normal shift-reduce parser, I don't have any sort of context. null doesn't know that I'm trying to pass it into an int?, all it knows is that it is a Nothing and needs to reduce if possible. There is no way for it to know/guess T. So that sucks. What really sucks is that it took me all that time and effort to realize that T (anything) or Nothing (something) is just anything again.

About 2 weeks of brooding and thinking of things that didn't work led me to a good enough compromise. I already have dynamic dispatch on functions (see ADTs), so I can flip it around. Instead of having Nothing build T? and call do, I would generate do(T) and do(Nothing). Then when I have a clear Nothing, it can call the function directly without any inference. That though leads to Tangent's first "Really Weird Thing It Won't Allow For Arcane Reasons": you cannot declare a function that takes a generic Sum Type and uses a generic parameter of that type in its return type and that generic parameter isn't used in all of the alternatives of the sum type. The compiler will give you an error about the offending function and generic parameter, and fail. This seems like a tiny limitation, and lets stuff like the example above work.

An annoying, twisty, frustrating road - but now functional.

* - yes, I am aware that kinds in other languages do not refer to this sort of thing.