Saturday, May 21, 2016

Interfaces - Part 4 - Compilation

(Part 1Part 2, Part 3)

The last part of implementing basic interfaces is the back end compiler. This part of the code takes the intermediate representation we made in Part 3 and turns it into CIL that actually runs.

There's two parts to this, creating the actual .NET type that represents our interface, and making the functions that properly dispatch to the right implementation. What is interesting about implementing interfaces like type-classes is that the two parts work like things the language already supports, but the two parts resemble different things.

The .NET type compilation used the sum-type compilation, similar to the intermediate code from part 3. All I needed was a few tweaks to allow the "what are all of the types this could be?" part of the code to get that from the sum-type directly or from the interface mapping. Oh, and allowing sum-types to have only one implementation in the runtime (since interfaces may have only one implementation, even if sum-types can't).

The function compilation on the other hand used the dispatch logic for generics. If you remember, interface functions infer their correct type based on the location of the (this) parameter. But since the type coming in is guaranteed to be a sum-type, the generic dispatch needs to check the type of the stored variant, not the sum-type itself.

Wrap that all together, and my basic test code now compiles and runs properly:

convertible to string :> interface {
  (this) to string => string;
}

Cow :> cow :< convertible to string {
  (this) to string => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x to string;
}

entrypoint => void {
  cow speak;
}

Hopefully this is straightforward - the code first declares an interface, which requires a single function. It then declares a class Cow with nullary constructor cow and implements the interface. And then it declares a function which works with that interface to print something to the screen so we can tell if it works. Finally, the entry point ties it all together by invoking that function with a concrete instance of Cow. Super dumb test code.

"So what?" you say, "Any plain interface implementation can do that."

That is entirely true. So let's covert this over to something that .NET style interfaces can't do. All the interface needs is some way to convert the object to a string. So why not just declare that?

convertible to string :> interface {
  (this) => string;
}

Cow :> cow :< convertible to string {
  (this) => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x;
}

entrypoint => void {
  cow speak;

}

Yeah, that works too. Here our convertible to string interface requires its implementors to supply an implicit conversion to string. Since print requires a string as its input, the language is smart enough to infer that you want the conversion to happen there.

Next on the list is non-inline interface binding, fancier examples, and .NET interop. I'm not sure which will come first - we'll see where whimsy takes me.

Friday, May 6, 2016

Interfaces - Part 3 - Intermediate Steps

(Part 1, Part 2)

The second step to getting interfaces working is to actually do something with the stuff we parsed in Part 2. The grammar just spots interface declarations from the tokens it sees as input. The intermediate code is what takes those declarations and resolves them into real types. It's also where the function implementations are turned into a syntax tree, which is extra challenging since Tangent has no fixed order of operations.

The first order of business was to have a class to keep track of interface declarations. Since under the covers, all type declarations (enums, sum types, product types, type aliases, even built in types) are a phrase and a Tangent Type, I went with a new Tangent Type to represent the interface declarations themselves. The object has references to the functions required to satisfy it and... well, that's it. For the moment, that's all interfaces are.

If you remember from part 2, those functions are actually generic, and (this) needs to be replaced by a generic inference. That was the next bit of work to be done. Easy enough, a little bit of code between the grammar and intermediate objects to do a replacement.

Now I have interfaces, and the functions they need. To allow the code to know what types should implement what interfaces, I needed to add something to the type resolver to make those relations. Since interfaces are just a different sort of type declaration, I already had code to resolve type declarations, making that relatively straight forward. Now I have all of my interfaces in my list of type declarations, and all of the relations between them and concrete types.

But how to let the order of operation inference code know about the relation? It doesn't know about subtyping, and based on my previous iterations, I know that really complicates that code if it shows up. And these interfaces aren't really subtyping anyways - they work like type classes. Type classes in turn work like sum types: you have some symbol that means "this could be A or B or C or...". So interfaces got implemented similarly to sum types in this regard. The compiler adds an implicit conversion function that goes from the implementation to the interface. The order of operation inference code then treats that like any other function, allowing me to do no changes there. Awesome.

After that thing runs, I get a syntax tree out with functions that call the interfaces' functions if appropriate to go with the type declaration information. The compiler is ready to pass that to the back end which will turn it into executable CIL. More on that next time.

Sunday, April 10, 2016

Interfaces - Part 2 - Grammar

The first step to getting interfaces working is to tweak the parser to actually recognize the damned things. Well, that's not quite right. The first step is that I need to figure out what I want, and what parts make that up. Let's look at what I want from Part 1. I want them to do:

  • I want interfaces to be some contract that types can supply. "I don't care what type is passed in here, as long as it can be enumerated." 
  • I need interfaces to be able to be generic - I am not writing IntEnumerable, StringEnumerable, BoolEnumerable, and so on. 
  • I want interfaces to be multiply-implemented. Just because something is enumerable should not prevent it from being comparable.
  • I would like interfaces to be open. That is, if you wrote AwesomeParser and I have IParser, I should be able to adapt your implementation to my interface without changing your code.
  • It'd be nice if interfaces could support operators better. .NET interfaces suck at things like T+T => T.
Some of you out there will notice that these requirements are eerily close to what Haskell type classes do. That is no mistake. Haskell's type system is widely regarded as one of the most awesome made by man, and type classes in particular are one of the unique parts that set it apart. Type classes though work a lot more like .NET generic constraints than .NET interfaces. So how to design the language to support these things in a way that is easy/familiar, without breaking the formalisms necessary to keep everything from falling apart?

For the grammar, there are three main things I need to change to support interfaces:


  • I need to be able to declare an interface - "Here is a contract, and here is how you access it."
  • I need to be able to say that a type implements that interface - "Type X implements that contract"
  • I need to be able to use that interface in functions - "This function needs something that is enumerable as its first argument"

To declare an interface in Tangent, you use the standard type declaration syntax similarly to .NET interfaces. Say we wanted an interface to allow classes to provide a human readable view:

    human readable :> interface {
        human readable (this) => string;
    }

This defines an interface called "human readable" that has a single required function, "human readable" followed by a variable of the implementing type, which returns a string. In the grammar, this means defining new rules for the interface declaration and adding them to the type declaration rule.

To declare a class that uses the interface, it is included in the declaration, akin to .NET style inheritance:

    point :> (x: int), (y: int) :< human readable {
       human readable (this) => string {
           x + ", " + y;
       }
   
    } 

The only difference is that you use that funky duck face :< rather than a plain colon. This is done to better visually distinguish the interface implementations from the parameters, and when non-inline interface binding comes into play (we'll get to that in later parts). Again, this is declaring a type named "point", whose constructor is two ints with a comma in-between, and implements the human readable interface. It then goes in and implements the required function.

In the grammar, this means adding the inline interface binding part as optional. The challenge here is that the sum-type extension (x|y|z) is in the same area and also optional. That broke some tests and will likely be a complexity in the parser.

And to use the interface in a function, you can just specify it as the type:

    display (value: human readable) => void {
        print human readable value;
    }

Which works as you would expect from .NET land. You get a function that takes anything that satisfies the interface, and you can use any function guaranteed to be there. In the grammar... that isn't any change at all. But remember when I said that type classes are more like generic constraints than real interfaces? Yeah, here is where that comes into play. When that gets compiled, it will look more like this C# declaration:

    void display<T>(T value) where T: IHumanReadable { ... }

and Tangent does the magic to include the generic parameter that you don't really care about. But what if you do care about it? That's where the grammar change comes into play. If you care about the actual type being passed in, then you can declare parameters like this:

    display (value: T: human readable) => void { ... }

which will bind the actual concrete type to the phrase T so you can work with it. That isn't important here, but does come into play in other scenarios where the type-classiness of Tangent interfaces is important. Consider this interface to do a C-style compare (0 is equals, >0 is greater than, <0 is less than):

    comparable :> interface {
        compare (this) to (this) => int;
    }

Great, now we have a nice interface that says you can take two things and compare them. Let's use this in a trivial function:

    smaller (a: comparable) or (b: comparable) => comparable {
        if compare a to b < 0
          then a
          else b;
    }

That does not work.

What the interface says is that the compare function takes two arguments of the same type. The smallest function takes two comparables, but they might not be the same type. So that's where the three part parameter declaration comes into play:

    smaller (a: T: comparable) or (b: T) => T {       
        if compare a to b < 0

          then a
          else b;   
    }

In this function, we're specifically saying that the second argument (and the return value) must be the same type as the first argument. Since that meets the interface's constraints (and because Tangent doesn't have subtyping), that works. 

So those are the three grammar changes going into the language for basic interface support. Now I just need to figure out how to use them now that the parser can recognize them. Part 3 and more to come!
    
     
   


Monday, March 28, 2016

Interfaces - Part 1

One of the motivations for making Tangent compile into CIL was that it would make it dead simple to interop with the huge pile of .NET code out there in the world. I mean, I would love to rewrite a bunch of library code, but I expect a lot of it would be worse than the stuff that people researched for years to make.

And I'm lazy.

So instead, I'm going to do a whole bunch of work implementing a well known feature in a weird way for a language that nobody will ever use. \o/

The problem is that Tangent does not have subtyping. And .NET code has... subtyping. I mean, I want IEnumerable to work - that doesn't seem to be too much to ask. Tangent though has some challenges with subtyping. For one, subtyping adds a lot of ambiguity to the order of operations inference engine. When an expression could be a few different types, you have more branches in the possible options to take. Another is dispatch. For .NET interfaces and single dispatch, there is a clear path to the implementation. Since Tangent allows multiple dispatch, it'd be weird to have interfaces behave differently.

So, what do I want interfaces to do?


  • I want interfaces to be some contract that types can supply. "I don't care what type is passed in here, as long as it can be enumerated." 
  • I need interfaces to be able to be generic - I am not writing IntEnumerable, StringEnumerable, BoolEnumerable, and so on. 
  • I want interfaces to be multiply-implemented. Just because something is enumerable should not prevent it from being comparable.
  • I would like interfaces to be open. That is, if you wrote AwesomeParser and I have IParser, I should be able to adapt your implementation to my interface without changing your code.
  • It'd be nice if interfaces could support operators better. .NET interfaces suck at things like T+T => T.
Those are the goals. Now to work out the kinks of how I plan on actually doing this stuff. I need to come up with an idea that actually works. It needs to integrate nicely with the existing language. And it needs to get compiled into CIL that is at least a little performant. More to come in Part 2.

Sunday, March 20, 2016

Parser Refactoring

Not a lot of sexy work in the past few months. I wanted to add features - local variables, interfaces, .NET interop, something - but just found myself dreading going into the parser code to extend it. So instead, I spent a little time refactoring that to suck less.

I should note that I'm talking about the actual parser code for type and function declarations, not Tangent's fancy inference engine for function bodies. The parser code used to be just a series of functions. They took tokens, they returned an optional type, popping tokens as needed on success, perhaps calling other parser functions. When the grammar was a dozen or so rules, that was fine. It was easy to step through, and it was easy to unit test.

That quickly got squirrelly as I added generics, function params, product types, and sum types. The grammar is still only about two dozen rules, but they interact a lot more. This means each parsing function is doing more, and that added complexity was harming unit testing and my ability to jump back into the code after some time doing real work.

So, I went back to an old stand-by the compositional parser. One of the first things I did when I learned C# was to create a parsing framework akin to boost::spirit for the first version of Tangent. Not using that though. It generates full parse trees, which are awkward to work with. I just tossed together something similar, but specific to the existing structure in the Tangent code.

What this allows me to do is to define the grammar more declaratively. That makes it easier to see what's going on. And since the declarations get real ugly real fast, it pushes me to separate things out into testable, reusable chunks. And since I was smart enough to have a small set of regression tests, I could make sure that the refactoring didn't break anything I expected to work before.



The stable checkpoint is available at https://github.com/Telastyn/Tangent/releases/tag/Milestone3.1

Wednesday, January 6, 2016

Delegates - Part 2

I talked a little bit last time about how Tangent's lambdas are a little weird compared to C# and other languages. Today, I'm going to go a little more in depth into that, as well as showing some other examples of use. The first is an example that uses the return value of the lambda:

apply (fn(int): int) twice to (x: int) => int {
  fn fn x;
}

entrypoint => void {
  print apply (x)=>{x*x} twice to 2;
}

Because of the order of operation inference, fn fn x gets parsed out to the right order, passing the result of one operation to the next.

Now for some weirder stuff. Because of how Tangent parses statements, a lambda can have different interpretations depending on what type it resolves to. So when the compiler matches a lambda to a function parameter it needs to make sure that the lambda works for those types. A simple example:

foo(fn(int):void) => void {
  fn 42;
}

foo(fn(string):void) => void {
  fn "foo";
}

entrypoint => void {
  foo (x) => { print x + 2; };

}

Since string + int makes no sense, the int -> void overload is chosen and this prints 44. Weirder yet, Tangent includes the return type in the work. If the lambda can't return the requested type, that interpretation is discarded. Consider a similar example:

foo(fn(int): int) => void {
  print "with int";
  print fn 42;
}

foo(fn(int): void) => void {
  print "with void";
  fn 42;
}

entrypoint => void {
  foo (x) => {x;};
  foo (x) => {print x;};

}

Here the different bodies return different types, leading to dispatching to the different overloads and avoiding ambiguity.

Nifty, and it all worked once I wrote up the tests. Next will be maybe cleaning up a little of the lambda syntax and testing more of the closure functionality.

Thursday, December 31, 2015

Delegates - Part 1

Turns out it was delegates. Having lists without higher order functions was going to annoy me sooner rather than later, and allowing mutable state - even local mutable state - was going to add complexity and bugs I didn't want to deal with.

But if you remember, we don't actually have a way to declare higher order function types in Tangent. And since functions look like phrases, it's not quite clear how to allow users to both specify higher order functions and anonymous functions that fit that style of calling convention. What I settled on is a little bit weird, so bear with me.

To say that a parameter takes a higher order function, you build a small phrase in the parameter declaration. For example:

twofer(fn(int): void) => void {
  fn 2;
}

or

sort (collection: list int) with (compare (int) to (int): int) => list int { ... }

Since you can't ever access the delegate parameter directly, just specifying the type is sufficient. fn above compiles to a .NET Action<int>, with the phrase style calling, and compare (int) to (int) to a Func<int, int, int>, again allowing the function to call it with the right phrase pattern.

To call these functions, you just use lambda syntax similar to C#:

twofer (x) => {print x;};

or

sort stuff with (a) (b) => { a.Compare(b); };

Right now, the parens and curly braces are required. I expect the parens to become not required and the curly requirement likely to stay. Annoying, but this is better than many of my alternatives.

Now, there's a little bit of a trick to this. In C#, it is very clear what types the lambda's parameters are. But in Tangent, we infer the order of operations - both using the lambda as an argument, and in the body of the lambda itself. As the types of the lambda parameters change, the actual operations taken in the body of the lambda can also change.

What happens is that the entire lambda is considered a node in the expression. When the pattern twofer (Action<int>) sees the tokens "twofer" <lambda> it checks to see if the argument matches the parameter. Does it have the same number of parameters? If x is an int and we need to return void, does the lambda parse? If not, then the match fails and the compiler moves along, trying different interpretations of the statement.

This will allow some mildly weird behavior, like the body of lambdas effecting which overload of a function is used. I'll look to have an example of that once I'm more sure it works properly. Likewise, I want to make sure that my closure implementation actually is correct. It was a little too easy.