Sunday, April 10, 2016

Interfaces - Part 2 - Grammar

The first step to getting interfaces working is to tweak the parser to actually recognize the damned things. Well, that's not quite right. The first step is that I need to figure out what I want, and what parts make that up. Let's look at what I want from Part 1. I want them to do:

  • I want interfaces to be some contract that types can supply. "I don't care what type is passed in here, as long as it can be enumerated." 
  • I need interfaces to be able to be generic - I am not writing IntEnumerable, StringEnumerable, BoolEnumerable, and so on. 
  • I want interfaces to be multiply-implemented. Just because something is enumerable should not prevent it from being comparable.
  • I would like interfaces to be open. That is, if you wrote AwesomeParser and I have IParser, I should be able to adapt your implementation to my interface without changing your code.
  • It'd be nice if interfaces could support operators better. .NET interfaces suck at things like T+T => T.
Some of you out there will notice that these requirements are eerily close to what Haskell type classes do. That is no mistake. Haskell's type system is widely regarded as one of the most awesome made by man, and type classes in particular are one of the unique parts that set it apart. Type classes though work a lot more like .NET generic constraints than .NET interfaces. So how to design the language to support these things in a way that is easy/familiar, without breaking the formalisms necessary to keep everything from falling apart?

For the grammar, there are three main things I need to change to support interfaces:


  • I need to be able to declare an interface - "Here is a contract, and here is how you access it."
  • I need to be able to say that a type implements that interface - "Type X implements that contract"
  • I need to be able to use that interface in functions - "This function needs something that is enumerable as its first argument"

To declare an interface in Tangent, you use the standard type declaration syntax similarly to .NET interfaces. Say we wanted an interface to allow classes to provide a human readable view:

    human readable :> interface {
        human readable (this) => string;
    }

This defines an interface called "human readable" that has a single required function, "human readable" followed by a variable of the implementing type, which returns a string. In the grammar, this means defining new rules for the interface declaration and adding them to the type declaration rule.

To declare a class that uses the interface, it is included in the declaration, akin to .NET style inheritance:

    point :> (x: int), (y: int) :< human readable {
       human readable (this) => string {
           x + ", " + y;
       }
   
    } 

The only difference is that you use that funky duck face :< rather than a plain colon. This is done to better visually distinguish the interface implementations from the parameters, and when non-inline interface binding comes into play (we'll get to that in later parts). Again, this is declaring a type named "point", whose constructor is two ints with a comma in-between, and implements the human readable interface. It then goes in and implements the required function.

In the grammar, this means adding the inline interface binding part as optional. The challenge here is that the sum-type extension (x|y|z) is in the same area and also optional. That broke some tests and will likely be a complexity in the parser.

And to use the interface in a function, you can just specify it as the type:

    display (value: human readable) => void {
        print human readable value;
    }

Which works as you would expect from .NET land. You get a function that takes anything that satisfies the interface, and you can use any function guaranteed to be there. In the grammar... that isn't any change at all. But remember when I said that type classes are more like generic constraints than real interfaces? Yeah, here is where that comes into play. When that gets compiled, it will look more like this C# declaration:

    void display<T>(T value) where T: IHumanReadable { ... }

and Tangent does the magic to include the generic parameter that you don't really care about. But what if you do care about it? That's where the grammar change comes into play. If you care about the actual type being passed in, then you can declare parameters like this:

    display (value: T: human readable) => void { ... }

which will bind the actual concrete type to the phrase T so you can work with it. That isn't important here, but does come into play in other scenarios where the type-classiness of Tangent interfaces is important. Consider this interface to do a C-style compare (0 is equals, >0 is greater than, <0 is less than):

    comparable :> interface {
        compare (this) to (this) => int;
    }

Great, now we have a nice interface that says you can take two things and compare them. Let's use this in a trivial function:

    smaller (a: comparable) or (b: comparable) => comparable {
        if compare a to b < 0
          then a
          else b;
    }

That does not work.

What the interface says is that the compare function takes two arguments of the same type. The smallest function takes two comparables, but they might not be the same type. So that's where the three part parameter declaration comes into play:

    smaller (a: T: comparable) or (b: T) => T {       
        if compare a to b < 0

          then a
          else b;   
    }

In this function, we're specifically saying that the second argument (and the return value) must be the same type as the first argument. Since that meets the interface's constraints (and because Tangent doesn't have subtyping), that works. 

So those are the three grammar changes going into the language for basic interface support. Now I just need to figure out how to use them now that the parser can recognize them. Part 3 and more to come!
    
     
   


Monday, March 28, 2016

Interfaces - Part 1

One of the motivations for making Tangent compile into CIL was that it would make it dead simple to interop with the huge pile of .NET code out there in the world. I mean, I would love to rewrite a bunch of library code, but I expect a lot of it would be worse than the stuff that people researched for years to make.

And I'm lazy.

So instead, I'm going to do a whole bunch of work implementing a well known feature in a weird way for a language that nobody will ever use. \o/

The problem is that Tangent does not have subtyping. And .NET code has... subtyping. I mean, I want IEnumerable to work - that doesn't seem to be too much to ask. Tangent though has some challenges with subtyping. For one, subtyping adds a lot of ambiguity to the order of operations inference engine. When an expression could be a few different types, you have more branches in the possible options to take. Another is dispatch. For .NET interfaces and single dispatch, there is a clear path to the implementation. Since Tangent allows multiple dispatch, it'd be weird to have interfaces behave differently.

So, what do I want interfaces to do?


  • I want interfaces to be some contract that types can supply. "I don't care what type is passed in here, as long as it can be enumerated." 
  • I need interfaces to be able to be generic - I am not writing IntEnumerable, StringEnumerable, BoolEnumerable, and so on. 
  • I want interfaces to be multiply-implemented. Just because something is enumerable should not prevent it from being comparable.
  • I would like interfaces to be open. That is, if you wrote AwesomeParser and I have IParser, I should be able to adapt your implementation to my interface without changing your code.
  • It'd be nice if interfaces could support operators better. .NET interfaces suck at things like T+T => T.
Those are the goals. Now to work out the kinks of how I plan on actually doing this stuff. I need to come up with an idea that actually works. It needs to integrate nicely with the existing language. And it needs to get compiled into CIL that is at least a little performant. More to come in Part 2.

Sunday, March 20, 2016

Parser Refactoring

Not a lot of sexy work in the past few months. I wanted to add features - local variables, interfaces, .NET interop, something - but just found myself dreading going into the parser code to extend it. So instead, I spent a little time refactoring that to suck less.

I should note that I'm talking about the actual parser code for type and function declarations, not Tangent's fancy inference engine for function bodies. The parser code used to be just a series of functions. They took tokens, they returned an optional type, popping tokens as needed on success, perhaps calling other parser functions. When the grammar was a dozen or so rules, that was fine. It was easy to step through, and it was easy to unit test.

That quickly got squirrelly as I added generics, function params, product types, and sum types. The grammar is still only about two dozen rules, but they interact a lot more. This means each parsing function is doing more, and that added complexity was harming unit testing and my ability to jump back into the code after some time doing real work.

So, I went back to an old stand-by the compositional parser. One of the first things I did when I learned C# was to create a parsing framework akin to boost::spirit for the first version of Tangent. Not using that though. It generates full parse trees, which are awkward to work with. I just tossed together something similar, but specific to the existing structure in the Tangent code.

What this allows me to do is to define the grammar more declaratively. That makes it easier to see what's going on. And since the declarations get real ugly real fast, it pushes me to separate things out into testable, reusable chunks. And since I was smart enough to have a small set of regression tests, I could make sure that the refactoring didn't break anything I expected to work before.



The stable checkpoint is available at https://github.com/Telastyn/Tangent/releases/tag/Milestone3.1

Wednesday, January 6, 2016

Delegates - Part 2

I talked a little bit last time about how Tangent's lambdas are a little weird compared to C# and other languages. Today, I'm going to go a little more in depth into that, as well as showing some other examples of use. The first is an example that uses the return value of the lambda:

apply (fn(int): int) twice to (x: int) => int {
  fn fn x;
}

entrypoint => void {
  print apply (x)=>{x*x} twice to 2;
}

Because of the order of operation inference, fn fn x gets parsed out to the right order, passing the result of one operation to the next.

Now for some weirder stuff. Because of how Tangent parses statements, a lambda can have different interpretations depending on what type it resolves to. So when the compiler matches a lambda to a function parameter it needs to make sure that the lambda works for those types. A simple example:

foo(fn(int):void) => void {
  fn 42;
}

foo(fn(string):void) => void {
  fn "foo";
}

entrypoint => void {
  foo (x) => { print x + 2; };

}

Since string + int makes no sense, the int -> void overload is chosen and this prints 44. Weirder yet, Tangent includes the return type in the work. If the lambda can't return the requested type, that interpretation is discarded. Consider a similar example:

foo(fn(int): int) => void {
  print "with int";
  print fn 42;
}

foo(fn(int): void) => void {
  print "with void";
  fn 42;
}

entrypoint => void {
  foo (x) => {x;};
  foo (x) => {print x;};

}

Here the different bodies return different types, leading to dispatching to the different overloads and avoiding ambiguity.

Nifty, and it all worked once I wrote up the tests. Next will be maybe cleaning up a little of the lambda syntax and testing more of the closure functionality.

Thursday, December 31, 2015

Delegates - Part 1

Turns out it was delegates. Having lists without higher order functions was going to annoy me sooner rather than later, and allowing mutable state - even local mutable state - was going to add complexity and bugs I didn't want to deal with.

But if you remember, we don't actually have a way to declare higher order function types in Tangent. And since functions look like phrases, it's not quite clear how to allow users to both specify higher order functions and anonymous functions that fit that style of calling convention. What I settled on is a little bit weird, so bear with me.

To say that a parameter takes a higher order function, you build a small phrase in the parameter declaration. For example:

twofer(fn(int): void) => void {
  fn 2;
}

or

sort (collection: list int) with (compare (int) to (int): int) => list int { ... }

Since you can't ever access the delegate parameter directly, just specifying the type is sufficient. fn above compiles to a .NET Action<int>, with the phrase style calling, and compare (int) to (int) to a Func<int, int, int>, again allowing the function to call it with the right phrase pattern.

To call these functions, you just use lambda syntax similar to C#:

twofer (x) => {print x;};

or

sort stuff with (a) (b) => { a.Compare(b); };

Right now, the parens and curly braces are required. I expect the parens to become not required and the curly requirement likely to stay. Annoying, but this is better than many of my alternatives.

Now, there's a little bit of a trick to this. In C#, it is very clear what types the lambda's parameters are. But in Tangent, we infer the order of operations - both using the lambda as an argument, and in the body of the lambda itself. As the types of the lambda parameters change, the actual operations taken in the body of the lambda can also change.

What happens is that the entire lambda is considered a node in the expression. When the pattern twofer (Action<int>) sees the tokens "twofer" <lambda> it checks to see if the argument matches the parameter. Does it have the same number of parameters? If x is an int and we need to return void, does the lambda parse? If not, then the match fails and the compiler moves along, trying different interpretations of the statement.

This will allow some mildly weird behavior, like the body of lambdas effecting which overload of a function is used. I'll look to have an example of that once I'm more sure it works properly. Likewise, I want to make sure that my closure implementation actually is correct. It was a little too easy.

Wednesday, December 2, 2015

Cleanup

So, at the core of Tangent is the code that takes all of the phrases the user has defined and figures out how to grok that statement. In previous iterations of the language, that code has been hairy, nasty, nested cruft that has made it difficult to add features and debug things when they go south.

In this iteration, things were better. I was explicit in defining the various things that the language could define, and good about keeping things simple. Unfortunately, this led to a bit of duplication in this core code because the simplest way to deal with these definitions was just to have different handlers that did almost the same thing.

Not great. So I spent some time making some better abstractions and cleaning up that core code. In the end, it works akin to a shift reduce parser - the phrases define what pattern should be matched and depending on what is being declared (a parameter, a type, a function, etc.) it creates a different expression type in the abstract syntax tree. The new design better shares the matching and use of phrases, while leaving the "what happens when this is matched" variable. That allows better testing too, which is nice.

That should let me more easily move into local variables or delegates depending on what I feel like doing.

Wednesday, October 28, 2015

Software Engineering FTW!

Sorry about the delay. After the success in finally getting partial specialization working, I figured that it would be a good time to do a little cleanup. You know, address some of the tech debt I incurred hacking away before moving on to another feature that would be more complicated than I anticipated.

First on the list as you might expect was a lack of unit tests. Unfortunately, the stuff I wanted to test (code generation) doesn't exactly lend itself to unit testing well. For simplicity, the compiler emits directly to files. For even basic cases, the setup is a bit onerous to reproduce a sane intermediate structure. And the super basic cases aren't the ones that have been causing problems - it's been when things start interacting.

So I decided to do the next best thing, automated integration tests. I took all of the little test programs I've written (and posted here) and put them into a csproj along with their expected output. A little bit of code to kick off the compiler and run the resultant temp file executable, and I had a nice little regression suite.

You can probably guess what happened next. Yeah, half the tests failed.

Some were super quick to fix (typos during test creation). Some were kinda quick to fix (functionality I had removed). A few were slow to fix (wait, that should work...). And one took about a month and a half of intermittent spare time to fix. It was the basic Algebraic Data Type code:

int list :> int | (a: int),(b: int list) {
  (this).head => int { a }
  (this).tail => int list { b }
  print (this) => void {
    print this.head;
print this.tail;
  }
}

print (il: int list) => void {}

entrypoint => void {
  print 1,2,3;
}

It first blew up because of ambiguity errors between print(int list) and print(_anonymousProductType1). That ended up just being some overzealous refactoring on my part during the partial specialization work. I had removed the fanning out of implicit conversions for sum types, so the compiler didn't see the two functions as a specialization.

The second problem was more troublesome. Because of partial specialization fixes, the relationship between int list and _anonymousProductType1 was tighter. Under the covers, int list inherits from Variant<int, _anonymousProductType1>, and _anonymousProductType1 has a constructor that takes an int and an int list. That cyclic reference was biting me in the ass. The compiler was written to take an intermediate type representation and compile up a .NET type for it. I couldn't quite get the ordering right to build most of one type and then the other so that reflection didn't choke on stuff that was half-built.

In the end, I ended up having to take a bit of the nuclear option and change how the code generation worked. Now the compiler will populate all of the TypeBuilders and pretty much always work with half-built types until function generation at which point it uses AppDomain events to build the types in the right order. (taken from this SO answer) I think that this may cause me problems when types get more member methods, and it has made stuff a bit messy.

But at least now all my tests pass and things are stable again. Next on the docket is likely local variables.