Wednesday, July 20, 2016

Basic Interop

(For those wondering, I did get delegate fields working. They were not too much trouble. You can read about them in the release notes.)

As promised, I'm taking a bit of a shift towards pragmatism in my work, meaning some efforts to do interop with existing .NET code. Doing that comes with some design decisions and coding challenges. Let's go through some of them, since they'll impact most anyone writing a compiler:


How can I do .NET interop without making it neigh impossible to target other languages?


Well, the back end of the compiler (the bits that take structured intermediate forms and generate CIL from them) already are highly coupled with .NET. That code can know all it wants to about .NET. The front end though (the bits that take text and turn it into structured intermediate forms) doesn't know much about .NET. Even the built in types are just placeholders that the back end uses to tie to real ints and strings and the such.

But because of how Tangent works, the parser needs to see all of the legal phrases so it can infer meaning in the code. That means I need something from .NET interop land to generate the phrases for the Tangent code to refer to. And that something needs to carry enough information so that the back end can turn the phrase back into .NET interaction.

In the end, I erred towards YAGNI. I mean, I'll be lucky if I get a dozen people to read this blog, let alone write code in the language. There's no need to support shifting target languages. But the engineer in me can't just couple that code meaninglessly, so it ended up as a mostly separate set of pieces with some interop agnostic interfaces to tie it into the front end. The code that imports .NET code into the front end is just sitting in the front end for ease of implementation, but is mostly separate.

How can I deal with reference types when my language doesn't have null?


I'd really like it to not have the nulls ever, but .NET interop is going to require it somehow. I could make all reference types Optional<T> sort of things, which makes the null very explicit, but the interop code verbose and unweildy. I could make the .NET types nullable, but not Tangent types which kinda works, but will carry over the problems of null and add the leaky abstraction that Tangent types behave differently from interop types. I could just not allow Tangent code to assign or check null, using exceptions to handle nullref... but that is terrible.

In the current implementation, I mostly went the optional route. Constructors return real types, this-params take real types, but all parameters and return values are Optional<T>, requiring explicit null handling in Tangent. It might suck.

How can I deal with subtyping when my language doesn't have it?


Tangent only has interface implementation sorts of subtyping. The compiler doesn't support it, and since the inference engine needs to know that sort of stuff to know what a "legal" interpretation of a phrase is, the information needs to get there somehow. I could add subtyping, but adding something to the language just for interop seems dirty. I could not do subtyping and force explicit casts, but that'd be obnoxious. The tentative plan is to implement conversion functions so that the language can shift types upwards, and have the implementation be some sort of no-op that the back end compiler can deal with. That still will require some sort of inlining so that I'm not kneecapping performance due to my own stubbornness. (More on that later).

The basics are there now though. I need to get it wired into the command line interface, but in the test code, you can specify a collection of assemblies which the code will import the best it can. Non-generic types are largely there. Field access/assignment, properties, functions, constructors, extension methods - all there for instance and static forms. This test code will now compile and run properly with the right imports:

entrypoint => void {
  print "foo".Length;
  print "foobar".Length;
}

Which is awesome. No need to manually specify bindings to .NET functions any more, and the back end work was dead simple.

The bad news is that the above example takes about 50 seconds to compile and run. The other examples I've posted, including the ones doing weird partial specialization took about 200 milliseconds. The addition of a few thousand phrases for the inference code to wade through will do that. It's not terribly surprising, but disappointing none the less.

I guess those optimizations aren't premature anymore...

Sunday, June 19, 2016

Fields

The next step after adding local variables was to extend that concept along to actual fields in classes. Not the sexiest feature in the world, but as the language pushes towards doing stuff, pragmatism will win. Fields (hopefully) look and act like you'd expect - they look like local variables without the preceding colon, and initialize when the class is created. Let's look at the test code:

foo :> new Foo(x: int) {
  (this).test : int := 42 + x;
  (this) => int { this.test } 
}

entrypoint => void {
  print new Foo 5;     // 47
  :x:foo := new Foo 5;
  print x;             // still 47
  x.test = 5;
  print x;             // 5
} 

The only weird part is that (this) bit in there. Since fields can be phrases like everything else in the language, you need to specify where the owner of the field is fetched from. The initializers can pull from the constructor parameters, and from other fields - though there is a bug currently where the compiler may not order the dependencies of initializers properly.

Slightly weird, but nothing ground breaking. You have fields, you can read from them and you can write to them. Now to something a little more weird. I also added support for fields to interfaces. They don't have initializers, but otherwise use the same syntax. Let's look at an example:

Pirate :> pirate (x: string) :< has name {
  (this).name : string := x;
}

Doctor :> doctor (x: string) :< has name {
  (this).name : string := x;
}

has name :> interface {
  (this).name : string;
}

print name of (x: has name) => void {
  print x.name;
}

entrypoint => void {
  print name of doctor "?";
  print name of pirate "Bob"; 

}

Here the interface allows unified access to the fields of two otherwise distinct types. Under the hood, this works because while I've been using "field" to refer to fields, they are implemented similarly to properties. Each field has a getter and a setter function, so the interface simply requires they be implemented. In the example above, we're only using the getter, so could be changed to this and still work:

Pirate :> pirate (x: string) :< has name {
  (this).name : string := x;
}

Doctor :> doctor (x: string) :< has name {
  (this).name : string := x;
}

has name :> interface {
  (this).name => string;
}

print name of (x: has name) => void {
  print x.name;
}

entrypoint => void {
  print name of doctor "?";
  print name of pirate "Bob"; 
}

Next on the list is delegates. Once they're in, the language should be mostly done I think. It will have all of the big language things you need to do stuff, along with a pile of bugs and none of the library/interop things necessary for you to do interesting stuff. Hopefully, delegates will go quickly and I'll figure out some project that will push the pragmatism agenda.

Saturday, June 4, 2016

Basic Local Variables

After this morning's work, Tangent now supports the most basic of local variables. Yes, the language had user defined syntax, delegates, generics, type-class style interfaces, and... no variable support. But now as I move my focus more from building cool stuff to using cool stuff to build other cool stuff, I need to fill in the gaps. That means fixing the inevitable bugs I find, and basic usability stuff like local variables, which are still going to cause trouble since there wasn't any mutable state at all in the language before today.

I wanted local variables to be able to be phrases, just like parameters and types and everything else in the language. Yet I needed them to be clear both to the compiler and the user that they were declarations and not statements, and I didn't have a lot of tokens to work with, since I want most of them to be available for users to build their own phrases with. Let's look at the test code:

entrypoint => void {
  :x:int := 42;
  print x;
  x = x + 1;
  print x;
}

So local variable declarations start with a colon, and then work like (I hope) you'd expect them to - you declare a phrase and its type. Right now, locals require initialization, which uses pascal-style assignment to differentiate it from `=`, which helps the parser (and user) avoid ambiguity. This code works as of today.

What does not work is capturing locals in lambdas. That will just throw you an exception right off. Nested scopes probably don't work. Variable hiding probably works, but is untested. And due to implementation details, this code probably compiles, but will do unknown and terrible things:

entrypoint => void {
  print x;
  :x:int := 42;
}


Because while the code is smart enough to initialize the locals at the right time, it isn't smart enough to only make them available to the scope when declared, mostly because the scope is immutable and I didn't want to rabbit hole it. 

I have not yet decided if I hate this enough to fix it.

In other usability news, I added support to the command line compiler and test suite to support multiple input files. This should let me reuse code for different programs. I expect that I'm not going to want to reimplement conditionals and loops for every program I write...

Sunday, May 29, 2016

Standalone Interface Bindings

Hello again. In Interfaces - Part 4, I talked through the basic interface support which looks mostly like C# and other popular languages, with a little more functionality. You can declare an interface, and when you declare a class, you can say that it implements that interface. Now for some things that are not like what is available in popular languages.

One of the big things about Haskell type classes is that they don't work like that. They don't require you to include the interface reference when declaring your type. Let's look back at the example from Part 4:

convertible to string :> interface {
  (this) => string;
}

Cow :> cow :< convertible to string {
  (this) => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x;
}

entrypoint => void {
  cow speak;
}

The second declaration is our type declaration for Cow, along with our inline interface binding. With today's check-in you can do standalone interface binding as well:

convertible to string :> interface {
  (this) => string;
}

Cow :> cow { }

Cow :< convertible to string {
  (this) => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x;
}

entrypoint => void {
  cow speak;
}

Here, Cow is tied to convertible to string in its own standalone interface binding declaration, along with the implementation for that interface. This looks and acts a lot more like Haskell type classes. With this, you can take classes defined in some other library and adapt them to your interfaces (or vice versa).

Saturday, May 21, 2016

Interfaces - Part 4 - Compilation

(Part 1Part 2, Part 3)

The last part of implementing basic interfaces is the back end compiler. This part of the code takes the intermediate representation we made in Part 3 and turns it into CIL that actually runs.

There's two parts to this, creating the actual .NET type that represents our interface, and making the functions that properly dispatch to the right implementation. What is interesting about implementing interfaces like type-classes is that the two parts work like things the language already supports, but the two parts resemble different things.

The .NET type compilation used the sum-type compilation, similar to the intermediate code from part 3. All I needed was a few tweaks to allow the "what are all of the types this could be?" part of the code to get that from the sum-type directly or from the interface mapping. Oh, and allowing sum-types to have only one implementation in the runtime (since interfaces may have only one implementation, even if sum-types can't).

The function compilation on the other hand used the dispatch logic for generics. If you remember, interface functions infer their correct type based on the location of the (this) parameter. But since the type coming in is guaranteed to be a sum-type, the generic dispatch needs to check the type of the stored variant, not the sum-type itself.

Wrap that all together, and my basic test code now compiles and runs properly:

convertible to string :> interface {
  (this) to string => string;
}

Cow :> cow :< convertible to string {
  (this) to string => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x to string;
}

entrypoint => void {
  cow speak;
}

Hopefully this is straightforward - the code first declares an interface, which requires a single function. It then declares a class Cow with nullary constructor cow and implements the interface. And then it declares a function which works with that interface to print something to the screen so we can tell if it works. Finally, the entry point ties it all together by invoking that function with a concrete instance of Cow. Super dumb test code.

"So what?" you say, "Any plain interface implementation can do that."

That is entirely true. So let's covert this over to something that .NET style interfaces can't do. All the interface needs is some way to convert the object to a string. So why not just declare that?

convertible to string :> interface {
  (this) => string;
}

Cow :> cow :< convertible to string {
  (this) => string { "moo..." }
}

(x: convertible to string) speak => void {
  print x;
}

entrypoint => void {
  cow speak;

}

Yeah, that works too. Here our convertible to string interface requires its implementors to supply an implicit conversion to string. Since print requires a string as its input, the language is smart enough to infer that you want the conversion to happen there.

Next on the list is non-inline interface binding, fancier examples, and .NET interop. I'm not sure which will come first - we'll see where whimsy takes me.

Friday, May 6, 2016

Interfaces - Part 3 - Intermediate Steps

(Part 1, Part 2)

The second step to getting interfaces working is to actually do something with the stuff we parsed in Part 2. The grammar just spots interface declarations from the tokens it sees as input. The intermediate code is what takes those declarations and resolves them into real types. It's also where the function implementations are turned into a syntax tree, which is extra challenging since Tangent has no fixed order of operations.

The first order of business was to have a class to keep track of interface declarations. Since under the covers, all type declarations (enums, sum types, product types, type aliases, even built in types) are a phrase and a Tangent Type, I went with a new Tangent Type to represent the interface declarations themselves. The object has references to the functions required to satisfy it and... well, that's it. For the moment, that's all interfaces are.

If you remember from part 2, those functions are actually generic, and (this) needs to be replaced by a generic inference. That was the next bit of work to be done. Easy enough, a little bit of code between the grammar and intermediate objects to do a replacement.

Now I have interfaces, and the functions they need. To allow the code to know what types should implement what interfaces, I needed to add something to the type resolver to make those relations. Since interfaces are just a different sort of type declaration, I already had code to resolve type declarations, making that relatively straight forward. Now I have all of my interfaces in my list of type declarations, and all of the relations between them and concrete types.

But how to let the order of operation inference code know about the relation? It doesn't know about subtyping, and based on my previous iterations, I know that really complicates that code if it shows up. And these interfaces aren't really subtyping anyways - they work like type classes. Type classes in turn work like sum types: you have some symbol that means "this could be A or B or C or...". So interfaces got implemented similarly to sum types in this regard. The compiler adds an implicit conversion function that goes from the implementation to the interface. The order of operation inference code then treats that like any other function, allowing me to do no changes there. Awesome.

After that thing runs, I get a syntax tree out with functions that call the interfaces' functions if appropriate to go with the type declaration information. The compiler is ready to pass that to the back end which will turn it into executable CIL. More on that next time.

Sunday, April 10, 2016

Interfaces - Part 2 - Grammar

The first step to getting interfaces working is to tweak the parser to actually recognize the damned things. Well, that's not quite right. The first step is that I need to figure out what I want, and what parts make that up. Let's look at what I want from Part 1. I want them to do:

  • I want interfaces to be some contract that types can supply. "I don't care what type is passed in here, as long as it can be enumerated." 
  • I need interfaces to be able to be generic - I am not writing IntEnumerable, StringEnumerable, BoolEnumerable, and so on. 
  • I want interfaces to be multiply-implemented. Just because something is enumerable should not prevent it from being comparable.
  • I would like interfaces to be open. That is, if you wrote AwesomeParser and I have IParser, I should be able to adapt your implementation to my interface without changing your code.
  • It'd be nice if interfaces could support operators better. .NET interfaces suck at things like T+T => T.
Some of you out there will notice that these requirements are eerily close to what Haskell type classes do. That is no mistake. Haskell's type system is widely regarded as one of the most awesome made by man, and type classes in particular are one of the unique parts that set it apart. Type classes though work a lot more like .NET generic constraints than .NET interfaces. So how to design the language to support these things in a way that is easy/familiar, without breaking the formalisms necessary to keep everything from falling apart?

For the grammar, there are three main things I need to change to support interfaces:


  • I need to be able to declare an interface - "Here is a contract, and here is how you access it."
  • I need to be able to say that a type implements that interface - "Type X implements that contract"
  • I need to be able to use that interface in functions - "This function needs something that is enumerable as its first argument"

To declare an interface in Tangent, you use the standard type declaration syntax similarly to .NET interfaces. Say we wanted an interface to allow classes to provide a human readable view:

    human readable :> interface {
        human readable (this) => string;
    }

This defines an interface called "human readable" that has a single required function, "human readable" followed by a variable of the implementing type, which returns a string. In the grammar, this means defining new rules for the interface declaration and adding them to the type declaration rule.

To declare a class that uses the interface, it is included in the declaration, akin to .NET style inheritance:

    point :> (x: int), (y: int) :< human readable {
       human readable (this) => string {
           x + ", " + y;
       }
   
    } 

The only difference is that you use that funky duck face :< rather than a plain colon. This is done to better visually distinguish the interface implementations from the parameters, and when non-inline interface binding comes into play (we'll get to that in later parts). Again, this is declaring a type named "point", whose constructor is two ints with a comma in-between, and implements the human readable interface. It then goes in and implements the required function.

In the grammar, this means adding the inline interface binding part as optional. The challenge here is that the sum-type extension (x|y|z) is in the same area and also optional. That broke some tests and will likely be a complexity in the parser.

And to use the interface in a function, you can just specify it as the type:

    display (value: human readable) => void {
        print human readable value;
    }

Which works as you would expect from .NET land. You get a function that takes anything that satisfies the interface, and you can use any function guaranteed to be there. In the grammar... that isn't any change at all. But remember when I said that type classes are more like generic constraints than real interfaces? Yeah, here is where that comes into play. When that gets compiled, it will look more like this C# declaration:

    void display<T>(T value) where T: IHumanReadable { ... }

and Tangent does the magic to include the generic parameter that you don't really care about. But what if you do care about it? That's where the grammar change comes into play. If you care about the actual type being passed in, then you can declare parameters like this:

    display (value: T: human readable) => void { ... }

which will bind the actual concrete type to the phrase T so you can work with it. That isn't important here, but does come into play in other scenarios where the type-classiness of Tangent interfaces is important. Consider this interface to do a C-style compare (0 is equals, >0 is greater than, <0 is less than):

    comparable :> interface {
        compare (this) to (this) => int;
    }

Great, now we have a nice interface that says you can take two things and compare them. Let's use this in a trivial function:

    smaller (a: comparable) or (b: comparable) => comparable {
        if compare a to b < 0
          then a
          else b;
    }

That does not work.

What the interface says is that the compare function takes two arguments of the same type. The smallest function takes two comparables, but they might not be the same type. So that's where the three part parameter declaration comes into play:

    smaller (a: T: comparable) or (b: T) => T {       
        if compare a to b < 0

          then a
          else b;   
    }

In this function, we're specifically saying that the second argument (and the return value) must be the same type as the first argument. Since that meets the interface's constraints (and because Tangent doesn't have subtyping), that works. 

So those are the three grammar changes going into the language for basic interface support. Now I just need to figure out how to use them now that the parser can recognize them. Part 3 and more to come!